Compare commits

...

18 Commits

Author SHA1 Message Date
170719414d release 2017.06.23 2017-06-23 02:13:21 +07:00
38dad4737f [ChangeLog] Actualize 2017-06-23 02:10:54 +07:00
ddbb4c5c3e [youtube] Adapt to new automatic captions rendition (closes #13467) 2017-06-23 02:00:19 +07:00
fa3ea7223a [hgtv.com:show] Relax video config regex and update test (closes #13279, closes #13461) 2017-06-23 00:42:42 +07:00
0f4a5a73e7 [drtuber] Fix formats extraction (fixes 12058) 2017-06-23 00:08:36 +07:00
18166bb8e8 [youporn] Fix upload date extraction 2017-06-22 00:47:02 +07:00
d4893e764b [youporn] Improve formats extraction 2017-06-22 00:40:15 +07:00
97b6e30113 [youporn] Fix title extraction (closes #13456) 2017-06-22 00:20:45 +07:00
9be9ec5980 [googledrive] Fix formats' sorting (closes #13443) 2017-06-20 22:58:33 +07:00
048b55804d [watchindianporn] Fix extraction (closes #13411) 2017-06-20 04:30:45 +07:00
6ce79d7ac0 [abcotvs] Fix test md5 2017-06-20 04:07:00 +07:00
1641ca402d [vimeo] Add fallback mp4 extension for original format 2017-06-20 01:27:59 +07:00
85cbcede5b [ruv] Improve, extract all formats and metadata (closes #13396) 2017-06-19 23:46:03 +07:00
Orn
a1de83e5f0 [ruv] Add extractor 2017-06-19 23:45:45 +07:00
fee00b3884 [viu] Fix extraction on older python 2.6 2017-06-19 22:57:37 +07:00
2d2132ac6e [adobepass] Fix extraction on older python 2.6 2017-06-19 22:54:53 +07:00
cc2ffe5afe [pandora.tv] Fix upload_date extraction (closes #12846) 2017-06-19 16:20:36 +08:00
560050669b [asiancrush] Add extractor (closes #13420) 2017-06-18 20:18:51 +07:00
18 changed files with 407 additions and 103 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.06.18*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.06.23*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.06.18** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.06.23**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2017.06.18 [debug] youtube-dl version 2017.06.23
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -1,3 +1,24 @@
version 2017.06.23
Core
* [adobepass] Fix extraction on older python 2.6
Extractors
* [youtube] Adapt to new automatic captions rendition (#13467)
* [hgtv.com:show] Relax video config regular expression (#13279, #13461)
* [drtuber] Fix formats extraction (#12058)
* [youporn] Fix upload date extraction
* [youporn] Improve formats extraction
* [youporn] Fix title extraction (#13456)
* [googledrive] Fix formats sorting (#13443)
* [watchindianporn] Fix extraction (#13411, #13415)
+ [vimeo] Add fallback mp4 extension for original format
+ [ruv] Add support for ruv.is (#13396)
* [viu] Fix extraction on older python 2.6
* [pandora.tv] Fix upload_date extraction (#12846)
+ [asiancrush] Add support for asiancrush.com (#13420)
version 2017.06.18 version 2017.06.18
Core Core

View File

@ -67,6 +67,8 @@
- **arte.tv:info** - **arte.tv:info**
- **arte.tv:magazine** - **arte.tv:magazine**
- **arte.tv:playlist** - **arte.tv:playlist**
- **AsianCrush**
- **AsianCrushPlaylist**
- **AtresPlayer** - **AtresPlayer**
- **ATTTechChannel** - **ATTTechChannel**
- **ATVAt** - **ATVAt**
@ -686,6 +688,7 @@
- **rutube:person**: Rutube person videos - **rutube:person**: Rutube person videos
- **RUTV**: RUTV.RU - **RUTV**: RUTV.RU
- **Ruutu** - **Ruutu**
- **Ruv**
- **safari**: safaribooksonline.com online video - **safari**: safaribooksonline.com online video
- **safari:api** - **safari:api**
- **safari:course**: safaribooksonline.com online courses - **safari:course**: safaribooksonline.com online courses

View File

@ -22,7 +22,7 @@ class ABCOTVSIE(InfoExtractor):
'display_id': 'east-bay-museum-celebrates-vintage-synthesizers', 'display_id': 'east-bay-museum-celebrates-vintage-synthesizers',
'ext': 'mp4', 'ext': 'mp4',
'title': 'East Bay museum celebrates vintage synthesizers', 'title': 'East Bay museum celebrates vintage synthesizers',
'description': 'md5:a4f10fb2f2a02565c1749d4adbab4b10', 'description': 'md5:24ed2bd527096ec2a5c67b9d5a9005f3',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1421123075, 'timestamp': 1421123075,
'upload_date': '20150113', 'upload_date': '20150113',

View File

@ -6,7 +6,10 @@ import time
import xml.etree.ElementTree as etree import xml.etree.ElementTree as etree
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse from ..compat import (
compat_kwargs,
compat_urlparse,
)
from ..utils import ( from ..utils import (
unescapeHTML, unescapeHTML,
urlencode_postdata, urlencode_postdata,
@ -1317,7 +1320,8 @@ class AdobePassIE(InfoExtractor):
headers = kwargs.get('headers', {}) headers = kwargs.get('headers', {})
headers.update(self.geo_verification_headers()) headers.update(self.geo_verification_headers())
kwargs['headers'] = headers kwargs['headers'] = headers
return super(AdobePassIE, self)._download_webpage_handle(*args, **kwargs) return super(AdobePassIE, self)._download_webpage_handle(
*args, **compat_kwargs(kwargs))
@staticmethod @staticmethod
def _get_mvpd_resource(provider_id, title, guid, rating): def _get_mvpd_resource(provider_id, title, guid, rating):

View File

@ -0,0 +1,93 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .kaltura import KalturaIE
from ..utils import (
extract_attributes,
remove_end,
urlencode_postdata,
)
class AsianCrushIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?asiancrush\.com/video/(?:[^/]+/)?0+(?P<id>\d+)v\b'
_TESTS = [{
'url': 'https://www.asiancrush.com/video/012869v/women-who-flirt/',
'md5': 'c3b740e48d0ba002a42c0b72857beae6',
'info_dict': {
'id': '1_y4tmjm5r',
'ext': 'mp4',
'title': 'Women Who Flirt',
'description': 'md5:3db14e9186197857e7063522cb89a805',
'timestamp': 1496936429,
'upload_date': '20170608',
'uploader_id': 'craig@crifkin.com',
},
}, {
'url': 'https://www.asiancrush.com/video/she-was-pretty/011886v-pretty-episode-3/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
'https://www.asiancrush.com/wp-admin/admin-ajax.php', video_id,
data=urlencode_postdata({
'postid': video_id,
'action': 'get_channel_kaltura_vars',
}))
entry_id = data['entry_id']
return self.url_result(
'kaltura:%s:%s' % (data['partner_id'], entry_id),
ie=KalturaIE.ie_key(), video_id=entry_id,
video_title=data.get('vid_label'))
class AsianCrushPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?asiancrush\.com/series/0+(?P<id>\d+)s\b'
_TEST = {
'url': 'https://www.asiancrush.com/series/012481s/scholar-walks-night/',
'info_dict': {
'id': '12481',
'title': 'Scholar Who Walks the Night',
'description': 'md5:7addd7c5132a09fd4741152d96cce886',
},
'playlist_count': 20,
}
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = []
for mobj in re.finditer(
r'<a[^>]+href=(["\'])(?P<url>%s.*?)\1[^>]*>' % AsianCrushIE._VALID_URL,
webpage):
attrs = extract_attributes(mobj.group(0))
if attrs.get('class') == 'clearfix':
entries.append(self.url_result(
mobj.group('url'), ie=AsianCrushIE.ie_key()))
title = remove_end(
self._html_search_regex(
r'(?s)<h1\b[^>]\bid=["\']movieTitle[^>]+>(.+?)</h1>', webpage,
'title', default=None) or self._og_search_title(
webpage, default=None) or self._html_search_meta(
'twitter:title', webpage, 'title',
default=None) or self._search_regex(
r'<title>([^<]+)</title>', webpage, 'title', fatal=False),
' | AsianCrush')
description = self._og_search_description(
webpage, default=None) or self._html_search_meta(
'twitter:description', webpage, 'description', fatal=False)
return self.playlist_result(entries, playlist_id, title, description)

View File

@ -44,8 +44,23 @@ class DrTuberIE(InfoExtractor):
webpage = self._download_webpage( webpage = self._download_webpage(
'http://www.drtuber.com/video/%s' % video_id, display_id) 'http://www.drtuber.com/video/%s' % video_id, display_id)
video_url = self._html_search_regex( video_data = self._download_json(
r'<source src="([^"]+)"', webpage, 'video URL') 'http://www.drtuber.com/player_config_json/', video_id, query={
'vid': video_id,
'embed': 0,
'aid': 0,
'domain_id': 0,
})
formats = []
for format_id, video_url in video_data['files'].items():
if video_url:
formats.append({
'format_id': format_id,
'quality': 2 if format_id == 'hq' else 1,
'url': video_url
})
self._sort_formats(formats)
title = self._html_search_regex( title = self._html_search_regex(
(r'class="title_watch"[^>]*><(?:p|h\d+)[^>]*>([^<]+)<', (r'class="title_watch"[^>]*><(?:p|h\d+)[^>]*>([^<]+)<',
@ -75,7 +90,7 @@ class DrTuberIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'url': video_url, 'formats': formats,
'title': title, 'title': title,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'like_count': like_count, 'like_count': like_count,

View File

@ -71,6 +71,10 @@ from .arte import (
TheOperaPlatformIE, TheOperaPlatformIE,
ArteTVPlaylistIE, ArteTVPlaylistIE,
) )
from .asiancrush import (
AsianCrushIE,
AsianCrushPlaylistIE,
)
from .atresplayer import AtresPlayerIE from .atresplayer import AtresPlayerIE
from .atttechchannel import ATTTechChannelIE from .atttechchannel import ATTTechChannelIE
from .atvat import ATVAtIE from .atvat import ATVAtIE
@ -871,6 +875,7 @@ from .rutube import (
) )
from .rutv import RUTVIE from .rutv import RUTVIE
from .ruutu import RuutuIE from .ruutu import RuutuIE
from .ruv import RuvIE
from .sandia import SandiaIE from .sandia import SandiaIE
from .safari import ( from .safari import (
SafariIE, SafariIE,

View File

@ -69,19 +69,32 @@ class GoogleDriveIE(InfoExtractor):
r'"fmt_stream_map"\s*,\s*"([^"]+)', webpage, 'fmt stream map').split(',') r'"fmt_stream_map"\s*,\s*"([^"]+)', webpage, 'fmt stream map').split(',')
fmt_list = self._search_regex(r'"fmt_list"\s*,\s*"([^"]+)', webpage, 'fmt_list').split(',') fmt_list = self._search_regex(r'"fmt_list"\s*,\s*"([^"]+)', webpage, 'fmt_list').split(',')
resolutions = {}
for fmt in fmt_list:
mobj = re.search(
r'^(?P<format_id>\d+)/(?P<width>\d+)[xX](?P<height>\d+)', fmt)
if mobj:
resolutions[mobj.group('format_id')] = (
int(mobj.group('width')), int(mobj.group('height')))
formats = [] formats = []
for fmt, fmt_stream in zip(fmt_list, fmt_stream_map): for fmt_stream in fmt_stream_map:
fmt_id, fmt_url = fmt_stream.split('|') fmt_stream_split = fmt_stream.split('|')
resolution = fmt.split('/')[1] if len(fmt_stream_split) < 2:
width, height = resolution.split('x') continue
formats.append({ format_id, format_url = fmt_stream_split[:2]
'url': lowercase_escape(fmt_url), f = {
'format_id': fmt_id, 'url': lowercase_escape(format_url),
'resolution': resolution, 'format_id': format_id,
'width': int_or_none(width), 'ext': self._FORMATS_EXT[format_id],
'height': int_or_none(height), }
'ext': self._FORMATS_EXT[fmt_id], resolution = resolutions.get(format_id)
if resolution:
f.update({
'width': resolution[0],
'height': resolution[0],
}) })
formats.append(f)
self._sort_formats(formats) self._sort_formats(formats)
return { return {

View File

@ -7,14 +7,19 @@ from .common import InfoExtractor
class HGTVComShowIE(InfoExtractor): class HGTVComShowIE(InfoExtractor):
IE_NAME = 'hgtv.com:show' IE_NAME = 'hgtv.com:show'
_VALID_URL = r'https?://(?:www\.)?hgtv\.com/shows/[^/]+/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?hgtv\.com/shows/[^/]+/(?P<id>[^/?#&]+)'
_TEST = { _TESTS = [{
'url': 'http://www.hgtv.com/shows/flip-or-flop/flip-or-flop-full-episodes-videos', # data-module="video"
'url': 'http://www.hgtv.com/shows/flip-or-flop/flip-or-flop-full-episodes-season-4-videos',
'info_dict': { 'info_dict': {
'id': 'flip-or-flop-full-episodes-videos', 'id': 'flip-or-flop-full-episodes-season-4-videos',
'title': 'Flip or Flop Full Episodes', 'title': 'Flip or Flop Full Episodes',
}, },
'playlist_mincount': 15, 'playlist_mincount': 15,
} }, {
# data-deferred-module="video"
'url': 'http://www.hgtv.com/shows/good-bones/episodes/an-old-victorian-house-gets-a-new-facelift',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
@ -23,7 +28,7 @@ class HGTVComShowIE(InfoExtractor):
config = self._parse_json( config = self._parse_json(
self._search_regex( self._search_regex(
r'(?s)data-module=["\']video["\'][^>]*>.*?<script[^>]+type=["\']text/x-config["\'][^>]*>(.+?)</script', r'(?s)data-(?:deferred)?-module=["\']video["\'][^>]*>.*?<script[^>]+type=["\']text/x-config["\'][^>]*>(.+?)</script',
webpage, 'video config'), webpage, 'video config'),
display_id)['channels'][0] display_id)['channels'][0]

View File

@ -19,7 +19,7 @@ class PandoraTVIE(InfoExtractor):
IE_NAME = 'pandora.tv' IE_NAME = 'pandora.tv'
IE_DESC = '판도라TV' IE_DESC = '판도라TV'
_VALID_URL = r'https?://(?:.+?\.)?channel\.pandora\.tv/channel/video\.ptv\?' _VALID_URL = r'https?://(?:.+?\.)?channel\.pandora\.tv/channel/video\.ptv\?'
_TEST = { _TESTS = [{
'url': 'http://jp.channel.pandora.tv/channel/video.ptv?c1=&prgid=53294230&ch_userid=mikakim&ref=main&lot=cate_01_2', 'url': 'http://jp.channel.pandora.tv/channel/video.ptv?c1=&prgid=53294230&ch_userid=mikakim&ref=main&lot=cate_01_2',
'info_dict': { 'info_dict': {
'id': '53294230', 'id': '53294230',
@ -34,7 +34,26 @@ class PandoraTVIE(InfoExtractor):
'view_count': int, 'view_count': int,
'like_count': int, 'like_count': int,
} }
} }, {
'url': 'http://channel.pandora.tv/channel/video.ptv?ch_userid=gogoucc&prgid=54721744',
'info_dict': {
'id': '54721744',
'ext': 'flv',
'title': '[HD] JAPAN COUNTDOWN 170423',
'description': '[HD] JAPAN COUNTDOWN 170423',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 1704.9,
'upload_date': '20170423',
'uploader': 'GOGO_UCC',
'uploader_id': 'gogoucc',
'view_count': int,
'like_count': int,
},
'params': {
# Test metadata only
'skip_download': True,
},
}]
def _real_extract(self, url): def _real_extract(self, url):
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query) qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
@ -86,7 +105,7 @@ class PandoraTVIE(InfoExtractor):
'description': info.get('body'), 'description': info.get('body'),
'thumbnail': info.get('thumbnail') or info.get('poster'), 'thumbnail': info.get('thumbnail') or info.get('poster'),
'duration': float_or_none(info.get('runtime'), 1000) or parse_duration(info.get('time')), 'duration': float_or_none(info.get('runtime'), 1000) or parse_duration(info.get('time')),
'upload_date': info['fid'][:8] if isinstance(info.get('fid'), compat_str) else None, 'upload_date': info['fid'].split('/')[-1][:8] if isinstance(info.get('fid'), compat_str) else None,
'uploader': info.get('nickname'), 'uploader': info.get('nickname'),
'uploader_id': info.get('upload_userid'), 'uploader_id': info.get('upload_userid'),
'view_count': str_to_int(info.get('hit')), 'view_count': str_to_int(info.get('hit')),

101
youtube_dl/extractor/ruv.py Normal file
View File

@ -0,0 +1,101 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
determine_ext,
unified_timestamp,
)
class RuvIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ruv\.is/(?:sarpurinn/[^/]+|node)/(?P<id>[^/]+(?:/\d+)?)'
_TESTS = [{
# m3u8
'url': 'http://ruv.is/sarpurinn/ruv-aukaras/fh-valur/20170516',
'md5': '66347652f4e13e71936817102acc1724',
'info_dict': {
'id': '1144499',
'display_id': 'fh-valur/20170516',
'ext': 'mp4',
'title': 'FH - Valur',
'description': 'Bein útsending frá 3. leik FH og Vals í úrslitum Olísdeildar karla í handbolta.',
'timestamp': 1494963600,
'upload_date': '20170516',
},
}, {
# mp3
'url': 'http://ruv.is/sarpurinn/ras-2/morgunutvarpid/20170619',
'md5': '395ea250c8a13e5fdb39d4670ef85378',
'info_dict': {
'id': '1153630',
'display_id': 'morgunutvarpid/20170619',
'ext': 'mp3',
'title': 'Morgunútvarpið',
'description': 'md5:a4cf1202c0a1645ca096b06525915418',
'timestamp': 1497855000,
'upload_date': '20170619',
},
}, {
'url': 'http://ruv.is/sarpurinn/ruv/frettir/20170614',
'only_matching': True,
}, {
'url': 'http://www.ruv.is/node/1151854',
'only_matching': True,
}, {
'url': 'http://ruv.is/sarpurinn/klippa/secret-soltice-hefst-a-morgun',
'only_matching': True,
}, {
'url': 'http://ruv.is/sarpurinn/ras-1/morgunvaktin/20170619',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
title = self._og_search_title(webpage)
FIELD_RE = r'video\.%s\s*=\s*(["\'])(?P<url>(?:(?!\1).)+)\1'
media_url = self._html_search_regex(
FIELD_RE % 'src', webpage, 'video URL', group='url')
video_id = self._search_regex(
r'<link\b[^>]+\bhref=["\']https?://www\.ruv\.is/node/(\d+)',
webpage, 'video id', default=display_id)
ext = determine_ext(media_url)
if ext == 'm3u8':
formats = self._extract_m3u8_formats(
media_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
elif ext == 'mp3':
formats = [{
'format_id': 'mp3',
'url': media_url,
'vcodec': 'none',
}]
else:
formats = [{
'url': media_url,
}]
description = self._og_search_description(webpage, default=None)
thumbnail = self._og_search_thumbnail(
webpage, default=None) or self._search_regex(
FIELD_RE % 'poster', webpage, 'thumbnail', fatal=False)
timestamp = unified_timestamp(self._html_search_meta(
'article:published_time', webpage, 'timestamp', fatal=False))
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'formats': formats,
}

View File

@ -615,7 +615,10 @@ class VimeoIE(VimeoBaseInfoExtractor):
if download_url and not source_file.get('is_cold') and not source_file.get('is_defrosting'): if download_url and not source_file.get('is_cold') and not source_file.get('is_defrosting'):
source_name = source_file.get('public_name', 'Original') source_name = source_file.get('public_name', 'Original')
if self._is_valid_url(download_url, video_id, '%s video' % source_name): if self._is_valid_url(download_url, video_id, '%s video' % source_name):
ext = source_file.get('extension', determine_ext(download_url)).lower() ext = (try_get(
source_file, lambda x: x['extension'],
compat_str) or determine_ext(
download_url, None) or 'mp4').lower()
formats.append({ formats.append({
'url': download_url, 'url': download_url,
'ext': ext, 'ext': ext,

View File

@ -4,7 +4,10 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..compat import (
compat_kwargs,
compat_str,
)
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
@ -36,7 +39,8 @@ class ViuBaseIE(InfoExtractor):
headers.update(kwargs.get('headers', {})) headers.update(kwargs.get('headers', {}))
kwargs['headers'] = headers kwargs['headers'] = headers
response = self._download_json( response = self._download_json(
'https://www.viu.com/api/' + path, *args, **kwargs)['response'] 'https://www.viu.com/api/' + path, *args,
**compat_kwargs(kwargs))['response']
if response.get('status') != 'success': if response.get('status') != 'success':
raise ExtractorError('%s said: %s' % ( raise ExtractorError('%s said: %s' % (
self.IE_NAME, response['message']), expected=True) self.IE_NAME, response['message']), expected=True)

View File

@ -4,11 +4,7 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import parse_duration
unified_strdate,
parse_duration,
int_or_none,
)
class WatchIndianPornIE(InfoExtractor): class WatchIndianPornIE(InfoExtractor):
@ -23,11 +19,8 @@ class WatchIndianPornIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Hot milf from kerala shows off her gorgeous large breasts on camera', 'title': 'Hot milf from kerala shows off her gorgeous large breasts on camera',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'LoveJay',
'upload_date': '20160428',
'duration': 226, 'duration': 226,
'view_count': int, 'view_count': int,
'comment_count': int,
'categories': list, 'categories': list,
'age_limit': 18, 'age_limit': 18,
} }
@ -40,51 +33,36 @@ class WatchIndianPornIE(InfoExtractor):
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
video_url = self._html_search_regex( info_dict = self._parse_html5_media_entries(url, webpage, video_id)[0]
r"url: escape\('([^']+)'\)", webpage, 'url')
title = self._html_search_regex( title = self._html_search_regex((
r'<h2 class="he2"><span>(.*?)</span>', r'<title>(.+?)\s*-\s*Indian\s+Porn</title>',
webpage, 'title') r'<h4>(.+?)</h4>'
thumbnail = self._html_search_regex( ), webpage, 'title')
r'<span id="container"><img\s+src="([^"]+)"',
webpage, 'thumbnail', fatal=False)
uploader = self._html_search_regex(
r'class="aupa">\s*(.*?)</a>',
webpage, 'uploader')
upload_date = unified_strdate(self._html_search_regex(
r'Added: <strong>(.+?)</strong>', webpage, 'upload date', fatal=False))
duration = parse_duration(self._search_regex( duration = parse_duration(self._search_regex(
r'<td>Time:\s*</td>\s*<td align="right"><span>\s*(.+?)\s*</span>', r'Time:\s*<strong>\s*(.+?)\s*</strong>',
webpage, 'duration', fatal=False)) webpage, 'duration', fatal=False))
view_count = int_or_none(self._search_regex( view_count = int(self._search_regex(
r'<td>Views:\s*</td>\s*<td align="right"><span>\s*(\d+)\s*</span>', r'(?s)Time:\s*<strong>.*?</strong>.*?<strong>\s*(\d+)\s*</strong>',
webpage, 'view count', fatal=False)) webpage, 'view count', fatal=False))
comment_count = int_or_none(self._search_regex(
r'<td>Comments:\s*</td>\s*<td align="right"><span>\s*(\d+)\s*</span>',
webpage, 'comment count', fatal=False))
categories = re.findall( categories = re.findall(
r'<a href="[^"]+/search/video/desi"><span>([^<]+)</span></a>', r'<a[^>]+class=[\'"]categories[\'"][^>]*>\s*([^<]+)\s*</a>',
webpage) webpage)
return { info_dict.update({
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'url': video_url,
'http_headers': { 'http_headers': {
'Referer': url, 'Referer': url,
}, },
'title': title, 'title': title,
'thumbnail': thumbnail,
'uploader': uploader,
'upload_date': upload_date,
'duration': duration, 'duration': duration,
'view_count': view_count, 'view_count': view_count,
'comment_count': comment_count,
'categories': categories, 'categories': categories,
'age_limit': 18, 'age_limit': 18,
} })
return info_dict

View File

@ -3,6 +3,7 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
sanitized_Request, sanitized_Request,
@ -26,7 +27,7 @@ class YouPornIE(InfoExtractor):
'description': 'Love & Sex Answers: http://bit.ly/DanAndJenn -- Is It Unhealthy To Masturbate Daily?', 'description': 'Love & Sex Answers: http://bit.ly/DanAndJenn -- Is It Unhealthy To Masturbate Daily?',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Ask Dan And Jennifer', 'uploader': 'Ask Dan And Jennifer',
'upload_date': '20101221', 'upload_date': '20101217',
'average_rating': int, 'average_rating': int,
'view_count': int, 'view_count': int,
'comment_count': int, 'comment_count': int,
@ -45,7 +46,7 @@ class YouPornIE(InfoExtractor):
'description': 'http://sweetlivegirls.com Big Tits Awesome Brunette On amazing webcam show.mp4', 'description': 'http://sweetlivegirls.com Big Tits Awesome Brunette On amazing webcam show.mp4',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Unknown', 'uploader': 'Unknown',
'upload_date': '20111125', 'upload_date': '20110418',
'average_rating': int, 'average_rating': int,
'view_count': int, 'view_count': int,
'comment_count': int, 'comment_count': int,
@ -68,28 +69,46 @@ class YouPornIE(InfoExtractor):
webpage = self._download_webpage(request, display_id) webpage = self._download_webpage(request, display_id)
title = self._search_regex( title = self._search_regex(
[r'(?:video_titles|videoTitle)\s*[:=]\s*(["\'])(?P<title>.+?)\1', [r'(?:video_titles|videoTitle)\s*[:=]\s*(["\'])(?P<title>(?:(?!\1).)+)\1',
r'<h1[^>]+class=["\']heading\d?["\'][^>]*>([^<])<'], r'<h1[^>]+class=["\']heading\d?["\'][^>]*>(?P<title>[^<]+)<'],
webpage, 'title', group='title') webpage, 'title', group='title',
default=None) or self._og_search_title(
webpage, default=None) or self._html_search_meta(
'title', webpage, fatal=True)
links = [] links = []
# Main source
definitions = self._parse_json(
self._search_regex(
r'mediaDefinition\s*=\s*(\[.+?\]);', webpage,
'media definitions', default='[]'),
video_id, fatal=False)
if definitions:
for definition in definitions:
if not isinstance(definition, dict):
continue
video_url = definition.get('videoUrl')
if isinstance(video_url, compat_str) and video_url:
links.append(video_url)
# Fallback #1, this also contains extra low quality 180p format
for _, link in re.findall(r'<a[^>]+href=(["\'])(http.+?)\1[^>]+title=["\']Download [Vv]ideo', webpage):
links.append(link)
# Fallback #2 (unavailable as at 22.06.2017)
sources = self._search_regex( sources = self._search_regex(
r'(?s)sources\s*:\s*({.+?})', webpage, 'sources', default=None) r'(?s)sources\s*:\s*({.+?})', webpage, 'sources', default=None)
if sources: if sources:
for _, link in re.findall(r'[^:]+\s*:\s*(["\'])(http.+?)\1', sources): for _, link in re.findall(r'[^:]+\s*:\s*(["\'])(http.+?)\1', sources):
links.append(link) links.append(link)
# Fallback #1 # Fallback #3 (unavailable as at 22.06.2017)
for _, link in re.findall( for _, link in re.findall(
r'(?:videoUrl|videoSrc|videoIpadUrl|html5PlayerSrc)\s*[:=]\s*(["\'])(http.+?)\1', webpage): r'(?:videoSrc|videoIpadUrl|html5PlayerSrc)\s*[:=]\s*(["\'])(http.+?)\1', webpage):
links.append(link) links.append(link)
# Fallback #2, this also contains extra low quality 180p format # Fallback #4, encrypted links (unavailable as at 22.06.2017)
for _, link in re.findall(r'<a[^>]+href=(["\'])(http.+?)\1[^>]+title=["\']Download [Vv]ideo', webpage):
links.append(link)
# Fallback #3, encrypted links
for _, encrypted_link in re.findall( for _, encrypted_link in re.findall(
r'encryptedQuality\d{3,4}URL\s*=\s*(["\'])([\da-zA-Z+/=]+)\1', webpage): r'encryptedQuality\d{3,4}URL\s*=\s*(["\'])([\da-zA-Z+/=]+)\1', webpage):
links.append(aes_decrypt_text(encrypted_link, title, 32).decode('utf-8')) links.append(aes_decrypt_text(encrypted_link, title, 32).decode('utf-8'))
@ -124,7 +143,8 @@ class YouPornIE(InfoExtractor):
r'(?s)<div[^>]+class=["\']submitByLink["\'][^>]*>(.+?)</div>', r'(?s)<div[^>]+class=["\']submitByLink["\'][^>]*>(.+?)</div>',
webpage, 'uploader', fatal=False) webpage, 'uploader', fatal=False)
upload_date = unified_strdate(self._html_search_regex( upload_date = unified_strdate(self._html_search_regex(
r'(?s)<div[^>]+class=["\']videoInfo(?:Date|Time)["\'][^>]*>(.+?)</div>', [r'Date\s+[Aa]dded:\s*<span>([^<]+)',
r'(?s)<div[^>]+class=["\']videoInfo(?:Date|Time)["\'][^>]*>(.+?)</div>'],
webpage, 'upload date', fatal=False)) webpage, 'upload date', fatal=False))
age_limit = self._rta_search(webpage) age_limit = self._rta_search(webpage)

View File

@ -1269,37 +1269,57 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
sub_lang_list[sub_lang] = sub_formats sub_lang_list[sub_lang] = sub_formats
return sub_lang_list return sub_lang_list
# Some videos don't provide ttsurl but rather caption_tracks and def make_captions(sub_url, sub_langs):
# caption_translation_languages (e.g. 20LmZk1hakA) parsed_sub_url = compat_urllib_parse_urlparse(sub_url)
caption_tracks = args['caption_tracks'] caption_qs = compat_parse_qs(parsed_sub_url.query)
caption_translation_languages = args['caption_translation_languages'] captions = {}
caption_url = compat_parse_qs(caption_tracks.split(',')[0])['u'][0] for sub_lang in sub_langs:
parsed_caption_url = compat_urllib_parse_urlparse(caption_url)
caption_qs = compat_parse_qs(parsed_caption_url.query)
sub_lang_list = {}
for lang in caption_translation_languages.split(','):
lang_qs = compat_parse_qs(compat_urllib_parse_unquote_plus(lang))
sub_lang = lang_qs.get('lc', [None])[0]
if not sub_lang:
continue
sub_formats = [] sub_formats = []
for ext in self._SUBTITLE_FORMATS: for ext in self._SUBTITLE_FORMATS:
caption_qs.update({ caption_qs.update({
'tlang': [sub_lang], 'tlang': [sub_lang],
'fmt': [ext], 'fmt': [ext],
}) })
sub_url = compat_urlparse.urlunparse(parsed_caption_url._replace( sub_url = compat_urlparse.urlunparse(parsed_sub_url._replace(
query=compat_urllib_parse_urlencode(caption_qs, True))) query=compat_urllib_parse_urlencode(caption_qs, True)))
sub_formats.append({ sub_formats.append({
'url': sub_url, 'url': sub_url,
'ext': ext, 'ext': ext,
}) })
sub_lang_list[sub_lang] = sub_formats captions[sub_lang] = sub_formats
return sub_lang_list return captions
# New captions format as of 22.06.2017
player_response = args.get('player_response')
if player_response and isinstance(player_response, compat_str):
player_response = self._parse_json(
player_response, video_id, fatal=False)
if player_response:
renderer = player_response['captions']['playerCaptionsTracklistRenderer']
base_url = renderer['captionTracks'][0]['baseUrl']
sub_lang_list = []
for lang in renderer['translationLanguages']:
lang_code = lang.get('languageCode')
if lang_code:
sub_lang_list.append(lang_code)
return make_captions(base_url, sub_lang_list)
# Some videos don't provide ttsurl but rather caption_tracks and
# caption_translation_languages (e.g. 20LmZk1hakA)
# Does not used anymore as of 22.06.2017
caption_tracks = args['caption_tracks']
caption_translation_languages = args['caption_translation_languages']
caption_url = compat_parse_qs(caption_tracks.split(',')[0])['u'][0]
sub_lang_list = []
for lang in caption_translation_languages.split(','):
lang_qs = compat_parse_qs(compat_urllib_parse_unquote_plus(lang))
sub_lang = lang_qs.get('lc', [None])[0]
if sub_lang:
sub_lang_list.append(sub_lang)
return make_captions(caption_url, sub_lang_list)
# An extractor error can be raise by the download process if there are # An extractor error can be raise by the download process if there are
# no automatic captions but there are subtitles # no automatic captions but there are subtitles
except (KeyError, ExtractorError): except (KeyError, IndexError, ExtractorError):
self._downloader.report_warning(err_msg) self._downloader.report_warning(err_msg)
return {} return {}

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2017.06.18' __version__ = '2017.06.23'