release 2016.08.07

[ChangeLog] Actualize
[discoverygo] Add extractor (Closes #10245 )
2025-07-30 09:04:23 -05:00 · 2016-08-07 21:12:41 +07:00 · 2016-08-07 21:10:48 +07:00 · 2016-08-07 20:57:05 +07:00 · 2016-08-07 20:45:18 +07:00 · 2016-08-07 19:06:55 +07:00
17 changed files with 580 additions and 185 deletions
--- a/.github/ISSUE_TEMPLATE.md
+++ b/.github/ISSUE_TEMPLATE.md
@@ -6,8 +6,8 @@

 ---

-### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.08.06*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.08.06**
+### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.08.07*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
+- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.08.07**

 ### Before submitting an *issue* make sure you have:
 - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
 [debug] User config: []
 [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
-[debug] youtube-dl version 2016.08.06
+[debug] youtube-dl version 2016.08.07
 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
 [debug] Proxy map: {}
--- a/26
+++ b/26
@@ -1,3 +1,29 @@
+version 2016.08.07
+
+Core
+ Add support for TV Parental Guidelines ratings in parse_age_limit
+ Add decode_png (#9706)
+ Add support for partOfTVSeries in JSON-LD
+* Lower master M3U8 manifest preference for better format sorting
+
+Extractors
+ [discoverygo] Add extractor (#10245)
+* [flipagram] Make JSON-LD extraction non fatal
+* [generic] Make JSON-LD extraction non fatal
+ [bbc] Add support for morph embeds (#10239)
+* [tnaflixnetworkbase] Improve title extraction
+* [tnaflix] Fix metadata extraction (#10249)
+* [fox] Fix theplatform release URL query
+* [openload] Fix extraction (#9706)
+* [bbc] Skip duplicate manifest URLs
+* [bbc] Improve format code
+ [bbc] Add support for DASH and F4M
+* [bbc] Improve format sorting and listing
+* [bbc] Improve playlist extraction
+ [pokemon] Add extractor (#10093)
+ [condenast] Add fallback scenario for video info extraction
+
+
 version 2016.08.06

 Core
--- a/docs/supportedsites.md
+++ b/docs/supportedsites.md
@@ -182,6 +182,7 @@
 - **DigitallySpeaking**
 - **Digiteka**
 - **Discovery**
+ - **DiscoveryGo**
 - **Dotsub**
 - **DouyuTV**: 斗鱼
 - **DPlay**
@@ -518,6 +519,7 @@
 - **plus.google**: Google Plus
 - **pluzz.francetv.fr**
 - **podomatic**
+ - **Pokemon**
 - **PolskieRadio**
 - **PornHd**
 - **PornHub**: PornHub and Thumbzilla
--- a/test/test_utils.py
+++ b/test/test_utils.py
@@ -42,6 +42,7 @@ from youtube_dl.utils import (
    ohdave_rsa_encrypt,
    OnDemandPagedList,
    orderedSet,
+    parse_age_limit,
    parse_duration,
    parse_filesize,
    parse_count,
@@ -432,6 +433,20 @@ class TestUtil(unittest.TestCase):
            url_basename('http://media.w3.org/2010/05/sintel/trailer.mp4'),
            'trailer.mp4')

+    def test_parse_age_limit(self):
+        self.assertEqual(parse_age_limit(None), None)
+        self.assertEqual(parse_age_limit(False), None)
+        self.assertEqual(parse_age_limit('invalid'), None)
+        self.assertEqual(parse_age_limit(0), 0)
+        self.assertEqual(parse_age_limit(18), 18)
+        self.assertEqual(parse_age_limit(21), 21)
+        self.assertEqual(parse_age_limit(22), None)
+        self.assertEqual(parse_age_limit('18'), 18)
+        self.assertEqual(parse_age_limit('18+'), 18)
+        self.assertEqual(parse_age_limit('PG-13'), 13)
+        self.assertEqual(parse_age_limit('TV-14'), 14)
+        self.assertEqual(parse_age_limit('TV-MA'), 17)
+
    def test_parse_duration(self):
        self.assertEqual(parse_duration(None), None)
        self.assertEqual(parse_duration(False), None)
--- a/youtube_dl/extractor/bbc.py
+++ b/youtube_dl/extractor/bbc.py
@@ -5,11 +5,13 @@ import re

 from .common import InfoExtractor
 from ..utils import (
+    dict_get,
    ExtractorError,
    float_or_none,
    int_or_none,
    parse_duration,
    parse_iso8601,
+    try_get,
    unescapeHTML,
 )
 from ..compat import (
@@ -229,51 +231,6 @@ class BBCCoUkIE(InfoExtractor):
        asx = self._download_xml(connection.get('href'), programme_id, 'Downloading ASX playlist')
        return [ref.get('href') for ref in asx.findall('./Entry/ref')]

-    def _extract_connection(self, connection, programme_id):
-        formats = []
-        kind = connection.get('kind')
-        protocol = connection.get('protocol')
-        supplier = connection.get('supplier')
-        if protocol == 'http':
-            href = connection.get('href')
-            transfer_format = connection.get('transferFormat')
-            # ASX playlist
-            if supplier == 'asx':
-                for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
-                    formats.append({
-                        'url': ref,
-                        'format_id': 'ref%s_%s' % (i, supplier),
-                    })
-            # Skip DASH until supported
-            elif transfer_format == 'dash':
-                pass
-            elif transfer_format == 'hls':
-                formats.extend(self._extract_m3u8_formats(
-                    href, programme_id, ext='mp4', entry_protocol='m3u8_native',
-                    m3u8_id=supplier, fatal=False))
-            # Direct link
-            else:
-                formats.append({
-                    'url': href,
-                    'format_id': supplier or kind or protocol,
-                })
-        elif protocol == 'rtmp':
-            application = connection.get('application', 'ondemand')
-            auth_string = connection.get('authString')
-            identifier = connection.get('identifier')
-            server = connection.get('server')
-            formats.append({
-                'url': '%s://%s/%s?%s' % (protocol, server, application, auth_string),
-                'play_path': identifier,
-                'app': '%s?%s' % (application, auth_string),
-                'page_url': 'http://www.bbc.co.uk',
-                'player_url': 'http://www.bbc.co.uk/emp/releases/iplayer/revisions/617463_618125_4/617463_618125_4_emp.swf',
-                'rtmp_live': False,
-                'ext': 'flv',
-                'format_id': supplier,
-            })
-        return formats
-
    def _extract_items(self, playlist):
        return playlist.findall('./{%s}item' % self._EMP_PLAYLIST_NS)

@@ -294,46 +251,6 @@ class BBCCoUkIE(InfoExtractor):
    def _extract_connections(self, media):
        return self._findall_ns(media, './{%s}connection')

-    def _extract_video(self, media, programme_id):
-        formats = []
-        vbr = int_or_none(media.get('bitrate'))
-        vcodec = media.get('encoding')
-        service = media.get('service')
-        width = int_or_none(media.get('width'))
-        height = int_or_none(media.get('height'))
-        file_size = int_or_none(media.get('media_file_size'))
-        for connection in self._extract_connections(media):
-            conn_formats = self._extract_connection(connection, programme_id)
-            for format in conn_formats:
-                format.update({
-                    'width': width,
-                    'height': height,
-                    'vbr': vbr,
-                    'vcodec': vcodec,
-                    'filesize': file_size,
-                })
-                if service:
-                    format['format_id'] = '%s_%s' % (service, format['format_id'])
-            formats.extend(conn_formats)
-        return formats
-
-    def _extract_audio(self, media, programme_id):
-        formats = []
-        abr = int_or_none(media.get('bitrate'))
-        acodec = media.get('encoding')
-        service = media.get('service')
-        for connection in self._extract_connections(media):
-            conn_formats = self._extract_connection(connection, programme_id)
-            for format in conn_formats:
-                format.update({
-                    'format_id': '%s_%s' % (service, format['format_id']),
-                    'abr': abr,
-                    'acodec': acodec,
-                    'vcodec': 'none',
-                })
-            formats.extend(conn_formats)
-        return formats
-
    def _get_subtitles(self, media, programme_id):
        subtitles = {}
        for connection in self._extract_connections(media):
@@ -379,13 +296,87 @@ class BBCCoUkIE(InfoExtractor):
    def _process_media_selector(self, media_selection, programme_id):
        formats = []
        subtitles = None
+        urls = []

        for media in self._extract_medias(media_selection):
            kind = media.get('kind')
-            if kind == 'audio':
-                formats.extend(self._extract_audio(media, programme_id))
-            elif kind == 'video':
-                formats.extend(self._extract_video(media, programme_id))
+            if kind in ('video', 'audio'):
+                bitrate = int_or_none(media.get('bitrate'))
+                encoding = media.get('encoding')
+                service = media.get('service')
+                width = int_or_none(media.get('width'))
+                height = int_or_none(media.get('height'))
+                file_size = int_or_none(media.get('media_file_size'))
+                for connection in self._extract_connections(media):
+                    href = connection.get('href')
+                    if href in urls:
+                        continue
+                    if href:
+                        urls.append(href)
+                    conn_kind = connection.get('kind')
+                    protocol = connection.get('protocol')
+                    supplier = connection.get('supplier')
+                    transfer_format = connection.get('transferFormat')
+                    format_id = supplier or conn_kind or protocol
+                    if service:
+                        format_id = '%s_%s' % (service, format_id)
+                    # ASX playlist
+                    if supplier == 'asx':
+                        for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
+                            formats.append({
+                                'url': ref,
+                                'format_id': 'ref%s_%s' % (i, format_id),
+                            })
+                    elif transfer_format == 'dash':
+                        formats.extend(self._extract_mpd_formats(
+                            href, programme_id, mpd_id=format_id, fatal=False))
+                    elif transfer_format == 'hls':
+                        formats.extend(self._extract_m3u8_formats(
+                            href, programme_id, ext='mp4', entry_protocol='m3u8_native',
+                            m3u8_id=format_id, fatal=False))
+                    elif transfer_format == 'hds':
+                        formats.extend(self._extract_f4m_formats(
+                            href, programme_id, f4m_id=format_id, fatal=False))
+                    else:
+                        if not service and not supplier and bitrate:
+                            format_id += '-%d' % bitrate
+                        fmt = {
+                            'format_id': format_id,
+                            'filesize': file_size,
+                        }
+                        if kind == 'video':
+                            fmt.update({
+                                'width': width,
+                                'height': height,
+                                'vbr': bitrate,
+                                'vcodec': encoding,
+                            })
+                        else:
+                            fmt.update({
+                                'abr': bitrate,
+                                'acodec': encoding,
+                                'vcodec': 'none',
+                            })
+                        if protocol == 'http':
+                            # Direct link
+                            fmt.update({
+                                'url': href,
+                            })
+                        elif protocol == 'rtmp':
+                            application = connection.get('application', 'ondemand')
+                            auth_string = connection.get('authString')
+                            identifier = connection.get('identifier')
+                            server = connection.get('server')
+                            fmt.update({
+                                'url': '%s://%s/%s?%s' % (protocol, server, application, auth_string),
+                                'play_path': identifier,
+                                'app': '%s?%s' % (application, auth_string),
+                                'page_url': 'http://www.bbc.co.uk',
+                                'player_url': 'http://www.bbc.co.uk/emp/releases/iplayer/revisions/617463_618125_4/617463_618125_4_emp.swf',
+                                'rtmp_live': False,
+                                'ext': 'flv',
+                            })
+                        formats.append(fmt)
            elif kind == 'captions':
                subtitles = self.extract_subtitles(media, programme_id)
        return formats, subtitles
@@ -589,7 +580,7 @@ class BBCIE(BBCCoUkIE):
        'info_dict': {
            'id': '150615_telabyad_kentin_cogu',
            'ext': 'mp4',
-            'title': "Tel Abyad'da IŞİD bayrağı indirildi YPG bayrağı çekildi",
+            'title': "YPG: Tel Abyad'ın tamamı kontrolümüzde",
            'description': 'md5:33a4805a855c9baf7115fcbde57e7025',
            'timestamp': 1434397334,
            'upload_date': '20150615',
@@ -654,6 +645,23 @@ class BBCIE(BBCCoUkIE):
            # rtmp download
            'skip_download': True,
        }
+    }, {
+        # single video embedded with Morph
+        'url': 'http://www.bbc.co.uk/sport/live/olympics/36895975',
+        'info_dict': {
+            'id': 'p041vhd0',
+            'ext': 'mp4',
+            'title': "Nigeria v Japan - Men's First Round",
+            'description': 'Live coverage of the first round from Group B at the Amazonia Arena.',
+            'duration': 7980,
+            'uploader': 'BBC Sport',
+            'uploader_id': 'bbc_sport',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+        'skip': 'Georestricted to UK',
    }, {
        # single video with playlist.sxml URL in playlist param
        'url': 'http://www.bbc.com/sport/0/football/33653409',
@@ -820,13 +828,19 @@ class BBCIE(BBCCoUkIE):
                        # http://www.bbc.com/turkce/multimedya/2015/10/151010_vid_ankara_patlama_ani)
                        playlist = data_playable.get('otherSettings', {}).get('playlist', {})
                        if playlist:
-                            for key in ('progressiveDownload', 'streaming'):
+                            entry = None
+                            for key in ('streaming', 'progressiveDownload'):
                                playlist_url = playlist.get('%sUrl' % key)
                                if not playlist_url:
                                    continue
                                try:
-                                    entries.append(self._extract_from_playlist_sxml(
-                                        playlist_url, playlist_id, timestamp))
+                                    info = self._extract_from_playlist_sxml(
+                                        playlist_url, playlist_id, timestamp)
+                                    if not entry:
+                                        entry = info
+                                    else:
+                                        entry['title'] = info['title']
+                                        entry['formats'].extend(info['formats'])
                                except Exception as e:
                                    # Some playlist URL may fail with 500, at the same time
                                    # the other one may work fine (e.g.
@@ -834,6 +848,9 @@ class BBCIE(BBCCoUkIE):
                                    if isinstance(e.cause, compat_HTTPError) and e.cause.code == 500:
                                        continue
                                    raise
+                            if entry:
+                                self._sort_formats(entry['formats'])
+                                entries.append(entry)

        if entries:
            return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)
@@ -866,6 +883,50 @@ class BBCIE(BBCCoUkIE):
                'subtitles': subtitles,
            }

+        # Morph based embed (e.g. http://www.bbc.co.uk/sport/live/olympics/36895975)
+        # There are several setPayload calls may be present but the video
+        # seems to be always related to the first one
+        morph_payload = self._parse_json(
+            self._search_regex(
+                r'Morph\.setPayload\([^,]+,\s*({.+?})\);',
+                webpage, 'morph payload', default='{}'),
+            playlist_id, fatal=False)
+        if morph_payload:
+            components = try_get(morph_payload, lambda x: x['body']['components'], list) or []
+            for component in components:
+                if not isinstance(component, dict):
+                    continue
+                lead_media = try_get(component, lambda x: x['props']['leadMedia'], dict)
+                if not lead_media:
+                    continue
+                identifiers = lead_media.get('identifiers')
+                if not identifiers or not isinstance(identifiers, dict):
+                    continue
+                programme_id = identifiers.get('vpid') or identifiers.get('playablePid')
+                if not programme_id:
+                    continue
+                title = lead_media.get('title') or self._og_search_title(webpage)
+                formats, subtitles = self._download_media_selector(programme_id)
+                self._sort_formats(formats)
+                description = lead_media.get('summary')
+                uploader = lead_media.get('masterBrand')
+                uploader_id = lead_media.get('mid')
+                duration = None
+                duration_d = lead_media.get('duration')
+                if isinstance(duration_d, dict):
+                    duration = parse_duration(dict_get(
+                        duration_d, ('rawDuration', 'formattedDuration', 'spokenDuration')))
+                return {
+                    'id': programme_id,
+                    'title': title,
+                    'description': description,
+                    'duration': duration,
+                    'uploader': uploader,
+                    'uploader_id': uploader_id,
+                    'formats': formats,
+                    'subtitles': subtitles,
+                }
+
        def extract_all(pattern):
            return list(filter(None, map(
                lambda s: self._parse_json(s, playlist_id, fatal=False),
@@ -883,7 +944,7 @@ class BBCIE(BBCCoUkIE):
            r'setPlaylist\("(%s)"\)' % EMBED_URL, webpage))
        if entries:
            return self.playlist_result(
-                [self.url_result(entry, 'BBCCoUk') for entry in entries],
+                [self.url_result(entry_, 'BBCCoUk') for entry_ in entries],
                playlist_id, playlist_title, playlist_description)

        # Multiple video article (e.g. http://www.bbc.com/news/world-europe-32668511)
--- a/youtube_dl/extractor/common.py
+++ b/youtube_dl/extractor/common.py
@@ -846,7 +846,7 @@ class InfoExtractor(object):
                    part_of_season = e.get('partOfSeason')
                    if isinstance(part_of_season, dict) and part_of_season.get('@type') == 'TVSeason':
                        info['season_number'] = int_or_none(part_of_season.get('seasonNumber'))
-                    part_of_series = e.get('partOfSeries')
+                    part_of_series = e.get('partOfSeries') or e.get('partOfTVSeries')
                    if isinstance(part_of_series, dict) and part_of_series.get('@type') == 'TVSeries':
                        info['series'] = unescapeHTML(part_of_series.get('name'))
                elif item_type == 'Article':
@@ -1140,7 +1140,7 @@ class InfoExtractor(object):
            'url': m3u8_url,
            'ext': ext,
            'protocol': 'm3u8',
-            'preference': preference - 1 if preference else -1,
+            'preference': preference - 100 if preference else -100,
            'resolution': 'multiple',
            'format_note': 'Quality selection URL',
        }
--- a/youtube_dl/extractor/condenast.py
+++ b/youtube_dl/extractor/condenast.py
@@ -113,11 +113,19 @@ class CondeNastIE(InfoExtractor):
                'target': params['id'],
            })
        video_id = query['videoId']
+        video_info = None
        info_page = self._download_webpage(
            'http://player.cnevids.com/player/video.js',
-            video_id, 'Downloading video info', query=query)
-        video_info = self._parse_json(self._search_regex(
-            r'loadCallback\(({.+})\)', info_page, 'video info'), video_id)['video']
+            video_id, 'Downloading video info', query=query, fatal=False)
+        if info_page:
+            video_info = self._parse_json(self._search_regex(
+                r'loadCallback\(({.+})\)', info_page, 'video info'), video_id)['video']
+        else:
+            info_page = self._download_webpage(
+                'http://player.cnevids.com/player/loader.js',
+                video_id, 'Downloading loader info', query=query)
+            video_info = self._parse_json(self._search_regex(
+                r'var\s+video\s*=\s*({.+?});', info_page, 'video info'), video_id)
        title = video_info['title']

        formats = []
--- a/youtube_dl/extractor/discoverygo.py
+++ b/youtube_dl/extractor/discoverygo.py
@@ -0,0 +1,98 @@
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    extract_attributes,
+    int_or_none,
+    parse_age_limit,
+    unescapeHTML,
+)
+
+
+class DiscoveryGoIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?discoverygo\.com/(?:[^/]+/)*(?P<id>[^/?#&]+)'
+    _TEST = {
+        'url': 'https://www.discoverygo.com/love-at-first-kiss/kiss-first-ask-questions-later/',
+        'info_dict': {
+            'id': '57a33c536b66d1cd0345eeb1',
+            'ext': 'mp4',
+            'title': 'Kiss First, Ask Questions Later!',
+            'description': 'md5:fe923ba34050eae468bffae10831cb22',
+            'duration': 2579,
+            'series': 'Love at First Kiss',
+            'season_number': 1,
+            'episode_number': 1,
+            'age_limit': 14,
+        },
+    }
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, display_id)
+
+        container = extract_attributes(
+            self._search_regex(
+                r'(<div[^>]+class=["\']video-player-container[^>]+>)',
+                webpage, 'video container'))
+
+        video = self._parse_json(
+            unescapeHTML(container.get('data-video') or container.get('data-json')),
+            display_id)
+
+        title = video['name']
+
+        stream = video['stream']
+        STREAM_URL_SUFFIX = 'streamUrl'
+        formats = []
+        for stream_kind in ('', 'hds'):
+            suffix = STREAM_URL_SUFFIX.capitalize() if stream_kind else STREAM_URL_SUFFIX
+            stream_url = stream.get('%s%s' % (stream_kind, suffix))
+            if not stream_url:
+                continue
+            if stream_kind == '':
+                formats.extend(self._extract_m3u8_formats(
+                    stream_url, display_id, 'mp4', entry_protocol='m3u8_native',
+                    m3u8_id='hls', fatal=False))
+            elif stream_kind == 'hds':
+                formats.extend(self._extract_f4m_formats(
+                    stream_url, display_id, f4m_id=stream_kind, fatal=False))
+        self._sort_formats(formats)
+
+        video_id = video.get('id') or display_id
+        description = video.get('description', {}).get('detailed')
+        duration = int_or_none(video.get('duration'))
+
+        series = video.get('show', {}).get('name')
+        season_number = int_or_none(video.get('season', {}).get('number'))
+        episode_number = int_or_none(video.get('episodeNumber'))
+
+        tags = video.get('tags')
+        age_limit = parse_age_limit(video.get('parental', {}).get('rating'))
+
+        subtitles = {}
+        captions = stream.get('captions')
+        if isinstance(captions, list):
+            for caption in captions:
+                subtitle_url = caption.get('fileUrl')
+                if (not subtitle_url or not isinstance(subtitle_url, compat_str) or
+                        not subtitle_url.startswith('http')):
+                    continue
+                lang = caption.get('fileLang', 'en')
+                subtitles.setdefault(lang, []).append({'url': subtitle_url})
+
+        return {
+            'id': video_id,
+            'display_id': display_id,
+            'title': title,
+            'description': description,
+            'duration': duration,
+            'series': series,
+            'season_number': season_number,
+            'episode_number': episode_number,
+            'tags': tags,
+            'age_limit': age_limit,
+            'formats': formats,
+            'subtitles': subtitles,
+        }
--- a/youtube_dl/extractor/extractors.py
+++ b/youtube_dl/extractor/extractors.py
@@ -221,6 +221,7 @@ from .dvtv import DVTVIE
 from .dumpert import DumpertIE
 from .defense import DefenseGouvFrIE
 from .discovery import DiscoveryIE
+from .discoverygo import DiscoveryGoIE
 from .dispeak import DigitallySpeakingIE
 from .dropbox import DropboxIE
 from .dw import (
@@ -636,6 +637,7 @@ from .pluralsight import (
    PluralsightCourseIE,
 )
 from .podomatic import PodomaticIE
+from .pokemon import PokemonIE
 from .polskieradio import PolskieRadioIE
 from .porn91 import Porn91IE
 from .pornhd import PornHdIE
--- a/youtube_dl/extractor/flipagram.py
+++ b/youtube_dl/extractor/flipagram.py
@@ -48,7 +48,7 @@ class FlipagramIE(InfoExtractor):
        flipagram = video_data['flipagram']
        video = flipagram['video']

-        json_ld = self._search_json_ld(webpage, video_id, default=False)
+        json_ld = self._search_json_ld(webpage, video_id, fatal=False)
        title = json_ld.get('title') or flipagram['captionText']
        description = json_ld.get('description') or flipagram.get('captionText')

--- a/youtube_dl/extractor/fox.py
+++ b/youtube_dl/extractor/fox.py
@@ -2,7 +2,10 @@
 from __future__ import unicode_literals

 from .common import InfoExtractor
-from ..utils import smuggle_url
+from ..utils import (
+    smuggle_url,
+    update_url_query,
+)


 class FOXIE(InfoExtractor):
@@ -29,11 +32,12 @@ class FOXIE(InfoExtractor):

        release_url = self._parse_json(self._search_regex(
            r'"fox_pdk_player"\s*:\s*({[^}]+?})', webpage, 'fox_pdk_player'),
-            video_id)['release_url'] + '&switch=http'
+            video_id)['release_url']

        return {
            '_type': 'url_transparent',
            'ie_key': 'ThePlatform',
-            'url': smuggle_url(release_url, {'force_smil_url': True}),
+            'url': smuggle_url(update_url_query(
+                release_url, {'switch': 'http'}), {'force_smil_url': True}),
            'id': video_id,
        }
--- a/youtube_dl/extractor/generic.py
+++ b/youtube_dl/extractor/generic.py
@@ -2241,7 +2241,7 @@ class GenericIE(InfoExtractor):

        # Looking for http://schema.org/VideoObject
        json_ld = self._search_json_ld(
-            webpage, video_id, default=None, expected_type='VideoObject')
+            webpage, video_id, fatal=False, expected_type='VideoObject')
        if json_ld and json_ld.get('url'):
            info_dict.update({
                'title': video_title or info_dict['title'],
--- a/youtube_dl/extractor/openload.py
+++ b/youtube_dl/extractor/openload.py
@@ -1,15 +1,14 @@
 # coding: utf-8
-from __future__ import unicode_literals
+from __future__ import unicode_literals, division

-import re
+import math

 from .common import InfoExtractor
 from ..compat import compat_chr
 from ..utils import (
+    decode_png,
    determine_ext,
-    encode_base_n,
    ExtractorError,
-    mimetype2ext,
 )


@@ -41,60 +40,6 @@ class OpenloadIE(InfoExtractor):
        'only_matching': True,
    }]

-    @staticmethod
-    def openload_level2_debase(m):
-        radix, num = int(m.group(1)) + 27, int(m.group(2))
-        return '"' + encode_base_n(num, radix) + '"'
-
-    @classmethod
-    def openload_level2(cls, txt):
-        # The function name is ǃ \u01c3
-        # Using escaped unicode literals does not work in Python 3.2
-        return re.sub(r'ǃ\((\d+),(\d+)\)', cls.openload_level2_debase, txt, re.UNICODE).replace('"+"', '')
-
-    # Openload uses a variant of aadecode
-    # openload_decode and related functions are originally written by
-    # vitas@matfyz.cz and released with public domain
-    # See https://github.com/rg3/youtube-dl/issues/8489
-    @classmethod
-    def openload_decode(cls, txt):
-        symbol_table = [
-            ('_', '(ﾟДﾟ) [ﾟΘﾟ]'),
-            ('a', '(ﾟДﾟ) [ﾟωﾟﾉ]'),
-            ('b', '(ﾟДﾟ) [ﾟΘﾟﾉ]'),
-            ('c', '(ﾟДﾟ) [\'c\']'),
-            ('d', '(ﾟДﾟ) [ﾟｰﾟﾉ]'),
-            ('e', '(ﾟДﾟ) [ﾟДﾟﾉ]'),
-            ('f', '(ﾟДﾟ) [1]'),
-
-            ('o', '(ﾟДﾟ) [\'o\']'),
-            ('u', '(oﾟｰﾟo)'),
-            ('c', '(ﾟДﾟ) [\'c\']'),
-
-            ('7', '((ﾟｰﾟ) + (o^_^o))'),
-            ('6', '((o^_^o) +(o^_^o) +(c^_^o))'),
-            ('5', '((ﾟｰﾟ) + (ﾟΘﾟ))'),
-            ('4', '(-~3)'),
-            ('3', '(-~-~1)'),
-            ('2', '(-~1)'),
-            ('1', '(-~0)'),
-            ('0', '((c^_^o)-(c^_^o))'),
-        ]
-        delim = '(ﾟДﾟ)[ﾟεﾟ]+'
-        ret = ''
-        for aachar in txt.split(delim):
-            for val, pat in symbol_table:
-                aachar = aachar.replace(pat, val)
-            aachar = aachar.replace('+ ', '')
-            m = re.match(r'^\d+', aachar)
-            if m:
-                ret += compat_chr(int(m.group(0), 8))
-            else:
-                m = re.match(r'^u([\da-f]+)', aachar)
-                if m:
-                    ret += compat_chr(int(m.group(1), 16))
-        return cls.openload_level2(ret)
-
    def _real_extract(self, url):
        video_id = self._match_id(url)
        webpage = self._download_webpage(url, video_id)
@@ -102,29 +47,77 @@ class OpenloadIE(InfoExtractor):
        if 'File not found' in webpage:
            raise ExtractorError('File not found', expected=True)

-        code = self._search_regex(
-            r'</video>\s*</div>\s*<script[^>]+>[^>]+</script>\s*<script[^>]+>([^<]+)</script>',
-            webpage, 'JS code')
+        # The following extraction logic is proposed by @Belderak and @gdkchan
+        # and declared to be used freely in youtube-dl
+        # See https://github.com/rg3/youtube-dl/issues/9706

-        decoded = self.openload_decode(code)
+        numbers_js = self._download_webpage(
+            'https://openload.co/assets/js/obfuscator/n.js', video_id,
+            note='Downloading signature numbers')
+        signums = self._search_regex(
+            r'window\.signatureNumbers\s*=\s*[\'"](?P<data>[a-z]+)[\'"]',
+            numbers_js, 'signature numbers', group='data')

-        video_url = self._search_regex(
-            r'return\s+"(https?://[^"]+)"', decoded, 'video URL')
+        linkimg_uri = self._search_regex(
+            r'<img[^>]+id="linkimg"[^>]+src="([^"]+)"', webpage, 'link image')
+        linkimg = self._request_webpage(
+            linkimg_uri, video_id, note=False).read()
+
+        width, height, pixels = decode_png(linkimg)
+
+        output = ''
+        for y in range(height):
+            for x in range(width):
+                r, g, b = pixels[y][3 * x:3 * x + 3]
+                if r == 0 and g == 0 and b == 0:
+                    break
+                else:
+                    output += compat_chr(r)
+                    output += compat_chr(g)
+                    output += compat_chr(b)
+
+        img_str_length = len(output) // 200
+        img_str = [[0 for x in range(img_str_length)] for y in range(10)]
+
+        sig_str_length = len(signums) // 260
+        sig_str = [[0 for x in range(sig_str_length)] for y in range(10)]
+
+        for i in range(10):
+            for j in range(img_str_length):
+                begin = i * img_str_length * 20 + j * 20
+                img_str[i][j] = output[begin:begin + 20]
+            for j in range(sig_str_length):
+                begin = i * sig_str_length * 26 + j * 26
+                sig_str[i][j] = signums[begin:begin + 26]
+
+        parts = []
+        # TODO: find better names for str_, chr_ and sum_
+        str_ = ''
+        for i in [2, 3, 5, 7]:
+            str_ = ''
+            sum_ = float(99)
+            for j in range(len(sig_str[i])):
+                for chr_idx in range(len(img_str[i][j])):
+                    if sum_ > float(122):
+                        sum_ = float(98)
+                    chr_ = compat_chr(int(math.floor(sum_)))
+                    if sig_str[i][j][chr_idx] == chr_ and j >= len(str_):
+                        sum_ += float(2.5)
+                        str_ += img_str[i][j][chr_idx]
+            parts.append(str_.replace(',', ''))
+
+        video_url = 'https://openload.co/stream/%s~%s~%s~%s' % (parts[3], parts[1], parts[2], parts[0])

        title = self._og_search_title(webpage, default=None) or self._search_regex(
            r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage,
            'title', default=None) or self._html_search_meta(
            'description', webpage, 'title', fatal=True)

-        ext = mimetype2ext(self._search_regex(
-            r'window\.vt\s*=\s*(["\'])(?P<mimetype>.+?)\1', decoded,
-            'mimetype', default=None, group='mimetype')) or determine_ext(
-            video_url, 'mp4')
-
        return {
            'id': video_id,
            'title': title,
-            'ext': ext,
            'thumbnail': self._og_search_thumbnail(webpage, default=None),
            'url': video_url,
+            # Seems all videos have extensions in their titles
+            'ext': determine_ext(title),
        }
--- a/youtube_dl/extractor/pokemon.py
+++ b/youtube_dl/extractor/pokemon.py
@@ -0,0 +1,58 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    extract_attributes,
+    int_or_none,
+)
+
+
+class PokemonIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?pokemon\.com/[a-z]{2}(?:.*?play=(?P<id>[a-z0-9]{32})|/[^/]+/\d+_\d+-(?P<display_id>[^/?#]+))'
+    _TESTS = [{
+        'url': 'http://www.pokemon.com/us/pokemon-episodes/19_01-from-a-to-z/?play=true',
+        'md5': '9fb209ae3a569aac25de0f5afc4ee08f',
+        'info_dict': {
+            'id': 'd0436c00c3ce4071ac6cee8130ac54a1',
+            'ext': 'mp4',
+            'title': 'From A to Z!',
+            'description': 'Bonnie makes a new friend, Ash runs into an old friend, and a terrifying premonition begins to unfold!',
+            'timestamp': 1460478136,
+            'upload_date': '20160412',
+        },
+        'add_id': ['LimelightMedia']
+    }, {
+        'url': 'http://www.pokemon.com/uk/pokemon-episodes/?play=2e8b5c761f1d4a9286165d7748c1ece2',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.pokemon.com/fr/episodes-pokemon/18_09-un-hiver-inattendu/',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.pokemon.com/de/pokemon-folgen/01_20-bye-bye-smettbo/',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id, display_id = re.match(self._VALID_URL, url).groups()
+        webpage = self._download_webpage(url, video_id or display_id)
+        video_data = extract_attributes(self._search_regex(
+            r'(<[^>]+data-video-id="%s"[^>]*>)' % (video_id if video_id else '[a-z0-9]{32}'),
+            webpage, 'video data element'))
+        video_id = video_data['data-video-id']
+        title = video_data['data-video-title']
+        return {
+            '_type': 'url_transparent',
+            'id': video_id,
+            'url': 'limelight:media:%s' % video_id,
+            'title': title,
+            'description': video_data.get('data-video-summary'),
+            'thumbnail': video_data.get('data-video-poster'),
+            'series': 'Pokémon',
+            'season_number': int_or_none(video_data.get('data-video-season')),
+            'episode': title,
+            'episode_number': int_or_none(video_data.get('data-video-episode')),
+            'ie_key': 'LimelightMedia',
+        }
--- a/youtube_dl/extractor/tnaflix.py
+++ b/youtube_dl/extractor/tnaflix.py
@@ -118,8 +118,12 @@ class TNAFlixNetworkBaseIE(InfoExtractor):
            xpath_text(cfg_xml, './startThumb', 'thumbnail'), 'http:')
        thumbnails = self._extract_thumbnails(cfg_xml)

-        title = self._html_search_regex(
-            self._TITLE_REGEX, webpage, 'title') if self._TITLE_REGEX else self._og_search_title(webpage)
+        title = None
+        if self._TITLE_REGEX:
+            title = self._html_search_regex(
+                self._TITLE_REGEX, webpage, 'title', default=None)
+        if not title:
+            title = self._og_search_title(webpage)

        age_limit = self._rta_search(webpage) or 18

@@ -189,9 +193,9 @@ class TNAFlixNetworkEmbedIE(TNAFlixNetworkBaseIE):
 class TNAFlixIE(TNAFlixNetworkBaseIE):
    _VALID_URL = r'https?://(?:www\.)?tnaflix\.com/[^/]+/(?P<display_id>[^/]+)/video(?P<id>\d+)'

-    _TITLE_REGEX = r'<title>(.+?) - TNAFlix Porn Videos</title>'
-    _DESCRIPTION_REGEX = r'<meta[^>]+name="description"[^>]+content="([^"]+)"'
-    _UPLOADER_REGEX = r'<i>\s*Verified Member\s*</i>\s*<h1>(.+?)</h1>'
+    _TITLE_REGEX = r'<title>(.+?) - (?:TNAFlix Porn Videos|TNAFlix\.com)</title>'
+    _DESCRIPTION_REGEX = r'(?s)>Description:</[^>]+>(.+?)<'
+    _UPLOADER_REGEX = r'<i>\s*Verified Member\s*</i>\s*<h\d+>(.+?)<'
    _CATEGORIES_REGEX = r'(?s)<span[^>]*>Categories:</span>(.+?)</div>'

    _TESTS = [{
--- a/youtube_dl/utils.py
+++ b/youtube_dl/utils.py
@@ -47,6 +47,7 @@ from .compat import (
    compat_socket_create_connection,
    compat_str,
    compat_struct_pack,
+    compat_struct_unpack,
    compat_urllib_error,
    compat_urllib_parse,
    compat_urllib_parse_urlencode,
@@ -1983,11 +1984,27 @@ US_RATINGS = {
 }


+TV_PARENTAL_GUIDELINES = {
+    'TV-Y': 0,
+    'TV-Y7': 7,
+    'TV-G': 0,
+    'TV-PG': 0,
+    'TV-14': 14,
+    'TV-MA': 17,
+}
+
+
 def parse_age_limit(s):
-    if s is None:
+    if type(s) == int:
+        return s if 0 <= s <= 21 else None
+    if not isinstance(s, compat_basestring):
        return None
    m = re.match(r'^(?P<age>\d{1,2})\+?$', s)
-    return int(m.group('age')) if m else US_RATINGS.get(s)
+    if m:
+        return int(m.group('age'))
+    if s in US_RATINGS:
+        return US_RATINGS[s]
+    return TV_PARENTAL_GUIDELINES.get(s)


 def strip_jsonp(code):
@@ -2969,3 +2986,110 @@ def parse_m3u8_attributes(attrib):

 def urshift(val, n):
    return val >> n if val >= 0 else (val + 0x100000000) >> n
+
+
+# Based on png2str() written by @gdkchan and improved by @yokrysty
+# Originally posted at https://github.com/rg3/youtube-dl/issues/9706
+def decode_png(png_data):
+    # Reference: https://www.w3.org/TR/PNG/
+    header = png_data[8:]
+
+    if png_data[:8] != b'\x89PNG\x0d\x0a\x1a\x0a' or header[4:8] != b'IHDR':
+        raise IOError('Not a valid PNG file.')
+
+    int_map = {1: '>B', 2: '>H', 4: '>I'}
+    unpack_integer = lambda x: compat_struct_unpack(int_map[len(x)], x)[0]
+
+    chunks = []
+
+    while header:
+        length = unpack_integer(header[:4])
+        header = header[4:]
+
+        chunk_type = header[:4]
+        header = header[4:]
+
+        chunk_data = header[:length]
+        header = header[length:]
+
+        header = header[4:]  # Skip CRC
+
+        chunks.append({
+            'type': chunk_type,
+            'length': length,
+            'data': chunk_data
+        })
+
+    ihdr = chunks[0]['data']
+
+    width = unpack_integer(ihdr[:4])
+    height = unpack_integer(ihdr[4:8])
+
+    idat = b''
+
+    for chunk in chunks:
+        if chunk['type'] == b'IDAT':
+            idat += chunk['data']
+
+    if not idat:
+        raise IOError('Unable to read PNG data.')
+
+    decompressed_data = bytearray(zlib.decompress(idat))
+
+    stride = width * 3
+    pixels = []
+
+    def _get_pixel(idx):
+        x = idx % stride
+        y = idx // stride
+        return pixels[y][x]
+
+    for y in range(height):
+        basePos = y * (1 + stride)
+        filter_type = decompressed_data[basePos]
+
+        current_row = []
+
+        pixels.append(current_row)
+
+        for x in range(stride):
+            color = decompressed_data[1 + basePos + x]
+            basex = y * stride + x
+            left = 0
+            up = 0
+
+            if x > 2:
+                left = _get_pixel(basex - 3)
+            if y > 0:
+                up = _get_pixel(basex - stride)
+
+            if filter_type == 1:  # Sub
+                color = (color + left) & 0xff
+            elif filter_type == 2:  # Up
+                color = (color + up) & 0xff
+            elif filter_type == 3:  # Average
+                color = (color + ((left + up) >> 1)) & 0xff
+            elif filter_type == 4:  # Paeth
+                a = left
+                b = up
+                c = 0
+
+                if x > 2 and y > 0:
+                    c = _get_pixel(basex - stride - 3)
+
+                p = a + b - c
+
+                pa = abs(p - a)
+                pb = abs(p - b)
+                pc = abs(p - c)
+
+                if pa <= pb and pa <= pc:
+                    color = (color + a) & 0xff
+                elif pb <= pc:
+                    color = (color + b) & 0xff
+                else:
+                    color = (color + c) & 0xff
+
+            current_row.append(color)
+
+    return width, height, pixels
--- a/youtube_dl/version.py
+++ b/youtube_dl/version.py
@@ -1,3 +1,3 @@
 from __future__ import unicode_literals

-__version__ = '2016.08.06'
+__version__ = '2016.08.07'
Author	SHA1	Message	Date
Sergey M․	4a01befb34	release 2016.08.07	2016-08-07 21:12:41 +07:00
Sergey M․	845dfcdc40	[ChangeLog] Actualize	2016-08-07 21:10:48 +07:00
Sergey M․	d92cb46305	[discoverygo] Add extractor (Closes #10245 )	2016-08-07 20:57:05 +07:00
Sergey M․	a8795327ca	[utils] Add support TV Parental Guidelines ratings in parse_age_limit	2016-08-07 20:45:18 +07:00
Sergey M․	d34995a9e3	[flipagram] Make _search_json_ld non fatal	2016-08-07 19:06:55 +07:00
Sergey M․	958849275f	[extractor/generic] Make _search_json_ld non fatal	2016-08-07 19:04:22 +07:00
Sergey M․	998f094452	[bbc] Remove proxy from test	2016-08-07 18:13:05 +07:00
Sergey M․	aaa42cf0cf	[bbc] PEP 8	2016-08-07 18:05:13 +07:00
Sergey M․	9fb64c04cd	[bbc] Add support for morph embeds (Closes #10239 )	2016-08-07 18:01:50 +07:00
Remita Amine	f9622868e7	[bbc] preserve format_id backward compatibility	2016-08-07 11:14:15 +01:00
Remita Amine	37768f9242	[common] correctly lower the preference of m3u8 master manifest format	2016-08-07 10:59:09 +01:00
Sergey M․	a1aadd09a4	[tnaflixnetworkbase] Improve title extraction	2016-08-07 16:00:09 +07:00
Sergey M․	b47a75017b	[tnaflix] Fix metadata extraction (Closes #10249 )	2016-08-07 16:00:03 +07:00
Remita Amine	e37b54b140	[fox] fix theplatform release url query	2016-08-06 20:53:39 +01:00
Yen Chi Hsuan	c1decda58c	[openload] Fix extraction (closes #9706 )	2016-08-07 02:44:15 +08:00
Yen Chi Hsuan	d3f8e038fe	[utils] Add decode_png for openload (#9706 )	2016-08-07 02:42:58 +08:00
Remita Amine	ad152e2d95	[bbc] fix test	2016-08-06 19:36:12 +01:00
Remita Amine	b0af12154e	[bbc] reduce requests and improve format_id	2016-08-06 19:24:59 +01:00
Remita Amine	d16b3c6677	[common] extract partOfTVSeries info in json-ld	2016-08-06 18:58:38 +01:00
Remita Amine	c57244cdb1	[common] lower the preference of m3u8 master manifest format	2016-08-06 18:55:05 +01:00
Remita Amine	a7e5f27412	[bbc] improve extraction - extract f4m and dash formats - improve format sorting and listing - improve extraction of articles with `otherSettings.playlist`	2016-08-06 18:48:09 +01:00
Remita Amine	089a40955c	[pokemon] improve _VALID_URL	2016-08-06 12:08:14 +01:00
Remita Amine	d73ebac100	[pokemon] Add new extractor(closes #10093 )	2016-08-06 11:18:14 +01:00
Remita Amine	e563c0d73b	[condenast] fallback to loader.js if video.js fail	2016-08-05 21:01:16 +01:00