Compare commits

...

77 Commits

Author SHA1 Message Date
a3ccd6bd11 release 2017.03.24 2017-03-24 00:24:23 +07:00
7963b6cba8 [ChangeLog] Actualize 2017-03-24 00:19:58 +07:00
bea7af6947 [channel9] Remove expired comment and sort imports 2017-03-23 23:58:12 +07:00
a5d783f525 [channel9] Extract more formats 2017-03-23 23:47:43 +07:00
d0572557c2 [ninecninemedia] remove mp4 url extraction request 2017-03-23 13:53:07 +01:00
52d5ecabd5 [bellmedia] add support for etalk.ca(closes #12447) 2017-03-23 13:52:45 +01:00
b0f7f21cb9 [channel9] fix extraction(closes #11323) 2017-03-23 09:22:37 +01:00
579c99a284 [cloudy] Fix extraction (closes #12525) 2017-03-22 23:48:06 +07:00
ca5ed022e9 [hbo] add support for free episode urls and new formats extraction(closes #12519) 2017-03-22 17:28:53 +01:00
391d076d7c [condenast] Fix extraction and style (closes #12526) 2017-03-22 23:22:14 +07:00
c183e14f89 [viu] Relax _VALID_URL (closes #12529) 2017-03-22 22:26:59 +07:00
093dad9e25 release 2017.03.22 2017-03-22 02:36:50 +07:00
e8686e51d7 [ChangeLog] Actualize 2017-03-22 02:35:09 +07:00
8e5a7c5e67 [pluralsight] Omit module title from video title (closes #12506) 2017-03-22 02:28:04 +07:00
e1e35d1ac6 [pornhub] Improve extraction and style (closes #12515) 2017-03-22 01:59:27 +07:00
21fbf0f955 [pornhub] Decode obfuscated video URL (closes #12470) 2017-03-22 01:51:45 +07:00
97952bdb78 [generic] Add test for Senate ISVP iframe embed 2017-03-22 01:12:14 +08:00
8a8cc339b6 [senateisvp] Allow https URL scheme for embeds 2017-03-20 23:35:13 +08:00
957f453429 [Openload.co] Fixed Extraction
They did it again. just a minor change though. here's quick fix
2017-03-20 16:15:00 +08:00
0e9a73e612 release 2017.03.20 2017-03-20 00:07:57 +07:00
0ecdd3adbd [ChangeLog] Actualize 2017-03-20 00:03:58 +07:00
9487ce03e9 [YoutubeDL] Allow multiple input URLs to be used with stdout as output template 2017-03-19 23:59:40 +07:00
45e6ad21b4 Credit @mrBliss for vtm (#11912) 2017-03-19 23:48:02 +07:00
68220649fa [ChangeLog] Update after #12099 2017-03-19 20:42:17 +08:00
46b18f2349 [BostonGlobe] New. Nonstandard version of Brightcove.
Has a "data-brightcove-video-id" instead of a "data-video-id," otherwise
pretty much just Brightcove. Except the Globe isn't all Brightcove
videos, so fallback to Generic, too.

Also, abstract playlist_from_matches() from generic.py to common.py, and use
it here.

History of these changes can be found in
51170427d4b1143572a498dedaee61863a5b2c5b.
2017-03-19 20:40:31 +08:00
772b5ff57f [toongoggles] Add new extractor(closes #12171) 2017-03-19 00:45:38 +01:00
f68ef1e2ab [medialaan] Remove unrelated test 2017-03-18 23:23:47 +07:00
febfe1e262 [adobepass] Detect and output error on authz token extraction (#12472) 2017-03-18 06:21:31 +07:00
5f0daab1ca [openload] Fix extraction 2017-03-18 07:02:55 +08:00
2a721cdff2 [medialaan] Fix and improve extraction (closes #11912) 2017-03-18 05:58:54 +07:00
e7a51a4c02 [vtm] Add extractor (closes #9974)
Implementation of the approach described in #9974.
2017-03-18 00:27:04 +07:00
3e5856d860 [discoverynetworks] add support for more domains and bypass geo restiction 2017-03-17 09:53:44 +01:00
ea883a687c [openload] Fix extraction (closes #10408)
Thanks to @makgun02

Ref: http://pastebin.com/raw/JX9gHFUz
2017-03-17 15:22:34 +08:00
7f3590c43b [test_InfoExtractor] Add some realworld tests for _extract_jwplayer_data 2017-03-17 00:00:01 +07:00
7d539ee10a release 2017.03.16 2017-03-16 22:42:12 +07:00
6ad476079d [ChangeLog] Actualize 2017-03-16 22:39:48 +07:00
0efbc6b56d [options] Mention flac support and sort alphabetically among the audio formats 2017-03-16 12:54:47 +01:00
21bfcd3d6e [postprocessor/ffmpeg] Add support for flac
Requested at http://stackoverflow.com/q/42828041/35070
2017-03-16 12:50:45 +01:00
b51dc9db0e [extractor/common] Extract SMIL formats from jwplayer 2017-03-16 03:30:53 +07:00
a309684285 [extractor/generic] Add forgotten return for jwplayer formats 2017-03-16 03:28:01 +07:00
ba448445b8 [redbull] improve extraction
- extract 1080p quality
- correct ttml subtitle ext
- catch api errors
- reduce request size
2017-03-15 01:40:54 +01:00
5db83d79bf release 2017.03.15 2017-03-15 02:01:24 +07:00
2a751e137f [ChangeLog] Actualize 2017-03-15 02:00:10 +07:00
398887b4c0 [Openload] Fixed Extraction
They did changed it again.
2017-03-14 14:03:52 +08:00
66bf351f80 [facebook] Make title optional (closes #12443) 2017-03-14 00:38:07 +07:00
9d08963022 [telecinco] Add test for #12430 2017-03-13 22:41:28 +07:00
e313d209c2 [mitele] Add support for ooyala videos (closes #12430) 2017-03-13 22:39:15 +07:00
ff9d509d20 [openload] Fix extraction
Just a minor fix for openload
2017-03-13 04:22:35 +08:00
c1795ca6c8 [streamable] Update API URL 2017-03-13 02:51:59 +08:00
8c99623259 [crunchyroll] Extract season name 2017-03-12 12:18:10 +08:00
57b0ddb35f [discoverygo] Actualize test 2017-03-11 23:21:08 +07:00
a28f8d7396 [discoverygo] Bypass geo restriction 2017-03-11 23:18:42 +07:00
7049799470 [discoverygo:playlist] Add extractor (closes #12424) 2017-03-11 23:16:51 +07:00
4605c94d1a [__init__] Fix missing subtitles if --add-metadata is used (#12423)
The previous fix for #5594 is incorrect
2017-03-11 19:37:45 +08:00
a8e687a4da release 2017.03.10 2017-03-10 23:26:28 +07:00
f9e5c92c94 [ChangeLog] Actualize 2017-03-10 23:23:24 +07:00
c2ee861c6d [extractor/generic] Make title optional for jwplayer embeds (closes #12410) 2017-03-10 23:16:53 +07:00
bd34c32bd7 [wdr] Actualize comment 2017-03-10 23:07:36 +07:00
f802c48660 [wdr:maus] Fix extraction and update tests 2017-03-10 23:59:32 +08:00
76bee08fe7 [prosiebensat1] Improve title extraction and add test 2017-03-09 23:42:07 +07:00
2913821723 [prosiebensat1] Improve title extraction (closes #12318) 2017-03-10 00:18:37 +08:00
0e7f9a9b48 [dplayit] Relax playback info URL extraction 2017-03-08 21:30:30 +07:00
0cf2352e85 [dplayit] Separate and rewrite extractor and bypass geo restriction (closes #12393) 2017-03-08 21:20:01 +07:00
0f6b87d067 [miomio] Fix extraction
Closes #12291
Closes #12388
Closes #12402
2017-03-08 19:46:58 +08:00
d7344d33b1 [telequebec] Fix description extraction and update test (closes #12399) 2017-03-08 18:25:59 +07:00
b08cc749d6 [openload] Fix extraction 2017-03-08 06:01:27 +08:00
b68a812ea8 [extractor/generic] Add test for brigthcove UUID-like videoPlayer 2017-03-07 23:00:21 +07:00
2e76bdc850 [brightcove:legacy] Relax videoPlayer validation check (closes #12381) 2017-03-07 22:59:33 +07:00
fe646a2f10 [twitch] PEP8 2017-03-07 15:34:06 +08:00
9df53ea36e Credit @puxlit for twitch 2fa (#11974) 2017-03-07 04:05:47 +07:00
d7d7f84c95 Credit @benages for redbull.tv (#11948) 2017-03-07 04:05:47 +07:00
dccd0ab35d release 2017.03.07 2017-03-07 03:59:22 +07:00
80146dcc6c [ChangeLog] Actualize 2017-03-07 03:57:54 +07:00
e30ccf7047 [soundcloud] Update client id (closes #12376) 2017-03-06 23:05:38 +07:00
54a3a8827b [__init__] Metadata should be added after conversion
Fixes #5594
2017-03-06 18:09:12 +08:00
92cb5763f4 [ChangeLog] Update after #12357 2017-03-06 18:04:19 +08:00
da92da4b88 Openload fix extraction (#12357)
* Fix extraction
2017-03-06 18:00:17 +08:00
45 changed files with 1431 additions and 524 deletions

View File

@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.03.06*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.03.06**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.03.24*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.03.24**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2017.03.06
[debug] youtube-dl version 2017.03.24
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@ -207,3 +207,6 @@ Marek Rusinowski
Tobias Gruetzmacher
Olivier Bilodeau
Lars Vierbergen
Juanjo Benages
Xiao Di Guan
Thomas Winant

View File

@ -1,3 +1,87 @@
version 2017.03.24
Extractors
- [9c9media] Remove mp4 URL extraction request
+ [bellmedia] Add support for etalk.ca and space.ca (#12447)
* [channel9] Fix extraction (#11323)
* [cloudy] Fix extraction (#12525)
+ [hbo] Add support for free episode URLs and new formats extraction (#12519)
* [condenast] Fix extraction and style (#12526)
* [viu] Relax URL regular expression (#12529)
version 2017.03.22
Extractors
- [pluralsight] Omit module title from video title (#12506)
* [pornhub] Decode obfuscated video URL (#12470, #12515)
* [senateisvp] Allow https URL scheme for embeds (#12512)
version 2017.03.20
Core
+ [YoutubeDL] Allow multiple input URLs to be used with stdout (-) as
output template
+ [adobepass] Detect and output error on authz token extraction (#12472)
Extractors
+ [bostonglobe] Add extractor for bostonglobe.com (#12099)
+ [toongoggles] Add support for toongoggles.com (#12171)
+ [medialaan] Add support for Medialaan sites (#9974, #11912)
+ [discoverynetworks] Add support for more domains and bypass geo restiction
* [openload] Fix extraction (#10408)
version 2017.03.16
Core
+ [postprocessor/ffmpeg] Add support for flac
+ [extractor/common] Extract SMIL formats from jwplayer
Extractors
+ [generic] Add forgotten return for jwplayer formats
* [redbulltv] Improve extraction
version 2017.03.15
Core
* Fix missing subtitles if --add-metadata is used (#12423)
Extractors
* [facebook] Make title optional (#12443)
+ [mitele] Add support for ooyala videos (#12430)
* [openload] Fix extraction (#12435, #12446)
* [streamable] Update API URL (#12433)
+ [crunchyroll] Extract season name (#12428)
* [discoverygo] Bypass geo restriction
+ [discoverygo:playlist] Add support for playlists (#12424)
version 2017.03.10
Extractors
* [generic] Make title optional for jwplayer embeds (#12410)
* [wdr:maus] Fix extraction (#12373)
* [prosiebensat1] Improve title extraction (#12318, #12327)
* [dplayit] Separate and rewrite extractor and bypass geo restriction (#12393)
* [miomio] Fix extraction (#12291, #12388, #12402)
* [telequebec] Fix description extraction (#12399)
* [openload] Fix extraction (#12357)
* [brightcove:legacy] Relax videoPlayer validation check (#12381)
version 2017.03.07
Core
* Metadata are now added after conversion (#5594)
Extractors
* [soundcloud] Update client id (#12376)
* [openload] Fix extraction (#10408, #12357)
version 2017.03.06
Core

View File

@ -375,8 +375,9 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
(requires ffmpeg or avconv and ffprobe or
avprobe)
--audio-format FORMAT Specify audio format: "best", "aac",
"vorbis", "mp3", "m4a", "opus", or "wav";
"best" by default; No effect without -x
"flac", "mp3", "m4a", "opus", "vorbis", or
"wav"; "best" by default; No effect without
-x
--audio-quality QUALITY Specify ffmpeg/avconv audio quality, insert
a value between 0 (better) and 9 (worse)
for VBR or a specific bitrate like 128K

View File

@ -108,6 +108,7 @@
- **blinkx**
- **Bloomberg**
- **BokeCC**
- **BostonGlobe**
- **Bpb**: Bundeszentrale für politische Bildung
- **BR**: Bayerischer Rundfunk Mediathek
- **BravoTV**
@ -208,10 +209,13 @@
- **Digiteka**
- **Discovery**
- **DiscoveryGo**
- **DiscoveryGoPlaylist**
- **DiscoveryNetworksDe**
- **Disney**
- **Dotsub**
- **DouyuTV**: 斗鱼
- **DPlay**
- **DPlayIt**
- **dramafever**
- **dramafever:series**
- **DRBonanza**
@ -308,8 +312,8 @@
- **GPUTechConf**
- **Groupon**
- **Hark**
- **HBO**
- **HBOEpisode**
- **hbo**
- **hbo:episode**
- **HearThisAt**
- **Heise**
- **HellPorno**
@ -423,6 +427,7 @@
- **MatchTV**
- **MDR**: MDR.DE and KiKA
- **media.ccc.de**
- **Medialaan**
- **Meipai**: 美拍
- **MelonVOD**
- **META**
@ -775,12 +780,12 @@
- **ThisAV**
- **ThisOldHouse**
- **tinypic**: tinypic.com videos
- **tlc.de**
- **TMZ**
- **TMZArticle**
- **TNAFlix**
- **TNAFlixNetworkEmbed**
- **toggle**
- **ToonGoggles**
- **Tosh**: Tosh.0
- **tou.tv**
- **Toypics**: Toypics user profile

View File

@ -8,7 +8,7 @@ import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL
from test.helper import FakeYDL, expect_dict
from youtube_dl.extractor.common import InfoExtractor
from youtube_dl.extractor import YoutubeIE, get_info_extractor
from youtube_dl.utils import encode_data_uri, strip_jsonp, ExtractorError, RegexNotFoundError
@ -84,6 +84,97 @@ class TestInfoExtractor(unittest.TestCase):
self.assertRaises(ExtractorError, self.ie._download_json, uri, None)
self.assertEqual(self.ie._download_json(uri, None, fatal=False), None)
def test_extract_jwplayer_data_realworld(self):
# from http://www.suffolk.edu/sjc/
expect_dict(
self,
self.ie._extract_jwplayer_data(r'''
<script type='text/javascript'>
jwplayer('my-video').setup({
file: 'rtmp://192.138.214.154/live/sjclive',
fallback: 'true',
width: '95%',
aspectratio: '16:9',
primary: 'flash',
mediaid:'XEgvuql4'
});
</script>
''', None, require_title=False),
{
'id': 'XEgvuql4',
'formats': [{
'url': 'rtmp://192.138.214.154/live/sjclive',
'ext': 'flv'
}]
})
# from https://www.pornoxo.com/videos/7564/striptease-from-sexy-secretary/
expect_dict(
self,
self.ie._extract_jwplayer_data(r'''
<script type="text/javascript">
jwplayer("mediaplayer").setup({
'videoid': "7564",
'width': "100%",
'aspectratio': "16:9",
'stretching': "exactfit",
'autostart': 'false',
'flashplayer': "https://t04.vipstreamservice.com/jwplayer/v5.10/player.swf",
'file': "https://cdn.pornoxo.com/key=MF+oEbaxqTKb50P-w9G3nA,end=1489689259,ip=104.199.146.27/ip=104.199.146.27/speed=6573765/buffer=3.0/2009-12/4b2157147afe5efa93ce1978e0265289c193874e02597.flv",
'image': "https://t03.vipstreamservice.com/thumbs/pxo-full/2009-12/14/a4b2157147afe5efa93ce1978e0265289c193874e02597.flv-full-13.jpg",
'filefallback': "https://cdn.pornoxo.com/key=9ZPsTR5EvPLQrBaak2MUGA,end=1489689259,ip=104.199.146.27/ip=104.199.146.27/speed=6573765/buffer=3.0/2009-12/m_4b2157147afe5efa93ce1978e0265289c193874e02597.mp4",
'logo.hide': true,
'skin': "https://t04.vipstreamservice.com/jwplayer/skin/modieus-blk.zip",
'plugins': "https://t04.vipstreamservice.com/jwplayer/dock/dockableskinnableplugin.swf",
'dockableskinnableplugin.piclink': "/index.php?key=ajax-videothumbsn&vid=7564&data=2009-12--14--4b2157147afe5efa93ce1978e0265289c193874e02597.flv--17370",
'controlbar': 'bottom',
'modes': [
{type: 'flash', src: 'https://t04.vipstreamservice.com/jwplayer/v5.10/player.swf'}
],
'provider': 'http'
});
//noinspection JSAnnotator
invideo.setup({
adsUrl: "/banner-iframe/?zoneId=32",
adsUrl2: "",
autostart: false
});
</script>
''', 'dummy', require_title=False),
{
'thumbnail': 'https://t03.vipstreamservice.com/thumbs/pxo-full/2009-12/14/a4b2157147afe5efa93ce1978e0265289c193874e02597.flv-full-13.jpg',
'formats': [{
'url': 'https://cdn.pornoxo.com/key=MF+oEbaxqTKb50P-w9G3nA,end=1489689259,ip=104.199.146.27/ip=104.199.146.27/speed=6573765/buffer=3.0/2009-12/4b2157147afe5efa93ce1978e0265289c193874e02597.flv',
'ext': 'flv'
}]
})
# from http://www.indiedb.com/games/king-machine/videos
expect_dict(
self,
self.ie._extract_jwplayer_data(r'''
<script>
jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/\/www.indiedb.com\/","displaytitle":false,"autostart":false,"repeat":false,"title":"king machine trailer 1","sharing":{"link":"http:\/\/www.indiedb.com\/games\/king-machine\/videos\/king-machine-trailer-1","code":"<iframe width=\"560\" height=\"315\" src=\"http:\/\/www.indiedb.com\/media\/iframe\/1522983\" frameborder=\"0\" allowfullscreen><\/iframe><br><a href=\"http:\/\/www.indiedb.com\/games\/king-machine\/videos\/king-machine-trailer-1\">king machine trailer 1 - Indie DB<\/a>"},"related":{"file":"http:\/\/rss.indiedb.com\/media\/recommended\/1522983\/feed\/rss.xml","dimensions":"160x120","onclick":"link"},"sources":[{"file":"http:\/\/cdn.dbolical.com\/cache\/videos\/games\/1\/50\/49678\/encode_mp4\/king-machine-trailer.mp4","label":"360p SD","default":"true"},{"file":"http:\/\/cdn.dbolical.com\/cache\/videos\/games\/1\/50\/49678\/encode720p_mp4\/king-machine-trailer.mp4","label":"720p HD"}],"image":"http:\/\/media.indiedb.com\/cache\/images\/games\/1\/50\/49678\/thumb_620x2000\/king-machine-trailer.mp4.jpg","advertising":{"client":"vast","tag":"http:\/\/ads.intergi.com\/adrawdata\/3.0\/5205\/4251742\/0\/1013\/ADTECH;cors=yes;width=560;height=315;referring_url=http:\/\/www.indiedb.com\/games\/king-machine\/videos\/king-machine-trailer-1;content_url=http:\/\/www.indiedb.com\/games\/king-machine\/videos\/king-machine-trailer-1;media_id=1522983;title=king+machine+trailer+1;device=__DEVICE__;model=__MODEL__;os=Windows+OS;osversion=__OSVERSION__;ua=__UA__;ip=109.171.17.81;uniqueid=1522983;tags=__TAGS__;number=58cac25928151;time=1489683033"},"width":620,"height":349}).once("play", function(event) {
videoAnalytics("play");
}).once("complete", function(event) {
videoAnalytics("completed");
});
</script>
''', 'dummy'),
{
'title': 'king machine trailer 1',
'thumbnail': 'http://media.indiedb.com/cache/images/games/1/50/49678/thumb_620x2000/king-machine-trailer.mp4.jpg',
'formats': [{
'url': 'http://cdn.dbolical.com/cache/videos/games/1/50/49678/encode_mp4/king-machine-trailer.mp4',
'height': 360,
'ext': 'mp4'
}, {
'url': 'http://cdn.dbolical.com/cache/videos/games/1/50/49678/encode720p_mp4/king-machine-trailer.mp4',
'height': 720,
'ext': 'mp4'
}]
})
if __name__ == '__main__':
unittest.main()

View File

@ -1872,6 +1872,7 @@ class YoutubeDL(object):
"""Download a given list of URLs."""
outtmpl = self.params.get('outtmpl', DEFAULT_OUTTMPL)
if (len(url_list) > 1 and
outtmpl != '-' and
'%' not in outtmpl and
self.params.get('max_downloads') != 1):
raise SameFileError(outtmpl)

View File

@ -196,7 +196,7 @@ def _real_main(argv=None):
if opts.playlistend not in (-1, None) and opts.playlistend < opts.playliststart:
raise ValueError('Playlist end must be greater than playlist start')
if opts.extractaudio:
if opts.audioformat not in ['best', 'aac', 'mp3', 'm4a', 'opus', 'vorbis', 'wav']:
if opts.audioformat not in ['best', 'aac', 'flac', 'mp3', 'm4a', 'opus', 'vorbis', 'wav']:
parser.error('invalid audio format specified')
if opts.audioquality:
opts.audioquality = opts.audioquality.strip('k').strip('K')
@ -242,14 +242,11 @@ def _real_main(argv=None):
# PostProcessors
postprocessors = []
# Add the metadata pp first, the other pps will copy it
if opts.metafromtitle:
postprocessors.append({
'key': 'MetadataFromTitle',
'titleformat': opts.metafromtitle
})
if opts.addmetadata:
postprocessors.append({'key': 'FFmpegMetadata'})
if opts.extractaudio:
postprocessors.append({
'key': 'FFmpegExtractAudio',
@ -262,6 +259,16 @@ def _real_main(argv=None):
'key': 'FFmpegVideoConvertor',
'preferedformat': opts.recodevideo,
})
# FFmpegMetadataPP should be run after FFmpegVideoConvertorPP and
# FFmpegExtractAudioPP as containers before conversion may not support
# metadata (3gp, webm, etc.)
# And this post-processor should be placed before other metadata
# manipulating post-processors (FFmpegEmbedSubtitle) to prevent loss of
# extra metadata. By default ffmpeg preserves metadata applicable for both
# source and target containers. From this point the container won't change,
# so metadata can be added here.
if opts.addmetadata:
postprocessors.append({'key': 'FFmpegMetadata'})
if opts.convertsubtitles:
postprocessors.append({
'key': 'FFmpegSubtitlesConvertor',

View File

@ -1458,6 +1458,8 @@ class AdobePassIE(InfoExtractor):
self._downloader.cache.store(self._MVPD_CACHE, requestor_id, {})
count += 1
continue
if '<error' in authorize:
raise ExtractorError(xml_text(authorize, 'details'), expected=True)
authz_token = unescapeHTML(xml_text(authorize, 'authzToken'))
requestor_info[guid] = authz_token
self._downloader.cache.store(self._MVPD_CACHE, requestor_id, requestor_info)

View File

@ -21,10 +21,11 @@ class BellMediaIE(InfoExtractor):
animalplanet|
bravo|
mtv|
space
space|
etalk
)\.ca|
much\.com
)/.*?(?:\bvid=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
)/.*?(?:\bvid(?:eoid)?=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
_TESTS = [{
'url': 'http://www.ctv.ca/video/player?vid=706966',
'md5': 'ff2ebbeae0aa2dcc32a830c3fd69b7b0',
@ -58,6 +59,9 @@ class BellMediaIE(InfoExtractor):
}, {
'url': 'http://www.ctv.ca/DCs-Legends-of-Tomorrow/Video/S2E11-Turncoat-vid1051430',
'only_matching': True,
}, {
'url': 'http://www.etalk.ca/video?videoid=663455',
'only_matching': True,
}]
_DOMAINS = {
'thecomedynetwork': 'comedy',
@ -65,6 +69,7 @@ class BellMediaIE(InfoExtractor):
'sciencechannel': 'discsci',
'investigationdiscovery': 'invdisc',
'animalplanet': 'aniplan',
'etalk': 'ctv',
}
def _real_extract(self, url):

View File

@ -0,0 +1,72 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
extract_attributes,
)
class BostonGlobeIE(InfoExtractor):
_VALID_URL = r'(?i)https?://(?:www\.)?bostonglobe\.com/.*/(?P<id>[^/]+)/\w+(?:\.html)?'
_TESTS = [
{
'url': 'http://www.bostonglobe.com/metro/2017/02/11/tree-finally-succumbs-disease-leaving-hole-neighborhood/h1b4lviqzMTIn9sVy8F3gP/story.html',
'md5': '0a62181079c85c2d2b618c9a738aedaf',
'info_dict': {
'title': 'A tree finally succumbs to disease, leaving a hole in a neighborhood',
'id': '5320421710001',
'ext': 'mp4',
'description': 'It arrived as a sapling when the Back Bay was in its infancy, a spindly American elm tamped down into a square of dirt cut into the brick sidewalk of 1880s Marlborough Street, no higher than the first bay window of the new brownstone behind it.',
'timestamp': 1486877593,
'upload_date': '20170212',
'uploader_id': '245991542',
},
},
{
# Embedded youtube video; we hand it off to the Generic extractor.
'url': 'https://www.bostonglobe.com/lifestyle/names/2017/02/17/does-ben-affleck-play-matt-damon-favorite-version-batman/ruqkc9VxKBYmh5txn1XhSI/story.html',
'md5': '582b40327089d5c0c949b3c54b13c24b',
'info_dict': {
'title': "Who Is Matt Damon's Favorite Batman?",
'id': 'ZW1QCnlA6Qc',
'ext': 'mp4',
'upload_date': '20170217',
'description': 'md5:3b3dccb9375867e0b4d527ed87d307cb',
'uploader': 'The Late Late Show with James Corden',
'uploader_id': 'TheLateLateShow',
},
'expected_warnings': ['404'],
},
]
def _real_extract(self, url):
page_id = self._match_id(url)
webpage = self._download_webpage(url, page_id)
page_title = self._og_search_title(webpage, default=None)
# <video data-brightcove-video-id="5320421710001" data-account="245991542" data-player="SJWAiyYWg" data-embed="default" class="video-js" controls itemscope itemtype="http://schema.org/VideoObject">
entries = []
for video in re.findall(r'(?i)(<video[^>]+>)', webpage):
attrs = extract_attributes(video)
video_id = attrs.get('data-brightcove-video-id')
account_id = attrs.get('data-account')
player_id = attrs.get('data-player')
embed = attrs.get('data-embed')
if video_id and account_id and player_id and embed:
entries.append(
'http://players.brightcove.net/%s/%s_%s/index.html?videoId=%s'
% (account_id, player_id, embed, video_id))
if len(entries) == 0:
return self.url_result(url, 'Generic')
elif len(entries) == 1:
return self.url_result(entries[0], 'BrightcoveNew')
else:
return self.playlist_from_matches(entries, page_id, page_title, ie='BrightcoveNew')

View File

@ -193,7 +193,13 @@ class BrightcoveLegacyIE(InfoExtractor):
if videoPlayer is not None:
if isinstance(videoPlayer, list):
videoPlayer = videoPlayer[0]
if not (videoPlayer.isdigit() or videoPlayer.startswith('ref:')):
videoPlayer = videoPlayer.strip()
# UUID is also possible for videoPlayer (e.g.
# http://www.popcornflix.com/hoodies-vs-hooligans/7f2d2b87-bbf2-4623-acfb-ea942b4f01dd
# or http://www8.hp.com/cn/zh/home.html)
if not (re.match(
r'^(?:\d+|[\da-fA-F]{8}-?[\da-fA-F]{4}-?[\da-fA-F]{4}-?[\da-fA-F]{4}-?[\da-fA-F]{12})$',
videoPlayer) or videoPlayer.startswith('ref:')):
return None
params['@videoPlayer'] = videoPlayer
linkBase = find_param('linkBaseURL')

View File

@ -4,62 +4,62 @@ import re
from .common import InfoExtractor
from ..utils import (
clean_html,
ExtractorError,
parse_filesize,
int_or_none,
parse_iso8601,
qualities,
unescapeHTML,
)
class Channel9IE(InfoExtractor):
'''
Common extractor for channel9.msdn.com.
The type of provided URL (video or playlist) is determined according to
meta Search.PageType from web page HTML rather than URL itself, as it is
not always possible to do.
'''
IE_DESC = 'Channel 9'
IE_NAME = 'channel9'
_VALID_URL = r'https?://(?:www\.)?channel9\.msdn\.com/(?P<contentpath>.+?)(?P<rss>/RSS)?/?(?:[?#&]|$)'
_VALID_URL = r'https?://(?:www\.)?(?:channel9\.msdn\.com|s\.ch9\.ms)/(?P<contentpath>.+?)(?P<rss>/RSS)?/?(?:[?#&]|$)'
_TESTS = [{
'url': 'http://channel9.msdn.com/Events/TechEd/Australia/2013/KOS002',
'md5': 'bbd75296ba47916b754e73c3a4bbdf10',
'md5': '32083d4eaf1946db6d454313f44510ca',
'info_dict': {
'id': 'Events/TechEd/Australia/2013/KOS002',
'ext': 'mp4',
'id': '6c413323-383a-49dc-88f9-a22800cab024',
'ext': 'wmv',
'title': 'Developer Kick-Off Session: Stuff We Love',
'description': 'md5:c08d72240b7c87fcecafe2692f80e35f',
'description': 'md5:b80bf9355a503c193aff7ec6cd5a7731',
'duration': 4576,
'thumbnail': r're:http://.*\.jpg',
'thumbnail': r're:https?://.*\.jpg',
'timestamp': 1377717420,
'upload_date': '20130828',
'session_code': 'KOS002',
'session_day': 'Day 1',
'session_room': 'Arena 1A',
'session_speakers': ['Ed Blankenship', 'Andrew Coates', 'Brady Gaster', 'Patrick Klug',
'Mads Kristensen'],
'session_speakers': ['Andrew Coates', 'Brady Gaster', 'Mads Kristensen', 'Ed Blankenship', 'Patrick Klug'],
},
}, {
'url': 'http://channel9.msdn.com/posts/Self-service-BI-with-Power-BI-nuclear-testing',
'md5': 'b43ee4529d111bc37ba7ee4f34813e68',
'md5': 'dcf983ee6acd2088e7188c3cf79b46bc',
'info_dict': {
'id': 'posts/Self-service-BI-with-Power-BI-nuclear-testing',
'ext': 'mp4',
'id': 'fe8e435f-bb93-4e01-8e97-a28c01887024',
'ext': 'wmv',
'title': 'Self-service BI with Power BI - nuclear testing',
'description': 'md5:d1e6ecaafa7fb52a2cacdf9599829f5b',
'description': 'md5:2d17fec927fc91e9e17783b3ecc88f54',
'duration': 1540,
'thumbnail': r're:http://.*\.jpg',
'thumbnail': r're:https?://.*\.jpg',
'timestamp': 1386381991,
'upload_date': '20131207',
'authors': ['Mike Wilmot'],
},
}, {
# low quality mp4 is best
'url': 'https://channel9.msdn.com/Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library',
'info_dict': {
'id': 'Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library',
'id': '33ad69d2-6a4e-4172-83a1-a523013dec76',
'ext': 'mp4',
'title': 'Ranges for the Standard Library',
'description': 'md5:2e6b4917677af3728c5f6d63784c4c5d',
'description': 'md5:9895e0a9fd80822d2f01c454b8f4a372',
'duration': 5646,
'thumbnail': r're:http://.*\.jpg',
'thumbnail': r're:https?://.*\.jpg',
'upload_date': '20150930',
'timestamp': 1443640735,
},
'params': {
'skip_download': True,
@ -70,7 +70,7 @@ class Channel9IE(InfoExtractor):
'id': 'Niners/Splendid22/Queue/76acff796e8f411184b008028e0d492b',
'title': 'Channel 9',
},
'playlist_count': 2,
'playlist_mincount': 100,
}, {
'url': 'https://channel9.msdn.com/Events/DEVintersection/DEVintersection-2016/RSS',
'only_matching': True,
@ -81,189 +81,6 @@ class Channel9IE(InfoExtractor):
_RSS_URL = 'http://channel9.msdn.com/%s/RSS'
def _formats_from_html(self, html):
FORMAT_REGEX = r'''
(?x)
<a\s+href="(?P<url>[^"]+)">(?P<quality>[^<]+)</a>\s*
<span\s+class="usage">\((?P<note>[^\)]+)\)</span>\s*
(?:<div\s+class="popup\s+rounded">\s*
<h3>File\s+size</h3>\s*(?P<filesize>.*?)\s*
</div>)? # File size part may be missing
'''
quality = qualities((
'MP3', 'MP4',
'Low Quality WMV', 'Low Quality MP4',
'Mid Quality WMV', 'Mid Quality MP4',
'High Quality WMV', 'High Quality MP4'))
formats = [{
'url': x.group('url'),
'format_id': x.group('quality'),
'format_note': x.group('note'),
'format': '%s (%s)' % (x.group('quality'), x.group('note')),
'filesize_approx': parse_filesize(x.group('filesize')),
'quality': quality(x.group('quality')),
'vcodec': 'none' if x.group('note') == 'Audio only' else None,
} for x in list(re.finditer(FORMAT_REGEX, html))]
self._sort_formats(formats)
return formats
def _extract_title(self, html):
title = self._html_search_meta('title', html, 'title')
if title is None:
title = self._og_search_title(html)
TITLE_SUFFIX = ' (Channel 9)'
if title is not None and title.endswith(TITLE_SUFFIX):
title = title[:-len(TITLE_SUFFIX)]
return title
def _extract_description(self, html):
DESCRIPTION_REGEX = r'''(?sx)
<div\s+class="entry-content">\s*
<div\s+id="entry-body">\s*
(?P<description>.+?)\s*
</div>\s*
</div>
'''
m = re.search(DESCRIPTION_REGEX, html)
if m is not None:
return m.group('description')
return self._html_search_meta('description', html, 'description')
def _extract_duration(self, html):
m = re.search(r'"length": *"(?P<hours>\d{2}):(?P<minutes>\d{2}):(?P<seconds>\d{2})"', html)
return ((int(m.group('hours')) * 60 * 60) + (int(m.group('minutes')) * 60) + int(m.group('seconds'))) if m else None
def _extract_slides(self, html):
m = re.search(r'<a href="(?P<slidesurl>[^"]+)" class="slides">Slides</a>', html)
return m.group('slidesurl') if m is not None else None
def _extract_zip(self, html):
m = re.search(r'<a href="(?P<zipurl>[^"]+)" class="zip">Zip</a>', html)
return m.group('zipurl') if m is not None else None
def _extract_avg_rating(self, html):
m = re.search(r'<p class="avg-rating">Avg Rating: <span>(?P<avgrating>[^<]+)</span></p>', html)
return float(m.group('avgrating')) if m is not None else 0
def _extract_rating_count(self, html):
m = re.search(r'<div class="rating-count">\((?P<ratingcount>[^<]+)\)</div>', html)
return int(self._fix_count(m.group('ratingcount'))) if m is not None else 0
def _extract_view_count(self, html):
m = re.search(r'<li class="views">\s*<span class="count">(?P<viewcount>[^<]+)</span> Views\s*</li>', html)
return int(self._fix_count(m.group('viewcount'))) if m is not None else 0
def _extract_comment_count(self, html):
m = re.search(r'<li class="comments">\s*<a href="#comments">\s*<span class="count">(?P<commentcount>[^<]+)</span> Comments\s*</a>\s*</li>', html)
return int(self._fix_count(m.group('commentcount'))) if m is not None else 0
def _fix_count(self, count):
return int(str(count).replace(',', '')) if count is not None else None
def _extract_authors(self, html):
m = re.search(r'(?s)<li class="author">(.*?)</li>', html)
if m is None:
return None
return re.findall(r'<a href="/Niners/[^"]+">([^<]+)</a>', m.group(1))
def _extract_session_code(self, html):
m = re.search(r'<li class="code">\s*(?P<code>.+?)\s*</li>', html)
return m.group('code') if m is not None else None
def _extract_session_day(self, html):
m = re.search(r'<li class="day">\s*<a href="/Events/[^"]+">(?P<day>[^<]+)</a>\s*</li>', html)
return m.group('day').strip() if m is not None else None
def _extract_session_room(self, html):
m = re.search(r'<li class="room">\s*(?P<room>.+?)\s*</li>', html)
return m.group('room') if m is not None else None
def _extract_session_speakers(self, html):
return re.findall(r'<a href="/Events/Speakers/[^"]+">([^<]+)</a>', html)
def _extract_content(self, html, content_path):
# Look for downloadable content
formats = self._formats_from_html(html)
slides = self._extract_slides(html)
zip_ = self._extract_zip(html)
# Nothing to download
if len(formats) == 0 and slides is None and zip_ is None:
self._downloader.report_warning('None of recording, slides or zip are available for %s' % content_path)
return
# Extract meta
title = self._extract_title(html)
description = self._extract_description(html)
thumbnail = self._og_search_thumbnail(html)
duration = self._extract_duration(html)
avg_rating = self._extract_avg_rating(html)
rating_count = self._extract_rating_count(html)
view_count = self._extract_view_count(html)
comment_count = self._extract_comment_count(html)
common = {
'_type': 'video',
'id': content_path,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'avg_rating': avg_rating,
'rating_count': rating_count,
'view_count': view_count,
'comment_count': comment_count,
}
result = []
if slides is not None:
d = common.copy()
d.update({'title': title + '-Slides', 'url': slides})
result.append(d)
if zip_ is not None:
d = common.copy()
d.update({'title': title + '-Zip', 'url': zip_})
result.append(d)
if len(formats) > 0:
d = common.copy()
d.update({'title': title, 'formats': formats})
result.append(d)
return result
def _extract_entry_item(self, html, content_path):
contents = self._extract_content(html, content_path)
if contents is None:
return contents
if len(contents) > 1:
raise ExtractorError('Got more than one entry')
result = contents[0]
result['authors'] = self._extract_authors(html)
return result
def _extract_session(self, html, content_path):
contents = self._extract_content(html, content_path)
if contents is None:
return contents
session_meta = {
'session_code': self._extract_session_code(html),
'session_day': self._extract_session_day(html),
'session_room': self._extract_session_room(html),
'session_speakers': self._extract_session_speakers(html),
}
for content in contents:
content.update(session_meta)
return self.playlist_result(contents)
def _extract_list(self, video_id, rss_url=None):
if not rss_url:
rss_url = self._RSS_URL % video_id
@ -274,9 +91,7 @@ class Channel9IE(InfoExtractor):
return self.playlist_result(entries, video_id, title_text)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
content_path = mobj.group('contentpath')
rss = mobj.group('rss')
content_path, rss = re.match(self._VALID_URL, url).groups()
if rss:
return self._extract_list(content_path, url)
@ -284,17 +99,158 @@ class Channel9IE(InfoExtractor):
webpage = self._download_webpage(
url, content_path, 'Downloading web page')
page_type = self._search_regex(
r'<meta[^>]+name=(["\'])WT\.entryid\1[^>]+content=(["\'])(?P<pagetype>[^:]+).+?\2',
webpage, 'page type', default=None, group='pagetype')
if page_type:
if page_type == 'Entry': # Any 'item'-like page, may contain downloadable content
return self._extract_entry_item(webpage, content_path)
elif page_type == 'Session': # Event session page, may contain downloadable content
return self._extract_session(webpage, content_path)
elif page_type == 'Event':
return self._extract_list(content_path)
episode_data = self._search_regex(
r"data-episode='([^']+)'", webpage, 'episode data', default=None)
if episode_data:
episode_data = self._parse_json(unescapeHTML(
episode_data), content_path)
content_id = episode_data['contentId']
is_session = '/Sessions(' in episode_data['api']
content_url = 'https://channel9.msdn.com/odata' + episode_data['api']
if is_session:
content_url += '?$expand=Speakers'
else:
raise ExtractorError('Unexpected WT.entryid %s' % page_type, expected=True)
else: # Assuming list
content_url += '?$expand=Authors'
content_data = self._download_json(content_url, content_id)
title = content_data['Title']
QUALITIES = (
'mp3',
'wmv', 'mp4',
'wmv-low', 'mp4-low',
'wmv-mid', 'mp4-mid',
'wmv-high', 'mp4-high',
)
quality_key = qualities(QUALITIES)
def quality(quality_id, format_url):
return (len(QUALITIES) if '_Source.' in format_url
else quality_key(quality_id))
formats = []
urls = set()
SITE_QUALITIES = {
'MP3': 'mp3',
'MP4': 'mp4',
'Low Quality WMV': 'wmv-low',
'Low Quality MP4': 'mp4-low',
'Mid Quality WMV': 'wmv-mid',
'Mid Quality MP4': 'mp4-mid',
'High Quality WMV': 'wmv-high',
'High Quality MP4': 'mp4-high',
}
formats_select = self._search_regex(
r'(?s)<select[^>]+name=["\']format[^>]+>(.+?)</select', webpage,
'formats select', default=None)
if formats_select:
for mobj in re.finditer(
r'<option\b[^>]+\bvalue=(["\'])(?P<url>(?:(?!\1).)+)\1[^>]*>\s*(?P<format>[^<]+?)\s*<',
formats_select):
format_url = mobj.group('url')
if format_url in urls:
continue
urls.add(format_url)
format_id = mobj.group('format')
quality_id = SITE_QUALITIES.get(format_id, format_id)
formats.append({
'url': format_url,
'format_id': quality_id,
'quality': quality(quality_id, format_url),
'vcodec': 'none' if quality_id == 'mp3' else None,
})
API_QUALITIES = {
'VideoMP4Low': 'mp4-low',
'VideoWMV': 'wmv-mid',
'VideoMP4Medium': 'mp4-mid',
'VideoMP4High': 'mp4-high',
'VideoWMVHQ': 'wmv-hq',
}
for format_id, q in API_QUALITIES.items():
q_url = content_data.get(format_id)
if not q_url or q_url in urls:
continue
urls.add(q_url)
formats.append({
'url': q_url,
'format_id': q,
'quality': quality(q, q_url),
})
self._sort_formats(formats)
slides = content_data.get('Slides')
zip_file = content_data.get('ZipFile')
if not formats and not slides and not zip_file:
raise ExtractorError(
'None of recording, slides or zip are available for %s' % content_path)
subtitles = {}
for caption in content_data.get('Captions', []):
caption_url = caption.get('Url')
if not caption_url:
continue
subtitles.setdefault(caption.get('Language', 'en'), []).append({
'url': caption_url,
'ext': 'vtt',
})
common = {
'id': content_id,
'title': title,
'description': clean_html(content_data.get('Description') or content_data.get('Body')),
'thumbnail': content_data.get('Thumbnail') or content_data.get('VideoPlayerPreviewImage'),
'duration': int_or_none(content_data.get('MediaLengthInSeconds')),
'timestamp': parse_iso8601(content_data.get('PublishedDate')),
'avg_rating': int_or_none(content_data.get('Rating')),
'rating_count': int_or_none(content_data.get('RatingCount')),
'view_count': int_or_none(content_data.get('Views')),
'comment_count': int_or_none(content_data.get('CommentCount')),
'subtitles': subtitles,
}
if is_session:
speakers = []
for s in content_data.get('Speakers', []):
speaker_name = s.get('FullName')
if not speaker_name:
continue
speakers.append(speaker_name)
common.update({
'session_code': content_data.get('Code'),
'session_room': content_data.get('Room'),
'session_speakers': speakers,
})
else:
authors = []
for a in content_data.get('Authors', []):
author_name = a.get('DisplayName')
if not author_name:
continue
authors.append(author_name)
common['authors'] = authors
contents = []
if slides:
d = common.copy()
d.update({'title': title + '-Slides', 'url': slides})
contents.append(d)
if zip_file:
d = common.copy()
d.update({'title': title + '-Zip', 'url': zip_file})
contents.append(d)
if formats:
d = common.copy()
d.update({'title': title, 'formats': formats})
contents.append(d)
return self.playlist_result(contents)
else:
return self._extract_list(content_path)

View File

@ -1,97 +1,56 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_HTTPError,
)
from ..utils import (
ExtractorError,
HEADRequest,
remove_end,
str_to_int,
unified_strdate,
)
class CloudyIE(InfoExtractor):
_IE_DESC = 'cloudy.ec'
_VALID_URL = r'''(?x)
https?://(?:www\.)?cloudy\.ec/
(?:v/|embed\.php\?id=)
(?P<id>[A-Za-z0-9]+)
'''
_EMBED_URL = 'http://www.cloudy.ec/embed.php?id=%s'
_API_URL = 'http://www.cloudy.ec/api/player.api.php'
_MAX_TRIES = 2
_TEST = {
_VALID_URL = r'https?://(?:www\.)?cloudy\.ec/(?:v/|embed\.php\?.*?\bid=)(?P<id>[A-Za-z0-9]+)'
_TESTS = [{
'url': 'https://www.cloudy.ec/v/af511e2527aac',
'md5': '5cb253ace826a42f35b4740539bedf07',
'md5': '29832b05028ead1b58be86bf319397ca',
'info_dict': {
'id': 'af511e2527aac',
'ext': 'flv',
'ext': 'mp4',
'title': 'Funny Cats and Animals Compilation june 2013',
'upload_date': '20130913',
'view_count': int,
}
}
def _extract_video(self, video_id, file_key, error_url=None, try_num=0):
if try_num > self._MAX_TRIES - 1:
raise ExtractorError('Unable to extract video URL', expected=True)
form = {
'file': video_id,
'key': file_key,
}
if error_url:
form.update({
'numOfErrors': try_num,
'errorCode': '404',
'errorUrl': error_url,
})
player_data = self._download_webpage(
self._API_URL, video_id, 'Downloading player data', query=form)
data = compat_parse_qs(player_data)
try_num += 1
if 'error' in data:
raise ExtractorError(
'%s error: %s' % (self.IE_NAME, ' '.join(data['error_msg'])),
expected=True)
title = data.get('title', [None])[0]
if title:
title = remove_end(title, '&asdasdas').strip()
video_url = data.get('url', [None])[0]
if video_url:
try:
self._request_webpage(HEADRequest(video_url), video_id, 'Checking video URL')
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in [404, 410]:
self.report_warning('Invalid video URL, requesting another', video_id)
return self._extract_video(video_id, file_key, video_url, try_num)
return {
'id': video_id,
'url': video_url,
'title': title,
}
}, {
'url': 'http://www.cloudy.ec/embed.php?autoplay=1&id=af511e2527aac',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = self._match_id(url)
url = self._EMBED_URL % video_id
webpage = self._download_webpage(url, video_id)
webpage = self._download_webpage(
'http://www.cloudy.ec/embed.php?id=%s' % video_id, video_id)
file_key = self._search_regex(
[r'key\s*:\s*"([^"]+)"', r'filekey\s*=\s*"([^"]+)"'],
webpage, 'file_key')
info = self._parse_html5_media_entries(url, webpage, video_id)[0]
return self._extract_video(video_id, file_key)
webpage = self._download_webpage(
'https://www.cloudy.ec/v/%s' % video_id, video_id, fatal=False)
if webpage:
info.update({
'title': self._search_regex(
r'<h\d[^>]*>([^<]+)<', webpage, 'title'),
'upload_date': unified_strdate(self._search_regex(
r'>Published at (\d{4}-\d{1,2}-\d{1,2})', webpage,
'upload date', fatal=False)),
'view_count': str_to_int(self._search_regex(
r'([\d,.]+) views<', webpage, 'view count', fatal=False)),
})
if not info.get('title'):
info['title'] = video_id
info['id'] = video_id
return info

View File

@ -36,34 +36,35 @@ from ..utils import (
clean_html,
compiled_regex_type,
determine_ext,
determine_protocol,
error_to_compat_str,
ExtractorError,
extract_attributes,
fix_xml_ampersands,
float_or_none,
GeoRestrictedError,
GeoUtils,
int_or_none,
js_to_json,
mimetype2ext,
orderedSet,
parse_codecs,
parse_duration,
parse_iso8601,
parse_m3u8_attributes,
RegexNotFoundError,
sanitize_filename,
sanitized_Request,
sanitize_filename,
unescapeHTML,
unified_strdate,
unified_timestamp,
update_Request,
update_url_query,
urljoin,
url_basename,
xpath_element,
xpath_text,
xpath_with_ns,
determine_protocol,
parse_duration,
mimetype2ext,
update_Request,
update_url_query,
parse_m3u8_attributes,
extract_attributes,
parse_codecs,
urljoin,
)
@ -714,6 +715,13 @@ class InfoExtractor(object):
video_info['title'] = video_title
return video_info
def playlist_from_matches(self, matches, video_id, video_title, getter=None, ie=None):
urlrs = orderedSet(
self.url_result(self._proto_relative_url(getter(m) if getter else m), ie)
for m in matches)
return self.playlist_result(
urlrs, playlist_id=video_id, playlist_title=video_title)
@staticmethod
def playlist_result(entries, playlist_id=None, playlist_title=None, playlist_description=None):
"""Returns a playlist"""
@ -2247,6 +2255,9 @@ class InfoExtractor(object):
elif ext == 'mpd':
formats.extend(self._extract_mpd_formats(
source_url, video_id, mpd_id=mpd_id, fatal=False))
elif ext == 'smil':
formats.extend(self._extract_smil_formats(
source_url, video_id, fatal=False))
# https://github.com/jwplayer/jwplayer/blob/master/src/js/providers/default.js#L67
elif source_type.startswith('audio') or ext in (
'oga', 'aac', 'mp3', 'mpeg', 'vorbis'):

View File

@ -9,13 +9,14 @@ from ..compat import (
compat_urlparse,
)
from ..utils import (
orderedSet,
remove_end,
extract_attributes,
mimetype2ext,
determine_ext,
extract_attributes,
int_or_none,
js_to_json,
mimetype2ext,
orderedSet,
parse_iso8601,
remove_end,
)
@ -66,6 +67,16 @@ class CondeNastIE(InfoExtractor):
'upload_date': '20130314',
'timestamp': 1363219200,
}
}, {
'url': 'http://video.gq.com/watch/the-closer-with-keith-olbermann-the-only-true-surprise-trump-s-an-idiot?c=series',
'info_dict': {
'id': '58d1865bfd2e6126e2000015',
'ext': 'mp4',
'title': 'The Only True Surprise? Trumps an Idiot',
'uploader': 'gq',
'upload_date': '20170321',
'timestamp': 1490126427,
},
}, {
# JS embed
'url': 'http://player.cnevids.com/embedjs/55f9cf8b61646d1acf00000c/5511d76261646d5566020000.js',
@ -114,26 +125,33 @@ class CondeNastIE(InfoExtractor):
})
video_id = query['videoId']
video_info = None
info_page = self._download_webpage(
info_page = self._download_json(
'http://player.cnevids.com/player/video.js',
video_id, 'Downloading video info', query=query, fatal=False)
video_id, 'Downloading video info', fatal=False, query=query)
if info_page:
video_info = self._parse_json(self._search_regex(
r'loadCallback\(({.+})\)', info_page, 'video info'), video_id)['video']
else:
video_info = info_page.get('video')
if not video_info:
info_page = self._download_webpage(
'http://player.cnevids.com/player/loader.js',
video_id, 'Downloading loader info', query=query)
video_info = self._parse_json(self._search_regex(
r'var\s+video\s*=\s*({.+?});', info_page, 'video info'), video_id)
video_info = self._parse_json(
self._search_regex(
r'(?s)var\s+config\s*=\s*({.+?});', info_page, 'config'),
video_id, transform_source=js_to_json)['video']
title = video_info['title']
formats = []
for fdata in video_info.get('sources', [{}])[0]:
for fdata in video_info['sources']:
src = fdata.get('src')
if not src:
continue
ext = mimetype2ext(fdata.get('type')) or determine_ext(src)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
src, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
continue
quality = fdata.get('quality')
formats.append({
'format_id': ext + ('-%s' % quality if quality else ''),
@ -169,7 +187,6 @@ class CondeNastIE(InfoExtractor):
path=remove_end(parsed_url.path, '.js').replace('/embedjs/', '/embed/')))
url_type = 'embed'
self.to_screen('Extracting from %s with the Condé Nast extractor' % self._SITES[site])
webpage = self._download_webpage(url, item_id)
if url_type == 'series':

View File

@ -177,6 +177,7 @@ class CrunchyrollIE(CrunchyrollBaseIE):
'uploader': 'Kadokawa Pictures Inc.',
'upload_date': '20170118',
'series': "KONOSUBA -God's blessing on this wonderful world!",
'season': "KONOSUBA -God's blessing on this wonderful world! 2",
'season_number': 2,
'episode': 'Give Me Deliverance from this Judicial Injustice!',
'episode_number': 1,
@ -222,6 +223,23 @@ class CrunchyrollIE(CrunchyrollBaseIE):
# just test metadata extraction
'skip_download': True,
},
}, {
# A video with a vastly different season name compared to the series name
'url': 'http://www.crunchyroll.com/nyarko-san-another-crawling-chaos/episode-1-test-590532',
'info_dict': {
'id': '590532',
'ext': 'mp4',
'title': 'Haiyoru! Nyaruani (ONA) Episode 1 Test',
'description': 'Mahiro and Nyaruko talk about official certification.',
'uploader': 'TV TOKYO',
'upload_date': '20120305',
'series': 'Nyarko-san: Another Crawling Chaos',
'season': 'Haiyoru! Nyaruani (ONA)',
},
'params': {
# Just test metadata extraction
'skip_download': True,
},
}]
_FORMAT_IDS = {
@ -491,7 +509,8 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
# webpage provide more accurate data than series_title from XML
series = self._html_search_regex(
r'id=["\']showmedia_about_episode_num[^>]+>\s*<a[^>]+>([^<]+)',
webpage, 'series', default=xpath_text(metadata, 'series_title'))
webpage, 'series', fatal=False)
season = xpath_text(metadata, 'series_title')
episode = xpath_text(metadata, 'episode_title')
episode_number = int_or_none(xpath_text(metadata, 'episode_number'))
@ -508,6 +527,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
'uploader': video_uploader,
'upload_date': video_upload_date,
'series': series,
'season': season,
'season_number': season_number,
'episode': episode,
'episode_number': episode_number,

View File

@ -1,17 +1,21 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
extract_attributes,
ExtractorError,
int_or_none,
parse_age_limit,
ExtractorError,
remove_end,
unescapeHTML,
)
class DiscoveryGoIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://(?:www\.)?(?:
class DiscoveryGoBaseIE(InfoExtractor):
_VALID_URL_TEMPLATE = r'''(?x)https?://(?:www\.)?(?:
discovery|
investigationdiscovery|
discoverylife|
@ -21,18 +25,23 @@ class DiscoveryGoIE(InfoExtractor):
sciencechannel|
tlc|
velocitychannel
)go\.com/(?:[^/]+/)*(?P<id>[^/?#&]+)'''
)go\.com/%s(?P<id>[^/?#&]+)'''
class DiscoveryGoIE(DiscoveryGoBaseIE):
_VALID_URL = DiscoveryGoBaseIE._VALID_URL_TEMPLATE % r'(?:[^/]+/)+'
_GEO_COUNTRIES = ['US']
_TEST = {
'url': 'https://www.discoverygo.com/love-at-first-kiss/kiss-first-ask-questions-later/',
'url': 'https://www.discoverygo.com/bering-sea-gold/reaper-madness/',
'info_dict': {
'id': '57a33c536b66d1cd0345eeb1',
'id': '58c167d86b66d12f2addeb01',
'ext': 'mp4',
'title': 'Kiss First, Ask Questions Later!',
'description': 'md5:fe923ba34050eae468bffae10831cb22',
'duration': 2579,
'series': 'Love at First Kiss',
'season_number': 1,
'episode_number': 1,
'title': 'Reaper Madness',
'description': 'md5:09f2c625c99afb8946ed4fb7865f6e78',
'duration': 2519,
'series': 'Bering Sea Gold',
'season_number': 8,
'episode_number': 6,
'age_limit': 14,
},
}
@ -113,3 +122,46 @@ class DiscoveryGoIE(InfoExtractor):
'formats': formats,
'subtitles': subtitles,
}
class DiscoveryGoPlaylistIE(DiscoveryGoBaseIE):
_VALID_URL = DiscoveryGoBaseIE._VALID_URL_TEMPLATE % ''
_TEST = {
'url': 'https://www.discoverygo.com/bering-sea-gold/',
'info_dict': {
'id': 'bering-sea-gold',
'title': 'Bering Sea Gold',
'description': 'md5:cc5c6489835949043c0cc3ad66c2fa0e',
},
'playlist_mincount': 6,
}
@classmethod
def suitable(cls, url):
return False if DiscoveryGoIE.suitable(url) else super(
DiscoveryGoPlaylistIE, cls).suitable(url)
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
entries = []
for mobj in re.finditer(r'data-json=(["\'])(?P<json>{.+?})\1', webpage):
data = self._parse_json(
mobj.group('json'), display_id,
transform_source=unescapeHTML, fatal=False)
if not isinstance(data, dict) or data.get('type') != 'episode':
continue
episode_url = data.get('socialUrl')
if not episode_url:
continue
entries.append(self.url_result(
episode_url, ie=DiscoveryGoIE.ie_key(),
video_id=data.get('id')))
return self.playlist_result(
entries, display_id,
remove_end(self._og_search_title(
webpage, fatal=False), ' | Discovery GO'),
self._og_search_description(webpage))

View File

@ -9,13 +9,13 @@ from ..compat import (
compat_parse_qs,
compat_urlparse,
)
from ..utils import smuggle_url
class TlcDeIE(InfoExtractor):
IE_NAME = 'tlc.de'
_VALID_URL = r'https?://(?:www\.)?tlc\.de/(?:[^/]+/)*videos/(?P<title>[^/?#]+)?(?:.*#(?P<id>\d+))?'
class DiscoveryNetworksDeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:discovery|tlc|animalplanet|dmax)\.de/(?:.*#(?P<id>\d+)|(?:[^/]+/)*videos/(?P<title>[^/?#]+))'
_TEST = {
_TESTS = [{
'url': 'http://www.tlc.de/sendungen/breaking-amish/videos/#3235167922001',
'info_dict': {
'id': '3235167922001',
@ -29,7 +29,13 @@ class TlcDeIE(InfoExtractor):
'upload_date': '20140404',
'uploader_id': '1659832546',
},
}
}, {
'url': 'http://www.dmax.de/programme/storage-hunters-uk/videos/storage-hunters-uk-episode-6/',
'only_matching': True,
}, {
'url': 'http://www.discovery.de/#5332316765001',
'only_matching': True,
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1659832546/default_default/index.html?videoId=%s'
def _real_extract(self, url):
@ -39,5 +45,8 @@ class TlcDeIE(InfoExtractor):
title = mobj.group('title')
webpage = self._download_webpage(url, title)
brightcove_legacy_url = BrightcoveLegacyIE._extract_brightcove_url(webpage)
brightcove_id = compat_parse_qs(compat_urlparse.urlparse(brightcove_legacy_url).query)['@videoPlayer'][0]
return self.url_result(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)
brightcove_id = compat_parse_qs(compat_urlparse.urlparse(
brightcove_legacy_url).query)['@videoPlayer'][0]
return self.url_result(smuggle_url(
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, {'geo_countries': ['DE']}),
'BrightcoveNew', brightcove_id)

View File

@ -6,37 +6,24 @@ import re
import time
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..compat import (
compat_urlparse,
compat_HTTPError,
)
from ..utils import (
USER_AGENTS,
ExtractorError,
int_or_none,
unified_strdate,
remove_end,
update_url_query,
)
class DPlayIE(InfoExtractor):
_VALID_URL = r'https?://(?P<domain>it\.dplay\.com|www\.dplay\.(?:dk|se|no))/[^/]+/(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?P<domain>www\.dplay\.(?:dk|se|no))/[^/]+/(?P<id>[^/?#]+)'
_TESTS = [{
# geo restricted, via direct unsigned hls URL
'url': 'http://it.dplay.com/take-me-out/stagione-1-episodio-25/',
'info_dict': {
'id': '1255600',
'display_id': 'stagione-1-episodio-25',
'ext': 'mp4',
'title': 'Episodio 25',
'description': 'md5:cae5f40ad988811b197d2d27a53227eb',
'duration': 2761,
'timestamp': 1454701800,
'upload_date': '20160205',
'creator': 'RTIT',
'series': 'Take me out',
'season_number': 1,
'episode_number': 25,
'age_limit': 0,
},
'expected_warnings': ['Unable to download f4m manifest'],
}, {
# non geo restricted, via secure api, unsigned download hls URL
'url': 'http://www.dplay.se/nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet/',
'info_dict': {
@ -168,3 +155,90 @@ class DPlayIE(InfoExtractor):
'formats': formats,
'subtitles': subtitles,
}
class DPlayItIE(InfoExtractor):
_VALID_URL = r'https?://it\.dplay\.com/[^/]+/[^/]+/(?P<id>[^/?#]+)'
_GEO_COUNTRIES = ['IT']
_TEST = {
'url': 'http://it.dplay.com/nove/biografie-imbarazzanti/luigi-di-maio-la-psicosi-di-stanislawskij/',
'md5': '2b808ffb00fc47b884a172ca5d13053c',
'info_dict': {
'id': '6918',
'display_id': 'luigi-di-maio-la-psicosi-di-stanislawskij',
'ext': 'mp4',
'title': 'Biografie imbarazzanti: Luigi Di Maio: la psicosi di Stanislawskij',
'description': 'md5:3c7a4303aef85868f867a26f5cc14813',
'thumbnail': r're:^https?://.*\.jpe?g',
'upload_date': '20160524',
'series': 'Biografie imbarazzanti',
'season_number': 1,
'episode': 'Luigi Di Maio: la psicosi di Stanislawskij',
'episode_number': 1,
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
info_url = self._search_regex(
r'url\s*:\s*["\']((?:https?:)?//[^/]+/playback/videoPlaybackInfo/\d+)',
webpage, 'video id')
title = remove_end(self._og_search_title(webpage), ' | Dplay')
try:
info = self._download_json(
info_url, display_id, headers={
'Authorization': 'Bearer %s' % self._get_cookies(url).get(
'dplayit_token').value,
'Referer': url,
})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (400, 403):
info = self._parse_json(e.cause.read().decode('utf-8'), display_id)
error = info['errors'][0]
if error.get('code') == 'access.denied.geoblocked':
self.raise_geo_restricted(
msg=error.get('detail'), countries=self._GEO_COUNTRIES)
raise ExtractorError(info['errors'][0]['detail'], expected=True)
raise
hls_url = info['data']['attributes']['streaming']['hls']['url']
formats = self._extract_m3u8_formats(
hls_url, display_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
series = self._html_search_regex(
r'(?s)<h1[^>]+class=["\'].*?\bshow_title\b.*?["\'][^>]*>(.+?)</h1>',
webpage, 'series', fatal=False)
episode = self._search_regex(
r'<p[^>]+class=["\'].*?\bdesc_ep\b.*?["\'][^>]*>\s*<br/>\s*<b>([^<]+)',
webpage, 'episode', fatal=False)
mobj = re.search(
r'(?s)<span[^>]+class=["\']dates["\'][^>]*>.+?\bS\.(?P<season_number>\d+)\s+E\.(?P<episode_number>\d+)\s*-\s*(?P<upload_date>\d{2}/\d{2}/\d{4})',
webpage)
if mobj:
season_number = int(mobj.group('season_number'))
episode_number = int(mobj.group('episode_number'))
upload_date = unified_strdate(mobj.group('upload_date'))
else:
season_number = episode_number = upload_date = None
return {
'id': info_url.rpartition('/')[-1],
'display_id': display_id,
'title': title,
'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage),
'series': series,
'season_number': season_number,
'episode': episode,
'episode_number': episode_number,
'upload_date': upload_date,
'formats': formats,
}

View File

@ -117,6 +117,7 @@ from .bleacherreport import (
from .blinkx import BlinkxIE
from .bloomberg import BloombergIE
from .bokecc import BokeCCIE
from .bostonglobe import BostonGlobeIE
from .bpb import BpbIE
from .br import BRIE
from .bravotv import BravoTVIE
@ -246,7 +247,10 @@ from .dfb import DFBIE
from .dhm import DHMIE
from .dotsub import DotsubIE
from .douyutv import DouyuTVIE
from .dplay import DPlayIE
from .dplay import (
DPlayIE,
DPlayItIE,
)
from .dramafever import (
DramaFeverIE,
DramaFeverSeriesIE,
@ -262,7 +266,11 @@ from .dvtv import DVTVIE
from .dumpert import DumpertIE
from .defense import DefenseGouvFrIE
from .discovery import DiscoveryIE
from .discoverygo import DiscoveryGoIE
from .discoverygo import (
DiscoveryGoIE,
DiscoveryGoPlaylistIE,
)
from .discoverynetworks import DiscoveryNetworksDeIE
from .disney import DisneyIE
from .dispeak import DigitallySpeakingIE
from .dropbox import DropboxIE
@ -967,7 +975,6 @@ from .thisav import ThisAVIE
from .thisoldhouse import ThisOldHouseIE
from .threeqsdn import ThreeQSDNIE
from .tinypic import TinyPicIE
from .tlc import TlcDeIE
from .tmz import (
TMZIE,
TMZArticleIE,
@ -980,6 +987,7 @@ from .tnaflix import (
)
from .toggle import ToggleIE
from .tonline import TOnlineIE
from .toongoggles import ToonGogglesIE
from .toutv import TouTvIE
from .toypics import ToypicsUserIE, ToypicsIE
from .traileraddict import TrailerAddictIE
@ -1168,6 +1176,7 @@ from .voxmedia import VoxMediaIE
from .vporn import VpornIE
from .vrt import VRTIE
from .vrak import VrakIE
from .medialaan import MedialaanIE
from .vube import VubeIE
from .vuclip import VuClipIE
from .vvvvid import VVVVIDIE

View File

@ -196,6 +196,10 @@ class FacebookIE(InfoExtractor):
}, {
'url': 'https://www.facebookcorewwwi.onion/video.php?v=274175099429670',
'only_matching': True,
}, {
# no title
'url': 'https://www.facebook.com/onlycleverentertainment/videos/1947995502095005/',
'only_matching': True,
}]
@staticmethod
@ -353,15 +357,15 @@ class FacebookIE(InfoExtractor):
self._sort_formats(formats)
video_title = self._html_search_regex(
r'<h2\s+[^>]*class="uiHeaderTitle"[^>]*>([^<]*)</h2>', webpage, 'title',
default=None)
r'<h2\s+[^>]*class="uiHeaderTitle"[^>]*>([^<]*)</h2>', webpage,
'title', default=None)
if not video_title:
video_title = self._html_search_regex(
r'(?s)<span class="fbPhotosPhotoCaption".*?id="fbPhotoPageCaption"><span class="hasCaption">(.*?)</span>',
webpage, 'alternative title', default=None)
if not video_title:
video_title = self._html_search_meta(
'description', webpage, 'title')
'description', webpage, 'title', default=None)
if video_title:
video_title = limit_length(video_title, 80)
else:

View File

@ -449,6 +449,23 @@ class GenericIE(InfoExtractor):
},
}],
},
{
# Brightcove with UUID in videoPlayer
'url': 'http://www8.hp.com/cn/zh/home.html',
'info_dict': {
'id': '5255815316001',
'ext': 'mp4',
'title': 'Sprocket Video - China',
'description': 'Sprocket Video - China',
'uploader': 'HP-Video Gallery',
'timestamp': 1482263210,
'upload_date': '20161220',
'uploader_id': '1107601872001',
},
'params': {
'skip_download': True, # m3u8 download
},
},
# ooyala video
{
'url': 'http://www.rollingstone.com/music/videos/norwegian-dj-cashmere-cat-goes-spartan-on-with-me-premiere-20131219',
@ -1525,6 +1542,17 @@ class GenericIE(InfoExtractor):
'url': 'http://www.golfchannel.com/topics/shows/golftalkcentral.htm',
'only_matching': True,
},
{
# Senate ISVP iframe https
'url': 'https://www.hsgac.senate.gov/hearings/canadas-fast-track-refugee-plan-unanswered-questions-and-implications-for-us-national-security',
'md5': 'fb8c70b0b515e5037981a2492099aab8',
'info_dict': {
'id': 'govtaff020316',
'ext': 'mp4',
'title': 'Integrated Senate Video Player',
},
'add_ie': [SenateISVPIE.ie_key()],
},
# {
# # TODO: find another test
# # http://schema.org/VideoObject
@ -1824,14 +1852,6 @@ class GenericIE(InfoExtractor):
video_description = self._og_search_description(webpage, default=None)
video_thumbnail = self._og_search_thumbnail(webpage, default=None)
# Helper method
def _playlist_from_matches(matches, getter=None, ie=None):
urlrs = orderedSet(
self.url_result(self._proto_relative_url(getter(m) if getter else m), ie)
for m in matches)
return self.playlist_result(
urlrs, playlist_id=video_id, playlist_title=video_title)
# Look for Brightcove Legacy Studio embeds
bc_urls = BrightcoveLegacyIE._extract_brightcove_urls(webpage)
if bc_urls:
@ -1852,28 +1872,28 @@ class GenericIE(InfoExtractor):
# Look for Brightcove New Studio embeds
bc_urls = BrightcoveNewIE._extract_urls(webpage)
if bc_urls:
return _playlist_from_matches(bc_urls, ie='BrightcoveNew')
return self.playlist_from_matches(bc_urls, video_id, video_title, ie='BrightcoveNew')
# Look for ThePlatform embeds
tp_urls = ThePlatformIE._extract_urls(webpage)
if tp_urls:
return _playlist_from_matches(tp_urls, ie='ThePlatform')
return self.playlist_from_matches(tp_urls, video_id, video_title, ie='ThePlatform')
# Look for Vessel embeds
vessel_urls = VesselIE._extract_urls(webpage)
if vessel_urls:
return _playlist_from_matches(vessel_urls, ie=VesselIE.ie_key())
return self.playlist_from_matches(vessel_urls, video_id, video_title, ie=VesselIE.ie_key())
# Look for embedded rtl.nl player
matches = re.findall(
r'<iframe[^>]+?src="((?:https?:)?//(?:www\.)?rtl\.nl/system/videoplayer/[^"]+(?:video_)?embed[^"]+)"',
webpage)
if matches:
return _playlist_from_matches(matches, ie='RtlNl')
return self.playlist_from_matches(matches, video_id, video_title, ie='RtlNl')
vimeo_urls = VimeoIE._extract_urls(url, webpage)
if vimeo_urls:
return _playlist_from_matches(vimeo_urls, ie=VimeoIE.ie_key())
return self.playlist_from_matches(vimeo_urls, video_id, video_title, ie=VimeoIE.ie_key())
vid_me_embed_url = self._search_regex(
r'src=[\'"](https?://vid\.me/[^\'"]+)[\'"]',
@ -1895,25 +1915,25 @@ class GenericIE(InfoExtractor):
(?:embed|v|p)/.+?)
\1''', webpage)
if matches:
return _playlist_from_matches(
matches, lambda m: unescapeHTML(m[1]))
return self.playlist_from_matches(
matches, video_id, video_title, lambda m: unescapeHTML(m[1]))
# Look for lazyYT YouTube embed
matches = re.findall(
r'class="lazyYT" data-youtube-id="([^"]+)"', webpage)
if matches:
return _playlist_from_matches(matches, lambda m: unescapeHTML(m))
return self.playlist_from_matches(matches, video_id, video_title, lambda m: unescapeHTML(m))
# Look for Wordpress "YouTube Video Importer" plugin
matches = re.findall(r'''(?x)<div[^>]+
class=(?P<q1>[\'"])[^\'"]*\byvii_single_video_player\b[^\'"]*(?P=q1)[^>]+
data-video_id=(?P<q2>[\'"])([^\'"]+)(?P=q2)''', webpage)
if matches:
return _playlist_from_matches(matches, lambda m: m[-1])
return self.playlist_from_matches(matches, video_id, video_title, lambda m: m[-1])
matches = DailymotionIE._extract_urls(webpage)
if matches:
return _playlist_from_matches(matches)
return self.playlist_from_matches(matches, video_id, video_title)
# Look for embedded Dailymotion playlist player (#3822)
m = re.search(
@ -1922,8 +1942,8 @@ class GenericIE(InfoExtractor):
playlists = re.findall(
r'list\[\]=/playlist/([^/]+)/', unescapeHTML(m.group('url')))
if playlists:
return _playlist_from_matches(
playlists, lambda p: '//dailymotion.com/playlist/%s' % p)
return self.playlist_from_matches(
playlists, video_id, video_title, lambda p: '//dailymotion.com/playlist/%s' % p)
# Look for embedded Wistia player
match = re.search(
@ -2030,8 +2050,9 @@ class GenericIE(InfoExtractor):
if mobj is not None:
embeds = self._parse_json(mobj.group(1), video_id, fatal=False)
if embeds:
return _playlist_from_matches(
embeds, getter=lambda v: OoyalaIE._url_for_embed_code(smuggle_url(v['provider_video_id'], {'domain': url})), ie='Ooyala')
return self.playlist_from_matches(
embeds, video_id, video_title,
getter=lambda v: OoyalaIE._url_for_embed_code(smuggle_url(v['provider_video_id'], {'domain': url})), ie='Ooyala')
# Look for Aparat videos
mobj = re.search(r'<iframe .*?src="(http://www\.aparat\.com/video/[^"]+)"', webpage)
@ -2093,13 +2114,13 @@ class GenericIE(InfoExtractor):
# Look for funnyordie embed
matches = re.findall(r'<iframe[^>]+?src="(https?://(?:www\.)?funnyordie\.com/embed/[^"]+)"', webpage)
if matches:
return _playlist_from_matches(
matches, getter=unescapeHTML, ie='FunnyOrDie')
return self.playlist_from_matches(
matches, video_id, video_title, getter=unescapeHTML, ie='FunnyOrDie')
# Look for BBC iPlayer embed
matches = re.findall(r'setPlaylist\("(https?://www\.bbc\.co\.uk/iplayer/[^/]+/[\da-z]{8})"\)', webpage)
if matches:
return _playlist_from_matches(matches, ie='BBCCoUk')
return self.playlist_from_matches(matches, video_id, video_title, ie='BBCCoUk')
# Look for embedded RUTV player
rutv_url = RUTVIE._extract_url(webpage)
@ -2114,32 +2135,32 @@ class GenericIE(InfoExtractor):
# Look for embedded SportBox player
sportbox_urls = SportBoxEmbedIE._extract_urls(webpage)
if sportbox_urls:
return _playlist_from_matches(sportbox_urls, ie='SportBoxEmbed')
return self.playlist_from_matches(sportbox_urls, video_id, video_title, ie='SportBoxEmbed')
# Look for embedded XHamster player
xhamster_urls = XHamsterEmbedIE._extract_urls(webpage)
if xhamster_urls:
return _playlist_from_matches(xhamster_urls, ie='XHamsterEmbed')
return self.playlist_from_matches(xhamster_urls, video_id, video_title, ie='XHamsterEmbed')
# Look for embedded TNAFlixNetwork player
tnaflix_urls = TNAFlixNetworkEmbedIE._extract_urls(webpage)
if tnaflix_urls:
return _playlist_from_matches(tnaflix_urls, ie=TNAFlixNetworkEmbedIE.ie_key())
return self.playlist_from_matches(tnaflix_urls, video_id, video_title, ie=TNAFlixNetworkEmbedIE.ie_key())
# Look for embedded PornHub player
pornhub_urls = PornHubIE._extract_urls(webpage)
if pornhub_urls:
return _playlist_from_matches(pornhub_urls, ie=PornHubIE.ie_key())
return self.playlist_from_matches(pornhub_urls, video_id, video_title, ie=PornHubIE.ie_key())
# Look for embedded DrTuber player
drtuber_urls = DrTuberIE._extract_urls(webpage)
if drtuber_urls:
return _playlist_from_matches(drtuber_urls, ie=DrTuberIE.ie_key())
return self.playlist_from_matches(drtuber_urls, video_id, video_title, ie=DrTuberIE.ie_key())
# Look for embedded RedTube player
redtube_urls = RedTubeIE._extract_urls(webpage)
if redtube_urls:
return _playlist_from_matches(redtube_urls, ie=RedTubeIE.ie_key())
return self.playlist_from_matches(redtube_urls, video_id, video_title, ie=RedTubeIE.ie_key())
# Look for embedded Tvigle player
mobj = re.search(
@ -2185,12 +2206,12 @@ class GenericIE(InfoExtractor):
# Look for embedded soundcloud player
soundcloud_urls = SoundcloudIE._extract_urls(webpage)
if soundcloud_urls:
return _playlist_from_matches(soundcloud_urls, getter=unescapeHTML, ie=SoundcloudIE.ie_key())
return self.playlist_from_matches(soundcloud_urls, video_id, video_title, getter=unescapeHTML, ie=SoundcloudIE.ie_key())
# Look for tunein player
tunein_urls = TuneInBaseIE._extract_urls(webpage)
if tunein_urls:
return _playlist_from_matches(tunein_urls)
return self.playlist_from_matches(tunein_urls, video_id, video_title)
# Look for embedded mtvservices player
mtvservices_url = MTVServicesEmbeddedIE._extract_url(webpage)
@ -2473,35 +2494,35 @@ class GenericIE(InfoExtractor):
# Look for DBTV embeds
dbtv_urls = DBTVIE._extract_urls(webpage)
if dbtv_urls:
return _playlist_from_matches(dbtv_urls, ie=DBTVIE.ie_key())
return self.playlist_from_matches(dbtv_urls, video_id, video_title, ie=DBTVIE.ie_key())
# Look for Videa embeds
videa_urls = VideaIE._extract_urls(webpage)
if videa_urls:
return _playlist_from_matches(videa_urls, ie=VideaIE.ie_key())
return self.playlist_from_matches(videa_urls, video_id, video_title, ie=VideaIE.ie_key())
# Look for 20 minuten embeds
twentymin_urls = TwentyMinutenIE._extract_urls(webpage)
if twentymin_urls:
return _playlist_from_matches(
twentymin_urls, ie=TwentyMinutenIE.ie_key())
return self.playlist_from_matches(
twentymin_urls, video_id, video_title, ie=TwentyMinutenIE.ie_key())
# Look for Openload embeds
openload_urls = OpenloadIE._extract_urls(webpage)
if openload_urls:
return _playlist_from_matches(
openload_urls, ie=OpenloadIE.ie_key())
return self.playlist_from_matches(
openload_urls, video_id, video_title, ie=OpenloadIE.ie_key())
# Look for VideoPress embeds
videopress_urls = VideoPressIE._extract_urls(webpage)
if videopress_urls:
return _playlist_from_matches(
videopress_urls, ie=VideoPressIE.ie_key())
return self.playlist_from_matches(
videopress_urls, video_id, video_title, ie=VideoPressIE.ie_key())
# Look for Rutube embeds
rutube_urls = RutubeIE._extract_urls(webpage)
if rutube_urls:
return _playlist_from_matches(
return self.playlist_from_matches(
rutube_urls, ie=RutubeIE.ie_key())
# Looking for http://schema.org/VideoObject
@ -2533,7 +2554,11 @@ class GenericIE(InfoExtractor):
try:
jwplayer_data = self._parse_json(
jwplayer_data_str, video_id, transform_source=js_to_json)
return self._parse_jwplayer_data(jwplayer_data, video_id)
info = self._parse_jwplayer_data(
jwplayer_data, video_id, require_title=False)
if not info.get('title'):
info['title'] = video_title
return info
except ExtractorError:
pass

View File

@ -4,6 +4,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
xpath_text,
xpath_element,
@ -14,14 +15,26 @@ from ..utils import (
class HBOBaseIE(InfoExtractor):
_FORMATS_INFO = {
'pro7': {
'width': 1280,
'height': 720,
},
'1920': {
'width': 1280,
'height': 720,
},
'pro6': {
'width': 768,
'height': 432,
},
'640': {
'width': 768,
'height': 432,
},
'pro5': {
'width': 640,
'height': 360,
},
'highwifi': {
'width': 640,
'height': 360,
@ -78,6 +91,17 @@ class HBOBaseIE(InfoExtractor):
formats.extend(self._extract_m3u8_formats(
video_url.replace('.tar', '/base_index_w8.m3u8'),
video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
elif source.tag == 'hls':
# #EXT-X-BYTERANGE is not supported by native hls downloader
# and ffmpeg (#10955)
# formats.extend(self._extract_m3u8_formats(
# video_url.replace('.tar', '/base_index.m3u8'),
# video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
continue
elif source.tag == 'dash':
formats.extend(self._extract_mpd_formats(
video_url.replace('.tar', '/manifest.mpd'),
video_id, mpd_id='dash', fatal=False))
else:
format_info = self._FORMATS_INFO.get(source.tag, {})
formats.append({
@ -112,10 +136,11 @@ class HBOBaseIE(InfoExtractor):
class HBOIE(HBOBaseIE):
IE_NAME = 'hbo'
_VALID_URL = r'https?://(?:www\.)?hbo\.com/video/video\.html\?.*vid=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.hbo.com/video/video.html?autoplay=true&g=u&vid=1437839',
'md5': '1c33253f0c7782142c993c0ba62a8753',
'md5': '2c6a6bc1222c7e91cb3334dad1746e5a',
'info_dict': {
'id': '1437839',
'ext': 'mp4',
@ -131,11 +156,12 @@ class HBOIE(HBOBaseIE):
class HBOEpisodeIE(HBOBaseIE):
_VALID_URL = r'https?://(?:www\.)?hbo\.com/(?!video)([^/]+/)+video/(?P<id>[0-9a-z-]+)\.html'
IE_NAME = 'hbo:episode'
_VALID_URL = r'https?://(?:www\.)?hbo\.com/(?P<path>(?!video)(?:(?:[^/]+/)+video|watch-free-episodes)/(?P<id>[0-9a-z-]+))(?:\.html)?'
_TESTS = [{
'url': 'http://www.hbo.com/girls/episodes/5/52-i-love-you-baby/video/ep-52-inside-the-episode.html?autoplay=true',
'md5': '689132b253cc0ab7434237fc3a293210',
'md5': '61ead79b9c0dfa8d3d4b07ef4ac556fb',
'info_dict': {
'id': '1439518',
'display_id': 'ep-52-inside-the-episode',
@ -147,16 +173,19 @@ class HBOEpisodeIE(HBOBaseIE):
}, {
'url': 'http://www.hbo.com/game-of-thrones/about/video/season-5-invitation-to-the-set.html?autoplay=true',
'only_matching': True,
}, {
'url': 'http://www.hbo.com/watch-free-episodes/last-week-tonight-with-john-oliver',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
path, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id)
content = self._download_json(
'http://www.hbo.com/api/content/' + path, display_id)['content']
video_id = self._search_regex(
r'(?P<q1>[\'"])videoId(?P=q1)\s*:\s*(?P<q2>[\'"])(?P<video_id>\d+)(?P=q2)',
webpage, 'video ID', group='video_id')
video_id = compat_str((content.get('parsed', {}).get(
'common:FullBleedVideo', {}) or content['selectedEpisode'])['videoId'])
info_dict = self._extract_from_id(video_id)
info_dict['display_id'] = display_id

View File

@ -0,0 +1,259 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
parse_duration,
try_get,
unified_timestamp,
urlencode_postdata,
)
class MedialaanIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?:www\.)?
(?:
(?P<site_id>vtm|q2|vtmkzoom)\.be/
(?:
video(?:/[^/]+/id/|/?\?.*?\baid=)|
(?:[^/]+/)*
)
)
(?P<id>[^/?#&]+)
'''
_NETRC_MACHINE = 'medialaan'
_APIKEY = '3_HZ0FtkMW_gOyKlqQzW5_0FHRC7Nd5XpXJZcDdXY4pk5eES2ZWmejRW5egwVm4ug-'
_SITE_TO_APP_ID = {
'vtm': 'vtm_watch',
'q2': 'q2',
'vtmkzoom': 'vtmkzoom',
}
_TESTS = [{
# vod
'url': 'http://vtm.be/video/volledige-afleveringen/id/vtm_20170219_VM0678361_vtmwatch',
'info_dict': {
'id': 'vtm_20170219_VM0678361_vtmwatch',
'ext': 'mp4',
'title': 'Allemaal Chris afl. 6',
'description': 'md5:4be86427521e7b07e0adb0c9c554ddb2',
'timestamp': 1487533280,
'upload_date': '20170219',
'duration': 2562,
'series': 'Allemaal Chris',
'season': 'Allemaal Chris',
'season_number': 1,
'season_id': '256936078124527',
'episode': 'Allemaal Chris afl. 6',
'episode_number': 6,
'episode_id': '256936078591527',
},
'params': {
'skip_download': True,
},
'skip': 'Requires account credentials',
}, {
# clip
'url': 'http://vtm.be/video?aid=168332',
'info_dict': {
'id': '168332',
'ext': 'mp4',
'title': '"Veronique liegt!"',
'description': 'md5:1385e2b743923afe54ba4adc38476155',
'timestamp': 1489002029,
'upload_date': '20170308',
'duration': 96,
},
}, {
# vod
'url': 'http://vtm.be/video/volledige-afleveringen/id/257107153551000',
'only_matching': True,
}, {
# vod
'url': 'http://vtm.be/video?aid=163157',
'only_matching': True,
}, {
# vod
'url': 'http://www.q2.be/video/volledige-afleveringen/id/2be_20170301_VM0684442_q2',
'only_matching': True,
}, {
# clip
'url': 'http://vtmkzoom.be/k3-dansstudio/een-nieuw-seizoen-van-k3-dansstudio',
'only_matching': True,
}]
def _real_initialize(self):
self._logged_in = False
def _login(self):
username, password = self._get_login_info()
if username is None:
self.raise_login_required()
auth_data = {
'APIKey': self._APIKEY,
'sdk': 'js_6.1',
'format': 'json',
'loginID': username,
'password': password,
}
auth_info = self._download_json(
'https://accounts.eu1.gigya.com/accounts.login', None,
note='Logging in', errnote='Unable to log in',
data=urlencode_postdata(auth_data))
error_message = auth_info.get('errorDetails') or auth_info.get('errorMessage')
if error_message:
raise ExtractorError(
'Unable to login: %s' % error_message, expected=True)
self._uid = auth_info['UID']
self._uid_signature = auth_info['UIDSignature']
self._signature_timestamp = auth_info['signatureTimestamp']
self._logged_in = True
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id, site_id = mobj.group('id', 'site_id')
webpage = self._download_webpage(url, video_id)
config = self._parse_json(
self._search_regex(
r'videoJSConfig\s*=\s*JSON\.parse\(\'({.+?})\'\);',
webpage, 'config', default='{}'), video_id,
transform_source=lambda s: s.replace(
'\\\\', '\\').replace(r'\"', '"').replace(r"\'", "'"))
vod_id = config.get('vodId') or self._search_regex(
(r'\\"vodId\\"\s*:\s*\\"(.+?)\\"',
r'<[^>]+id=["\']vod-(\d+)'),
webpage, 'video_id', default=None)
# clip, no authentication required
if not vod_id:
player = self._parse_json(
self._search_regex(
r'vmmaplayer\(({.+?})\);', webpage, 'vmma player',
default=''),
video_id, transform_source=lambda s: '[%s]' % s, fatal=False)
if player:
video = player[-1]
info = {
'id': video_id,
'url': video['videoUrl'],
'title': video['title'],
'thumbnail': video.get('imageUrl'),
'timestamp': int_or_none(video.get('createdDate')),
'duration': int_or_none(video.get('duration')),
}
else:
info = self._parse_html5_media_entries(
url, webpage, video_id, m3u8_id='hls')[0]
info.update({
'id': video_id,
'title': self._html_search_meta('description', webpage),
'duration': parse_duration(self._html_search_meta('duration', webpage)),
})
# vod, authentication required
else:
if not self._logged_in:
self._login()
settings = self._parse_json(
self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
webpage, 'drupal settings', default='{}'),
video_id)
def get(container, item):
return try_get(
settings, lambda x: x[container][item],
compat_str) or self._search_regex(
r'"%s"\s*:\s*"([^"]+)' % item, webpage, item,
default=None)
app_id = get('vod', 'app_id') or self._SITE_TO_APP_ID.get(site_id, 'vtm_watch')
sso = get('vod', 'gigyaDatabase') or 'vtm-sso'
data = self._download_json(
'http://vod.medialaan.io/api/1.0/item/%s/video' % vod_id,
video_id, query={
'app_id': app_id,
'user_network': sso,
'UID': self._uid,
'UIDSignature': self._uid_signature,
'signatureTimestamp': self._signature_timestamp,
})
formats = self._extract_m3u8_formats(
data['response']['uri'], video_id, entry_protocol='m3u8_native',
ext='mp4', m3u8_id='hls')
self._sort_formats(formats)
info = {
'id': vod_id,
'formats': formats,
}
api_key = get('vod', 'apiKey')
channel = get('medialaanGigya', 'channel')
if api_key:
videos = self._download_json(
'http://vod.medialaan.io/vod/v2/videos', video_id, fatal=False,
query={
'channels': channel,
'ids': vod_id,
'limit': 1,
'apikey': api_key,
})
if videos:
video = try_get(
videos, lambda x: x['response']['videos'][0], dict)
if video:
def get(container, item, expected_type=None):
return try_get(
video, lambda x: x[container][item], expected_type)
def get_string(container, item):
return get(container, item, compat_str)
info.update({
'series': get_string('program', 'title'),
'season': get_string('season', 'title'),
'season_number': int_or_none(get('season', 'number')),
'season_id': get_string('season', 'id'),
'episode': get_string('episode', 'title'),
'episode_number': int_or_none(get('episode', 'number')),
'episode_id': get_string('episode', 'id'),
'duration': int_or_none(
video.get('duration')) or int_or_none(
video.get('durationMillis'), scale=1000),
'title': get_string('episode', 'title'),
'description': get_string('episode', 'text'),
'timestamp': unified_timestamp(get_string(
'publication', 'begin')),
})
if not info.get('title'):
info['title'] = try_get(
config, lambda x: x['videoConfig']['title'],
compat_str) or self._html_search_regex(
r'\\"title\\"\s*:\s*\\"(.+?)\\"', webpage, 'title',
default=None) or self._og_search_title(webpage)
if not info.get('description'):
info['description'] = self._html_search_regex(
r'<div[^>]+class="field-item\s+even">\s*<p>(.+?)</p>',
webpage, 'description', default=None)
return info

View File

@ -51,6 +51,7 @@ class MioMioIE(InfoExtractor):
'ext': 'mp4',
'title': 'マツコの知らない世界【劇的進化SPビニール傘冷凍食品2016】 1_2 - 16 05 31',
},
'skip': 'Unable to load videos',
}]
def _extract_mioplayer(self, webpage, video_id, title, http_headers):
@ -94,9 +95,18 @@ class MioMioIE(InfoExtractor):
return entries
def _download_chinese_webpage(self, *args, **kwargs):
# Requests with English locales return garbage
headers = {
'Accept-Language': 'zh-TW,en-US;q=0.7,en;q=0.3',
}
kwargs.setdefault('headers', {}).update(headers)
return self._download_webpage(*args, **kwargs)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
webpage = self._download_chinese_webpage(
url, video_id)
title = self._html_search_meta(
'description', webpage, 'title', fatal=True)
@ -106,7 +116,7 @@ class MioMioIE(InfoExtractor):
if '_h5' in mioplayer_path:
player_url = compat_urlparse.urljoin(url, mioplayer_path)
player_webpage = self._download_webpage(
player_webpage = self._download_chinese_webpage(
player_url, video_id,
note='Downloading player webpage', headers={'Referer': url})
entries = self._parse_html5_media_entries(player_url, player_webpage, video_id)

View File

@ -4,6 +4,7 @@ from __future__ import unicode_literals
import uuid
from .common import InfoExtractor
from .ooyala import OoyalaIE
from ..compat import (
compat_str,
compat_urllib_parse_urlencode,
@ -24,6 +25,9 @@ class MiTeleBaseIE(InfoExtractor):
r'(?s)(<ms-video-player.+?</ms-video-player>)',
webpage, 'ms video player'))
video_id = player_data['data-media-id']
if player_data.get('data-cms-id') == 'ooyala':
return self.url_result(
'ooyala:%s' % video_id, ie=OoyalaIE.ie_key(), video_id=video_id)
config_url = compat_urlparse.urljoin(url, player_data['data-config'])
config = self._download_json(
config_url, video_id, 'Downloading config JSON')

View File

@ -34,12 +34,6 @@ class NineCNineMediaStackIE(NineCNineMediaBaseIE):
formats.extend(self._extract_f4m_formats(
stack_base_url + 'f4m', stack_id,
f4m_id='hds', fatal=False))
mp4_url = self._download_webpage(stack_base_url + 'pd', stack_id, fatal=False)
if mp4_url:
formats.append({
'url': mp4_url,
'format_id': 'mp4',
})
self._sort_formats(formats)
return {

View File

@ -75,22 +75,51 @@ class OpenloadIE(InfoExtractor):
'<span[^>]+id="[^"]+"[^>]*>([0-9A-Za-z]+)</span>',
webpage, 'openload ID')
first_char = int(ol_id[0])
urlcode = []
num = 1
video_url_chars = []
while num < len(ol_id):
i = ord(ol_id[num])
key = 0
if i <= 90:
key = i - 65
elif i >= 97:
key = 25 + i - 97
urlcode.append((key, compat_chr(int(ol_id[num + 2:num + 5]) // int(ol_id[num + 1]) - first_char)))
num += 5
first_char = ord(ol_id[0])
key = first_char - 55
maxKey = max(2, key)
key = min(maxKey, len(ol_id) - 38)
t = ol_id[key:key + 36]
video_url = 'https://openload.co/stream/' + ''.join(
[value for _, value in sorted(urlcode, key=lambda x: x[0])])
hashMap = {}
v = ol_id.replace(t, '')
h = 0
while h < len(t):
f = t[h:h + 3]
i = int(f, 8)
hashMap[h / 3] = i
h += 3
h = 0
H = 0
while h < len(v):
B = ''
C = ''
if len(v) >= h + 2:
B = v[h:h + 2]
if len(v) >= h + 3:
C = v[h:h + 3]
i = int(B, 16)
h += 2
if H % 3 == 0:
i = int(C, 8)
h += 1
elif H % 2 == 0 and H != 0 and ord(v[H - 1]) < 60:
i = int(C, 10)
h += 1
index = H % 7
A = hashMap[index]
i ^= 213
i ^= A
video_url_chars.append(compat_chr(i))
H += 1
video_url = 'https://openload.co/stream/%s?mime=true'
video_url = video_url % (''.join(video_url_chars))
title = self._og_search_title(webpage, default=None) or self._search_regex(
r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage,

View File

@ -40,7 +40,7 @@ class PluralsightIE(PluralsightBaseIE):
'info_dict': {
'id': 'hosting-sql-server-windows-azure-iaas-m7-mgmt-04',
'ext': 'mp4',
'title': 'Management of SQL Server - Demo Monitoring',
'title': 'Demo Monitoring',
'duration': 338,
},
'skip': 'Requires pluralsight account credentials',
@ -187,7 +187,7 @@ class PluralsightIE(PluralsightBaseIE):
if not clip:
raise ExtractorError('Unable to resolve clip')
title = '%s - %s' % (module['title'], clip['title'])
title = clip['title']
QUALITIES = {
'low': {'width': 640, 'height': 480},

View File

@ -1,7 +1,9 @@
# coding: utf-8
from __future__ import unicode_literals
import functools
import itertools
import operator
# import os
import re
@ -18,6 +20,7 @@ from ..utils import (
js_to_json,
orderedSet,
# sanitized_Request,
remove_quotes,
str_to_int,
)
# from ..aes import (
@ -129,9 +132,32 @@ class PornHubIE(InfoExtractor):
tv_webpage = dl_webpage('tv')
video_url = self._search_regex(
r'<video[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//.+?)\1', tv_webpage,
'video url', group='url')
assignments = self._search_regex(
r'(var.+?mediastring.+?)</script>', tv_webpage,
'encoded url').split(';')
js_vars = {}
def parse_js_value(inp):
inp = re.sub(r'/\*(?:(?!\*/).)*?\*/', '', inp)
if '+' in inp:
inps = inp.split('+')
return functools.reduce(
operator.concat, map(parse_js_value, inps))
inp = inp.strip()
if inp in js_vars:
return js_vars[inp]
return remove_quotes(inp)
for assn in assignments:
assn = assn.strip()
if not assn:
continue
assn = re.sub(r'var\s+', '', assn)
vname, value = assn.split('=', 1)
js_vars[vname] = parse_js_value(value)
video_url = js_vars['mediastring']
title = self._search_regex(
r'<h1>([^>]+)</h1>', tv_webpage, 'title', default=None)

View File

@ -300,6 +300,21 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
'skip_download': True,
},
},
{
# title in <h2 class="subtitle">
'url': 'http://www.prosieben.de/stars/oscar-award/videos/jetzt-erst-enthuellt-das-geheimnis-von-emma-stones-oscar-robe-clip',
'info_dict': {
'id': '4895826',
'ext': 'mp4',
'title': 'Jetzt erst enthüllt: Das Geheimnis von Emma Stones Oscar-Robe',
'description': 'md5:e5ace2bc43fadf7b63adc6187e9450b9',
'upload_date': '20170302',
},
'params': {
'skip_download': True,
},
'skip': 'geo restricted to Germany',
},
{
# geo restricted to Germany
'url': 'http://www.kabeleinsdoku.de/tv/mayday-alarm-im-cockpit/video/102-notlandung-im-hudson-river-ganze-folge',
@ -338,6 +353,7 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
r'<header class="module_header">\s*<h2>([^<]+)</h2>\s*</header>',
r'<h2 class="video-title" itemprop="name">\s*(.+?)</h2>',
r'<div[^>]+id="veeseoTitle"[^>]*>(.+?)</div>',
r'<h2[^>]+class="subtitle"[^>]*>([^<]+)</h2>',
]
_DESCRIPTION_REGEXES = [
r'<p itemprop="description">\s*(.+?)</p>',
@ -369,7 +385,9 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
def _extract_clip(self, url, webpage):
clip_id = self._html_search_regex(
self._CLIPID_REGEXES, webpage, 'clip id')
title = self._html_search_regex(self._TITLE_REGEXES, webpage, 'title')
title = self._html_search_regex(
self._TITLE_REGEXES, webpage, 'title',
default=None) or self._og_search_title(webpage)
info = self._extract_video_info(url, clip_id)
description = self._html_search_regex(
self._DESCRIPTION_REGEXES, webpage, 'description', default=None)

View File

@ -2,11 +2,13 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
float_or_none,
int_or_none,
try_get,
unified_timestamp,
# unified_timestamp,
ExtractorError,
)
@ -15,15 +17,15 @@ class RedBullTVIE(InfoExtractor):
_TESTS = [{
# film
'url': 'https://www.redbull.tv/video/AP-1Q756YYX51W11/abc-of-wrc',
'md5': '78e860f631d7a846e712fab8c5fe2c38',
'md5': 'fb0445b98aa4394e504b413d98031d1f',
'info_dict': {
'id': 'AP-1Q756YYX51W11',
'ext': 'mp4',
'title': 'ABC of...WRC',
'description': 'md5:5c7ed8f4015c8492ecf64b6ab31e7d31',
'duration': 1582.04,
'timestamp': 1488405786,
'upload_date': '20170301',
# 'timestamp': 1488405786,
# 'upload_date': '20170301',
},
}, {
# episode
@ -34,8 +36,8 @@ class RedBullTVIE(InfoExtractor):
'title': 'Grime - Hashtags S2 E4',
'description': 'md5:334b741c8c1ce65be057eab6773c1cf5',
'duration': 904.6,
'timestamp': 1487290093,
'upload_date': '20170217',
# 'timestamp': 1487290093,
# 'upload_date': '20170217',
'series': 'Hashtags',
'season_number': 2,
'episode_number': 4,
@ -48,29 +50,40 @@ class RedBullTVIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
access_token = self._download_json(
'https://api-v2.redbull.tv/start', video_id,
session = self._download_json(
'https://api-v2.redbull.tv/session', video_id,
note='Downloading access token', query={
'build': '4.0.9',
'category': 'smartphone',
'os_version': 23,
'os_family': 'android',
})['auth']['access_token']
'build': '4.370.0',
'category': 'personal_computer',
'os_version': '1.0',
'os_family': 'http',
})
if session.get('code') == 'error':
raise ExtractorError('%s said: %s' % (
self.IE_NAME, session['message']))
auth = '%s %s' % (session.get('token_type', 'Bearer'), session['access_token'])
info = self._download_json(
'https://api-v2.redbull.tv/views/%s' % video_id,
video_id, note='Downloading video information',
headers={'Authorization': 'Bearer ' + access_token}
)['blocks'][0]['top'][0]
try:
info = self._download_json(
'https://api-v2.redbull.tv/content/%s' % video_id,
video_id, note='Downloading video information',
headers={'Authorization': auth}
)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 404:
error_message = self._parse_json(
e.cause.read().decode(), video_id)['message']
raise ExtractorError('%s said: %s' % (
self.IE_NAME, error_message), expected=True)
raise
video = info['video_product']
title = info['title'].strip()
m3u8_url = video['url']
formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
video['url'], video_id, 'mp4', 'm3u8_native')
self._sort_formats(formats)
subtitles = {}
for _, captions in (try_get(
@ -82,9 +95,12 @@ class RedBullTVIE(InfoExtractor):
caption_url = caption.get('url')
if not caption_url:
continue
ext = caption.get('format')
if ext == 'xml':
ext = 'ttml'
subtitles.setdefault(caption.get('lang') or 'en', []).append({
'url': caption_url,
'ext': caption.get('format'),
'ext': ext,
})
subheading = info.get('subheading')
@ -97,7 +113,7 @@ class RedBullTVIE(InfoExtractor):
'description': info.get('long_description') or info.get(
'short_description'),
'duration': float_or_none(video.get('duration'), scale=1000),
'timestamp': unified_timestamp(info.get('published')),
# 'timestamp': unified_timestamp(info.get('published')),
'series': info.get('show_title'),
'season_number': int_or_none(info.get('season_number')),
'episode_number': int_or_none(info.get('episode_number')),

View File

@ -89,7 +89,7 @@ class SenateISVPIE(InfoExtractor):
@staticmethod
def _search_iframe_url(webpage):
mobj = re.search(
r"<iframe[^>]+src=['\"](?P<url>http://www\.senate\.gov/isvp/?\?[^'\"]+)['\"]",
r"<iframe[^>]+src=['\"](?P<url>https?://www\.senate\.gov/isvp/?\?[^'\"]+)['\"]",
webpage)
if mobj:
return mobj.group('url')

View File

@ -121,7 +121,7 @@ class SoundcloudIE(InfoExtractor):
},
]
_CLIENT_ID = 'fDoItMDbsbZz8dY16ZzARCZmzgHBPotA'
_CLIENT_ID = '2t9loNQH90kzJcsFCODdigxfp325aq4z'
_IPHONE_CLIENT_ID = '376f225bf427445fc4bfb6b99b72e0bf'
@staticmethod

View File

@ -65,7 +65,7 @@ class StreamableIE(InfoExtractor):
# to return video info like the title properly sometimes, and doesn't
# include info like the video duration
video = self._download_json(
'https://streamable.com/ajax/videos/%s' % video_id, video_id)
'https://ajax.streamable.com/videos/%s' % video_id, video_id)
# Format IDs:
# 0 The video is being uploaded

View File

@ -44,6 +44,10 @@ class TelecincoIE(MiTeleBaseIE):
}, {
'url': 'http://www.telecinco.es/espanasinirmaslejos/Espana-gran-destino-turistico_2_1240605043.html',
'only_matching': True,
}, {
# ooyala video
'url': 'http://www.cuatro.com/chesterinlove/a-carta/chester-chester_in_love-chester_edu_2_2331030022.html',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@ -2,15 +2,17 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
int_or_none,
smuggle_url,
try_get,
)
class TeleQuebecIE(InfoExtractor):
_VALID_URL = r'https?://zonevideo\.telequebec\.tv/media/(?P<id>\d+)'
_TEST = {
_TESTS = [{
'url': 'http://zonevideo.telequebec.tv/media/20984/le-couronnement-de-new-york/couronnement-de-new-york',
'md5': 'fe95a0957e5707b1b01f5013e725c90f',
'info_dict': {
@ -18,10 +20,14 @@ class TeleQuebecIE(InfoExtractor):
'ext': 'mp4',
'title': 'Le couronnement de New York',
'description': 'md5:f5b3d27a689ec6c1486132b2d687d432',
'upload_date': '20160220',
'timestamp': 1455965438,
'upload_date': '20170201',
'timestamp': 1485972222,
}
}
}, {
# no description
'url': 'http://zonevideo.telequebec.tv/media/30261',
'only_matching': True,
}]
def _real_extract(self, url):
media_id = self._match_id(url)
@ -31,9 +37,13 @@ class TeleQuebecIE(InfoExtractor):
return {
'_type': 'url_transparent',
'id': media_id,
'url': smuggle_url('limelight:media:' + media_data['streamInfo']['sourceId'], {'geo_countries': ['CA']}),
'url': smuggle_url(
'limelight:media:' + media_data['streamInfo']['sourceId'],
{'geo_countries': ['CA']}),
'title': media_data['title'],
'description': media_data.get('descriptions', [{'text': None}])[0].get('text'),
'duration': int_or_none(media_data.get('durationInMilliseconds'), 1000),
'description': try_get(
media_data, lambda x: x['descriptions'][0]['text'], compat_str),
'duration': int_or_none(
media_data.get('durationInMilliseconds'), 1000),
'ie_key': 'LimelightMedia',
}

View File

@ -0,0 +1,81 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_duration,
)
class ToonGogglesIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?toongoggles\.com/shows/(?P<show_id>\d+)(?:/[^/]+/episodes/(?P<episode_id>\d+))?'
_TESTS = [{
'url': 'http://www.toongoggles.com/shows/217143/bernard-season-2/episodes/217147/football',
'md5': '18289fc2b951eff6b953a9d8f01e6831',
'info_dict': {
'id': '217147',
'ext': 'mp4',
'title': 'Football',
'uploader_id': '1',
'description': 'Bernard decides to play football in order to be better than Lloyd and tries to beat him no matter how, he even cheats.',
'upload_date': '20160718',
'timestamp': 1468879330,
}
}, {
'url': 'http://www.toongoggles.com/shows/227759/om-nom-stories-around-the-world',
'info_dict': {
'id': '227759',
'title': 'Om Nom Stories Around The World',
},
'playlist_mincount': 11,
}]
def _call_api(self, action, page_id, query):
query.update({
'for_ng': 1,
'for_web': 1,
'show_meta': 1,
'version': 7.0,
})
return self._download_json('http://api.toongoggles.com/' + action, page_id, query=query)
def _parse_episode_data(self, episode_data):
title = episode_data['episode_name']
return {
'_type': 'url_transparent',
'id': episode_data['episode_id'],
'title': title,
'url': 'kaltura:513551:' + episode_data['entry_id'],
'thumbnail': episode_data.get('thumbnail_url'),
'description': episode_data.get('description'),
'duration': parse_duration(episode_data.get('hms')),
'series': episode_data.get('show_name'),
'season_number': int_or_none(episode_data.get('season_num')),
'episode_id': episode_data.get('episode_id'),
'episode': title,
'episode_number': int_or_none(episode_data.get('episode_num')),
'categories': episode_data.get('categories'),
'ie_key': 'Kaltura',
}
def _real_extract(self, url):
show_id, episode_id = re.match(self._VALID_URL, url).groups()
if episode_id:
episode_data = self._call_api('search', episode_id, {
'filter': 'episode',
'id': episode_id,
})['objects'][0]
return self._parse_episode_data(episode_data)
else:
show_data = self._call_api('getepisodesbyshow', show_id, {
'max': 1000000000,
'showid': show_id,
})
entries = []
for episode_data in show_data.get('objects', []):
entries.append(self._parse_episode_data(episode_data))
return self.playlist_result(entries, show_id, show_data.get('show_name'))

View File

@ -104,7 +104,7 @@ class TwitchBaseIE(InfoExtractor):
login_page, handle, 'Logging in as %s' % username, {
'username': username,
'password': password,
})
})
if re.search(r'(?i)<form[^>]+id="two-factor-submit"', redirect_page) is not None:
# TODO: Add mechanism to request an SMS or phone call

View File

@ -44,7 +44,7 @@ class ViuBaseIE(InfoExtractor):
class ViuIE(ViuBaseIE):
_VALID_URL = r'(?:viu:|https?://www\.viu\.com/[a-z]{2}/media/)(?P<id>\d+)'
_VALID_URL = r'(?:viu:|https?://[^/]+\.viu\.com/[a-z]{2}/media/)(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.viu.com/en/media/1116705532?containerId=playlist-22168059',
'info_dict': {
@ -69,6 +69,9 @@ class ViuIE(ViuBaseIE):
'skip_download': 'm3u8 download',
},
'skip': 'Geo-restricted to Indonesia',
}, {
'url': 'https://india.viu.com/en/media/1126286865',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@ -19,9 +19,10 @@ class WDRBaseIE(InfoExtractor):
def _extract_wdr_video(self, webpage, display_id):
# for wdr.de the data-extension is in a tag with the class "mediaLink"
# for wdr.de radio players, in a tag with the class "wdrrPlayerPlayBtn"
# for wdrmaus its in a link to the page in a multiline "videoLink"-tag
# for wdrmaus, in a tag with the class "videoButton" (previously a link
# to the page in a multiline "videoLink"-tag)
json_metadata = self._html_search_regex(
r'class=(?:"(?:mediaLink|wdrrPlayerPlayBtn)\b[^"]*"[^>]+|"videoLink\b[^"]*"[\s]*>\n[^\n]*)data-extension="([^"]+)"',
r'class=(?:"(?:mediaLink|wdrrPlayerPlayBtn|videoButton)\b[^"]*"[^>]+|"videoLink\b[^"]*"[\s]*>\n[^\n]*)data-extension="([^"]+)"',
webpage, 'media link', default=None, flags=re.MULTILINE)
if not json_metadata:
@ -32,7 +33,7 @@ class WDRBaseIE(InfoExtractor):
jsonp_url = media_link_obj['mediaObj']['url']
metadata = self._download_json(
jsonp_url, 'metadata', transform_source=strip_jsonp)
jsonp_url, display_id, transform_source=strip_jsonp)
metadata_tracker_data = metadata['trackerData']
metadata_media_resource = metadata['mediaResource']
@ -161,23 +162,23 @@ class WDRIE(WDRBaseIE):
{
'url': 'http://www.wdrmaus.de/aktuelle-sendung/index.php5',
'info_dict': {
'id': 'mdb-1096487',
'ext': 'flv',
'id': 'mdb-1323501',
'ext': 'mp4',
'upload_date': 're:^[0-9]{8}$',
'title': 're:^Die Sendung mit der Maus vom [0-9.]{10}$',
'description': '- Die Sendung mit der Maus -',
'description': 'Die Seite mit der Maus -',
},
'skip': 'The id changes from week to week because of the new episode'
},
{
'url': 'http://www.wdrmaus.de/sachgeschichten/sachgeschichten/achterbahn.php5',
'url': 'http://www.wdrmaus.de/filme/sachgeschichten/achterbahn.php5',
'md5': '803138901f6368ee497b4d195bb164f2',
'info_dict': {
'id': 'mdb-186083',
'ext': 'mp4',
'upload_date': '20130919',
'title': 'Sachgeschichte - Achterbahn ',
'description': '- Die Sendung mit der Maus -',
'description': 'Die Seite mit der Maus -',
},
},
{
@ -186,7 +187,7 @@ class WDRIE(WDRBaseIE):
'info_dict': {
'id': 'mdb-869971',
'ext': 'flv',
'title': 'Funkhaus Europa Livestream',
'title': 'COSMO Livestream',
'description': 'md5:2309992a6716c347891c045be50992e4',
'upload_date': '20160101',
},

View File

@ -773,7 +773,7 @@ def parseOpts(overrideArguments=None):
help='Convert video files to audio-only files (requires ffmpeg or avconv and ffprobe or avprobe)')
postproc.add_option(
'--audio-format', metavar='FORMAT', dest='audioformat', default='best',
help='Specify audio format: "best", "aac", "vorbis", "mp3", "m4a", "opus", or "wav"; "%default" by default; No effect without -x')
help='Specify audio format: "best", "aac", "flac", "mp3", "m4a", "opus", "vorbis", or "wav"; "%default" by default; No effect without -x')
postproc.add_option(
'--audio-quality', metavar='QUALITY',
dest='audioquality', default='5',

View File

@ -26,15 +26,25 @@ from ..utils import (
EXT_TO_OUT_FORMATS = {
"aac": "adts",
"m4a": "ipod",
"mka": "matroska",
"mkv": "matroska",
"mpg": "mpeg",
"ogv": "ogg",
"ts": "mpegts",
"wma": "asf",
"wmv": "asf",
'aac': 'adts',
'flac': 'flac',
'm4a': 'ipod',
'mka': 'matroska',
'mkv': 'matroska',
'mpg': 'mpeg',
'ogv': 'ogg',
'ts': 'mpegts',
'wma': 'asf',
'wmv': 'asf',
}
ACODECS = {
'mp3': 'libmp3lame',
'aac': 'aac',
'flac': 'flac',
'm4a': 'aac',
'opus': 'opus',
'vorbis': 'libvorbis',
'wav': None,
}
@ -237,7 +247,7 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
acodec = 'copy'
extension = 'm4a'
more_opts = ['-bsf:a', 'aac_adtstoasc']
elif filecodec in ['aac', 'mp3', 'vorbis', 'opus']:
elif filecodec in ['aac', 'flac', 'mp3', 'vorbis', 'opus']:
# Lossless if possible
acodec = 'copy'
extension = filecodec
@ -256,8 +266,8 @@ class FFmpegExtractAudioPP(FFmpegPostProcessor):
else:
more_opts += ['-b:a', self._preferredquality + 'k']
else:
# We convert the audio (lossy)
acodec = {'mp3': 'libmp3lame', 'aac': 'aac', 'm4a': 'aac', 'opus': 'opus', 'vorbis': 'libvorbis', 'wav': None}[self._preferredcodec]
# We convert the audio (lossy if codec is lossy)
acodec = ACODECS[self._preferredcodec]
extension = self._preferredcodec
more_opts = []
if self._preferredquality is not None:

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2017.03.06'
__version__ = '2017.03.24'