Compare commits

...

94 Commits

Author SHA1 Message Date
Sergey M․
90634acfcf release 2019.07.27 2019-07-27 03:44:55 +07:00
Sergey M․
eaba9dd6c2 [ChangeLog] Actualize
[ci skip]
2019-07-27 03:43:33 +07:00
Kitten King
843ad1796b Fix typos (#21901) 2019-07-26 22:30:18 +07:00
Kyle
608b8a4300 [yahoo:japannews] Add extractor (closes #21698) (#21265) 2019-07-22 00:59:36 +07:00
Sergey M․
ab794a553c [ctsnews] PEP 8 2019-07-21 14:59:53 +07:00
Remita Amine
3b446ab351 [discovery] add support go.discovery.com URLs 2019-07-20 20:20:53 +01:00
Sergey M․
13a75688a5 [youtube] Fix some tests 2019-07-21 00:01:46 +07:00
Sergey M․
2e18adec98 [youtube:playlist] Relax _VIDEO_RE (closes #21844) 2019-07-20 23:46:34 +07:00
Sergey M․
9c1da4a9f9 [extractor/generic] Restrict --default-search schemeless URLs detection pattern (closes #21842) 2019-07-20 23:08:26 +07:00
Petr Vaněk
5e1c39ac85 [extractor/common] Fix typo in thumbnails resolution description (#21817) 2019-07-17 22:47:53 +07:00
Remita Amine
1824bfdcdf [vrv] fix CMS signing query extraction(closes #21809) 2019-07-16 22:51:10 +01:00
Sergey M․
2f1991ff14 release 2019.07.16 2019-07-16 00:01:46 +07:00
Sergey M․
8b4a0ebf10 [ChangeLog] Actualize
[ci skip]
2019-07-15 23:59:23 +07:00
Sergey M․
f61496863d [asiancrush] Add support for yuyutv.com, midnightpulp.com and cocoro.tv (closes #21281, closes #21290) 2019-07-15 23:58:08 +07:00
Sergey M․
799756a3b3 [kaltura] Check source format URL (#21290) 2019-07-15 23:58:08 +07:00
chien-yu
7d4dd3e5b4 [ctsnews] Fix YouTube embeds extraction (#21678) 2019-07-15 23:03:03 +07:00
tlonic
f2a213d025 [einthusan] Add support for einthusan.com (closes #21748) (#21775) 2019-07-15 22:58:55 +07:00
geditorit
791d2e8117 [youtube] Add support for invidious.mastodon.host (#21777) 2019-07-15 22:54:22 +07:00
Gary
2adedc477e [gfycat] Extend _VALID_URL (closes #21779) (#21780) 2019-07-15 22:53:20 +07:00
Sergey M․
898238e9f8 [youtube] Restrict is_live extraction (closes #21782) 2019-07-14 20:30:05 +07:00
Sergey M․
ce80cacefd release 2019.07.14 2019-07-14 03:10:49 +07:00
Sergey M․
0250161c52 [yandexmusic] Add missing import 2019-07-14 03:09:16 +07:00
Sergey M․
364a2cb658 [ChangeLog] Actualize
[ci skip]
2019-07-14 03:07:02 +07:00
hrimfaxi
2fe074a960 [porn91] Fix extraction (#21312) 2019-07-14 02:57:43 +07:00
aerworker
c452790a79 [yandexmusic] Add support for multi disk albums and extract track number and disk number (closes #21420) (#21421)
* [yandexmusic] extract tracks from all volumes of an album (closes #21420)

* [yandexmusic] extract genre, disk_number and track_number

* [yandexmusic] extract decomposed artist names

* Update yandexmusic.py

* Update yandexmusic.py

* Update yandexmusic.py
2019-07-14 02:38:47 +07:00
Sergey M․
d89a0a8026 [lynda] Handle missing subtitles (closes #20490, closes #20513) 2019-07-14 01:45:28 +07:00
geditorit
ba036333bf [youtube] Add more invidious instances to _VALID_URL (#21694) 2019-07-14 01:23:22 +07:00
Sergey M․
b7ef93f0ab [twitter] Improve uploader id extraction (closes #21705) 2019-07-14 01:19:17 +07:00
Sergey M․
f9eeeda31c [spankbang] Fix and improve metadata extraction 2019-07-14 00:21:39 +07:00
Sergey M․
5f562bd4bb [spankbang] Fix extraction (closes #21763, closes #21764) 2019-07-14 00:13:26 +07:00
Remita Amine
b99f11a56b [dlive] restrict DLive Stream _VALID_URL regex 2019-07-13 14:11:57 +01:00
Remita Amine
4a71ef6da6 [dlive] Add new extractor(closes #18080) 2019-07-13 13:08:19 +01:00
Remita Amine
fd95105ed4 [livejournal] Add new extractor(closes #21526) 2019-07-13 12:47:02 +01:00
Remita Amine
c72dc20d09 [roosterteeth] fix free episode extraction(#16094) 2019-07-13 10:13:07 +01:00
Remita Amine
272355c172 [dbtv] fix extraction 2019-07-12 23:26:46 +01:00
Remita Amine
57227618fe [spike] fix Bellator extraction 2019-07-12 22:50:37 +01:00
Remita Amine
0441d6266c [rudo] remove extractor(closes #18430)(closes #18474)
Covered by generic extractor
2019-07-12 22:31:11 +01:00
Remita Amine
82f68e4a01 [facebook] fallback to twitter:image meta for thumbnail extraction(closes #21224) 2019-07-12 22:02:06 +01:00
Remita Amine
d4ece5d359 [bleacherreport] fix Bleacher Report CMS extraction 2019-07-12 21:56:49 +01:00
Remita Amine
16d3672ad7 [espn] fix fivethirtyeight.com extraction 2019-07-11 23:37:34 +01:00
Remita Amine
0dd58a523f [fivetv] relax video URL regex and support https URLs 2019-07-11 23:10:35 +01:00
Sergey M․
27019dbb4b [youtube] Fix is_live extraction (closes #21734) 2019-07-12 03:45:58 +07:00
Sergey M․
baf67a604d [youtube] Fix authentication (closes #11270) 2019-07-12 02:26:05 +07:00
Sergey M․
0d1f4af39d release 2019.07.12 2019-07-12 00:43:54 +07:00
Sergey M․
7612406bf9 [ChangeLog] Actualize
[ci skip]
2019-07-12 00:34:03 +07:00
Sergey M․
4dcd4b7b16 [mgtv] Pass Referer HTTP header for format URLs (closes #21726) 2019-07-12 00:04:25 +07:00
Sergey M․
5fc0896168 [beeg] Add support for api/v6 v2 URLs without t argument (closes #21701) 2019-07-11 23:37:09 +07:00
Remita Amine
e4d53148f5 [funnyordie] move extraction to VoxMedia extractor and improve vox volume embed extraction(closes #16846) 2019-07-10 16:47:37 +01:00
Remita Amine
cfe781d4fa [gameinformer] fix extraction(closes #8895)(closes #15363)(closes #17206) 2019-07-10 15:45:00 +01:00
Remita Amine
253289656f [extractors] update funk.net import 2019-07-10 13:57:43 +01:00
Remita Amine
4b30282616 [funk] fix extraction(closes #17915) 2019-07-10 13:54:49 +01:00
Remita Amine
c9b0564ac1 [packtpub] Relax lesson _VALID_URL regex(closes #21695) 2019-07-09 11:56:16 +01:00
Remita Amine
25d71fb058 [packtpub] fix extraction(closes #21268) 2019-07-09 08:28:56 +01:00
Sergey M․
a6389abfd7 [philharmoniedeparis] Relax _VALID_URL (closes #21672) 2019-07-06 23:17:45 +07:00
Sergey M․
d18003a141 [peertube] Detect embed URLs in generic extraction (closes #21666) 2019-07-06 00:50:56 +07:00
Hendrik Schröter
d1850c1a97 [mixer:vod] Relax _VALID_URL (closes #21657) (#21658) 2019-07-05 22:47:32 +07:00
Remita Amine
c9fa84d88e [lecturio] add support id based URLs(closes #21630) 2019-07-04 15:59:45 +01:00
Sergey M․
a30c2f4055 [go] Add site info for disneynow (closes #21613) 2019-07-04 04:01:30 +07:00
Sergey M․
5ae9b8b3a3 [adobepass] Add support for AT&T U-verse (mso ATT) (closes #13938, closes #21016) 2019-07-04 03:57:11 +07:00
Sergey M․
cdb7c7d147 [ted] Restrict info regex (closes #21631) 2019-07-04 02:04:23 +07:00
David Caldwell
2da4316e48 [twitch:vod] Actualize m3u8 URL (#21538, #21607) 2019-07-03 23:22:23 +07:00
Sergey M․
313877c6a2 [vzaar] Fix videos with empty title (closes #21606) 2019-07-03 23:16:40 +07:00
Remita Amine
e61ac1a09c [tvland] fix extraction(closes #21384) 2019-07-03 13:31:47 +01:00
Remita Amine
ff0f4cfeba [arte] clean extractor(closes #15583)(closes #21614) 2019-07-02 22:09:40 +01:00
Sergey M․
1335bf10f6 release 2019.07.02 2019-07-02 01:09:59 +07:00
Sergey M․
c8343f0a43 [ChangeLog] Actualize
[ci skip]
2019-07-02 01:07:54 +07:00
nyuszika7h
d1e4116427 [vevo] Add support for embed.vevo.com URLs (#21565) 2019-07-02 00:13:23 +07:00
smed79
9baf69af45 [openload] Add support for oload.biz (#21574) 2019-07-02 00:11:38 +07:00
Fai
918398092c [xiami] Update API base URL (#21575) 2019-07-02 00:10:55 +07:00
xyssy
4e2491f066 [yourporn] Fix extraction (#21585) 2019-07-02 00:05:51 +07:00
Remita Amine
976e1ff7f9 [acast] add support for URLs with episode id(closes #21444) 2019-07-01 12:05:18 +01:00
Remita Amine
5e3da0d42b [dailymotion] add support embed with DM.player js call 2019-07-01 08:37:21 +01:00
Sergey M․
c560680247 [soundcloud] Update client id 2019-06-29 00:33:35 +07:00
Sergey M․
f7a147e3b6 [utils] Introduce random_user_agent and use as default User-Agent (closes #21546) 2019-06-29 00:32:43 +07:00
Sergey M․
8c8cae91ec release 2019.06.27 2019-06-27 23:57:33 +07:00
Sergey M․
232331c0d2 [ChangeLog] Actualize
[ci skip]
2019-06-27 23:55:15 +07:00
Sergey M․
4f71473ef1 [go] Add support for disneynow.com (closes #21528) 2019-06-27 22:59:30 +07:00
Mike Fährmann
6625bf200d [mixer:vod] Relax _VALID_URL (closes #21531) (#21536) 2019-06-27 22:24:46 +07:00
Sergey M․
f562994660 [drtv] Relax _VALID_URL 2019-06-27 22:18:10 +07:00
Remita Amine
509bcec37b [fusion] fix extraction(closes #17775)(closes #21269) 2019-06-27 12:06:31 +01:00
Sergey M․
1d83e9bd4b [nfb] Remove extractor (closes #21518)
Covered by generic extractor
2019-06-25 00:12:31 +07:00
Sergey M․
27cef8885d [beeg] Add support for api/v6 v2 URLs (closes #21511) 2019-06-24 23:01:52 +07:00
Kyle
3031b7c4ed [brightcove:new] Add support for playlists (#21331) 2019-06-23 17:04:05 +07:00
smed79
695720ebe8 [openload] Add support for oload.life (#21495) 2019-06-23 04:31:43 +07:00
Sergey M․
2605043d6d [vimeo:channel,group] Make title extraction no fatal 2019-06-23 02:16:09 +07:00
Sergey M․
091c9b4316 [vimeo:likes] Implement extrator in terms of channel extractor
This allows to obtain videos' ids before extraction (#21493)
2019-06-23 02:13:46 +07:00
Sergey M․
9634de178d [pornhub] Add support for more paged video sources 2019-06-22 08:37:07 +07:00
Sergey M․
1f7a563ab0 [pornhub] Add support for downloading single pages and search pages (closes #15570) 2019-06-22 06:01:43 +07:00
Sergey M․
21b08463a7 [pornhub] Rework extractors (closes #11922, closes #16078, closes #17454, closes #17936) 2019-06-22 05:34:46 +07:00
Sergey M․
31ce6e9966 [youtube] Add another signature function pattern 2019-06-22 02:22:41 +07:00
Sergey M․
1c11204056 [tf1] Improve extraction and fix issues (closes #21372) 2019-06-22 00:40:06 +07:00
Emmanuel Froissart
9c2aaac268 [tf1] Fix wat id extraction (closes #21365) 2019-06-22 00:40:00 +07:00
Sergey M․
d415957dbc [crunchyroll] Move Accept-Language workaround to video extractor since it causes playlists not to list any videos 2019-06-22 00:15:52 +07:00
Sergey M․
4681441d2f [crunchyroll:playlist] Fix and relax title extraction (closes #21291, closes #21443) 2019-06-22 00:07:26 +07:00
75 changed files with 3105 additions and 2899 deletions

View File

@@ -18,7 +18,7 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.06.21. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.07.27. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@@ -26,7 +26,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running youtube-dl version **2019.06.21**
- [ ] I've verified that I'm running youtube-dl version **2019.07.27**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones
@@ -41,7 +41,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2019.06.21
[debug] youtube-dl version 2019.07.27
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -19,7 +19,7 @@ labels: 'site-support-request'
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.06.21. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.07.27. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
@@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running youtube-dl version **2019.06.21**
- [ ] I've verified that I'm running youtube-dl version **2019.07.27**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones

View File

@@ -18,13 +18,13 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.06.21. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.07.27. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x])
-->
- [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running youtube-dl version **2019.06.21**
- [ ] I've verified that I'm running youtube-dl version **2019.07.27**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones

View File

@@ -18,7 +18,7 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.06.21. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.07.27. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running youtube-dl version **2019.06.21**
- [ ] I've verified that I'm running youtube-dl version **2019.07.27**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones
@@ -43,7 +43,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2019.06.21
[debug] youtube-dl version 2019.07.27
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@@ -19,13 +19,13 @@ labels: 'request'
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.06.21. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.07.27. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x])
-->
- [ ] I'm reporting a feature request
- [ ] I've verified that I'm running youtube-dl version **2019.06.21**
- [ ] I've verified that I'm running youtube-dl version **2019.07.27**
- [ ] I've searched the bugtracker for similar feature requests including closed ones

113
ChangeLog
View File

@@ -1,3 +1,116 @@
version 2019.07.27
Extractors
+ [yahoo:japannews] Add support for yahoo.co.jp (#21698, #21265)
+ [discovery] Add support go.discovery.com URLs
* [youtube:playlist] Relax video regular expression (#21844)
* [generic] Restrict --default-search schemeless URLs detection pattern
(#21842)
* [vrv] Fix CMS signing query extraction (#21809)
version 2019.07.16
Extractors
+ [asiancrush] Add support for yuyutv.com, midnightpulp.com and cocoro.tv
(#21281, #21290)
* [kaltura] Check source format URL (#21290)
* [ctsnews] Fix YouTube embeds extraction (#21678)
+ [einthusan] Add support for einthusan.com (#21748, #21775)
+ [youtube] Add support for invidious.mastodon.host (#21777)
+ [gfycat] Extend URL regular expression (#21779, #21780)
* [youtube] Restrict is_live extraction (#21782)
version 2019.07.14
Extractors
* [porn91] Fix extraction (#21312)
+ [yandexmusic] Extract track number and disk number (#21421)
+ [yandexmusic] Add support for multi disk albums (#21420, #21421)
* [lynda] Handle missing subtitles (#20490, #20513)
+ [youtube] Add more invidious instances to URL regular expression (#21694)
* [twitter] Improve uploader id extraction (#21705)
* [spankbang] Fix and improve metadata extraction
* [spankbang] Fix extraction (#21763, #21764)
+ [dlive] Add support for dlive.tv (#18080)
+ [livejournal] Add support for livejournal.com (#21526)
* [roosterteeth] Fix free episode extraction (#16094)
* [dbtv] Fix extraction
* [bellator] Fix extraction
- [rudo] Remove extractor (#18430, #18474)
* [facebook] Fallback to twitter:image meta for thumbnail extraction (#21224)
* [bleacherreport] Fix Bleacher Report CMS extraction
* [espn] Fix fivethirtyeight.com extraction
* [5tv] Relax video URL regular expression and support https URLs
* [youtube] Fix is_live extraction (#21734)
* [youtube] Fix authentication (#11270)
version 2019.07.12
Core
+ [adobepass] Add support for AT&T U-verse (mso ATT) (#13938, #21016)
Extractors
+ [mgtv] Pass Referer HTTP header for format URLs (#21726)
+ [beeg] Add support for api/v6 v2 URLs without t argument (#21701)
* [voxmedia:volume] Improvevox embed extraction (#16846)
* [funnyordie] Move extraction to VoxMedia extractor (#16846)
* [gameinformer] Fix extraction (#8895, #15363, #17206)
* [funk] Fix extraction (#17915)
* [packtpub] Relax lesson URL regular expression (#21695)
* [packtpub] Fix extraction (#21268)
* [philharmoniedeparis] Relax URL regular expression (#21672)
* [peertube] Detect embed URLs in generic extraction (#21666)
* [mixer:vod] Relax URL regular expression (#21657, #21658)
+ [lecturio] Add support id based URLs (#21630)
+ [go] Add site info for disneynow (#21613)
* [ted] Restrict info regular expression (#21631)
* [twitch:vod] Actualize m3u8 URL (#21538, #21607)
* [vzaar] Fix videos with empty title (#21606)
* [tvland] Fix extraction (#21384)
* [arte] Clean extractor (#15583, #21614)
version 2019.07.02
Core
+ [utils] Introduce random_user_agent and use as default User-Agent (#21546)
Extractors
+ [vevo] Add support for embed.vevo.com URLs (#21565)
+ [openload] Add support for oload.biz (#21574)
* [xiami] Update API base URL (#21575)
* [yourporn] Fix extraction (#21585)
+ [acast] Add support for URLs with episode id (#21444)
+ [dailymotion] Add support for DM.player embeds
* [soundcloud] Update client id
version 2019.06.27
Extractors
+ [go] Add support for disneynow.com (#21528)
* [mixer:vod] Relax URL regular expression (#21531, #21536)
* [drtv] Relax URL regular expression
* [fusion] Fix extraction (#17775, #21269)
- [nfb] Remove extractor (#21518)
+ [beeg] Add support for api/v6 v2 URLs (#21511)
+ [brightcove:new] Add support for playlists (#21331)
+ [openload] Add support for oload.life (#21495)
* [vimeo:channel,group] Make title extraction non fatal
* [vimeo:likes] Implement extrator in terms of channel extractor (#21493)
+ [pornhub] Add support for more paged video sources
+ [pornhub] Add support for downloading single pages and search pages (#15570)
* [pornhub] Rework extractors (#11922, #16078, #17454, #17936)
+ [youtube] Add another signature function pattern
* [tf1] Fix extraction (#21365, #21372)
* [crunchyroll] Move Accept-Language workaround to video extractor since
it causes playlists not to list any videos
* [crunchyroll:playlist] Fix and relax title extraction (#21291, #21443)
version 2019.06.21
Core

View File

@@ -58,16 +58,8 @@
- **ARD:mediathek**
- **ARDBetaMediathek**
- **Arkena**
- **arte.tv**
- **arte.tv:+7**
- **arte.tv:cinema**
- **arte.tv:concert**
- **arte.tv:creative**
- **arte.tv:ddc**
- **arte.tv:embed**
- **arte.tv:future**
- **arte.tv:info**
- **arte.tv:magazine**
- **arte.tv:playlist**
- **AsianCrush**
- **AsianCrushPlaylist**
@@ -231,6 +223,8 @@
- **DiscoveryNetworksDe**
- **DiscoveryVR**
- **Disney**
- **dlive:stream**
- **dlive:vod**
- **Dotsub**
- **DouyuShow**
- **DouyuTV**: 斗鱼
@@ -313,9 +307,7 @@
- **FrontendMastersCourse**
- **FrontendMastersLesson**
- **Funimation**
- **FunkChannel**
- **FunkMix**
- **FunnyOrDie**
- **Funk**
- **Fusion**
- **Fux**
- **FXNetworks**
@@ -458,6 +450,7 @@
- **linkedin:learning:course**
- **LinuxAcademy**
- **LiTV**
- **LiveJournal**
- **LiveLeak**
- **LiveLeakEmbed**
- **livestream**
@@ -581,7 +574,6 @@
- **NextTV**: 壹電視
- **Nexx**
- **NexxEmbed**
- **nfb**: National Film Board of Canada
- **nfl.com**
- **NhkVod**
- **nhl.com**
@@ -692,8 +684,9 @@
- **PornerBros**
- **PornHd**
- **PornHub**: PornHub and Thumbzilla
- **PornHubPlaylist**
- **PornHubUserVideos**
- **PornHubPagedVideoList**
- **PornHubUser**
- **PornHubUserVideosUpload**
- **Pornotube**
- **PornoVoisines**
- **PornoXO**
@@ -764,7 +757,6 @@
- **rtve.es:television**
- **RTVNH**
- **RTVS**
- **Rudo**
- **RUHD**
- **rutube**: Rutube videos
- **rutube:channel**: Rutube channels
@@ -896,7 +888,6 @@
- **TF1**
- **TFO**
- **TheIntercept**
- **theoperaplatform**
- **ThePlatform**
- **ThePlatformFeed**
- **TheScene**
@@ -1126,6 +1117,7 @@
- **Yahoo**: Yahoo screen and movies
- **yahoo:gyao**
- **yahoo:gyao:player**
- **yahoo:japannews**: Yahoo! Japan News
- **YandexDisk**
- **yandexmusic:album**: Яндекс.Музыка - Альбом
- **yandexmusic:playlist**: Яндекс.Музыка - Плейлист

View File

@@ -53,7 +53,7 @@ class DashSegmentsFD(FragmentFD):
except compat_urllib_error.HTTPError as err:
# YouTube may often return 404 HTTP error for a fragment causing the
# whole download to fail. However if the same fragment is immediately
# retried with the same request data this usually succeeds (1-2 attemps
# retried with the same request data this usually succeeds (1-2 attempts
# is usually enough) thus allowing to download the whole file successfully.
# To be future-proof we will retry all fragments that fail with any
# HTTP error.

View File

@@ -146,7 +146,7 @@ def write_piff_header(stream, params):
sps, pps = codec_private_data.split(u32.pack(1))[1:]
avcc_payload = u8.pack(1) # configuration version
avcc_payload += sps[1:4] # avc profile indication + profile compatibility + avc level indication
avcc_payload += u8.pack(0xfc | (params.get('nal_unit_length_field', 4) - 1)) # complete represenation (1) + reserved (11111) + length size minus one
avcc_payload += u8.pack(0xfc | (params.get('nal_unit_length_field', 4) - 1)) # complete representation (1) + reserved (11111) + length size minus one
avcc_payload += u8.pack(1) # reserved (0) + number of sps (0000001)
avcc_payload += u16.pack(len(sps))
avcc_payload += sps

View File

@@ -15,10 +15,13 @@ class AbcNewsVideoIE(AMPIE):
IE_NAME = 'abcnews:video'
_VALID_URL = r'''(?x)
https?://
abcnews\.go\.com/
(?:
[^/]+/video/(?P<display_id>[0-9a-z-]+)-|
video/embed\?.*?\bid=
abcnews\.go\.com/
(?:
[^/]+/video/(?P<display_id>[0-9a-z-]+)-|
video/embed\?.*?\bid=
)|
fivethirtyeight\.abcnews\.go\.com/video/embed/\d+/
)
(?P<id>\d+)
'''

View File

@@ -7,6 +7,7 @@ import functools
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
clean_html,
float_or_none,
int_or_none,
try_get,
@@ -27,7 +28,7 @@ class ACastIE(InfoExtractor):
'''
_TESTS = [{
'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna',
'md5': 'a02393c74f3bdb1801c3ec2695577ce0',
'md5': '16d936099ec5ca2d5869e3a813ee8dc4',
'info_dict': {
'id': '2a92b283-1a75-4ad8-8396-499c641de0d9',
'ext': 'mp3',
@@ -46,28 +47,37 @@ class ACastIE(InfoExtractor):
}, {
'url': 'https://play.acast.com/s/rattegangspodden/s04e09-styckmordet-i-helenelund-del-22',
'only_matching': True,
}, {
'url': 'https://play.acast.com/s/sparpodcast/2a92b283-1a75-4ad8-8396-499c641de0d9',
'only_matching': True,
}]
def _real_extract(self, url):
channel, display_id = re.match(self._VALID_URL, url).groups()
s = self._download_json(
'https://play-api.acast.com/stitch/%s/%s' % (channel, display_id),
display_id)['result']
'https://feeder.acast.com/api/v1/shows/%s/episodes/%s' % (channel, display_id),
display_id)
media_url = s['url']
if re.search(r'[0-9a-f]{8}-(?:[0-9a-f]{4}-){3}[0-9a-f]{12}', display_id):
episode_url = s.get('episodeUrl')
if episode_url:
display_id = episode_url
else:
channel, display_id = re.match(self._VALID_URL, s['link']).groups()
cast_data = self._download_json(
'https://play-api.acast.com/splash/%s/%s' % (channel, display_id),
display_id)['result']
e = cast_data['episode']
title = e['name']
title = e.get('name') or s['title']
return {
'id': compat_str(e['id']),
'display_id': display_id,
'url': media_url,
'title': title,
'description': e.get('description') or e.get('summary'),
'description': e.get('summary') or clean_html(e.get('description') or s.get('description')),
'thumbnail': e.get('image'),
'timestamp': unified_timestamp(e.get('publishingDate')),
'duration': float_or_none(s.get('duration') or e.get('duration')),
'timestamp': unified_timestamp(e.get('publishingDate') or s.get('publishDate')),
'duration': float_or_none(e.get('duration') or s.get('duration')),
'filesize': int_or_none(e.get('contentLength')),
'creator': try_get(cast_data, lambda x: x['show']['author'], compat_str),
'series': try_get(cast_data, lambda x: x['show']['name'], compat_str),

View File

@@ -25,6 +25,11 @@ MSO_INFO = {
'username_field': 'username',
'password_field': 'password',
},
'ATT': {
'name': 'AT&T U-verse',
'username_field': 'userid',
'password_field': 'password',
},
'ATTOTT': {
'name': 'DIRECTV NOW',
'username_field': 'email',

View File

@@ -4,17 +4,10 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_str,
compat_urllib_parse_urlparse,
)
from ..compat import compat_str
from ..utils import (
ExtractorError,
find_xpath_attr,
get_element_by_attribute,
int_or_none,
NO_DEFAULT,
qualities,
try_get,
unified_strdate,
@@ -25,59 +18,7 @@ from ..utils import (
# add tests.
class ArteTvIE(InfoExtractor):
_VALID_URL = r'https?://videos\.arte\.tv/(?P<lang>fr|de|en|es)/.*-(?P<id>.*?)\.html'
IE_NAME = 'arte.tv'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
lang = mobj.group('lang')
video_id = mobj.group('id')
ref_xml_url = url.replace('/videos/', '/do_delegate/videos/')
ref_xml_url = ref_xml_url.replace('.html', ',view,asPlayerXml.xml')
ref_xml_doc = self._download_xml(
ref_xml_url, video_id, note='Downloading metadata')
config_node = find_xpath_attr(ref_xml_doc, './/video', 'lang', lang)
config_xml_url = config_node.attrib['ref']
config = self._download_xml(
config_xml_url, video_id, note='Downloading configuration')
formats = [{
'format_id': q.attrib['quality'],
# The playpath starts at 'mp4:', if we don't manually
# split the url, rtmpdump will incorrectly parse them
'url': q.text.split('mp4:', 1)[0],
'play_path': 'mp4:' + q.text.split('mp4:', 1)[1],
'ext': 'flv',
'quality': 2 if q.attrib['quality'] == 'hd' else 1,
} for q in config.findall('./urls/url')]
self._sort_formats(formats)
title = config.find('.//name').text
thumbnail = config.find('.//firstThumbnailUrl').text
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'formats': formats,
}
class ArteTVBaseIE(InfoExtractor):
@classmethod
def _extract_url_info(cls, url):
mobj = re.match(cls._VALID_URL, url)
lang = mobj.group('lang')
query = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
if 'vid' in query:
video_id = query['vid'][0]
else:
# This is not a real id, it can be for example AJT for the news
# http://www.arte.tv/guide/fr/emissions/AJT/arte-journal
video_id = mobj.group('id')
return video_id, lang
def _extract_from_json_url(self, json_url, video_id, lang, title=None):
info = self._download_json(json_url, video_id)
player_info = info['videoJsonPlayer']
@@ -108,13 +49,15 @@ class ArteTVBaseIE(InfoExtractor):
'upload_date': unified_strdate(upload_date_str),
'thumbnail': player_info.get('programImage') or player_info.get('VTU', {}).get('IUR'),
}
qfunc = qualities(['HQ', 'MQ', 'EQ', 'SQ'])
qfunc = qualities(['MQ', 'HQ', 'EQ', 'SQ'])
LANGS = {
'fr': 'F',
'de': 'A',
'en': 'E[ANG]',
'es': 'E[ESP]',
'it': 'E[ITA]',
'pl': 'E[POL]',
}
langcode = LANGS.get(lang, lang)
@@ -126,8 +69,8 @@ class ArteTVBaseIE(InfoExtractor):
l = re.escape(langcode)
# Language preference from most to least priority
# Reference: section 5.6.3 of
# http://www.arte.tv/sites/en/corporate/files/complete-technical-guidelines-arte-geie-v1-05.pdf
# Reference: section 6.8 of
# https://www.arte.tv/sites/en/corporate/files/complete-technical-guidelines-arte-geie-v1-07-1.pdf
PREFERENCES = (
# original version in requested language, without subtitles
r'VO{0}$'.format(l),
@@ -193,274 +136,59 @@ class ArteTVBaseIE(InfoExtractor):
class ArteTVPlus7IE(ArteTVBaseIE):
IE_NAME = 'arte.tv:+7'
_VALID_URL = r'https?://(?:(?:www|sites)\.)?arte\.tv/(?:[^/]+/)?(?P<lang>fr|de|en|es)/(?:videos/)?(?:[^/]+/)*(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/(?P<lang>fr|de|en|es|it|pl)/videos/(?P<id>\d{6}-\d{3}-[AF])'
_TESTS = [{
'url': 'http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D',
'only_matching': True,
}, {
'url': 'http://sites.arte.tv/karambolage/de/video/karambolage-22',
'only_matching': True,
}, {
'url': 'http://www.arte.tv/de/videos/048696-000-A/der-kluge-bauch-unser-zweites-gehirn',
'only_matching': True,
'url': 'https://www.arte.tv/en/videos/088501-000-A/mexico-stealing-petrol-to-survive/',
'info_dict': {
'id': '088501-000-A',
'ext': 'mp4',
'title': 'Mexico: Stealing Petrol to Survive',
'upload_date': '20190628',
},
}]
@classmethod
def suitable(cls, url):
return False if ArteTVPlaylistIE.suitable(url) else super(ArteTVPlus7IE, cls).suitable(url)
def _real_extract(self, url):
video_id, lang = self._extract_url_info(url)
webpage = self._download_webpage(url, video_id)
return self._extract_from_webpage(webpage, video_id, lang)
def _extract_from_webpage(self, webpage, video_id, lang):
patterns_templates = (r'arte_vp_url=["\'](.*?%s.*?)["\']', r'data-url=["\']([^"]+%s[^"]+)["\']')
ids = (video_id, '')
# some pages contain multiple videos (like
# http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D),
# so we first try to look for json URLs that contain the video id from
# the 'vid' parameter.
patterns = [t % re.escape(_id) for _id in ids for t in patterns_templates]
json_url = self._html_search_regex(
patterns, webpage, 'json vp url', default=None)
if not json_url:
def find_iframe_url(webpage, default=NO_DEFAULT):
return self._html_search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>.+\bjson_url=.+?)\1',
webpage, 'iframe url', group='url', default=default)
iframe_url = find_iframe_url(webpage, None)
if not iframe_url:
embed_url = self._html_search_regex(
r'arte_vp_url_oembed=\'([^\']+?)\'', webpage, 'embed url', default=None)
if embed_url:
player = self._download_json(
embed_url, video_id, 'Downloading player page')
iframe_url = find_iframe_url(player['html'])
# en and es URLs produce react-based pages with different layout (e.g.
# http://www.arte.tv/guide/en/053330-002-A/carnival-italy?zone=world)
if not iframe_url:
program = self._search_regex(
r'program\s*:\s*({.+?["\']embed_html["\'].+?}),?\s*\n',
webpage, 'program', default=None)
if program:
embed_html = self._parse_json(program, video_id)
if embed_html:
iframe_url = find_iframe_url(embed_html['embed_html'])
if iframe_url:
json_url = compat_parse_qs(
compat_urllib_parse_urlparse(iframe_url).query)['json_url'][0]
if json_url:
title = self._search_regex(
r'<h3[^>]+title=(["\'])(?P<title>.+?)\1',
webpage, 'title', default=None, group='title')
return self._extract_from_json_url(json_url, video_id, lang, title=title)
# Different kind of embed URL (e.g.
# http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium)
entries = [
self.url_result(url)
for _, url in re.findall(r'<iframe[^>]+src=(["\'])(?P<url>.+?)\1', webpage)]
return self.playlist_result(entries)
# It also uses the arte_vp_url url from the webpage to extract the information
class ArteTVCreativeIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:creative'
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://creative.arte.tv/fr/episode/osmosis-episode-1',
'info_dict': {
'id': '057405-001-A',
'ext': 'mp4',
'title': 'OSMOSIS - N\'AYEZ PLUS PEUR D\'AIMER (1)',
'upload_date': '20150716',
},
}, {
'url': 'http://creative.arte.tv/fr/Monty-Python-Reunion',
'playlist_count': 11,
'add_ie': ['Youtube'],
}, {
'url': 'http://creative.arte.tv/de/episode/agentur-amateur-4-der-erste-kunde',
'only_matching': True,
}]
class ArteTVInfoIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:info'
_VALID_URL = r'https?://info\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://info.arte.tv/fr/service-civique-un-cache-misere',
'info_dict': {
'id': '067528-000-A',
'ext': 'mp4',
'title': 'Service civique, un cache misère ?',
'upload_date': '20160403',
},
}]
class ArteTVFutureIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:future'
_VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://future.arte.tv/fr/info-sciences/les-ecrevisses-aussi-sont-anxieuses',
'info_dict': {
'id': '050940-028-A',
'ext': 'mp4',
'title': 'Les écrevisses aussi peuvent être anxieuses',
'upload_date': '20140902',
},
}, {
'url': 'http://future.arte.tv/fr/la-science-est-elle-responsable',
'only_matching': True,
}]
class ArteTVDDCIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:ddc'
_VALID_URL = r'https?://ddc\.arte\.tv/(?P<lang>emission|folge)/(?P<id>[^/?#&]+)'
_TESTS = []
def _real_extract(self, url):
video_id, lang = self._extract_url_info(url)
if lang == 'folge':
lang = 'de'
elif lang == 'emission':
lang = 'fr'
webpage = self._download_webpage(url, video_id)
scriptElement = get_element_by_attribute('class', 'visu_video_block', webpage)
script_url = self._html_search_regex(r'src="(.*?)"', scriptElement, 'script url')
javascriptPlayerGenerator = self._download_webpage(script_url, video_id, 'Download javascript player generator')
json_url = self._search_regex(r"json_url=(.*)&rendering_place.*", javascriptPlayerGenerator, 'json url')
return self._extract_from_json_url(json_url, video_id, lang)
class ArteTVConcertIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:concert'
_VALID_URL = r'https?://concert\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://concert.arte.tv/de/notwist-im-pariser-konzertclub-divan-du-monde',
'md5': '9ea035b7bd69696b67aa2ccaaa218161',
'info_dict': {
'id': '186',
'ext': 'mp4',
'title': 'The Notwist im Pariser Konzertclub "Divan du Monde"',
'upload_date': '20140128',
'description': 'md5:486eb08f991552ade77439fe6d82c305',
},
}]
class ArteTVCinemaIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:cinema'
_VALID_URL = r'https?://cinema\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>.+)'
_TESTS = [{
'url': 'http://cinema.arte.tv/fr/article/les-ailes-du-desir-de-julia-reck',
'md5': 'a5b9dd5575a11d93daf0e3f404f45438',
'info_dict': {
'id': '062494-000-A',
'ext': 'mp4',
'title': 'Film lauréat du concours web - "Les ailes du désir" de Julia Reck',
'upload_date': '20150807',
},
}]
class ArteTVMagazineIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:magazine'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/magazine/[^/]+/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
_TESTS = [{
# Embedded via <iframe src="http://www.arte.tv/arte_vp/index.php?json_url=..."
'url': 'http://www.arte.tv/magazine/trepalium/fr/entretien-avec-le-realisateur-vincent-lannoo-trepalium',
'md5': '2a9369bcccf847d1c741e51416299f25',
'info_dict': {
'id': '065965-000-A',
'ext': 'mp4',
'title': 'Trepalium - Extrait Ep.01',
'upload_date': '20160121',
},
}, {
# Embedded via <iframe src="http://www.arte.tv/guide/fr/embed/054813-004-A/medium"
'url': 'http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium',
'md5': 'fedc64fc7a946110fe311634e79782ca',
'info_dict': {
'id': '054813-004_PLUS7-F',
'ext': 'mp4',
'title': 'Trepalium (4/6)',
'description': 'md5:10057003c34d54e95350be4f9b05cb40',
'upload_date': '20160218',
},
}, {
'url': 'http://www.arte.tv/magazine/metropolis/de/frank-woeste-german-paris-metropolis',
'only_matching': True,
}]
lang, video_id = re.match(self._VALID_URL, url).groups()
return self._extract_from_json_url(
'https://api.arte.tv/api/player/v1/config/%s/%s' % (lang, video_id),
video_id, lang)
class ArteTVEmbedIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:embed'
_VALID_URL = r'''(?x)
http://www\.arte\.tv
/(?:playerv2/embed|arte_vp/index)\.php\?json_url=
https://www\.arte\.tv
/player/v3/index\.php\?json_url=
(?P<json_url>
http://arte\.tv/papi/tvguide/videos/stream/player/
(?P<lang>[^/]+)/(?P<id>[^/]+)[^&]*
https?://api\.arte\.tv/api/player/v1/config/
(?P<lang>[^/]+)/(?P<id>\d{6}-\d{3}-[AF])
)
'''
_TESTS = []
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
lang = mobj.group('lang')
json_url = mobj.group('json_url')
json_url, lang, video_id = re.match(self._VALID_URL, url).groups()
return self._extract_from_json_url(json_url, video_id, lang)
class TheOperaPlatformIE(ArteTVPlus7IE):
IE_NAME = 'theoperaplatform'
_VALID_URL = r'https?://(?:www\.)?theoperaplatform\.eu/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.theoperaplatform.eu/de/opera/verdi-otello',
'md5': '970655901fa2e82e04c00b955e9afe7b',
'info_dict': {
'id': '060338-009-A',
'ext': 'mp4',
'title': 'Verdi - OTELLO',
'upload_date': '20160927',
},
}]
class ArteTVPlaylistIE(ArteTVBaseIE):
IE_NAME = 'arte.tv:playlist'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/[^#]*#collection/(?P<id>PL-\d+)'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/(?P<lang>fr|de|en|es|it|pl)/videos/(?P<id>RC-\d{6})'
_TESTS = [{
'url': 'http://www.arte.tv/guide/de/plus7/?country=DE#collection/PL-013263/ARTETV',
'url': 'https://www.arte.tv/en/videos/RC-016954/earn-a-living/',
'info_dict': {
'id': 'PL-013263',
'title': 'Areva & Uramin',
'description': 'md5:a1dc0312ce357c262259139cfd48c9bf',
'id': 'RC-016954',
'title': 'Earn a Living',
'description': 'md5:d322c55011514b3a7241f7fb80d494c2',
},
'playlist_mincount': 6,
}, {
'url': 'http://www.arte.tv/guide/de/playlists?country=DE#collection/PL-013190/ARTETV',
'only_matching': True,
}]
def _real_extract(self, url):
playlist_id, lang = self._extract_url_info(url)
lang, playlist_id = re.match(self._VALID_URL, url).groups()
collection = self._download_json(
'https://api.arte.tv/api/player/v1/collectionData/%s/%s?source=videos'
% (lang, playlist_id), playlist_id)

View File

@@ -5,14 +5,12 @@ import re
from .common import InfoExtractor
from .kaltura import KalturaIE
from ..utils import (
extract_attributes,
remove_end,
)
from ..utils import extract_attributes
class AsianCrushIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?asiancrush\.com/video/(?:[^/]+/)?0+(?P<id>\d+)v\b'
_VALID_URL_BASE = r'https?://(?:www\.)?(?P<host>(?:(?:asiancrush|yuyutv|midnightpulp)\.com|cocoro\.tv))'
_VALID_URL = r'%s/video/(?:[^/]+/)?0+(?P<id>\d+)v\b' % _VALID_URL_BASE
_TESTS = [{
'url': 'https://www.asiancrush.com/video/012869v/women-who-flirt/',
'md5': 'c3b740e48d0ba002a42c0b72857beae6',
@@ -20,7 +18,7 @@ class AsianCrushIE(InfoExtractor):
'id': '1_y4tmjm5r',
'ext': 'mp4',
'title': 'Women Who Flirt',
'description': 'md5:3db14e9186197857e7063522cb89a805',
'description': 'md5:7e986615808bcfb11756eb503a751487',
'timestamp': 1496936429,
'upload_date': '20170608',
'uploader_id': 'craig@crifkin.com',
@@ -28,10 +26,27 @@ class AsianCrushIE(InfoExtractor):
}, {
'url': 'https://www.asiancrush.com/video/she-was-pretty/011886v-pretty-episode-3/',
'only_matching': True,
}, {
'url': 'https://www.yuyutv.com/video/013886v/the-act-of-killing/',
'only_matching': True,
}, {
'url': 'https://www.yuyutv.com/video/peep-show/013922v-warring-factions/',
'only_matching': True,
}, {
'url': 'https://www.midnightpulp.com/video/010400v/drifters/',
'only_matching': True,
}, {
'url': 'https://www.midnightpulp.com/video/mononoke/016378v-zashikiwarashi-part-1/',
'only_matching': True,
}, {
'url': 'https://www.cocoro.tv/video/the-wonderful-wizard-of-oz/008878v-the-wonderful-wizard-of-oz-ep01/',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
@@ -51,7 +66,7 @@ class AsianCrushIE(InfoExtractor):
r'\bentry_id["\']\s*:\s*["\'](\d+)', webpage, 'entry id')
player = self._download_webpage(
'https://api.asiancrush.com/embeddedVideoPlayer', video_id,
'https://api.%s/embeddedVideoPlayer' % host, video_id,
query={'id': entry_id})
kaltura_id = self._search_regex(
@@ -63,15 +78,23 @@ class AsianCrushIE(InfoExtractor):
r'/p(?:artner_id)?/(\d+)', player, 'partner id',
default='513551')
return self.url_result(
'kaltura:%s:%s' % (partner_id, kaltura_id),
ie=KalturaIE.ie_key(), video_id=kaltura_id,
video_title=title)
description = self._html_search_regex(
r'(?s)<div[^>]+\bclass=["\']description["\'][^>]*>(.+?)</div>',
webpage, 'description', fatal=False)
return {
'_type': 'url_transparent',
'url': 'kaltura:%s:%s' % (partner_id, kaltura_id),
'ie_key': KalturaIE.ie_key(),
'id': video_id,
'title': title,
'description': description,
}
class AsianCrushPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?asiancrush\.com/series/0+(?P<id>\d+)s\b'
_TEST = {
_VALID_URL = r'%s/series/0+(?P<id>\d+)s\b' % AsianCrushIE._VALID_URL_BASE
_TESTS = [{
'url': 'https://www.asiancrush.com/series/012481s/scholar-walks-night/',
'info_dict': {
'id': '12481',
@@ -79,7 +102,16 @@ class AsianCrushPlaylistIE(InfoExtractor):
'description': 'md5:7addd7c5132a09fd4741152d96cce886',
},
'playlist_count': 20,
}
}, {
'url': 'https://www.yuyutv.com/series/013920s/peep-show/',
'only_matching': True,
}, {
'url': 'https://www.midnightpulp.com/series/016375s/mononoke/',
'only_matching': True,
}, {
'url': 'https://www.cocoro.tv/series/008549s/the-wonderful-wizard-of-oz/',
'only_matching': True,
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
@@ -96,15 +128,15 @@ class AsianCrushPlaylistIE(InfoExtractor):
entries.append(self.url_result(
mobj.group('url'), ie=AsianCrushIE.ie_key()))
title = remove_end(
self._html_search_regex(
r'(?s)<h1\b[^>]\bid=["\']movieTitle[^>]+>(.+?)</h1>', webpage,
'title', default=None) or self._og_search_title(
webpage, default=None) or self._html_search_meta(
'twitter:title', webpage, 'title',
default=None) or self._search_regex(
r'<title>([^<]+)</title>', webpage, 'title', fatal=False),
' | AsianCrush')
title = self._html_search_regex(
r'(?s)<h1\b[^>]\bid=["\']movieTitle[^>]+>(.+?)</h1>', webpage,
'title', default=None) or self._og_search_title(
webpage, default=None) or self._html_search_meta(
'twitter:title', webpage, 'title',
default=None) or self._search_regex(
r'<title>([^<]+)</title>', webpage, 'title', fatal=False)
if title:
title = re.sub(r'\s*\|\s*.+?$', '', title)
description = self._og_search_description(
webpage, default=None) or self._html_search_meta(

View File

@@ -99,8 +99,8 @@ class BeamProLiveIE(BeamProBaseIE):
class BeamProVodIE(BeamProBaseIE):
IE_NAME = 'Mixer:vod'
_VALID_URL = r'https?://(?:\w+\.)?(?:beam\.pro|mixer\.com)/[^/?#&]+\?.*?\bvod=(?P<id>\d+)'
_TEST = {
_VALID_URL = r'https?://(?:\w+\.)?(?:beam\.pro|mixer\.com)/[^/?#&]+\?.*?\bvod=(?P<id>[^?#&]+)'
_TESTS = [{
'url': 'https://mixer.com/willow8714?vod=2259830',
'md5': 'b2431e6e8347dc92ebafb565d368b76b',
'info_dict': {
@@ -119,7 +119,13 @@ class BeamProVodIE(BeamProBaseIE):
'params': {
'skip_download': True,
},
}
}, {
'url': 'https://mixer.com/streamer?vod=IxFno1rqC0S_XJ1a2yGgNw',
'only_matching': True,
}, {
'url': 'https://mixer.com/streamer?vod=Rh3LY0VAqkGpEQUe2pN-ig',
'only_matching': True,
}]
@staticmethod
def _extract_format(vod, vod_type):

View File

@@ -1,7 +1,10 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..compat import (
compat_str,
compat_urlparse,
)
from ..utils import (
int_or_none,
unified_timestamp,
@@ -11,6 +14,7 @@ from ..utils import (
class BeegIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?beeg\.(?:com|porn(?:/video)?)/(?P<id>\d+)'
_TESTS = [{
# api/v6 v1
'url': 'http://beeg.com/5416503',
'md5': 'a1a1b1a8bc70a89e49ccfd113aed0820',
'info_dict': {
@@ -24,6 +28,14 @@ class BeegIE(InfoExtractor):
'tags': list,
'age_limit': 18,
}
}, {
# api/v6 v2
'url': 'https://beeg.com/1941093077?t=911-1391',
'only_matching': True,
}, {
# api/v6 v2 w/o t
'url': 'https://beeg.com/1277207756',
'only_matching': True,
}, {
'url': 'https://beeg.porn/video/5416503',
'only_matching': True,
@@ -41,11 +53,25 @@ class BeegIE(InfoExtractor):
r'beeg_version\s*=\s*([\da-zA-Z_-]+)', webpage, 'beeg version',
default='1546225636701')
if len(video_id) >= 10:
query = {
'v': 2,
}
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
t = qs.get('t', [''])[0].split('-')
if len(t) > 1:
query.update({
's': t[0],
'e': t[1],
})
else:
query = {'v': 1}
for api_path in ('', 'api.'):
video = self._download_json(
'https://%sbeeg.com/api/v6/%s/video/%s'
% (api_path, beeg_version, video_id), video_id,
fatal=api_path == 'api.')
fatal=api_path == 'api.', query=query)
if video:
break

View File

@@ -6,7 +6,6 @@ from ..utils import (
ExtractorError,
remove_end,
)
from .rudo import RudoIE
class BioBioChileTVIE(InfoExtractor):
@@ -41,11 +40,15 @@ class BioBioChileTVIE(InfoExtractor):
}, {
'url': 'http://www.biobiochile.cl/noticias/bbtv/comentarios-bio-bio/2016/07/08/edecanes-del-congreso-figuras-decorativas-que-le-cuestan-muy-caro-a-los-chilenos.shtml',
'info_dict': {
'id': 'edecanes-del-congreso-figuras-decorativas-que-le-cuestan-muy-caro-a-los-chilenos',
'id': 'b4xd0LK3SK',
'ext': 'mp4',
'uploader': '(none)',
'upload_date': '20160708',
'title': 'Edecanes del Congreso: Figuras decorativas que le cuestan muy caro a los chilenos',
# TODO: fix url_transparent information overriding
# 'uploader': 'Juan Pablo Echenique',
'title': 'Comentario Oscar Cáceres',
},
'params': {
# empty m3u8 manifest
'skip_download': True,
},
}, {
'url': 'http://tv.biobiochile.cl/notas/2015/10/22/ninos-transexuales-de-quien-es-la-decision.shtml',
@@ -60,7 +63,9 @@ class BioBioChileTVIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
rudo_url = RudoIE._extract_url(webpage)
rudo_url = self._search_regex(
r'<iframe[^>]+src=(?P<q1>[\'"])(?P<url>(?:https?:)?//rudo\.video/vod/[0-9a-zA-Z]+)(?P=q1)',
webpage, 'embed URL', None, group='url')
if not rudo_url:
raise ExtractorError('No videos found')
@@ -68,7 +73,7 @@ class BioBioChileTVIE(InfoExtractor):
thumbnail = self._og_search_thumbnail(webpage)
uploader = self._html_search_regex(
r'<a[^>]+href=["\']https?://(?:busca|www)\.biobiochile\.cl/(?:lista/)?(?:author|autor)[^>]+>(.+?)</a>',
r'<a[^>]+href=["\'](?:https?://(?:busca|www)\.biobiochile\.cl)?/(?:lista/)?(?:author|autor)[^>]+>(.+?)</a>',
webpage, 'uploader', fatal=False)
return {

View File

@@ -71,7 +71,7 @@ class BleacherReportIE(InfoExtractor):
video = article_data.get('video')
if video:
video_type = video['type']
if video_type == 'cms.bleacherreport.com':
if video_type in ('cms.bleacherreport.com', 'vid.bleacherreport.com'):
info['url'] = 'http://bleacherreport.com/video_embed?id=%s' % video['id']
elif video_type == 'ooyala.com':
info['url'] = 'ooyala:%s' % video['id']
@@ -87,9 +87,9 @@ class BleacherReportIE(InfoExtractor):
class BleacherReportCMSIE(AMPIE):
_VALID_URL = r'https?://(?:www\.)?bleacherreport\.com/video_embed\?id=(?P<id>[0-9a-f-]{36})'
_VALID_URL = r'https?://(?:www\.)?bleacherreport\.com/video_embed\?id=(?P<id>[0-9a-f-]{36}|\d{5})'
_TESTS = [{
'url': 'http://bleacherreport.com/video_embed?id=8fd44c2f-3dc5-4821-9118-2c825a98c0e1',
'url': 'http://bleacherreport.com/video_embed?id=8fd44c2f-3dc5-4821-9118-2c825a98c0e1&library=video-cms',
'md5': '2e4b0a997f9228ffa31fada5c53d1ed1',
'info_dict': {
'id': '8fd44c2f-3dc5-4821-9118-2c825a98c0e1',
@@ -101,6 +101,6 @@ class BleacherReportCMSIE(AMPIE):
def _real_extract(self, url):
video_id = self._match_id(url)
info = self._extract_feed_info('http://cms.bleacherreport.com/media/items/%s/akamai.json' % video_id)
info = self._extract_feed_info('http://vid.bleacherreport.com/videos/%s.akamai' % video_id)
info['id'] = video_id
return info

View File

@@ -483,7 +483,7 @@ class BrightcoveLegacyIE(InfoExtractor):
class BrightcoveNewIE(AdobePassIE):
IE_NAME = 'brightcove:new'
_VALID_URL = r'https?://players\.brightcove\.net/(?P<account_id>\d+)/(?P<player_id>[^/]+)_(?P<embed>[^/]+)/index\.html\?.*videoId=(?P<video_id>\d+|ref:[^&]+)'
_VALID_URL = r'https?://players\.brightcove\.net/(?P<account_id>\d+)/(?P<player_id>[^/]+)_(?P<embed>[^/]+)/index\.html\?.*(?P<content_type>video|playlist)Id=(?P<video_id>\d+|ref:[^&]+)'
_TESTS = [{
'url': 'http://players.brightcove.net/929656772001/e41d32dc-ec74-459e-a845-6c69f7b724ea_default/index.html?videoId=4463358922001',
'md5': 'c8100925723840d4b0d243f7025703be',
@@ -516,6 +516,21 @@ class BrightcoveNewIE(AdobePassIE):
# m3u8 download
'skip_download': True,
}
}, {
# playlist stream
'url': 'https://players.brightcove.net/1752604059001/S13cJdUBz_default/index.html?playlistId=5718313430001',
'info_dict': {
'id': '5718313430001',
'title': 'No Audio Playlist',
},
'playlist_count': 7,
'params': {
# m3u8 download
'skip_download': True,
}
}, {
'url': 'http://players.brightcove.net/5690807595001/HyZNerRl7_default/index.html?playlistId=5743160747001',
'only_matching': True,
}, {
# ref: prefixed video id
'url': 'http://players.brightcove.net/3910869709001/21519b5c-4b3b-4363-accb-bdc8f358f823_default/index.html?videoId=ref:7069442',
@@ -715,7 +730,7 @@ class BrightcoveNewIE(AdobePassIE):
'ip_blocks': smuggled_data.get('geo_ip_blocks'),
})
account_id, player_id, embed, video_id = re.match(self._VALID_URL, url).groups()
account_id, player_id, embed, content_type, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(
'http://players.brightcove.net/%s/%s_%s/index.min.js'
@@ -736,7 +751,7 @@ class BrightcoveNewIE(AdobePassIE):
r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
webpage, 'policy key', group='pk')
api_url = 'https://edge.api.brightcove.com/playback/v1/accounts/%s/videos/%s' % (account_id, video_id)
api_url = 'https://edge.api.brightcove.com/playback/v1/accounts/%s/%ss/%s' % (account_id, content_type, video_id)
headers = {
'Accept': 'application/json;pk=%s' % policy_key,
}
@@ -771,5 +786,12 @@ class BrightcoveNewIE(AdobePassIE):
'tveToken': tve_token,
})
if content_type == 'playlist':
return self.playlist_result(
[self._parse_brightcove_metadata(vid, vid.get('id'), headers)
for vid in json_data.get('videos', []) if vid.get('id')],
json_data.get('id'), json_data.get('name'),
json_data.get('description'))
return self._parse_brightcove_metadata(
json_data, video_id, headers=headers)

View File

@@ -220,7 +220,7 @@ class InfoExtractor(object):
* "preference" (optional, int) - quality of the image
* "width" (optional, int)
* "height" (optional, int)
* "resolution" (optional, string "{width}x{height"},
* "resolution" (optional, string "{width}x{height}",
deprecated)
* "filesize" (optional, int)
thumbnail: Full URL to a video thumbnail image.

View File

@@ -103,19 +103,6 @@ class CrunchyrollBaseIE(InfoExtractor):
def _real_initialize(self):
self._login()
def _download_webpage(self, url_or_request, *args, **kwargs):
request = (url_or_request if isinstance(url_or_request, compat_urllib_request.Request)
else sanitized_Request(url_or_request))
# Accept-Language must be set explicitly to accept any language to avoid issues
# similar to https://github.com/ytdl-org/youtube-dl/issues/6797.
# Along with IP address Crunchyroll uses Accept-Language to guess whether georestriction
# should be imposed or not (from what I can see it just takes the first language
# ignoring the priority and requires it to correspond the IP). By the way this causes
# Crunchyroll to not work in georestriction cases in some browsers that don't place
# the locale lang first in header. However allowing any language seems to workaround the issue.
request.add_header('Accept-Language', '*')
return super(CrunchyrollBaseIE, self)._download_webpage(request, *args, **kwargs)
@staticmethod
def _add_skip_wall(url):
parsed_url = compat_urlparse.urlparse(url)
@@ -269,6 +256,19 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
'1080': ('80', '108'),
}
def _download_webpage(self, url_or_request, *args, **kwargs):
request = (url_or_request if isinstance(url_or_request, compat_urllib_request.Request)
else sanitized_Request(url_or_request))
# Accept-Language must be set explicitly to accept any language to avoid issues
# similar to https://github.com/ytdl-org/youtube-dl/issues/6797.
# Along with IP address Crunchyroll uses Accept-Language to guess whether georestriction
# should be imposed or not (from what I can see it just takes the first language
# ignoring the priority and requires it to correspond the IP). By the way this causes
# Crunchyroll to not work in georestriction cases in some browsers that don't place
# the locale lang first in header. However allowing any language seems to workaround the issue.
request.add_header('Accept-Language', '*')
return super(CrunchyrollBaseIE, self)._download_webpage(request, *args, **kwargs)
def _decrypt_subtitles(self, data, iv, id):
data = bytes_to_intlist(compat_b64decode(data))
iv = bytes_to_intlist(compat_b64decode(iv))
@@ -661,9 +661,8 @@ class CrunchyrollShowPlaylistIE(CrunchyrollBaseIE):
webpage = self._download_webpage(
self._add_skip_wall(url), show_id,
headers=self.geo_verification_headers())
title = self._html_search_regex(
r'(?s)<h1[^>]*>\s*<span itemprop="name">(.*?)</span>',
webpage, 'title')
title = self._html_search_meta('name', webpage, default=None)
episode_paths = re.findall(
r'(?s)<li id="showview_videos_media_(\d+)"[^>]+>.*?<a href="([^"]+)"',
webpage)

View File

@@ -3,6 +3,7 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import unified_timestamp
from .youtube import YoutubeIE
class CtsNewsIE(InfoExtractor):
@@ -14,8 +15,8 @@ class CtsNewsIE(InfoExtractor):
'info_dict': {
'id': '201501291578109',
'ext': 'mp4',
'title': '以色列.真主黨交火 3人死亡',
'description': '以色列和黎巴嫩真主黨,爆發五年最嚴重衝突,雙方砲轟交火,兩名以軍死亡,還有一名西班牙籍的聯合國維和人...',
'title': '以色列.真主黨交火 3人死亡 - 華視新聞網',
'description': '以色列和黎巴嫩真主黨,爆發五年最嚴重衝突,雙方砲轟交火,兩名以軍死亡,還有一名西班牙籍的聯合國維和人員也不幸罹難。大陸陝西、河南、安徽、江蘇和湖北五個省份出現大暴雪,嚴重影響陸空交通,不過九華山卻出現...',
'timestamp': 1422528540,
'upload_date': '20150129',
}
@@ -26,7 +27,7 @@ class CtsNewsIE(InfoExtractor):
'info_dict': {
'id': '201309031304098',
'ext': 'mp4',
'title': '韓國31歲童顏男 貌如十多歲小孩',
'title': '韓國31歲童顏男 貌如十多歲小孩 - 華視新聞網',
'description': '越有年紀的人越希望看起來年輕一點而南韓卻有一位31歲的男子看起來像是11、12歲的小孩身...',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1378205880,
@@ -62,8 +63,7 @@ class CtsNewsIE(InfoExtractor):
video_url = mp4_feed['source_url']
else:
self.to_screen('Not CTSPlayer video, trying Youtube...')
youtube_url = self._search_regex(
r'src="(//www\.youtube\.com/embed/[^"]+)"', page, 'youtube url')
youtube_url = YoutubeIE._extract_url(page)
return self.url_result(youtube_url, ie='Youtube')

View File

@@ -137,10 +137,16 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
@staticmethod
def _extract_urls(webpage):
urls = []
# Look for embedded Dailymotion player
matches = re.findall(
r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage)
return list(map(lambda m: unescapeHTML(m[1]), matches))
# https://developer.dailymotion.com/player#player-parameters
for mobj in re.finditer(
r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage):
urls.append(unescapeHTML(mobj.group('url')))
for mobj in re.finditer(
r'(?s)DM\.player\([^,]+,\s*{.*?video[\'"]?\s*:\s*["\']?(?P<id>[0-9a-zA-Z]+).+?}\s*\);', webpage):
urls.append('https://www.dailymotion.com/embed/video/' + mobj.group('id'))
return urls
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@@ -7,50 +7,51 @@ from .common import InfoExtractor
class DBTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dbtv\.no/(?:[^/]+/)?(?P<id>[0-9]+)(?:#(?P<display_id>.+))?'
_VALID_URL = r'https?://(?:www\.)?dagbladet\.no/video/(?:(?:embed|(?P<display_id>[^/]+))/)?(?P<id>[0-9A-Za-z_-]{11}|[a-zA-Z0-9]{8})'
_TESTS = [{
'url': 'http://dbtv.no/3649835190001#Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen',
'md5': '2e24f67936517b143a234b4cadf792ec',
'url': 'https://www.dagbladet.no/video/PynxJnNWChE/',
'md5': 'b8f850ba1860adbda668d367f9b77699',
'info_dict': {
'id': '3649835190001',
'display_id': 'Skulle_teste_ut_fornøyelsespark,_men_kollegaen_var_bare_opptatt_av_bikinikroppen',
'id': 'PynxJnNWChE',
'ext': 'mp4',
'title': 'Skulle teste ut fornøyelsespark, men kollegaen var bare opptatt av bikinikroppen',
'description': 'md5:1504a54606c4dde3e4e61fc97aa857e0',
'description': 'md5:49cc8370e7d66e8a2ef15c3b4631fd3f',
'thumbnail': r're:https?://.*\.jpg',
'timestamp': 1404039863,
'upload_date': '20140629',
'duration': 69.544,
'uploader_id': '1027729757001',
'upload_date': '20160916',
'duration': 69,
'uploader_id': 'UCk5pvsyZJoYJBd7_oFPTlRQ',
'uploader': 'Dagbladet',
},
'add_ie': ['BrightcoveNew']
'add_ie': ['Youtube']
}, {
'url': 'http://dbtv.no/3649835190001',
'url': 'https://www.dagbladet.no/video/embed/xlGmyIeN9Jo/?autoplay=false',
'only_matching': True,
}, {
'url': 'http://www.dbtv.no/lazyplayer/4631135248001',
'only_matching': True,
}, {
'url': 'http://dbtv.no/vice/5000634109001',
'only_matching': True,
}, {
'url': 'http://dbtv.no/filmtrailer/3359293614001',
'url': 'https://www.dagbladet.no/video/truer-iran-bor-passe-dere/PalfB2Cw',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return [url for _, url in re.findall(
r'<iframe[^>]+src=(["\'])((?:https?:)?//(?:www\.)?dbtv\.no/(?:lazy)?player/\d+.*?)\1',
r'<iframe[^>]+src=(["\'])((?:https?:)?//(?:www\.)?dagbladet\.no/video/embed/(?:[0-9A-Za-z_-]{11}|[a-zA-Z0-9]{8}).*?)\1',
webpage)]
def _real_extract(self, url):
video_id, display_id = re.match(self._VALID_URL, url).groups()
return {
display_id, video_id = re.match(self._VALID_URL, url).groups()
info = {
'_type': 'url_transparent',
'url': 'http://players.brightcove.net/1027729757001/default_default/index.html?videoId=%s' % video_id,
'id': video_id,
'display_id': display_id,
'ie_key': 'BrightcoveNew',
}
if len(video_id) == 11:
info.update({
'url': video_id,
'ie_key': 'Youtube',
})
else:
info.update({
'url': 'jwplatform:' + video_id,
'ie_key': 'JWPlatform',
})
return info

View File

@@ -19,9 +19,9 @@ from ..compat import compat_HTTPError
class DiscoveryIE(DiscoveryGoBaseIE):
_VALID_URL = r'''(?x)https?://
(?P<site>
(?:(?:www|go)\.)?discovery|
(?:www\.)?
(?:
discovery|
investigationdiscovery|
discoverylife|
animalplanet|
@@ -56,6 +56,9 @@ class DiscoveryIE(DiscoveryGoBaseIE):
}, {
'url': 'https://www.investigationdiscovery.com/tv-shows/final-vision/full-episodes/final-vision',
'only_matching': True,
}, {
'url': 'https://go.discovery.com/tv-shows/alaskan-bush-people/videos/follow-your-own-road',
'only_matching': True,
}]
_GEO_COUNTRIES = ['US']
_GEO_BYPASS = False

View File

@@ -0,0 +1,94 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import int_or_none
class DLiveVODIE(InfoExtractor):
IE_NAME = 'dlive:vod'
_VALID_URL = r'https?://(?:www\.)?dlive\.tv/p/(?P<uploader_id>.+?)\+(?P<id>[a-zA-Z0-9]+)'
_TEST = {
'url': 'https://dlive.tv/p/pdp+3mTzOl4WR',
'info_dict': {
'id': '3mTzOl4WR',
'ext': 'mp4',
'title': 'Minecraft with james charles epic',
'upload_date': '20190701',
'timestamp': 1562011015,
'uploader_id': 'pdp',
}
}
def _real_extract(self, url):
uploader_id, vod_id = re.match(self._VALID_URL, url).groups()
broadcast = self._download_json(
'https://graphigo.prd.dlive.tv/', vod_id,
data=json.dumps({'query': '''query {
pastBroadcast(permlink:"%s+%s") {
content
createdAt
length
playbackUrl
title
thumbnailUrl
viewCount
}
}''' % (uploader_id, vod_id)}).encode())['data']['pastBroadcast']
title = broadcast['title']
formats = self._extract_m3u8_formats(
broadcast['playbackUrl'], vod_id, 'mp4', 'm3u8_native')
self._sort_formats(formats)
return {
'id': vod_id,
'title': title,
'uploader_id': uploader_id,
'formats': formats,
'description': broadcast.get('content'),
'thumbnail': broadcast.get('thumbnailUrl'),
'timestamp': int_or_none(broadcast.get('createdAt'), 1000),
'view_count': int_or_none(broadcast.get('viewCount')),
}
class DLiveStreamIE(InfoExtractor):
IE_NAME = 'dlive:stream'
_VALID_URL = r'https?://(?:www\.)?dlive\.tv/(?!p/)(?P<id>[\w.-]+)'
def _real_extract(self, url):
display_name = self._match_id(url)
user = self._download_json(
'https://graphigo.prd.dlive.tv/', display_name,
data=json.dumps({'query': '''query {
userByDisplayName(displayname:"%s") {
livestream {
content
createdAt
title
thumbnailUrl
watchingCount
}
username
}
}''' % display_name}).encode())['data']['userByDisplayName']
livestream = user['livestream']
title = livestream['title']
username = user['username']
formats = self._extract_m3u8_formats(
'https://live.prd.dlive.tv/hls/live/%s.m3u8' % username,
display_name, 'mp4')
self._sort_formats(formats)
return {
'id': display_name,
'title': self._live_title(title),
'uploader': display_name,
'uploader_id': username,
'formats': formats,
'description': livestream.get('content'),
'thumbnail': livestream.get('thumbnailUrl'),
'is_live': True,
'timestamp': int_or_none(livestream.get('createdAt'), 1000),
'view_count': int_or_none(livestream.get('watchingCount')),
}

View File

@@ -24,7 +24,7 @@ from ..utils import (
class DRTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dr\.dk/(?:tv/se|nyheder|radio/ondemand)/(?:[^/]+/)*(?P<id>[\da-z-]+)(?:[/#?]|$)'
_VALID_URL = r'https?://(?:www\.)?dr\.dk/(?:tv/se|nyheder|radio(?:/ondemand)?)/(?:[^/]+/)*(?P<id>[\da-z-]+)(?:[/#?]|$)'
_GEO_BYPASS = False
_GEO_COUNTRIES = ['DK']
IE_NAME = 'drtv'
@@ -80,6 +80,9 @@ class DRTVIE(InfoExtractor):
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.dr.dk/radio/p4kbh/regionale-nyheder-kh4/p4-nyheder-2019-06-26-17-30-9',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@@ -2,6 +2,7 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..compat import (
@@ -18,7 +19,7 @@ from ..utils import (
class EinthusanIE(InfoExtractor):
_VALID_URL = r'https?://einthusan\.tv/movie/watch/(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://(?P<host>einthusan\.(?:tv|com))/movie/watch/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://einthusan.tv/movie/watch/9097/',
'md5': 'ff0f7f2065031b8a2cf13a933731c035',
@@ -32,6 +33,9 @@ class EinthusanIE(InfoExtractor):
}, {
'url': 'https://einthusan.tv/movie/watch/51MZ/?lang=hindi',
'only_matching': True,
}, {
'url': 'https://einthusan.com/movie/watch/9097/',
'only_matching': True,
}]
# reversed from jsoncrypto.prototype.decrypt() in einthusan-PGMovieWatcher.js
@@ -41,7 +45,9 @@ class EinthusanIE(InfoExtractor):
)).decode('utf-8'), video_id)
def _real_extract(self, url):
video_id = self._match_id(url)
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
@@ -53,7 +59,7 @@ class EinthusanIE(InfoExtractor):
page_id = self._html_search_regex(
'<html[^>]+data-pageid="([^"]+)"', webpage, 'page ID')
video_data = self._download_json(
'https://einthusan.tv/ajax/movie/watch/%s/' % video_id, video_id,
'https://%s/ajax/movie/watch/%s/' % (host, video_id), video_id,
data=urlencode_postdata({
'xEvent': 'UIVideoPlayer.PingOutcome',
'xJson': json.dumps({

View File

@@ -216,17 +216,14 @@ class FiveThirtyEightIE(InfoExtractor):
_TEST = {
'url': 'http://fivethirtyeight.com/features/how-the-6-8-raiders-can-still-make-the-playoffs/',
'info_dict': {
'id': '21846851',
'ext': 'mp4',
'id': '56032156',
'ext': 'flv',
'title': 'FiveThirtyEight: The Raiders can still make the playoffs',
'description': 'Neil Paine breaks down the simplest scenario that will put the Raiders into the playoffs at 8-8.',
'timestamp': 1513960621,
'upload_date': '20171222',
},
'params': {
'skip_download': True,
},
'expected_warnings': ['Unable to download f4m manifest'],
}
def _real_extract(self, url):
@@ -234,9 +231,8 @@ class FiveThirtyEightIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
video_id = self._search_regex(
r'data-video-id=["\'](?P<id>\d+)',
webpage, 'video id', group='id')
embed_url = self._search_regex(
r'<iframe[^>]+src=["\'](https?://fivethirtyeight\.abcnews\.go\.com/video/embed/\d+/\d+)',
webpage, 'embed url')
return self.url_result(
'http://espn.go.com/video/clip?id=%s' % video_id, ESPNIE.ie_key())
return self.url_result(embed_url, 'AbcNewsVideo')

View File

@@ -58,17 +58,8 @@ from .ard import (
ARDMediathekIE,
)
from .arte import (
ArteTvIE,
ArteTVPlus7IE,
ArteTVCreativeIE,
ArteTVConcertIE,
ArteTVInfoIE,
ArteTVFutureIE,
ArteTVCinemaIE,
ArteTVDDCIE,
ArteTVMagazineIE,
ArteTVEmbedIE,
TheOperaPlatformIE,
ArteTVPlaylistIE,
)
from .asiancrush import (
@@ -404,11 +395,7 @@ from .frontendmasters import (
FrontendMastersCourseIE
)
from .funimation import FunimationIE
from .funk import (
FunkMixIE,
FunkChannelIE,
)
from .funnyordie import FunnyOrDieIE
from .funk import FunkIE
from .fusion import FusionIE
from .fxnetworks import FXNetworksIE
from .gaia import GaiaIE
@@ -592,6 +579,7 @@ from .linkedin import (
)
from .linuxacademy import LinuxAcademyIE
from .litv import LiTVIE
from .livejournal import LiveJournalIE
from .liveleak import (
LiveLeakIE,
LiveLeakEmbedIE,
@@ -745,7 +733,6 @@ from .nexx import (
NexxIE,
NexxEmbedIE,
)
from .nfb import NFBIE
from .nfl import NFLIE
from .nhk import NhkVodIE
from .nhl import NHLIE
@@ -892,8 +879,9 @@ from .porncom import PornComIE
from .pornhd import PornHdIE
from .pornhub import (
PornHubIE,
PornHubPlaylistIE,
PornHubUserVideosIE,
PornHubUserIE,
PornHubPagedVideoListIE,
PornHubUserVideosUploadIE,
)
from .pornotube import PornotubeIE
from .pornovoisines import PornoVoisinesIE
@@ -980,7 +968,6 @@ from .rts import RTSIE
from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE, RTVELiveIE, RTVETelevisionIE
from .rtvnh import RTVNHIE
from .rtvs import RTVSIE
from .rudo import RudoIE
from .ruhd import RUHDIE
from .rutube import (
RutubeIE,
@@ -1268,6 +1255,10 @@ from .udn import UDNEmbedIE
from .ufctv import UFCTVIE
from .uktvplay import UKTVPlayIE
from .digiteka import DigitekaIE
from .dlive import (
DLiveVODIE,
DLiveStreamIE,
)
from .umg import UMGDeIE
from .unistra import UnistraIE
from .unity import UnityIE
@@ -1457,6 +1448,7 @@ from .yahoo import (
YahooSearchIE,
YahooGyaOPlayerIE,
YahooGyaOIE,
YahooJapanNewsIE,
)
from .yandexdisk import YandexDiskIE
from .yandexmusic import (

View File

@@ -428,7 +428,7 @@ class FacebookIE(InfoExtractor):
timestamp = int_or_none(self._search_regex(
r'<abbr[^>]+data-utime=["\'](\d+)', webpage,
'timestamp', default=None))
thumbnail = self._og_search_thumbnail(webpage)
thumbnail = self._html_search_meta(['og:image', 'twitter:image'], webpage)
view_count = parse_count(self._search_regex(
r'\bviewCount\s*:\s*["\']([\d,.]+)', webpage, 'view count',

View File

@@ -9,7 +9,7 @@ from ..utils import int_or_none
class FiveTVIE(InfoExtractor):
_VALID_URL = r'''(?x)
http://
https?://
(?:www\.)?5-tv\.ru/
(?:
(?:[^/]+/)+(?P<id>\d+)|
@@ -39,6 +39,7 @@ class FiveTVIE(InfoExtractor):
'duration': 180,
},
}, {
# redirect to https://www.5-tv.ru/projects/1000095/izvestia-glavnoe/
'url': 'http://www.5-tv.ru/glavnoe/#itemDetails',
'info_dict': {
'id': 'glavnoe',
@@ -46,6 +47,7 @@ class FiveTVIE(InfoExtractor):
'title': r're:^Итоги недели с \d+ по \d+ \w+ \d{4} года$',
'thumbnail': r're:^https?://.*\.jpg$',
},
'skip': 'redirect to «Известия. Главное» project page',
}, {
'url': 'http://www.5-tv.ru/glavnoe/broadcasts/508645/',
'only_matching': True,
@@ -70,7 +72,7 @@ class FiveTVIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(
[r'<div[^>]+?class="flowplayer[^>]+?data-href="([^"]+)"',
[r'<div[^>]+?class="(?:flow)?player[^>]+?data-href="([^"]+)"',
r'<a[^>]+?href="([^"]+)"[^>]+?class="videoplayer"'],
webpage, 'video url')

View File

@@ -1,89 +1,21 @@
# coding: utf-8
from __future__ import unicode_literals
import itertools
import re
from .common import InfoExtractor
from .nexx import NexxIE
from ..compat import compat_str
from ..utils import (
int_or_none,
try_get,
str_or_none,
)
class FunkBaseIE(InfoExtractor):
_HEADERS = {
'Accept': '*/*',
'Accept-Language': 'en-US,en;q=0.9,ru;q=0.8',
'authorization': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnROYW1lIjoid2ViYXBwLXYzMSIsInNjb3BlIjoic3RhdGljLWNvbnRlbnQtYXBpLGN1cmF0aW9uLWFwaSxuZXh4LWNvbnRlbnQtYXBpLXYzMSx3ZWJhcHAtYXBpIn0.mbuG9wS9Yf5q6PqgR4fiaRFIagiHk9JhwoKES7ksVX4',
}
_AUTH = 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnROYW1lIjoid2ViYXBwLXYzMSIsInNjb3BlIjoic3RhdGljLWNvbnRlbnQtYXBpLGN1cmF0aW9uLWFwaSxuZXh4LWNvbnRlbnQtYXBpLXYzMSx3ZWJhcHAtYXBpIn0.mbuG9wS9Yf5q6PqgR4fiaRFIagiHk9JhwoKES7ksVX4'
@staticmethod
def _make_headers(referer):
headers = FunkBaseIE._HEADERS.copy()
headers['Referer'] = referer
return headers
def _make_url_result(self, video):
return {
'_type': 'url_transparent',
'url': 'nexx:741:%s' % video['sourceId'],
'ie_key': NexxIE.ie_key(),
'id': video['sourceId'],
'title': video.get('title'),
'description': video.get('description'),
'duration': int_or_none(video.get('duration')),
'season_number': int_or_none(video.get('seasonNr')),
'episode_number': int_or_none(video.get('episodeNr')),
}
class FunkMixIE(FunkBaseIE):
_VALID_URL = r'https?://(?:www\.)?funk\.net/mix/(?P<id>[^/]+)/(?P<alias>[^/?#&]+)'
class FunkIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?funk\.net/(?:channel|playlist)/[^/]+/(?P<display_id>[0-9a-z-]+)-(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.funk.net/mix/59d65d935f8b160001828b5b/die-realste-kifferdoku-aller-zeiten',
'md5': '8edf617c2f2b7c9847dfda313f199009',
'info_dict': {
'id': '123748',
'ext': 'mp4',
'title': '"Die realste Kifferdoku aller Zeiten"',
'description': 'md5:c97160f5bafa8d47ec8e2e461012aa9d',
'timestamp': 1490274721,
'upload_date': '20170323',
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
mix_id = mobj.group('id')
alias = mobj.group('alias')
lists = self._download_json(
'https://www.funk.net/api/v3.1/curation/curatedLists/',
mix_id, headers=self._make_headers(url), query={
'size': 100,
})['_embedded']['curatedListList']
metas = next(
l for l in lists
if mix_id in (l.get('entityId'), l.get('alias')))['videoMetas']
video = next(
meta['videoDataDelegate']
for meta in metas
if try_get(
meta, lambda x: x['videoDataDelegate']['alias'],
compat_str) == alias)
return self._make_url_result(video)
class FunkChannelIE(FunkBaseIE):
_VALID_URL = r'https?://(?:www\.)?funk\.net/channel/(?P<id>[^/]+)/(?P<alias>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.funk.net/channel/ba/die-lustigsten-instrumente-aus-dem-internet-teil-2',
'url': 'https://www.funk.net/channel/ba-793/die-lustigsten-instrumente-aus-dem-internet-teil-2-1155821',
'md5': '8dd9d9ab59b4aa4173b3197f2ea48e81',
'info_dict': {
'id': '1155821',
'ext': 'mp4',
@@ -92,83 +24,26 @@ class FunkChannelIE(FunkBaseIE):
'timestamp': 1514507395,
'upload_date': '20171229',
},
'params': {
'skip_download': True,
},
}, {
# only available via byIdList API
'url': 'https://www.funk.net/channel/informr/martin-sonneborn-erklaert-die-eu',
'info_dict': {
'id': '205067',
'ext': 'mp4',
'title': 'Martin Sonneborn erklärt die EU',
'description': 'md5:050f74626e4ed87edf4626d2024210c0',
'timestamp': 1494424042,
'upload_date': '20170510',
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.funk.net/channel/59d5149841dca100012511e3/mein-erster-job-lovemilla-folge-1/lovemilla/',
'url': 'https://www.funk.net/playlist/neuesteVideos/kameras-auf-dem-fusion-festival-1618699',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
channel_id = mobj.group('id')
alias = mobj.group('alias')
headers = self._make_headers(url)
video = None
# Id-based channels are currently broken on their side: webplayer
# tries to process them via byChannelAlias endpoint and fails
# predictably.
for page_num in itertools.count():
by_channel_alias = self._download_json(
'https://www.funk.net/api/v3.1/webapp/videos/byChannelAlias/%s'
% channel_id,
'Downloading byChannelAlias JSON page %d' % (page_num + 1),
headers=headers, query={
'filterFsk': 'false',
'sort': 'creationDate,desc',
'size': 100,
'page': page_num,
}, fatal=False)
if not by_channel_alias:
break
video_list = try_get(
by_channel_alias, lambda x: x['_embedded']['videoList'], list)
if not video_list:
break
try:
video = next(r for r in video_list if r.get('alias') == alias)
break
except StopIteration:
pass
if not try_get(
by_channel_alias, lambda x: x['_links']['next']):
break
if not video:
by_id_list = self._download_json(
'https://www.funk.net/api/v3.0/content/videos/byIdList',
channel_id, 'Downloading byIdList JSON', headers=headers,
query={
'ids': alias,
}, fatal=False)
if by_id_list:
video = try_get(by_id_list, lambda x: x['result'][0], dict)
if not video:
results = self._download_json(
'https://www.funk.net/api/v3.0/content/videos/filter',
channel_id, 'Downloading filter JSON', headers=headers, query={
'channelId': channel_id,
'size': 100,
})['result']
video = next(r for r in results if r.get('alias') == alias)
return self._make_url_result(video)
display_id, nexx_id = re.match(self._VALID_URL, url).groups()
video = self._download_json(
'https://www.funk.net/api/v4.0/videos/' + nexx_id, nexx_id)
return {
'_type': 'url_transparent',
'url': 'nexx:741:' + nexx_id,
'ie_key': NexxIE.ie_key(),
'id': nexx_id,
'title': video.get('title'),
'description': video.get('description'),
'duration': int_or_none(video.get('duration')),
'channel_id': str_or_none(video.get('channelId')),
'display_id': display_id,
'tags': video.get('tags'),
'thumbnail': video.get('imageUrlLandscape'),
}

View File

@@ -1,162 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
unified_timestamp,
)
class FunnyOrDieIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?funnyordie\.com/(?P<type>embed|articles|videos)/(?P<id>[0-9a-f]+)(?:$|[?#/])'
_TESTS = [{
'url': 'http://www.funnyordie.com/videos/0732f586d7/heart-shaped-box-literal-video-version',
'md5': 'bcd81e0c4f26189ee09be362ad6e6ba9',
'info_dict': {
'id': '0732f586d7',
'ext': 'mp4',
'title': 'Heart-Shaped Box: Literal Video Version',
'description': 'md5:ea09a01bc9a1c46d9ab696c01747c338',
'thumbnail': r're:^http:.*\.jpg$',
'uploader': 'DASjr',
'timestamp': 1317904928,
'upload_date': '20111006',
'duration': 318.3,
},
}, {
'url': 'http://www.funnyordie.com/embed/e402820827',
'info_dict': {
'id': 'e402820827',
'ext': 'mp4',
'title': 'Please Use This Song (Jon Lajoie)',
'description': 'Please use this to sell something. www.jonlajoie.com',
'thumbnail': r're:^http:.*\.jpg$',
'timestamp': 1398988800,
'upload_date': '20140502',
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://www.funnyordie.com/articles/ebf5e34fc8/10-hours-of-walking-in-nyc-as-a-man',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
links = re.findall(r'<source src="([^"]+/v)[^"]+\.([^"]+)" type=\'video', webpage)
if not links:
raise ExtractorError('No media links available for %s' % video_id)
links.sort(key=lambda link: 1 if link[1] == 'mp4' else 0)
m3u8_url = self._search_regex(
r'<source[^>]+src=(["\'])(?P<url>.+?/master\.m3u8[^"\']*)\1',
webpage, 'm3u8 url', group='url')
formats = []
m3u8_formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False)
source_formats = list(filter(
lambda f: f.get('vcodec') != 'none', m3u8_formats))
bitrates = [int(bitrate) for bitrate in re.findall(r'[,/]v(\d+)(?=[,/])', m3u8_url)]
bitrates.sort()
if source_formats:
self._sort_formats(source_formats)
for bitrate, f in zip(bitrates, source_formats or [{}] * len(bitrates)):
for path, ext in links:
ff = f.copy()
if ff:
if ext != 'mp4':
ff = dict(
[(k, v) for k, v in ff.items()
if k in ('height', 'width', 'format_id')])
ff.update({
'format_id': ff['format_id'].replace('hls', ext),
'ext': ext,
'protocol': 'http',
})
else:
ff.update({
'format_id': '%s-%d' % (ext, bitrate),
'vbr': bitrate,
})
ff['url'] = self._proto_relative_url(
'%s%d.%s' % (path, bitrate, ext))
formats.append(ff)
self._check_formats(formats, video_id)
formats.extend(m3u8_formats)
self._sort_formats(
formats, field_preference=('height', 'width', 'tbr', 'format_id'))
subtitles = {}
for src, src_lang in re.findall(r'<track kind="captions" src="([^"]+)" srclang="([^"]+)"', webpage):
subtitles[src_lang] = [{
'ext': src.split('/')[-1],
'url': 'http://www.funnyordie.com%s' % src,
}]
timestamp = unified_timestamp(self._html_search_meta(
'uploadDate', webpage, 'timestamp', default=None))
uploader = self._html_search_regex(
r'<h\d[^>]+\bclass=["\']channel-preview-name[^>]+>(.+?)</h',
webpage, 'uploader', default=None)
title, description, thumbnail, duration = [None] * 4
medium = self._parse_json(
self._search_regex(
r'jsonMedium\s*=\s*({.+?});', webpage, 'JSON medium',
default='{}'),
video_id, fatal=False)
if medium:
title = medium.get('title')
duration = float_or_none(medium.get('duration'))
if not timestamp:
timestamp = unified_timestamp(medium.get('publishDate'))
post = self._parse_json(
self._search_regex(
r'fb_post\s*=\s*(\{.*?\});', webpage, 'post details',
default='{}'),
video_id, fatal=False)
if post:
if not title:
title = post.get('name')
description = post.get('description')
thumbnail = post.get('picture')
if not title:
title = self._og_search_title(webpage)
if not description:
description = self._og_search_description(webpage)
if not duration:
duration = int_or_none(self._html_search_meta(
('video:duration', 'duration'), webpage, 'duration', default=False))
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'timestamp': timestamp,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
}

View File

@@ -1,35 +1,84 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .ooyala import OoyalaIE
from ..utils import (
determine_ext,
int_or_none,
mimetype2ext,
parse_iso8601,
)
class FusionIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?fusion\.(?:net|tv)/video/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?fusion\.(?:net|tv)/(?:video/|show/.+?\bvideo=)(?P<id>\d+)'
_TESTS = [{
'url': 'http://fusion.tv/video/201781/u-s-and-panamanian-forces-work-together-to-stop-a-vessel-smuggling-drugs/',
'info_dict': {
'id': 'ZpcWNoMTE6x6uVIIWYpHh0qQDjxBuq5P',
'id': '3145868',
'ext': 'mp4',
'title': 'U.S. and Panamanian forces work together to stop a vessel smuggling drugs',
'description': 'md5:0cc84a9943c064c0f46b128b41b1b0d7',
'duration': 140.0,
'timestamp': 1442589635,
'uploader': 'UNIVISON',
'upload_date': '20150918',
},
'params': {
'skip_download': True,
},
'add_ie': ['Ooyala'],
'add_ie': ['Anvato'],
}, {
'url': 'http://fusion.tv/video/201781',
'only_matching': True,
}, {
'url': 'https://fusion.tv/show/food-exposed-with-nelufar-hedayat/?ancla=full-episodes&video=588644',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._match_id(url)
video = self._download_json(
'https://platform.fusion.net/wp-json/fusiondotnet/v1/video/' + video_id, video_id)
ooyala_code = self._search_regex(
r'data-ooyala-id=(["\'])(?P<code>(?:(?!\1).)+)\1',
webpage, 'ooyala code', group='code')
info = {
'id': video_id,
'title': video['title'],
'description': video.get('excerpt'),
'timestamp': parse_iso8601(video.get('published')),
'series': video.get('show'),
}
return OoyalaIE._build_url_result(ooyala_code)
formats = []
src = video.get('src') or {}
for f_id, f in src.items():
for q_id, q in f.items():
q_url = q.get('url')
if not q_url:
continue
ext = determine_ext(q_url, mimetype2ext(q.get('type')))
if ext == 'smil':
formats.extend(self._extract_smil_formats(q_url, video_id, fatal=False))
elif f_id == 'm3u8-variant' or (ext == 'm3u8' and q_id == 'Variant'):
formats.extend(self._extract_m3u8_formats(
q_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
else:
formats.append({
'format_id': '-'.join([f_id, q_id]),
'url': q_url,
'width': int_or_none(q.get('width')),
'height': int_or_none(q.get('height')),
'tbr': int_or_none(self._search_regex(r'_(\d+)\.m(?:p4|3u8)', q_url, 'bitrate')),
'ext': 'mp4' if ext == 'm3u8' else ext,
'protocol': 'm3u8_native' if ext == 'm3u8' else 'https',
})
if formats:
self._sort_formats(formats)
info['formats'] = formats
else:
info.update({
'_type': 'url',
'url': 'anvato:uni:' + video['video_ids']['anvato'],
'ie_key': 'Anvato',
})
return info

View File

@@ -1,12 +1,19 @@
# coding: utf-8
from __future__ import unicode_literals
from .brightcove import BrightcoveNewIE
from .common import InfoExtractor
from ..utils import (
clean_html,
get_element_by_class,
get_element_by_id,
)
class GameInformerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gameinformer\.com/(?:[^/]+/)*(?P<id>.+)\.aspx'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?gameinformer\.com/(?:[^/]+/)*(?P<id>[^.?&#]+)'
_TESTS = [{
# normal Brightcove embed code extracted with BrightcoveNewIE._extract_url
'url': 'http://www.gameinformer.com/b/features/archive/2015/09/26/replay-animal-crossing.aspx',
'md5': '292f26da1ab4beb4c9099f1304d2b071',
'info_dict': {
@@ -18,16 +25,25 @@ class GameInformerIE(InfoExtractor):
'upload_date': '20150928',
'uploader_id': '694940074001',
},
}
}, {
# Brightcove id inside unique element with field--name-field-brightcove-video-id class
'url': 'https://www.gameinformer.com/video-feature/new-gameplay-today/2019/07/09/new-gameplay-today-streets-of-rogue',
'info_dict': {
'id': '6057111913001',
'ext': 'mp4',
'title': 'New Gameplay Today Streets Of Rogue',
'timestamp': 1562699001,
'upload_date': '20190709',
'uploader_id': '694940074001',
},
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/694940074001/default_default/index.html?videoId=%s'
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(
url, display_id, headers=self.geo_verification_headers())
brightcove_id = self._search_regex(
[r'<[^>]+\bid=["\']bc_(\d+)', r"getVideo\('[^']+video_id=(\d+)"],
webpage, 'brightcove id')
return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew',
brightcove_id)
brightcove_id = clean_html(get_element_by_class('field--name-field-brightcove-video-id', webpage) or get_element_by_id('video-source-content', webpage))
brightcove_url = self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id if brightcove_id else BrightcoveNewIE._extract_url(self, webpage)
return self.url_result(brightcove_url, 'BrightcoveNew', brightcove_id)

View File

@@ -2104,6 +2104,23 @@ class GenericIE(InfoExtractor):
},
'expected_warnings': ['Failed to download MPD manifest'],
},
{
# DailyMotion embed with DM.player
'url': 'https://www.beinsports.com/us/copa-del-rey/video/the-locker-room-valencia-beat-barca-in-copa/1203804',
'info_dict': {
'id': 'k6aKkGHd9FJs4mtJN39',
'ext': 'mp4',
'title': 'The Locker Room: Valencia Beat Barca In Copa del Rey Final',
'description': 'This video is private.',
'uploader_id': 'x1jf30l',
'uploader': 'beIN SPORTS USA',
'upload_date': '20190528',
'timestamp': 1559062971,
},
'params': {
'skip_download': True,
},
},
# {
# # TODO: find another test
# # http://schema.org/VideoObject
@@ -2209,7 +2226,7 @@ class GenericIE(InfoExtractor):
default_search = 'fixup_error'
if default_search in ('auto', 'auto_warning', 'fixup_error'):
if '/' in url:
if re.match(r'^[^\s/]+\.[^\s/]+/', url):
self._downloader.report_warning('The url doesn\'t specify the protocol, trying with http')
return self.url_result('http://' + url)
elif default_search != 'fixup_error':

View File

@@ -11,7 +11,7 @@ from ..utils import (
class GfycatIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gfycat\.com/(?:ifr/|gifs/detail/)?(?P<id>[^-/?#]+)'
_VALID_URL = r'https?://(?:www\.)?gfycat\.com/(?:ru/|ifr/|gifs/detail/)?(?P<id>[^-/?#]+)'
_TESTS = [{
'url': 'http://gfycat.com/DeadlyDecisiveGermanpinscher',
'info_dict': {
@@ -44,6 +44,9 @@ class GfycatIE(InfoExtractor):
'categories': list,
'age_limit': 0,
}
}, {
'url': 'https://gfycat.com/ru/RemarkableDrearyAmurstarfish',
'only_matching': True
}, {
'url': 'https://gfycat.com/gifs/detail/UnconsciousLankyIvorygull',
'only_matching': True

View File

@@ -34,9 +34,13 @@ class GoIE(AdobePassIE):
'watchdisneyxd': {
'brand': '009',
'resource_id': 'DisneyXD',
},
'disneynow': {
'brand': '011',
'resource_id': 'Disney',
}
}
_VALID_URL = r'https?://(?:(?P<sub_domain>%s)\.)?go\.com/(?:(?:[^/]+/)*(?P<id>vdka\w+)|(?:[^/]+/)*(?P<display_id>[^/?#]+))'\
_VALID_URL = r'https?://(?:(?:(?P<sub_domain>%s)\.)?go|(?P<sub_domain_2>disneynow))\.com/(?:(?:[^/]+/)*(?P<id>vdka\w+)|(?:[^/]+/)*(?P<display_id>[^/?#]+))'\
% '|'.join(list(_SITE_INFO.keys()) + ['disneynow'])
_TESTS = [{
'url': 'http://abc.go.com/shows/designated-survivor/video/most-recent/VDKA3807643',
@@ -71,6 +75,9 @@ class GoIE(AdobePassIE):
# brand 008
'url': 'http://disneynow.go.com/shows/minnies-bow-toons/video/happy-campers/vdka4872013',
'only_matching': True,
}, {
'url': 'https://disneynow.com/shows/minnies-bow-toons/video/happy-campers/vdka4872013',
'only_matching': True,
}]
def _extract_videos(self, brand, video_id='-1', show_id='-1'):
@@ -80,7 +87,9 @@ class GoIE(AdobePassIE):
display_id)['video']
def _real_extract(self, url):
sub_domain, video_id, display_id = re.match(self._VALID_URL, url).groups()
mobj = re.match(self._VALID_URL, url)
sub_domain = mobj.group('sub_domain') or mobj.group('sub_domain_2')
video_id, display_id = mobj.group('id', 'display_id')
site_info = self._SITE_INFO.get(sub_domain, {})
brand = site_info.get('brand')
if not video_id or not site_info:
@@ -89,7 +98,7 @@ class GoIE(AdobePassIE):
# There may be inner quotes, e.g. data-video-id="'VDKA3609139'"
# from http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood
r'data-video-id=["\']*(VDKA\w+)', webpage, 'video id',
default=None)
default=video_id)
if not site_info:
brand = self._search_regex(
(r'data-brand=\s*["\']\s*(\d+)',

View File

@@ -103,6 +103,11 @@ class KalturaIE(InfoExtractor):
{
'url': 'https://www.kaltura.com:443/index.php/extwidget/preview/partner_id/1770401/uiconf_id/37307382/entry_id/0_58u8kme7/embed/iframe?&flashvars[streamerType]=auto',
'only_matching': True,
},
{
# unavailable source format
'url': 'kaltura:513551:1_66x4rg7o',
'only_matching': True,
}
]
@@ -306,12 +311,17 @@ class KalturaIE(InfoExtractor):
f['fileExt'] = 'mp4'
video_url = sign_url(
'%s/flavorId/%s' % (data_url, f['id']))
format_id = '%(fileExt)s-%(bitrate)s' % f
# Source format may not be available (e.g. kaltura:513551:1_66x4rg7o)
if f.get('isOriginal') is True and not self._is_valid_url(
video_url, entry_id, format_id):
continue
# audio-only has no videoCodecId (e.g. kaltura:1926081:0_c03e1b5g
# -f mp4-56)
vcodec = 'none' if 'videoCodecId' not in f and f.get(
'frameRate') == 0 else f.get('videoCodecId')
formats.append({
'format_id': '%(fileExt)s-%(bitrate)s' % f,
'format_id': format_id,
'ext': f.get('fileExt'),
'tbr': int_or_none(f['bitrate']),
'fps': int_or_none(f.get('frameRate')),

View File

@@ -6,8 +6,8 @@ import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
clean_html,
determine_ext,
extract_attributes,
ExtractorError,
float_or_none,
int_or_none,
@@ -19,6 +19,7 @@ from ..utils import (
class LecturioBaseIE(InfoExtractor):
_API_BASE_URL = 'https://app.lecturio.com/api/en/latest/html5/'
_LOGIN_URL = 'https://app.lecturio.com/en/login'
_NETRC_MACHINE = 'lecturio'
@@ -67,51 +68,56 @@ class LecturioIE(LecturioBaseIE):
_VALID_URL = r'''(?x)
https://
(?:
app\.lecturio\.com/[^/]+/(?P<id>[^/?#&]+)\.lecture|
(?:www\.)?lecturio\.de/[^/]+/(?P<id_de>[^/?#&]+)\.vortrag
app\.lecturio\.com/([^/]+/(?P<nt>[^/?#&]+)\.lecture|(?:\#/)?lecture/c/\d+/(?P<id>\d+))|
(?:www\.)?lecturio\.de/[^/]+/(?P<nt_de>[^/?#&]+)\.vortrag
)
'''
_TESTS = [{
'url': 'https://app.lecturio.com/medical-courses/important-concepts-and-terms-introduction-to-microbiology.lecture#tab/videos',
'md5': 'f576a797a5b7a5e4e4bbdfc25a6a6870',
'md5': '9a42cf1d8282a6311bf7211bbde26fde',
'info_dict': {
'id': '39634',
'ext': 'mp4',
'title': 'Important Concepts and Terms Introduction to Microbiology',
'title': 'Important Concepts and Terms Introduction to Microbiology',
},
'skip': 'Requires lecturio account credentials',
}, {
'url': 'https://www.lecturio.de/jura/oeffentliches-recht-staatsexamen.vortrag',
'only_matching': True,
}, {
'url': 'https://app.lecturio.com/#/lecture/c/6434/39634',
'only_matching': True,
}]
_CC_LANGS = {
'Arabic': 'ar',
'Bulgarian': 'bg',
'German': 'de',
'English': 'en',
'Spanish': 'es',
'Persian': 'fa',
'French': 'fr',
'Japanese': 'ja',
'Polish': 'pl',
'Pashto': 'ps',
'Russian': 'ru',
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id') or mobj.group('id_de')
webpage = self._download_webpage(
'https://app.lecturio.com/en/lecture/%s/player.html' % display_id,
display_id)
lecture_id = self._search_regex(
r'lecture_id\s*=\s*(?:L_)?(\d+)', webpage, 'lecture id')
api_url = self._search_regex(
r'lectureDataLink\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'api url', group='url')
video = self._download_json(api_url, display_id)
nt = mobj.group('nt') or mobj.group('nt_de')
lecture_id = mobj.group('id')
display_id = nt or lecture_id
api_path = 'lectures/' + lecture_id if lecture_id else 'lecture/' + nt + '.json'
video = self._download_json(
self._API_BASE_URL + api_path, display_id)
title = video['title'].strip()
if not lecture_id:
pid = video.get('productId') or video.get('uid')
if pid:
spid = pid.split('_')
if spid and len(spid) == 2:
lecture_id = spid[1]
formats = []
for format_ in video['content']['media']:
@@ -129,24 +135,30 @@ class LecturioIE(LecturioBaseIE):
continue
label = str_or_none(format_.get('label'))
filesize = int_or_none(format_.get('fileSize'))
formats.append({
f = {
'url': file_url,
'format_id': label,
'filesize': float_or_none(filesize, invscale=1000)
})
}
if label:
mobj = re.match(r'(\d+)p\s*\(([^)]+)\)', label)
if mobj:
f.update({
'format_id': mobj.group(2),
'height': int(mobj.group(1)),
})
formats.append(f)
self._sort_formats(formats)
subtitles = {}
automatic_captions = {}
cc = self._parse_json(
self._search_regex(
r'subtitleUrls\s*:\s*({.+?})\s*,', webpage, 'subtitles',
default='{}'), display_id, fatal=False)
for cc_label, cc_url in cc.items():
cc_url = url_or_none(cc_url)
captions = video.get('captions') or []
for cc in captions:
cc_url = cc.get('url')
if not cc_url:
continue
lang = self._search_regex(
cc_label = cc.get('translatedCode')
lang = cc.get('languageCode') or self._search_regex(
r'/([a-z]{2})_', cc_url, 'lang',
default=cc_label.split()[0] if cc_label else 'en')
original_lang = self._search_regex(
@@ -160,7 +172,7 @@ class LecturioIE(LecturioBaseIE):
})
return {
'id': lecture_id,
'id': lecture_id or nt,
'title': title,
'formats': formats,
'subtitles': subtitles,
@@ -169,37 +181,40 @@ class LecturioIE(LecturioBaseIE):
class LecturioCourseIE(LecturioBaseIE):
_VALID_URL = r'https://app\.lecturio\.com/[^/]+/(?P<id>[^/?#&]+)\.course'
_TEST = {
_VALID_URL = r'https://app\.lecturio\.com/(?:[^/]+/(?P<nt>[^/?#&]+)\.course|(?:#/)?course/c/(?P<id>\d+))'
_TESTS = [{
'url': 'https://app.lecturio.com/medical-courses/microbiology-introduction.course#/',
'info_dict': {
'id': 'microbiology-introduction',
'title': 'Microbiology: Introduction',
'description': 'md5:13da8500c25880c6016ae1e6d78c386a',
},
'playlist_count': 45,
'skip': 'Requires lecturio account credentials',
}
}, {
'url': 'https://app.lecturio.com/#/course/c/6434',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
nt, course_id = re.match(self._VALID_URL, url).groups()
display_id = nt or course_id
api_path = 'courses/' + course_id if course_id else 'course/content/' + nt + '.json'
course = self._download_json(
self._API_BASE_URL + api_path, display_id)
entries = []
for mobj in re.finditer(
r'(?s)<[^>]+\bdata-url=(["\'])(?:(?!\1).)+\.lecture\b[^>]+>',
webpage):
params = extract_attributes(mobj.group(0))
lecture_url = urljoin(url, params.get('data-url'))
lecture_id = params.get('data-id')
for lecture in course.get('lectures', []):
lecture_id = str_or_none(lecture.get('id'))
lecture_url = lecture.get('url')
if lecture_url:
lecture_url = urljoin(url, lecture_url)
else:
lecture_url = 'https://app.lecturio.com/#/lecture/c/%s/%s' % (course_id, lecture_id)
entries.append(self.url_result(
lecture_url, ie=LecturioIE.ie_key(), video_id=lecture_id))
title = self._search_regex(
r'<span[^>]+class=["\']content-title[^>]+>([^<]+)', webpage,
'title', default=None)
return self.playlist_result(entries, display_id, title)
return self.playlist_result(
entries, display_id, course.get('title'),
clean_html(course.get('description')))
class LecturioDeCourseIE(LecturioBaseIE):

View File

@@ -326,7 +326,7 @@ class LetvCloudIE(InfoExtractor):
elif play_json.get('code'):
raise ExtractorError('Letv cloud returned error %d' % play_json['code'], expected=True)
else:
raise ExtractorError('Letv cloud returned an unknwon error')
raise ExtractorError('Letv cloud returned an unknown error')
def b64decode(s):
return compat_b64decode(s).decode('utf-8')

View File

@@ -0,0 +1,42 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import int_or_none
class LiveJournalIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^.]+\.)?livejournal\.com/video/album/\d+.+?\bid=(?P<id>\d+)'
_TEST = {
'url': 'https://andrei-bt.livejournal.com/video/album/407/?mode=view&id=51272',
'md5': 'adaf018388572ced8a6f301ace49d4b2',
'info_dict': {
'id': '1263729',
'ext': 'mp4',
'title': 'Истребители против БПЛА',
'upload_date': '20190624',
'timestamp': 1561406715,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
record = self._parse_json(self._search_regex(
r'Site\.page\s*=\s*({.+?});', webpage,
'page data'), video_id)['video']['record']
storage_id = compat_str(record['storageid'])
title = record.get('name')
if title:
# remove filename extension(.mp4, .mov, etc...)
title = title.rsplit('.', 1)[0]
return {
'_type': 'url_transparent',
'id': video_id,
'title': title,
'thumbnail': record.get('thumbnail'),
'timestamp': int_or_none(record.get('timecreate')),
'url': 'eagleplatform:vc.videos.livejournal.com:' + storage_id,
'ie_key': 'EaglePlatform',
}

View File

@@ -117,6 +117,10 @@ class LyndaIE(LyndaBaseIE):
}, {
'url': 'https://www.lynda.com/de/Graphic-Design-tutorials/Willkommen-Grundlagen-guten-Gestaltung/393570/393572-4.html',
'only_matching': True,
}, {
# Status="NotFound", Message="Transcript not found"
'url': 'https://www.lynda.com/ASP-NET-tutorials/What-you-should-know/5034180/2811512-4.html',
'only_matching': True,
}]
def _raise_unavailable(self, video_id):
@@ -247,12 +251,17 @@ class LyndaIE(LyndaBaseIE):
def _get_subtitles(self, video_id):
url = 'https://www.lynda.com/ajax/player?videoId=%s&type=transcript' % video_id
subs = self._download_json(url, None, False)
subs = self._download_webpage(
url, video_id, 'Downloading subtitles JSON', fatal=False)
if not subs or 'Status="NotFound"' in subs:
return {}
subs = self._parse_json(subs, video_id, fatal=False)
if not subs:
return {}
fixed_subs = self._fix_subtitles(subs)
if fixed_subs:
return {'en': [{'ext': 'srt', 'data': fixed_subs}]}
else:
return {}
return {}
class LyndaCourseIE(LyndaBaseIE):

View File

@@ -79,6 +79,9 @@ class MGTVIE(InfoExtractor):
'ext': 'mp4',
'tbr': tbr,
'protocol': 'm3u8_native',
'http_headers': {
'Referer': url,
},
})
self._sort_formats(formats)

View File

@@ -1,112 +0,0 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
clean_html,
determine_ext,
int_or_none,
qualities,
urlencode_postdata,
xpath_text,
)
class NFBIE(InfoExtractor):
IE_NAME = 'nfb'
IE_DESC = 'National Film Board of Canada'
_VALID_URL = r'https?://(?:www\.)?(?:nfb|onf)\.ca/film/(?P<id>[\da-z_-]+)'
_TEST = {
'url': 'https://www.nfb.ca/film/qallunaat_why_white_people_are_funny',
'info_dict': {
'id': 'qallunaat_why_white_people_are_funny',
'ext': 'flv',
'title': 'Qallunaat! Why White People Are Funny ',
'description': 'md5:6b8e32dde3abf91e58857b174916620c',
'duration': 3128,
'creator': 'Mark Sandiford',
'uploader': 'Mark Sandiford',
},
'params': {
# rtmp download
'skip_download': True,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
config = self._download_xml(
'https://www.nfb.ca/film/%s/player_config' % video_id,
video_id, 'Downloading player config XML',
data=urlencode_postdata({'getConfig': 'true'}),
headers={
'Content-Type': 'application/x-www-form-urlencoded',
'X-NFB-Referer': 'http://www.nfb.ca/medias/flash/NFBVideoPlayer.swf'
})
title, description, thumbnail, duration, uploader, author = [None] * 6
thumbnails, formats = [[]] * 2
subtitles = {}
for media in config.findall('./player/stream/media'):
if media.get('type') == 'posterImage':
quality_key = qualities(('low', 'high'))
thumbnails = []
for asset in media.findall('assets/asset'):
asset_url = xpath_text(asset, 'default/url', default=None)
if not asset_url:
continue
quality = asset.get('quality')
thumbnails.append({
'url': asset_url,
'id': quality,
'preference': quality_key(quality),
})
elif media.get('type') == 'video':
title = xpath_text(media, 'title', fatal=True)
for asset in media.findall('assets/asset'):
quality = asset.get('quality')
height = int_or_none(self._search_regex(
r'^(\d+)[pP]$', quality or '', 'height', default=None))
for node in asset:
streamer = xpath_text(node, 'streamerURI', default=None)
if not streamer:
continue
play_path = xpath_text(node, 'url', default=None)
if not play_path:
continue
formats.append({
'url': streamer,
'app': streamer.split('/', 3)[3],
'play_path': play_path,
'rtmp_live': False,
'ext': 'flv',
'format_id': '%s-%s' % (node.tag, quality) if quality else node.tag,
'height': height,
})
self._sort_formats(formats)
description = clean_html(xpath_text(media, 'description'))
uploader = xpath_text(media, 'author')
duration = int_or_none(media.get('duration'))
for subtitle in media.findall('./subtitles/subtitle'):
subtitle_url = xpath_text(subtitle, 'url', default=None)
if not subtitle_url:
continue
lang = xpath_text(subtitle, 'lang', default='en')
subtitles.setdefault(lang, []).append({
'url': subtitle_url,
'ext': (subtitle.get('format') or determine_ext(subtitle_url)).lower(),
})
return {
'id': video_id,
'title': title,
'description': description,
'thumbnails': thumbnails,
'duration': duration,
'creator': uploader,
'uploader': uploader,
'formats': formats,
'subtitles': subtitles,
}

File diff suppressed because it is too large Load Diff

View File

@@ -5,26 +5,27 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_str,
# compat_str,
compat_HTTPError,
)
from ..utils import (
clean_html,
ExtractorError,
remove_end,
# remove_end,
str_or_none,
strip_or_none,
unified_timestamp,
urljoin,
# urljoin,
)
class PacktPubBaseIE(InfoExtractor):
_PACKT_BASE = 'https://www.packtpub.com'
_MAPT_REST = '%s/mapt-rest' % _PACKT_BASE
# _PACKT_BASE = 'https://www.packtpub.com'
_STATIC_PRODUCTS_BASE = 'https://static.packt-cdn.com/products/'
class PacktPubIE(PacktPubBaseIE):
_VALID_URL = r'https?://(?:(?:www\.)?packtpub\.com/mapt|subscription\.packtpub\.com)/video/[^/]+/(?P<course_id>\d+)/(?P<chapter_id>\d+)/(?P<id>\d+)'
_VALID_URL = r'https?://(?:(?:www\.)?packtpub\.com/mapt|subscription\.packtpub\.com)/video/[^/]+/(?P<course_id>\d+)/(?P<chapter_id>[^/]+)/(?P<id>[^/]+)(?:/(?P<display_id>[^/?&#]+))?'
_TESTS = [{
'url': 'https://www.packtpub.com/mapt/video/web-development/9781787122215/20528/20530/Project+Intro',
@@ -40,6 +41,9 @@ class PacktPubIE(PacktPubBaseIE):
}, {
'url': 'https://subscription.packtpub.com/video/web_development/9781787122215/20528/20530/project-intro',
'only_matching': True,
}, {
'url': 'https://subscription.packtpub.com/video/programming/9781838988906/p1/video1_1/business-card-project',
'only_matching': True,
}]
_NETRC_MACHINE = 'packtpub'
_TOKEN = None
@@ -50,9 +54,9 @@ class PacktPubIE(PacktPubBaseIE):
return
try:
self._TOKEN = self._download_json(
self._MAPT_REST + '/users/tokens', None,
'https://services.packtpub.com/auth-v1/users/tokens', None,
'Downloading Authorization Token', data=json.dumps({
'email': username,
'username': username,
'password': password,
}).encode())['data']['access']
except ExtractorError as e:
@@ -61,54 +65,40 @@ class PacktPubIE(PacktPubBaseIE):
raise ExtractorError(message, expected=True)
raise
def _handle_error(self, response):
if response.get('status') != 'success':
raise ExtractorError(
'% said: %s' % (self.IE_NAME, response['message']),
expected=True)
def _download_json(self, *args, **kwargs):
response = super(PacktPubIE, self)._download_json(*args, **kwargs)
self._handle_error(response)
return response
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
course_id, chapter_id, video_id = mobj.group(
'course_id', 'chapter_id', 'id')
course_id, chapter_id, video_id, display_id = re.match(self._VALID_URL, url).groups()
headers = {}
if self._TOKEN:
headers['Authorization'] = 'Bearer ' + self._TOKEN
video = self._download_json(
'%s/users/me/products/%s/chapters/%s/sections/%s'
% (self._MAPT_REST, course_id, chapter_id, video_id), video_id,
'Downloading JSON video', headers=headers)['data']
try:
video_url = self._download_json(
'https://services.packtpub.com/products-v1/products/%s/%s/%s' % (course_id, chapter_id, video_id), video_id,
'Downloading JSON video', headers=headers)['data']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
self.raise_login_required('This video is locked')
raise
content = video.get('content')
if not content:
self.raise_login_required('This video is locked')
# TODO: find a better way to avoid duplicating course requests
# metadata = self._download_json(
# '%s/products/%s/chapters/%s/sections/%s/metadata'
# % (self._MAPT_REST, course_id, chapter_id, video_id),
# video_id)['data']
video_url = content['file']
metadata = self._download_json(
'%s/products/%s/chapters/%s/sections/%s/metadata'
% (self._MAPT_REST, course_id, chapter_id, video_id),
video_id)['data']
title = metadata['pageTitle']
course_title = metadata.get('title')
if course_title:
title = remove_end(title, ' - %s' % course_title)
timestamp = unified_timestamp(metadata.get('publicationDate'))
thumbnail = urljoin(self._PACKT_BASE, metadata.get('filepath'))
# title = metadata['pageTitle']
# course_title = metadata.get('title')
# if course_title:
# title = remove_end(title, ' - %s' % course_title)
# timestamp = unified_timestamp(metadata.get('publicationDate'))
# thumbnail = urljoin(self._PACKT_BASE, metadata.get('filepath'))
return {
'id': video_id,
'url': video_url,
'title': title,
'thumbnail': thumbnail,
'timestamp': timestamp,
'title': display_id or video_id, # title,
# 'thumbnail': thumbnail,
# 'timestamp': timestamp,
}
@@ -119,6 +109,7 @@ class PacktPubCourseIE(PacktPubBaseIE):
'info_dict': {
'id': '9781787122215',
'title': 'Learn Nodejs by building 12 projects [Video]',
'description': 'md5:489da8d953f416e51927b60a1c7db0aa',
},
'playlist_count': 90,
}, {
@@ -136,35 +127,38 @@ class PacktPubCourseIE(PacktPubBaseIE):
url, course_id = mobj.group('url', 'id')
course = self._download_json(
'%s/products/%s/metadata' % (self._MAPT_REST, course_id),
course_id)['data']
self._STATIC_PRODUCTS_BASE + '%s/toc' % course_id, course_id)
metadata = self._download_json(
self._STATIC_PRODUCTS_BASE + '%s/summary' % course_id,
course_id, fatal=False) or {}
entries = []
for chapter_num, chapter in enumerate(course['tableOfContents'], 1):
if chapter.get('type') != 'chapter':
continue
children = chapter.get('children')
if not isinstance(children, list):
for chapter_num, chapter in enumerate(course['chapters'], 1):
chapter_id = str_or_none(chapter.get('id'))
sections = chapter.get('sections')
if not chapter_id or not isinstance(sections, list):
continue
chapter_info = {
'chapter': chapter.get('title'),
'chapter_number': chapter_num,
'chapter_id': chapter.get('id'),
'chapter_id': chapter_id,
}
for section in children:
if section.get('type') != 'section':
continue
section_url = section.get('seoUrl')
if not isinstance(section_url, compat_str):
for section in sections:
section_id = str_or_none(section.get('id'))
if not section_id or section.get('contentType') != 'video':
continue
entry = {
'_type': 'url_transparent',
'url': urljoin(url + '/', section_url),
'url': '/'.join([url, chapter_id, section_id]),
'title': strip_or_none(section.get('title')),
'description': clean_html(section.get('summary')),
'thumbnail': metadata.get('coverImage'),
'timestamp': unified_timestamp(metadata.get('publicationDate')),
'ie_key': PacktPubIE.ie_key(),
}
entry.update(chapter_info)
entries.append(entry)
return self.playlist_result(entries, course_id, course.get('title'))
return self.playlist_result(
entries, course_id, metadata.get('title'),
clean_html(metadata.get('about')))

View File

@@ -168,7 +168,7 @@ class PeerTubeIE(InfoExtractor):
@staticmethod
def _extract_peertube_url(webpage, source_url):
mobj = re.match(
r'https?://(?P<host>[^/]+)/videos/watch/(?P<id>%s)'
r'https?://(?P<host>[^/]+)/videos/(?:watch|embed)/(?P<id>%s)'
% PeerTubeIE._UUID_RE, source_url)
if mobj and any(p in webpage for p in (
'<title>PeerTube<',

View File

@@ -14,7 +14,7 @@ class PhilharmonieDeParisIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?:
live\.philharmoniedeparis\.fr/(?:[Cc]oncert/|misc/Playlist\.ashx\?id=)|
live\.philharmoniedeparis\.fr/(?:[Cc]oncert/|embed(?:app)?/|misc/Playlist\.ashx\?id=)|
pad\.philharmoniedeparis\.fr/doc/CIMU/
)
(?P<id>\d+)
@@ -40,6 +40,12 @@ class PhilharmonieDeParisIE(InfoExtractor):
}, {
'url': 'http://live.philharmoniedeparis.fr/misc/Playlist.ashx?id=1030324&track=&lang=fr',
'only_matching': True,
}, {
'url': 'https://live.philharmoniedeparis.fr/embedapp/1098406/berlioz-fantastique-lelio-les-siecles-national-youth-choir-of.html?lang=fr-FR',
'only_matching': True,
}, {
'url': 'https://live.philharmoniedeparis.fr/embed/1098406/berlioz-fantastique-lelio-les-siecles-national-youth-choir-of.html?lang=fr-FR',
'only_matching': True,
}]
_LIVE_URL = 'https://live.philharmoniedeparis.fr'

View File

@@ -39,7 +39,12 @@ class Porn91IE(InfoExtractor):
r'<div id="viewvideo-title">([^<]+)</div>', webpage, 'title')
title = title.replace('\n', '')
info_dict = self._parse_html5_media_entries(url, webpage, video_id)[0]
video_link_url = self._search_regex(
r'<textarea[^>]+id=["\']fm-video_link[^>]+>([^<]+)</textarea>',
webpage, 'video link')
videopage = self._download_webpage(video_link_url, video_id)
info_dict = self._parse_html5_media_entries(url, videopage, video_id)[0]
duration = parse_duration(self._search_regex(
r'时长:\s*</span>\s*(\d+:\d+)', webpage, 'duration', fatal=False))

View File

@@ -372,37 +372,92 @@ class PornHubPlaylistBaseIE(PornHubBaseIE):
entries, playlist_id, title, playlist.get('description'))
class PornHubPlaylistIE(PornHubPlaylistBaseIE):
_VALID_URL = r'https?://(?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/playlist/(?P<id>\d+)'
class PornHubUserIE(PornHubPlaylistBaseIE):
_VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?pornhub\.(?:com|net)/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/?#&]+))(?:[?#&]|/(?!videos)|$)'
_TESTS = [{
'url': 'http://www.pornhub.com/playlist/4667351',
'info_dict': {
'id': '4667351',
'title': 'Nataly Hot',
},
'playlist_mincount': 2,
'url': 'https://www.pornhub.com/model/zoe_ph',
'playlist_mincount': 118,
}, {
'url': 'https://de.pornhub.com/playlist/4667351',
'url': 'https://www.pornhub.com/pornstar/liz-vicious',
'info_dict': {
'id': 'liz-vicious',
},
'playlist_mincount': 118,
}, {
'url': 'https://www.pornhub.com/users/russianveet69',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/channels/povd',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/model/zoe_ph?abc=1',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user_id = mobj.group('id')
return self.url_result(
'%s/videos' % mobj.group('url'), ie=PornHubPagedVideoListIE.ie_key(),
video_id=user_id)
class PornHubUserVideosIE(PornHubPlaylistBaseIE):
_VALID_URL = r'https?://(?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/]+)/videos'
class PornHubPagedPlaylistBaseIE(PornHubPlaylistBaseIE):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
item_id = mobj.group('id')
page = int_or_none(self._search_regex(
r'\bpage=(\d+)', url, 'page', default=None))
page_url = self._make_page_url(url)
entries = []
for page_num in (page, ) if page is not None else itertools.count(1):
try:
webpage = self._download_webpage(
page_url, item_id, 'Downloading page %d' % page_num,
query={'page': page_num})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 404:
break
raise
page_entries = self._extract_entries(webpage, host)
if not page_entries:
break
entries.extend(page_entries)
if not self._has_more(webpage):
break
return self.playlist_result(orderedSet(entries), item_id)
class PornHubPagedVideoListIE(PornHubPagedPlaylistBaseIE):
_VALID_URL = r'https?://(?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/(?P<id>(?:[^/]+/)*[^/?#&]+)'
_TESTS = [{
'url': 'http://www.pornhub.com/users/zoe_ph/videos/public',
'info_dict': {
'id': 'zoe_ph',
},
'playlist_mincount': 171,
'url': 'https://www.pornhub.com/model/zoe_ph/videos',
'only_matching': True,
}, {
'url': 'http://www.pornhub.com/users/rushandlia/videos',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/pornstar/jenny-blighe/videos',
'info_dict': {
'id': 'pornstar/jenny-blighe/videos',
},
'playlist_mincount': 149,
}, {
'url': 'https://www.pornhub.com/pornstar/jenny-blighe/videos?page=3',
'info_dict': {
'id': 'pornstar/jenny-blighe/videos',
},
'playlist_mincount': 40,
}, {
# default sorting as Top Rated Videos
'url': 'https://www.pornhub.com/channels/povd/videos',
'info_dict': {
'id': 'povd',
'id': 'channels/povd/videos',
},
'playlist_mincount': 293,
}, {
@@ -421,31 +476,107 @@ class PornHubUserVideosIE(PornHubPlaylistBaseIE):
'url': 'http://www.pornhub.com/users/zoe_ph/videos/public',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/model/jayndrea/videos/upload',
# Most Viewed Videos
'url': 'https://www.pornhub.com/pornstar/liz-vicious/videos?o=mv',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/pornstar/jenny-blighe/videos/upload',
# Top Rated Videos
'url': 'https://www.pornhub.com/pornstar/liz-vicious/videos?o=tr',
'only_matching': True,
}, {
# Longest Videos
'url': 'https://www.pornhub.com/pornstar/liz-vicious/videos?o=lg',
'only_matching': True,
}, {
# Newest Videos
'url': 'https://www.pornhub.com/pornstar/liz-vicious/videos?o=cm',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/pornstar/liz-vicious/videos/paid',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/pornstar/liz-vicious/videos/fanonly',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/video',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/video?page=3',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/video/search?search=123',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/categories/teen',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/categories/teen?page=3',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/hd',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/hd?page=3',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/described-video',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/described-video?page=2',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/video/incategories/60fps-1/hd-porn',
'only_matching': True,
}, {
'url': 'https://www.pornhub.com/playlist/44121572',
'info_dict': {
'id': 'playlist/44121572',
},
'playlist_mincount': 132,
}, {
'url': 'https://www.pornhub.com/playlist/4667351',
'only_matching': True,
}, {
'url': 'https://de.pornhub.com/playlist/4667351',
'only_matching': True,
}]
def _real_extract(self, url):
@classmethod
def suitable(cls, url):
return (False
if PornHubIE.suitable(url) or PornHubUserIE.suitable(url) or PornHubUserVideosUploadIE.suitable(url)
else super(PornHubPagedVideoListIE, cls).suitable(url))
def _make_page_url(self, url):
return url
@staticmethod
def _has_more(webpage):
return re.search(
r'''(?x)
<li[^>]+\bclass=["\']page_next|
<link[^>]+\brel=["\']next|
<button[^>]+\bid=["\']moreDataBtn
''', webpage) is not None
class PornHubUserVideosUploadIE(PornHubPagedPlaylistBaseIE):
_VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/]+)/videos/upload)'
_TESTS = [{
'url': 'https://www.pornhub.com/pornstar/jenny-blighe/videos/upload',
'info_dict': {
'id': 'jenny-blighe',
},
'playlist_mincount': 129,
}, {
'url': 'https://www.pornhub.com/model/zoe_ph/videos/upload',
'only_matching': True,
}]
def _make_page_url(self, url):
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
user_id = mobj.group('id')
return '%s/ajax' % mobj.group('url')
entries = []
for page_num in itertools.count(1):
try:
webpage = self._download_webpage(
url, user_id, 'Downloading page %d' % page_num,
query={'page': page_num})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 404:
break
raise
page_entries = self._extract_entries(webpage, host)
if not page_entries:
break
entries.extend(page_entries)
return self.playlist_result(entries, user_id)
@staticmethod
def _has_more(webpage):
return True

View File

@@ -4,11 +4,14 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_str,
)
from ..utils import (
ExtractorError,
int_or_none,
strip_or_none,
unescapeHTML,
str_or_none,
urlencode_postdata,
)
@@ -21,15 +24,14 @@ class RoosterTeethIE(InfoExtractor):
'url': 'http://roosterteeth.com/episode/million-dollars-but-season-2-million-dollars-but-the-game-announcement',
'md5': 'e2bd7764732d785ef797700a2489f212',
'info_dict': {
'id': '26576',
'id': '9156',
'display_id': 'million-dollars-but-season-2-million-dollars-but-the-game-announcement',
'ext': 'mp4',
'title': 'Million Dollars, But...: Million Dollars, But... The Game Announcement',
'description': 'md5:0cc3b21986d54ed815f5faeccd9a9ca5',
'title': 'Million Dollars, But... The Game Announcement',
'description': 'md5:168a54b40e228e79f4ddb141e89fe4f5',
'thumbnail': r're:^https?://.*\.png$',
'series': 'Million Dollars, But...',
'episode': 'Million Dollars, But... The Game Announcement',
'comment_count': int,
},
}, {
'url': 'http://achievementhunter.roosterteeth.com/episode/off-topic-the-achievement-hunter-podcast-2016-i-didn-t-think-it-would-pass-31',
@@ -89,60 +91,55 @@ class RoosterTeethIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
api_episode_url = 'https://svod-be.roosterteeth.com/api/v1/episodes/%s' % display_id
webpage = self._download_webpage(url, display_id)
episode = strip_or_none(unescapeHTML(self._search_regex(
(r'videoTitle\s*=\s*(["\'])(?P<title>(?:(?!\1).)+)\1',
r'<title>(?P<title>[^<]+)</title>'), webpage, 'title',
default=None, group='title')))
title = strip_or_none(self._og_search_title(
webpage, default=None)) or episode
m3u8_url = self._search_regex(
r'file\s*:\s*(["\'])(?P<url>http.+?\.m3u8.*?)\1',
webpage, 'm3u8 url', default=None, group='url')
if not m3u8_url:
if re.search(r'<div[^>]+class=["\']non-sponsor', webpage):
self.raise_login_required(
'%s is only available for FIRST members' % display_id)
if re.search(r'<div[^>]+class=["\']golive-gate', webpage):
self.raise_login_required('%s is not available yet' % display_id)
raise ExtractorError('Unable to extract m3u8 URL')
try:
m3u8_url = self._download_json(
api_episode_url + '/videos', display_id,
'Downloading video JSON metadata')['data'][0]['attributes']['url']
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
if self._parse_json(e.cause.read().decode(), display_id).get('access') is False:
self.raise_login_required(
'%s is only available for FIRST members' % display_id)
raise
formats = self._extract_m3u8_formats(
m3u8_url, display_id, ext='mp4',
entry_protocol='m3u8_native', m3u8_id='hls')
m3u8_url, display_id, 'mp4', 'm3u8_native', m3u8_id='hls')
self._sort_formats(formats)
description = strip_or_none(self._og_search_description(webpage))
thumbnail = self._proto_relative_url(self._og_search_thumbnail(webpage))
episode = self._download_json(
api_episode_url, display_id,
'Downloading episode JSON metadata')['data'][0]
attributes = episode['attributes']
title = attributes.get('title') or attributes['display_title']
video_id = compat_str(episode['id'])
series = self._search_regex(
(r'<h2>More ([^<]+)</h2>', r'<a[^>]+>See All ([^<]+) Videos<'),
webpage, 'series', fatal=False)
comment_count = int_or_none(self._search_regex(
r'>Comments \((\d+)\)<', webpage,
'comment count', fatal=False))
video_id = self._search_regex(
(r'containerId\s*=\s*["\']episode-(\d+)\1',
r'<div[^<]+id=["\']episode-(\d+)'), webpage,
'video id', default=display_id)
thumbnails = []
for image in episode.get('included', {}).get('images', []):
if image.get('type') == 'episode_image':
img_attributes = image.get('attributes') or {}
for k in ('thumb', 'small', 'medium', 'large'):
img_url = img_attributes.get(k)
if img_url:
thumbnails.append({
'id': k,
'url': img_url,
})
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'series': series,
'episode': episode,
'comment_count': comment_count,
'description': attributes.get('description') or attributes.get('caption'),
'thumbnails': thumbnails,
'series': attributes.get('show_title'),
'season_number': int_or_none(attributes.get('season_number')),
'season_id': attributes.get('season_id'),
'episode': title,
'episode_number': int_or_none(attributes.get('number')),
'episode_id': str_or_none(episode.get('uuid')),
'formats': formats,
'channel_id': attributes.get('channel_id'),
'duration': int_or_none(attributes.get('length')),
}

View File

@@ -32,7 +32,7 @@ class RtlNlIE(InfoExtractor):
'duration': 1167.96,
},
}, {
# best format avaialble a3t
# best format available a3t
'url': 'http://www.rtl.nl/system/videoplayer/derden/rtlnieuws/video_embed.html#uuid=84ae5571-ac25-4225-ae0c-ef8d9efb2aed/autoplay=false',
'md5': 'dea7474214af1271d91ef332fb8be7ea',
'info_dict': {

View File

@@ -1,53 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
js_to_json,
get_element_by_class,
unified_strdate,
)
class RudoIE(InfoExtractor):
_VALID_URL = r'https?://rudo\.video/vod/(?P<id>[0-9a-zA-Z]+)'
_TEST = {
'url': 'http://rudo.video/vod/oTzw0MGnyG',
'md5': '2a03a5b32dd90a04c83b6d391cf7b415',
'info_dict': {
'id': 'oTzw0MGnyG',
'ext': 'mp4',
'title': 'Comentario Tomás Mosciatti',
'upload_date': '20160617',
},
}
@classmethod
def _extract_url(cls, webpage):
mobj = re.search(
r'<iframe[^>]+src=(?P<q1>[\'"])(?P<url>(?:https?:)?//rudo\.video/vod/[0-9a-zA-Z]+)(?P=q1)',
webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id, encoding='iso-8859-1')
jwplayer_data = self._parse_json(self._search_regex(
r'(?s)playerInstance\.setup\(({.+?})\)', webpage, 'jwplayer data'), video_id,
transform_source=lambda s: js_to_json(re.sub(r'encodeURI\([^)]+\)', '""', s)))
info_dict = self._parse_jwplayer_data(
jwplayer_data, video_id, require_title=False, m3u8_id='hls', mpd_id='dash')
info_dict.update({
'title': self._og_search_title(webpage),
'upload_date': unified_strdate(get_element_by_class('date', webpage)),
})
return info_dict

View File

@@ -197,7 +197,7 @@ class SoundcloudIE(InfoExtractor):
'skip_download': True,
},
},
# not avaialble via api.soundcloud.com/i1/tracks/id/streams
# not available via api.soundcloud.com/i1/tracks/id/streams
{
'url': 'https://soundcloud.com/giovannisarani/mezzo-valzer',
'md5': 'e22aecd2bc88e0e4e432d7dcc0a1abf7',
@@ -221,7 +221,7 @@ class SoundcloudIE(InfoExtractor):
}
]
_CLIENT_ID = 'FweeGBOOEOYJWLJN3oEyToGLKhmSz0I7'
_CLIENT_ID = 'BeGVhOrGmfboy1LtiHTQF6Ejpt9ULJCI'
@staticmethod
def _extract_urls(webpage):

View File

@@ -5,6 +5,7 @@ import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
merge_dicts,
orderedSet,
parse_duration,
parse_resolution,
@@ -26,6 +27,8 @@ class SpankBangIE(InfoExtractor):
'description': 'dillion harper masturbates on a bed',
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'silly2587',
'timestamp': 1422571989,
'upload_date': '20150129',
'age_limit': 18,
}
}, {
@@ -106,31 +109,36 @@ class SpankBangIE(InfoExtractor):
for format_id, format_url in stream.items():
if format_id.startswith(STREAM_URL_PREFIX):
if format_url and isinstance(format_url, list):
format_url = format_url[0]
extract_format(
format_id[len(STREAM_URL_PREFIX):], format_url)
self._sort_formats(formats)
info = self._search_json_ld(webpage, video_id, default={})
title = self._html_search_regex(
r'(?s)<h1[^>]*>(.+?)</h1>', webpage, 'title')
r'(?s)<h1[^>]*>(.+?)</h1>', webpage, 'title', default=None)
description = self._search_regex(
r'<div[^>]+\bclass=["\']bottom[^>]+>\s*<p>[^<]*</p>\s*<p>([^<]+)',
webpage, 'description', fatal=False)
thumbnail = self._og_search_thumbnail(webpage)
uploader = self._search_regex(
r'class="user"[^>]*><img[^>]+>([^<]+)',
webpage, 'description', default=None)
thumbnail = self._og_search_thumbnail(webpage, default=None)
uploader = self._html_search_regex(
(r'(?s)<li[^>]+class=["\']profile[^>]+>(.+?)</a>',
r'class="user"[^>]*><img[^>]+>([^<]+)'),
webpage, 'uploader', default=None)
duration = parse_duration(self._search_regex(
r'<div[^>]+\bclass=["\']right_side[^>]+>\s*<span>([^<]+)',
webpage, 'duration', fatal=False))
webpage, 'duration', default=None))
view_count = str_to_int(self._search_regex(
r'([\d,.]+)\s+plays', webpage, 'view count', fatal=False))
r'([\d,.]+)\s+plays', webpage, 'view count', default=None))
age_limit = self._rta_search(webpage)
return {
return merge_dicts({
'id': video_id,
'title': title,
'title': title or video_id,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
@@ -138,7 +146,8 @@ class SpankBangIE(InfoExtractor):
'view_count': view_count,
'formats': formats,
'age_limit': age_limit,
}
}, info
)
class SpankBangPlaylistIE(InfoExtractor):

View File

@@ -22,7 +22,7 @@ class BellatorIE(MTVServicesInfoExtractor):
'only_matching': True,
}]
_FEED_URL = 'http://www.spike.com/feeds/mrss/'
_FEED_URL = 'http://www.bellator.com/feeds/mrss/'
_GEO_COUNTRIES = ['US']

View File

@@ -133,7 +133,7 @@ class TEDIE(InfoExtractor):
def _extract_info(self, webpage):
info_json = self._search_regex(
r'(?s)q\(\s*"\w+.init"\s*,\s*({.+})\)\s*</script>',
r'(?s)q\(\s*"\w+.init"\s*,\s*({.+?})\)\s*</script>',
webpage, 'info json')
return json.loads(info_json)

View File

@@ -2,6 +2,7 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
class TF1IE(InfoExtractor):
@@ -43,12 +44,49 @@ class TF1IE(InfoExtractor):
}, {
'url': 'http://www.tf1.fr/hd1/documentaire/videos/mylene-farmer-d-une-icone.html',
'only_matching': True,
}, {
'url': 'https://www.tf1.fr/tmc/quotidien-avec-yann-barthes/videos/quotidien-premiere-partie-11-juin-2019.html',
'info_dict': {
'id': '13641379',
'ext': 'mp4',
'title': 'md5:f392bc52245dc5ad43771650c96fb620',
'description': 'md5:44bc54f0a21322f5b91d68e76a544eae',
'upload_date': '20190611',
},
'params': {
# Sometimes wat serves the whole file with the --test option
'skip_download': True,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
wat_id = self._html_search_regex(
r'(["\'])(?:https?:)?//www\.wat\.tv/embedframe/.*?(?P<id>\d{8})\1',
webpage, 'wat id', group='id')
wat_id = None
data = self._parse_json(
self._search_regex(
r'__APOLLO_STATE__\s*=\s*({.+?})\s*(?:;|</script>)', webpage,
'data', default='{}'), video_id, fatal=False)
if data:
try:
wat_id = next(
video.get('streamId')
for key, video in data.items()
if isinstance(video, dict)
and video.get('slug') == video_id)
if not isinstance(wat_id, compat_str) or not wat_id.isdigit():
wat_id = None
except StopIteration:
pass
if not wat_id:
wat_id = self._html_search_regex(
(r'(["\'])(?:https?:)?//www\.wat\.tv/embedframe/.*?(?P<id>\d{8})\1',
r'(["\']?)streamId\1\s*:\s*(["\']?)(?P<id>\d+)\2'),
webpage, 'wat id', group='id')
return self.url_result('wat:%s' % wat_id, 'Wat')

View File

@@ -1,32 +1,35 @@
# coding: utf-8
from __future__ import unicode_literals
from .mtv import MTVServicesInfoExtractor
from .spike import ParamountNetworkIE
class TVLandIE(MTVServicesInfoExtractor):
class TVLandIE(ParamountNetworkIE):
IE_NAME = 'tvland.com'
_VALID_URL = r'https?://(?:www\.)?tvland\.com/(?:video-clips|(?:full-)?episodes)/(?P<id>[^/?#.]+)'
_FEED_URL = 'http://www.tvland.com/feeds/mrss/'
_TESTS = [{
# Geo-restricted. Without a proxy metadata are still there. With a
# proxy it redirects to http://m.tvland.com/app/
'url': 'http://www.tvland.com/episodes/hqhps2/everybody-loves-raymond-the-invasion-ep-048',
'url': 'https://www.tvland.com/episodes/s04pzf/everybody-loves-raymond-the-dog-season-1-ep-19',
'info_dict': {
'description': 'md5:80973e81b916a324e05c14a3fb506d29',
'title': 'The Invasion',
'description': 'md5:84928e7a8ad6649371fbf5da5e1ad75a',
'title': 'The Dog',
},
'playlist': [],
'playlist_mincount': 5,
}, {
'url': 'http://www.tvland.com/video-clips/zea2ev/younger-younger--hilary-duff---little-lies',
'url': 'https://www.tvland.com/video-clips/4n87f2/younger-a-first-look-at-younger-season-6',
'md5': 'e2c6389401cf485df26c79c247b08713',
'info_dict': {
'id': 'b8697515-4bbe-4e01-83d5-fa705ce5fa88',
'id': '891f7d3c-5b5b-4753-b879-b7ba1a601757',
'ext': 'mp4',
'title': 'Younger|December 28, 2015|2|NO-EPISODE#|Younger: Hilary Duff - Little Lies',
'description': 'md5:7d192f56ca8d958645c83f0de8ef0269',
'upload_date': '20151228',
'timestamp': 1451289600,
'title': 'Younger|April 30, 2019|6|NO-EPISODE#|A First Look at Younger Season 6',
'description': 'md5:595ea74578d3a888ae878dfd1c7d4ab2',
'upload_date': '20190430',
'timestamp': 1556658000,
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://www.tvland.com/full-episodes/iu0hz6/younger-a-kiss-is-just-a-kiss-season-3-ep-301',

View File

@@ -317,7 +317,7 @@ class TwitchVodIE(TwitchItemBaseIE):
'Downloading %s access token' % self._ITEM_TYPE)
formats = self._extract_m3u8_formats(
'%s/vod/%s?%s' % (
'%s/vod/%s.m3u8?%s' % (
self._USHER_BASE, item_id,
compat_urllib_parse_urlencode({
'allow_source': 'true',

View File

@@ -428,11 +428,22 @@ class TwitterIE(InfoExtractor):
'params': {
'skip_download': True, # requires ffmpeg
},
}, {
'url': 'https://twitter.com/foobar/status/1087791357756956680',
'info_dict': {
'id': '1087791357756956680',
'ext': 'mp4',
'title': 'Twitter - A new is coming. Some of you got an opt-in to try it now. Check out the emoji button, quick keyboard shortcuts, upgraded trends, advanced search, and more. Let us know your thoughts!',
'thumbnail': r're:^https?://.*\.jpg',
'description': 'md5:66d493500c013e3e2d434195746a7f78',
'uploader': 'Twitter',
'uploader_id': 'Twitter',
'duration': 61.567,
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user_id = mobj.group('user_id')
twid = mobj.group('id')
webpage, urlh = self._download_webpage_handle(
@@ -441,8 +452,13 @@ class TwitterIE(InfoExtractor):
if 'twitter.com/account/suspended' in urlh.geturl():
raise ExtractorError('Account suspended by Twitter.', expected=True)
if user_id is None:
mobj = re.match(self._VALID_URL, urlh.geturl())
user_id = None
redirect_mobj = re.match(self._VALID_URL, urlh.geturl())
if redirect_mobj:
user_id = redirect_mobj.group('user_id')
if not user_id:
user_id = mobj.group('user_id')
username = remove_end(self._og_search_title(webpage), ' on Twitter')

View File

@@ -34,6 +34,7 @@ class VevoIE(VevoBaseIE):
(?:https?://(?:www\.)?vevo\.com/watch/(?!playlist|genre)(?:[^/]+/(?:[^/]+/)?)?|
https?://cache\.vevo\.com/m/html/embed\.html\?video=|
https?://videoplayer\.vevo\.com/embed/embedded\?videoId=|
https?://embed\.vevo\.com/.*?[?&]isrc=|
vevo:)
(?P<id>[^&?#]+)'''
@@ -144,6 +145,9 @@ class VevoIE(VevoBaseIE):
# Geo-restricted to Netherlands/Germany
'url': 'http://www.vevo.com/watch/boostee/pop-corn-clip-officiel/FR1A91600909',
'only_matching': True,
}, {
'url': 'https://embed.vevo.com/?isrc=USH5V1923499&partnerId=4d61b777-8023-4191-9ede-497ed6c24647&partnerAdCode=',
'only_matching': True,
}]
_VERSIONS = {
0: 'youtube', # only in AuthenticateVideo videoVersions

View File

@@ -16,7 +16,6 @@ from ..utils import (
determine_ext,
ExtractorError,
js_to_json,
InAdvancePagedList,
int_or_none,
merge_dicts,
NO_DEFAULT,
@@ -814,7 +813,8 @@ class VimeoChannelIE(VimeoBaseInfoExtractor):
return '%s/videos/page:%d/' % (base_url, pagenum)
def _extract_list_title(self, webpage):
return self._TITLE or self._html_search_regex(self._TITLE_RE, webpage, 'list title')
return self._TITLE or self._html_search_regex(
self._TITLE_RE, webpage, 'list title', fatal=False)
def _login_list_password(self, page_url, list_id, webpage):
login_form = self._search_regex(
@@ -955,7 +955,7 @@ class VimeoGroupsIE(VimeoAlbumIE):
}]
def _extract_list_title(self, webpage):
return self._og_search_title(webpage)
return self._og_search_title(webpage, fatal=False)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@@ -1065,7 +1065,7 @@ class VimeoWatchLaterIE(VimeoChannelIE):
return self._extract_videos('watchlater', 'https://vimeo.com/watchlater')
class VimeoLikesIE(InfoExtractor):
class VimeoLikesIE(VimeoChannelIE):
_VALID_URL = r'https://(?:www\.)?vimeo\.com/(?P<id>[^/]+)/likes/?(?:$|[?#]|sort:)'
IE_NAME = 'vimeo:likes'
IE_DESC = 'Vimeo user likes'
@@ -1073,55 +1073,20 @@ class VimeoLikesIE(InfoExtractor):
'url': 'https://vimeo.com/user755559/likes/',
'playlist_mincount': 293,
'info_dict': {
'id': 'user755559_likes',
'description': 'See all the videos urza likes',
'title': 'Videos urza likes',
'id': 'user755559',
'title': 'urzas Likes',
},
}, {
'url': 'https://vimeo.com/stormlapse/likes',
'only_matching': True,
}]
def _page_url(self, base_url, pagenum):
return '%s/page:%d/' % (base_url, pagenum)
def _real_extract(self, url):
user_id = self._match_id(url)
webpage = self._download_webpage(url, user_id)
page_count = self._int(
self._search_regex(
r'''(?x)<li><a\s+href="[^"]+"\s+data-page="([0-9]+)">
.*?</a></li>\s*<li\s+class="pagination_next">
''', webpage, 'page count', default=1),
'page count', fatal=True)
PAGE_SIZE = 12
title = self._html_search_regex(
r'(?s)<h1>(.+?)</h1>', webpage, 'title', fatal=False)
description = self._html_search_meta('description', webpage)
def _get_page(idx):
page_url = 'https://vimeo.com/%s/likes/page:%d/sort:date' % (
user_id, idx + 1)
webpage = self._download_webpage(
page_url, user_id,
note='Downloading page %d/%d' % (idx + 1, page_count))
video_list = self._search_regex(
r'(?s)<ol class="js-browse_list[^"]+"[^>]*>(.*?)</ol>',
webpage, 'video content')
paths = re.findall(
r'<li[^>]*>\s*<a\s+href="([^"]+)"', video_list)
for path in paths:
yield {
'_type': 'url',
'url': compat_urlparse.urljoin(page_url, path),
}
pl = InAdvancePagedList(_get_page, page_count, PAGE_SIZE)
return {
'_type': 'playlist',
'id': '%s_likes' % user_id,
'title': title,
'description': description,
'entries': pl,
}
return self._extract_videos(user_id, 'https://vimeo.com/%s/likes' % user_id)
class VHXEmbedIE(InfoExtractor):

View File

@@ -4,7 +4,10 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from .once import OnceIE
from ..compat import compat_urllib_parse_unquote
from ..utils import ExtractorError
from ..utils import (
ExtractorError,
int_or_none,
)
class VoxMediaVolumeIE(OnceIE):
@@ -13,18 +16,43 @@ class VoxMediaVolumeIE(OnceIE):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_data = self._parse_json(self._search_regex(
r'Volume\.createVideo\(({.+})\s*,\s*{.*}\s*,\s*\[.*\]\s*,\s*{.*}\);', webpage, 'video data'), video_id)
setup = self._parse_json(self._search_regex(
r'setup\s*=\s*({.+});', webpage, 'setup'), video_id)
video_data = setup.get('video') or {}
info = {
'id': video_id,
'title': video_data.get('title_short'),
'description': video_data.get('description_long') or video_data.get('description_short'),
'thumbnail': video_data.get('brightcove_thumbnail')
}
asset = setup.get('asset') or setup.get('params') or {}
formats = []
hls_url = asset.get('hls_url')
if hls_url:
formats.extend(self._extract_m3u8_formats(
hls_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
mp4_url = asset.get('mp4_url')
if mp4_url:
tbr = self._search_regex(r'-(\d+)k\.', mp4_url, 'bitrate', default=None)
format_id = 'http'
if tbr:
format_id += '-' + tbr
formats.append({
'format_id': format_id,
'url': mp4_url,
'tbr': int_or_none(tbr),
})
if formats:
self._sort_formats(formats)
info['formats'] = formats
return info
for provider_video_type in ('ooyala', 'youtube', 'brightcove'):
provider_video_id = video_data.get('%s_id' % provider_video_type)
if not provider_video_id:
continue
info = {
'id': video_id,
'title': video_data.get('title_short'),
'description': video_data.get('description_long') or video_data.get('description_short'),
'thumbnail': video_data.get('brightcove_thumbnail')
}
if provider_video_type == 'brightcove':
info['formats'] = self._extract_once_formats(provider_video_id)
self._sort_formats(info['formats'])
@@ -39,46 +67,49 @@ class VoxMediaVolumeIE(OnceIE):
class VoxMediaIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:(?:theverge|vox|sbnation|eater|polygon|curbed|racked)\.com|recode\.net)/(?:[^/]+/)*(?P<id>[^/?]+)'
_VALID_URL = r'https?://(?:www\.)?(?:(?:theverge|vox|sbnation|eater|polygon|curbed|racked|funnyordie)\.com|recode\.net)/(?:[^/]+/)*(?P<id>[^/?]+)'
_TESTS = [{
# Volume embed, Youtube
'url': 'http://www.theverge.com/2014/6/27/5849272/material-world-how-google-discovered-what-software-is-made-of',
'info_dict': {
'id': '11eXZobjrG8DCSTgrNjVinU-YmmdYjhe',
'id': 'j4mLW6x17VM',
'ext': 'mp4',
'title': 'Google\'s new material design direction',
'description': 'md5:2f44f74c4d14a1f800ea73e1c6832ad2',
'title': 'Material world: how Google discovered what software is made of',
'description': 'md5:dfc17e7715e3b542d66e33a109861382',
'upload_date': '20190710',
'uploader_id': 'TheVerge',
'uploader': 'The Verge',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['Ooyala'],
'add_ie': ['Youtube'],
}, {
# data-ooyala-id
# Volume embed, Youtube
'url': 'http://www.theverge.com/2014/10/21/7025853/google-nexus-6-hands-on-photos-video-android-phablet',
'md5': 'd744484ff127884cd2ba09e3fa604e4b',
'md5': '4c8f4a0937752b437c3ebc0ed24802b5',
'info_dict': {
'id': 'RkZXU4cTphOCPDMZg5oEounJyoFI0g-B',
'id': 'Gy8Md3Eky38',
'ext': 'mp4',
'title': 'The Nexus 6: hands-on with Google\'s phablet',
'description': 'md5:87a51fe95ff8cea8b5bdb9ac7ae6a6af',
'description': 'md5:d9f0216e5fb932dd2033d6db37ac3f1d',
'uploader_id': 'TheVerge',
'upload_date': '20141021',
'uploader': 'The Verge',
},
'add_ie': ['Ooyala'],
'skip': 'Video Not Found',
'add_ie': ['Youtube'],
'skip': 'similar to the previous test',
}, {
# volume embed
# Volume embed, Youtube
'url': 'http://www.vox.com/2016/3/31/11336640/mississippi-lgbt-religious-freedom-bill',
'info_dict': {
'id': 'wydzk3dDpmRz7PQoXRsTIX6XTkPjYL0b',
'id': 'YCjDnX-Xzhg',
'ext': 'mp4',
'title': 'The new frontier of LGBTQ civil rights, explained',
'description': 'md5:0dc58e94a465cbe91d02950f770eb93f',
'title': "Mississippi's laws are so bad that its anti-LGBTQ law isn't needed to allow discrimination",
'description': 'md5:fc1317922057de31cd74bce91eb1c66c',
'uploader_id': 'voxdotcom',
'upload_date': '20150915',
'uploader': 'Vox',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['Ooyala'],
'add_ie': ['Youtube'],
'skip': 'similar to the previous test',
}, {
# youtube embed
'url': 'http://www.vox.com/2016/3/24/11291692/robot-dance',
@@ -93,6 +124,7 @@ class VoxMediaIE(InfoExtractor):
'uploader': 'Vox',
},
'add_ie': ['Youtube'],
'skip': 'Page no longer contain videos',
}, {
# SBN.VideoLinkset.entryGroup multiple ooyala embeds
'url': 'http://www.sbnation.com/college-football-recruiting/2015/2/3/7970291/national-signing-day-rationalizations-itll-be-ok-itll-be-ok',
@@ -118,10 +150,11 @@ class VoxMediaIE(InfoExtractor):
'description': 'md5:e02d56b026d51aa32c010676765a690d',
},
}],
'skip': 'Page no longer contain videos',
}, {
# volume embed, Brightcove Once
'url': 'https://www.recode.net/2014/6/17/11628066/post-post-pc-ceo-the-full-code-conference-video-of-microsofts-satya',
'md5': '01571a896281f77dc06e084138987ea2',
'md5': '2dbc77b8b0bff1894c2fce16eded637d',
'info_dict': {
'id': '1231c973d',
'ext': 'mp4',

View File

@@ -64,7 +64,15 @@ class VRVBaseIE(InfoExtractor):
def _call_cms(self, path, video_id, note):
if not self._CMS_SIGNING:
self._CMS_SIGNING = self._call_api('index', video_id, 'CMS Signing')['cms_signing']
index = self._call_api('index', video_id, 'CMS Signing')
self._CMS_SIGNING = index.get('cms_signing') or {}
if not self._CMS_SIGNING:
for signing_policy in index.get('signing_policies', []):
signing_path = signing_policy.get('path')
if signing_path and signing_path.startswith('/cms/'):
name, value = signing_policy.get('name'), signing_policy.get('value')
if name and value:
self._CMS_SIGNING[name] = value
return self._download_json(
self._API_DOMAIN + path, video_id, query=self._CMS_SIGNING,
note='Downloading %s JSON metadata' % note, headers=self.geo_verification_headers())

View File

@@ -32,6 +32,10 @@ class VzaarIE(InfoExtractor):
'ext': 'mp3',
'title': 'MP3',
},
}, {
# with null videoTitle
'url': 'https://view.vzaar.com/20313539/download',
'only_matching': True,
}]
@staticmethod
@@ -45,7 +49,7 @@ class VzaarIE(InfoExtractor):
video_data = self._download_json(
'http://view.vzaar.com/v2/%s/video' % video_id, video_id)
title = video_data['videoTitle']
title = video_data.get('videoTitle') or video_id
formats = []

View File

@@ -7,7 +7,7 @@ from ..utils import int_or_none
class XiamiBaseIE(InfoExtractor):
_API_BASE_URL = 'http://www.xiami.com/song/playlist/cat/json/id'
_API_BASE_URL = 'https://emumo.xiami.com/song/playlist/cat/json/id'
def _download_webpage_handle(self, *args, **kwargs):
webpage = super(XiamiBaseIE, self)._download_webpage_handle(*args, **kwargs)

View File

@@ -1,12 +1,14 @@
# coding: utf-8
from __future__ import unicode_literals
import hashlib
import itertools
import json
import re
from .common import InfoExtractor, SearchInfoExtractor
from ..compat import (
compat_str,
compat_urllib_parse,
compat_urlparse,
)
@@ -18,7 +20,9 @@ from ..utils import (
int_or_none,
mimetype2ext,
smuggle_url,
try_get,
unescapeHTML,
url_or_none,
)
from .brightcove import (
@@ -556,3 +560,130 @@ class YahooGyaOIE(InfoExtractor):
'https://gyao.yahoo.co.jp/player/%s/' % video_id.replace(':', '/'),
YahooGyaOPlayerIE.ie_key(), video_id))
return self.playlist_result(entries, program_id)
class YahooJapanNewsIE(InfoExtractor):
IE_NAME = 'yahoo:japannews'
IE_DESC = 'Yahoo! Japan News'
_VALID_URL = r'https?://(?P<host>(?:news|headlines)\.yahoo\.co\.jp)[^\d]*(?P<id>\d[\d-]*\d)?'
_GEO_COUNTRIES = ['JP']
_TESTS = [{
'url': 'https://headlines.yahoo.co.jp/videonews/ann?a=20190716-00000071-ann-int',
'info_dict': {
'id': '1736242',
'ext': 'mp4',
'title': 'ムン大統領が対日批判を強化“現金化”効果はテレビ朝日系ANN - Yahoo!ニュース',
'description': '韓国の元徴用工らを巡る裁判の原告が弁護士が差し押さえた三菱重工業の資産を売却して - Yahoo!ニュース(テレビ朝日系ANN)',
'thumbnail': r're:^https?://.*\.[a-zA-Z\d]{3,4}$',
},
'params': {
'skip_download': True,
},
}, {
# geo restricted
'url': 'https://headlines.yahoo.co.jp/hl?a=20190721-00000001-oxv-l04',
'only_matching': True,
}, {
'url': 'https://headlines.yahoo.co.jp/videonews/',
'only_matching': True,
}, {
'url': 'https://news.yahoo.co.jp',
'only_matching': True,
}, {
'url': 'https://news.yahoo.co.jp/byline/hashimotojunji/20190628-00131977/',
'only_matching': True,
}, {
'url': 'https://news.yahoo.co.jp/feature/1356',
'only_matching': True
}]
def _extract_formats(self, json_data, content_id):
formats = []
video_data = try_get(
json_data,
lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
list)
for vid in video_data or []:
delivery = vid.get('delivery')
url = url_or_none(vid.get('Url'))
if not delivery or not url:
continue
elif delivery == 'hls':
formats.extend(
self._extract_m3u8_formats(
url, content_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
else:
formats.append({
'url': url,
'format_id': 'http-%s' % compat_str(vid.get('bitrate', '')),
'height': int_or_none(vid.get('height')),
'width': int_or_none(vid.get('width')),
'tbr': int_or_none(vid.get('bitrate')),
})
self._remove_duplicate_formats(formats)
self._sort_formats(formats)
return formats
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host')
display_id = mobj.group('id') or host
webpage = self._download_webpage(url, display_id)
title = self._html_search_meta(
['og:title', 'twitter:title'], webpage, 'title', default=None
) or self._html_search_regex('<title>([^<]+)</title>', webpage, 'title')
if display_id == host:
# Headline page (w/ multiple BC playlists) ('news.yahoo.co.jp', 'headlines.yahoo.co.jp/videonews/', ...)
stream_plists = re.findall(r'plist=(\d+)', webpage) or re.findall(r'plist["\']:\s*["\']([^"\']+)', webpage)
entries = [
self.url_result(
smuggle_url(
'http://players.brightcove.net/5690807595001/HyZNerRl7_default/index.html?playlistId=%s' % plist_id,
{'geo_countries': ['JP']}),
ie='BrightcoveNew', video_id=plist_id)
for plist_id in stream_plists]
return self.playlist_result(entries, playlist_title=title)
# Article page
description = self._html_search_meta(
['og:description', 'description', 'twitter:description'],
webpage, 'description', default=None)
thumbnail = self._og_search_thumbnail(
webpage, default=None) or self._html_search_meta(
'twitter:image', webpage, 'thumbnail', default=None)
space_id = self._search_regex([
r'<script[^>]+class=["\']yvpub-player["\'][^>]+spaceid=([^&"\']+)',
r'YAHOO\.JP\.srch\.\w+link\.onLoad[^;]+spaceID["\' ]*:["\' ]+([^"\']+)',
r'<!--\s+SpaceID=(\d+)'
], webpage, 'spaceid')
content_id = self._search_regex(
r'<script[^>]+class=["\']yvpub-player["\'][^>]+contentid=(?P<contentid>[^&"\']+)',
webpage, 'contentid', group='contentid')
json_data = self._download_json(
'https://feapi-yvpub.yahooapis.jp/v1/content/%s' % content_id,
content_id,
query={
'appid': 'dj0zaiZpPVZMTVFJR0FwZWpiMyZzPWNvbnN1bWVyc2VjcmV0Jng9YjU-',
'output': 'json',
'space_id': space_id,
'domain': host,
'ak': hashlib.md5('_'.join((space_id, host)).encode()).hexdigest(),
'device_type': '1100',
})
formats = self._extract_formats(json_data, content_id)
return {
'id': content_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'formats': formats,
}

View File

@@ -10,6 +10,7 @@ from ..utils import (
ExtractorError,
int_or_none,
float_or_none,
try_get,
)
@@ -51,23 +52,43 @@ class YandexMusicTrackIE(YandexMusicBaseIE):
IE_DESC = 'Яндекс.Музыка - Трек'
_VALID_URL = r'https?://music\.yandex\.(?:ru|kz|ua|by)/album/(?P<album_id>\d+)/track/(?P<id>\d+)'
_TEST = {
_TESTS = [{
'url': 'http://music.yandex.ru/album/540508/track/4878838',
'md5': 'f496818aa2f60b6c0062980d2e00dc20',
'info_dict': {
'id': '4878838',
'ext': 'mp3',
'title': 'Carlo Ambrosio, Carlo Ambrosio & Fabio Di Bari - Gypsy Eyes 1',
'title': 'Carlo Ambrosio & Fabio Di Bari - Gypsy Eyes 1',
'filesize': 4628061,
'duration': 193.04,
'track': 'Gypsy Eyes 1',
'album': 'Gypsy Soul',
'album_artist': 'Carlo Ambrosio',
'artist': 'Carlo Ambrosio, Carlo Ambrosio & Fabio Di Bari',
'artist': 'Carlo Ambrosio & Fabio Di Bari',
'release_year': 2009,
},
'skip': 'Travis CI servers blocked by YandexMusic',
}
}, {
# multiple disks
'url': 'http://music.yandex.ru/album/3840501/track/705105',
'md5': 'ebe7b4e2ac7ac03fe11c19727ca6153e',
'info_dict': {
'id': '705105',
'ext': 'mp3',
'title': 'Hooverphonic - Sometimes',
'filesize': 5743386,
'duration': 239.27,
'track': 'Sometimes',
'album': 'The Best of Hooverphonic',
'album_artist': 'Hooverphonic',
'artist': 'Hooverphonic',
'release_year': 2016,
'genre': 'pop',
'disc_number': 2,
'track_number': 9,
},
'skip': 'Travis CI servers blocked by YandexMusic',
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
@@ -110,9 +131,21 @@ class YandexMusicTrackIE(YandexMusicBaseIE):
'abr': int_or_none(download_data.get('bitrate')),
}
def extract_artist_name(artist):
decomposed = artist.get('decomposed')
if not isinstance(decomposed, list):
return artist['name']
parts = [artist['name']]
for element in decomposed:
if isinstance(element, dict) and element.get('name'):
parts.append(element['name'])
elif isinstance(element, compat_str):
parts.append(element)
return ''.join(parts)
def extract_artist(artist_list):
if artist_list and isinstance(artist_list, list):
artists_names = [a['name'] for a in artist_list if a.get('name')]
artists_names = [extract_artist_name(a) for a in artist_list if a.get('name')]
if artists_names:
return ', '.join(artists_names)
@@ -121,10 +154,17 @@ class YandexMusicTrackIE(YandexMusicBaseIE):
album = albums[0]
if isinstance(album, dict):
year = album.get('year')
disc_number = int_or_none(try_get(
album, lambda x: x['trackPosition']['volume']))
track_number = int_or_none(try_get(
album, lambda x: x['trackPosition']['index']))
track_info.update({
'album': album.get('title'),
'album_artist': extract_artist(album.get('artists')),
'release_year': int_or_none(year),
'genre': album.get('genre'),
'disc_number': disc_number,
'track_number': track_number,
})
track_artist = extract_artist(track.get('artists'))
@@ -152,7 +192,7 @@ class YandexMusicAlbumIE(YandexMusicPlaylistBaseIE):
IE_DESC = 'Яндекс.Музыка - Альбом'
_VALID_URL = r'https?://music\.yandex\.(?:ru|kz|ua|by)/album/(?P<id>\d+)/?(\?|$)'
_TEST = {
_TESTS = [{
'url': 'http://music.yandex.ru/album/540508',
'info_dict': {
'id': '540508',
@@ -160,7 +200,15 @@ class YandexMusicAlbumIE(YandexMusicPlaylistBaseIE):
},
'playlist_count': 50,
'skip': 'Travis CI servers blocked by YandexMusic',
}
}, {
'url': 'https://music.yandex.ru/album/3840501',
'info_dict': {
'id': '3840501',
'title': 'Hooverphonic - The Best of Hooverphonic (2016)',
},
'playlist_count': 33,
'skip': 'Travis CI servers blocked by YandexMusic',
}]
def _real_extract(self, url):
album_id = self._match_id(url)
@@ -169,7 +217,7 @@ class YandexMusicAlbumIE(YandexMusicPlaylistBaseIE):
'http://music.yandex.ru/handlers/album.jsx?album=%s' % album_id,
album_id, 'Downloading album JSON')
entries = self._build_playlist(album['volumes'][0])
entries = self._build_playlist([track for volume in album['volumes'] for track in volume])
title = '%s - %s' % (album['artists'][0]['name'], album['title'])
year = album.get('year')

View File

@@ -37,7 +37,7 @@ class YourPornIE(InfoExtractor):
self._search_regex(
r'data-vnfo=(["\'])(?P<data>{.+?})\1', webpage, 'data info',
group='data'),
video_id)[video_id]).replace('/cdn/', '/cdn4/')
video_id)[video_id]).replace('/cdn/', '/cdn5/')
title = (self._search_regex(
r'<[^>]+\bclass=["\']PostEditTA[^>]+>([^<]+)', webpage, 'title',

View File

@@ -27,6 +27,7 @@ from ..compat import (
compat_str,
)
from ..utils import (
bool_or_none,
clean_html,
dict_get,
error_to_compat_str,
@@ -116,6 +117,8 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
'f.req': json.dumps(f_req),
'flowName': 'GlifWebSignIn',
'flowEntry': 'ServiceLogin',
# TODO: reverse actual botguard identifier generation algo
'bgRequest': '["identifier",""]',
})
return self._download_json(
url, None, note=note, errnote=errnote,
@@ -368,10 +371,15 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
(?:www\.)?hooktube\.com/|
(?:www\.)?yourepeat\.com/|
tube\.majestyc\.net/|
# Invidious instances taken from https://github.com/omarroth/invidious/wiki/Invidious-Instances
(?:(?:www|dev)\.)?invidio\.us/|
(?:www\.)?invidiou\.sh/|
(?:www\.)?invidious\.snopyta\.org/|
(?:(?:www|no)\.)?invidiou\.sh/|
(?:(?:www|fi|de)\.)?invidious\.snopyta\.org/|
(?:www\.)?invidious\.kabi\.tk/|
(?:www\.)?invidious\.enkirton\.net/|
(?:www\.)?invidious\.13ad\.de/|
(?:www\.)?invidious\.mastodon\.host/|
(?:www\.)?tube\.poal\.co/|
(?:www\.)?vid\.wxzm\.sx/|
youtube\.googleapis\.com/) # the various hostnames, with wildcard subdomains
(?:.*?\#/)? # handle anchor (#/) redirect urls
@@ -1314,6 +1322,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
funcname = self._search_regex(
(r'\b[cs]\s*&&\s*[adf]\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
r'\b[a-zA-Z0-9]+\s*&&\s*[a-zA-Z0-9]+\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
r'(?P<sig>[a-zA-Z0-9$]+)\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\)',
# Obsolete patterns
r'(["\'])signature\1\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
r'\.sig\|\|(?P<sig>[a-zA-Z0-9$]+)\(',
@@ -1887,6 +1896,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if view_count is None and video_details:
view_count = int_or_none(video_details.get('viewCount'))
if is_live is None:
is_live = bool_or_none(video_details.get('isLive'))
# Check for "rental" videos
if 'ypc_video_rental_bar_text' in video_info and 'author' not in video_info:
raise ExtractorError('"rental" videos not supported. See https://github.com/ytdl-org/youtube-dl/issues/359 for more information.', expected=True)
@@ -2420,7 +2432,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
(%(playlist_id)s)
)""" % {'playlist_id': YoutubeBaseInfoExtractor._PLAYLIST_ID_RE}
_TEMPLATE_URL = 'https://www.youtube.com/playlist?list=%s'
_VIDEO_RE = r'href="\s*/watch\?v=(?P<id>[0-9A-Za-z_-]{11})&amp;[^"]*?index=(?P<index>\d+)(?:[^>]+>(?P<title>[^<]+))?'
_VIDEO_RE = r'href="\s*/watch\?v=(?P<id>[0-9A-Za-z_-]{11})(?:&amp;(?:[^"]*?index=(?P<index>\d+))?(?:[^>]+>(?P<title>[^<]+))?)?'
IE_NAME = 'youtube:playlist'
_TESTS = [{
'url': 'https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re',
@@ -2443,6 +2455,8 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
'info_dict': {
'title': '29C3: Not my department',
'id': 'PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC',
'uploader': 'Christiaan008',
'uploader_id': 'ChRiStIaAn008',
},
'playlist_count': 95,
}, {
@@ -2451,6 +2465,8 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
'info_dict': {
'title': '[OLD]Team Fortress 2 (Class-based LP)',
'id': 'PLBB231211A4F62143',
'uploader': 'Wickydoo',
'uploader_id': 'Wickydoo',
},
'playlist_mincount': 26,
}, {
@@ -2459,6 +2475,8 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
'info_dict': {
'title': 'Uploads from Cauchemar',
'id': 'UUBABnxM4Ar9ten8Mdjj1j0Q',
'uploader': 'Cauchemar',
'uploader_id': 'Cauchemar89',
},
'playlist_mincount': 799,
}, {
@@ -2476,13 +2494,17 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
'info_dict': {
'title': 'JODA15',
'id': 'PL6IaIsEjSbf96XFRuNccS_RuEXwNdsoEu',
'uploader': 'milan',
'uploader_id': 'UCEI1-PVPcYXjB73Hfelbmaw',
}
}, {
'url': 'http://www.youtube.com/embed/_xDOZElKyNU?list=PLsyOSbh5bs16vubvKePAQ1x3PhKavfBIl',
'playlist_mincount': 485,
'info_dict': {
'title': '2017 華語最新單曲 (2/24更新)',
'title': '2018 Chinese New Singles (11/6 updated)',
'id': 'PLsyOSbh5bs16vubvKePAQ1x3PhKavfBIl',
'uploader': 'LBK',
'uploader_id': 'sdragonfang',
}
}, {
'note': 'Embedded SWF player',
@@ -2491,13 +2513,16 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
'info_dict': {
'title': 'JODA7',
'id': 'YN5VISEtHet5D4NEvfTd0zcgFk84NqFZ',
}
},
'skip': 'This playlist does not exist',
}, {
'note': 'Buggy playlist: the webpage has a "Load more" button but it doesn\'t have more videos',
'url': 'https://www.youtube.com/playlist?list=UUXw-G3eDE9trcvY2sBMM_aA',
'info_dict': {
'title': 'Uploads from Interstellar Movie',
'id': 'UUXw-G3eDE9trcvY2sBMM_aA',
'uploader': 'Interstellar Movie',
'uploader_id': 'InterstellarMovie1',
},
'playlist_mincount': 21,
}, {
@@ -2522,6 +2547,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
'params': {
'skip_download': True,
},
'skip': 'This video is not available.',
'add_ie': [YoutubeIE.ie_key()],
}, {
'url': 'https://youtu.be/yeWKywCrFtk?list=PL2qgrgXsNUG5ig9cat4ohreBjYLAPC0J5',
@@ -2533,7 +2559,6 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
'uploader_id': 'backuspagemuseum',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/backuspagemuseum',
'upload_date': '20161008',
'license': 'Standard YouTube License',
'description': 'md5:800c0c78d5eb128500bffd4f0b4f2e8a',
'categories': ['Nonprofits & Activism'],
'tags': list,
@@ -2544,6 +2569,16 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
'noplaylist': True,
'skip_download': True,
},
}, {
# https://github.com/ytdl-org/youtube-dl/issues/21844
'url': 'https://www.youtube.com/playlist?list=PLzH6n4zXuckpfMu_4Ff8E7Z1behQks5ba',
'info_dict': {
'title': 'Data Analysis with Dr Mike Pound',
'id': 'PLzH6n4zXuckpfMu_4Ff8E7Z1behQks5ba',
'uploader_id': 'Computerphile',
'uploader': 'Computerphile',
},
'playlist_mincount': 11,
}, {
'url': 'https://youtu.be/uWyaPkt-VOI?list=PL9D9FC436B881BA21',
'only_matching': True,
@@ -2710,6 +2745,8 @@ class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
'info_dict': {
'id': 'UUKfVa3S1e4PHvxWcwyMMg8w',
'title': 'Uploads from lex will',
'uploader': 'lex will',
'uploader_id': 'UCKfVa3S1e4PHvxWcwyMMg8w',
}
}, {
'note': 'Age restricted channel',
@@ -2719,6 +2756,8 @@ class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
'info_dict': {
'id': 'UUs0ifCMCm1icqRbqhUINa0w',
'title': 'Uploads from Deus Ex',
'uploader': 'Deus Ex',
'uploader_id': 'DeusExOfficial',
},
}, {
'url': 'https://invidio.us/channel/UC23qupoDRn9YOAVzeoxjOQA',
@@ -2803,6 +2842,8 @@ class YoutubeUserIE(YoutubeChannelIE):
'info_dict': {
'id': 'UUfX55Sx5hEFjoC3cNs6mCUQ',
'title': 'Uploads from The Linux Foundation',
'uploader': 'The Linux Foundation',
'uploader_id': 'TheLinuxFoundation',
}
}, {
# Only available via https://www.youtube.com/c/12minuteathlete/videos
@@ -2812,6 +2853,8 @@ class YoutubeUserIE(YoutubeChannelIE):
'info_dict': {
'id': 'UUVjM-zV6_opMDx7WYxnjZiQ',
'title': 'Uploads from 12 Minute Athlete',
'uploader': '12 Minute Athlete',
'uploader_id': 'the12minuteathlete',
}
}, {
'url': 'ytuser:phihag',
@@ -2905,7 +2948,7 @@ class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
'playlist_mincount': 4,
'info_dict': {
'id': 'ThirstForScience',
'title': 'Thirst for Science',
'title': 'ThirstForScience',
},
}, {
# with "Load more" button
@@ -2922,6 +2965,7 @@ class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
'id': 'UCiU1dHvZObB2iP6xkJ__Icw',
'title': 'Chem Player',
},
'skip': 'Blocked',
}]

File diff suppressed because it is too large Load Diff

View File

@@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2019.06.21'
__version__ = '2019.07.27'