1
0
mirror of https://github.com/yt-dlp/yt-dlp synced 2025-04-13 18:38:10 -05:00

Compare commits

...

108 Commits

Author SHA1 Message Date
Subrat Lima
74e90dd9b8
[ie/LRTRadio] Add extractor (#12801)
Closes #12745
Authored by: subrat-lima
2025-04-06 23:26:44 +00:00
Snack
1d45e30537
[ie/niconico:live] Fix extractor (#12809)
Closes #12365
Authored by: Snack-X
2025-04-06 23:24:58 +00:00
Frank Aurich
3c1c75ecb8
[ie/kika] Add playlist extractor (#12832)
Closes #3658
Authored by: 1100101
2025-04-06 21:04:24 +02:00
J.Luis
7faa18b83d
[ie/ivoox] Add extractor (#12768)
Authored by: NeonMan, seproDev

Co-authored-by: sepro <sepro@sepr0.com>
2025-04-06 20:48:07 +02:00
doe1080
a473e59233
[utils] url_or_none: Support WebSocket URLs (#12848)
Authored by: doe1080
2025-04-06 20:46:08 +02:00
sepro
45f01de00e
[utils] _yield_json_ld: Make function less fatal (#12855)
Authored by: seproDev
2025-04-06 20:31:00 +02:00
WouterGordts
db6d1f145a
[ie/mixcloud] Refactor extractor (#12830)
Authored by: WouterGordts, seproDev

Co-authored-by: sepro <sepro@sepr0.com>
2025-04-06 19:51:08 +02:00
sepro
a3f2b54c25
[ie/dzen.ru] Rework extractors (#12852)
Closes #5523, Closes #10818, Closes #11385, Closes #11470
Authored by: seproDev
2025-04-06 17:41:48 +02:00
LN Liberda
91832111a1
[ie/TokFMPodcast] Fix formats extraction (#12842)
Authored by: selfisekai
2025-04-06 17:05:43 +02:00
Ben Faerber
425017531f
[ie/parti] Add extractors (#12769)
Closes #11434
Authored by: benfaerber
2025-04-05 22:09:53 +02:00
sepro
58d0c83457
[ie/rumble] Improve format extraction (#12838)
Closes #12837
Authored by: seproDev
2025-04-05 20:29:57 +02:00
sepro
4ebf41309d
[ie/CrowdBunker] Make format extraction non-fatal (#12836)
Authored by: seproDev
2025-04-05 19:49:51 +02:00
CasperMcFadden95
e1847535e2
[ie/RoyaLive] Add extractor (#12817)
Authored by: CasperMcFadden95
2025-04-03 21:02:24 +02:00
sepro
5361a7c6e2
[ie/vk] Fix chapters extraction (#12821)
Fix 05c8023a27dd37c49163c0498bf98e3e3c1cb4b9

Authored by: seproDev
2025-04-03 19:55:36 +02:00
github-actions[bot]
349f36606f Release 2025.03.31
Created by: bashonly

:ci skip all
2025-03-31 21:54:27 +00:00
bashonly
5e457af57f
[cleanup] Misc (#12802)
Authored by: bashonly
2025-03-31 21:38:21 +00:00
DmitryScaletta
61046c3161
[ie/twitch:clips] Extract portrait formats (#12763)
Authored by: DmitryScaletta
2025-03-31 21:21:14 +00:00
bashonly
07f04005e4
[ie/youtube] Add player_js_variant extractor-arg (#12767)
- Always distinguish between different JS variants' code/functions
- Change naming scheme for nsig and sigfuncs in disk cache

Authored by: bashonly
2025-03-31 19:45:48 +00:00
bashonly
e465b078ea
[ie/on24] Support mainEvent URLs (#12800)
Closes #12782
Authored by: bashonly
2025-03-31 19:25:10 +00:00
bashonly
d63696f23a
[ie/MicrosoftLearnEpisode] Extract more formats (#12799)
Closes #12798
Authored by: bashonly
2025-03-31 19:21:44 +00:00
Muhammad Labeeb
bb321cfdc3
[ie/francaisfacile] Add extractor (#12787)
Authored by: mlabeeb03
2025-03-31 19:06:33 +00:00
Miroslav Bendík
5fc521cbd0
[ie/stvr] Rename extractor from RTVS to STVR (#12788)
Authored by: mireq
2025-03-31 19:04:52 +00:00
bashonly
f033d86b96
[ie/mlbtv] Fix radio-only extraction (#12792)
Authored by: bashonly
2025-03-30 23:28:14 +00:00
bashonly
9a1ec1d36e
[ie/generic] Validate response before checking m3u8 live status (#12784)
Closes #12744
Authored by: bashonly
2025-03-30 23:02:59 +00:00
bashonly
2956035912
[ie/sbs] Fix subtitles extraction (#12785)
Closes #12783
Authored by: bashonly
2025-03-30 22:54:55 +00:00
sepro
22e34adbd7
Add --compat-options 2024 (#12789)
Authored by: seproDev
2025-03-31 00:38:46 +02:00
coletdjnz
6a6d97b2cb
[ie/youtube:tab] Fix playlist continuation extraction (#12777)
Fixes https://github.com/yt-dlp/yt-dlp/issues/12759

Authored by: coletdjnz
2025-03-29 11:13:09 +13:00
github-actions[bot]
3ddbebb3c6 Release 2025.03.27
Created by: bashonly

:ci skip all
2025-03-27 23:45:56 +00:00
bashonly
48be862b32
[ie/youtube] Make signature and nsig extraction more robust (#12761)
Authored by: bashonly, seproDev

Co-authored-by: sepro <sepro@sepr0.com>
2025-03-27 22:31:01 +00:00
bashonly
a8b9ff3c2a
[jsinterp] Fix nested attributes and object extraction (#12760)
Authored by: bashonly, seproDev

Co-authored-by: sepro <sepro@sepr0.com>
2025-03-27 22:28:30 +00:00
github-actions[bot]
6eaa574c82 Release 2025.03.26
Created by: bashonly

:ci skip all
2025-03-26 00:04:51 +00:00
sepro
ecee97b4fa
[ie/youtube] Only cache nsig code on successful decoding (#12750)
Authored by: seproDev, bashonly

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
2025-03-25 23:47:45 +00:00
sepro
a550dfc904
[ie/youtube] Fix signature and nsig extraction for player 4fcd6e4a (#12748)
Closes #12746
Authored by: seproDev
2025-03-25 23:40:58 +00:00
github-actions[bot]
336b33e72f Release 2025.03.25
Created by: bashonly

:ci skip all
2025-03-25 00:07:18 +00:00
sepro
9dde546e7e
[cleanup] Misc (#12694)
Authored by: seproDev
2025-03-25 00:05:02 +00:00
Abdulmohsen
66e0bab814
[ie/TVer] Fix extractor (#12659)
Closes #12643, Closes #12282
Authored by: arabcoders, bashonly

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
2025-03-25 00:00:22 +00:00
doe1080
801afeac91
[ie/streaks] Add extractor (#12679)
Authored by: doe1080
2025-03-24 23:12:09 +00:00
bashonly
86ab79e1a5
[ie] Fix sorting of HLS audio formats by GROUP-ID (#12714)
Closes #11178
Authored by: bashonly
2025-03-24 22:38:22 +00:00
Subrat Lima
3396eb50dc
[ie/17live:vod] Add extractor (#12723)
Closes #12570
Authored by: subrat-lima
2025-03-24 22:26:45 +00:00
fireattack
5086d4aed6
[ie/generic] Fix MPD base URL parsing (#12718)
Closes #12709
Authored by: fireattack
2025-03-24 22:24:09 +00:00
sepro
9491b44032
[utils] js_to_json: Make function less fatal (#12715)
Authored by: seproDev
2025-03-24 22:28:47 +01:00
doe1080
b7fbb5a0a1
[ie/vrsquare] Add extractors (#12515)
Authored by: doe1080
2025-03-24 22:28:09 +01:00
bashonly
4054a2b623
[ie/youtube] Fix PhantomJS nsig fallback (#12728)
Also fixes the NSigDeno plugin

Closes #12724
Authored by: bashonly
2025-03-24 21:22:25 +00:00
bashonly
b9c979461b
[ie/youtube] Fix signature and nsig extraction for player 363db69b (#12725)
Closes #12724
Authored by: bashonly
2025-03-24 21:18:51 +00:00
bashonly
9d5e6de2e7
[ie/9now.com.au] Fix extractor (#12702)
Closes #12591
Authored by: bashonly
2025-03-23 16:35:46 +00:00
Simon Sawicki
9bf23902ce
[rh:curl_cffi] Support curl_cffi 0.10.x (#12670)
Authored by: Grub4K
2025-03-23 00:15:20 +01:00
sepro
be5af3f9e9
[ie/deezer] Remove extractors (#12704)
Authored by: seproDev
2025-03-22 22:53:20 +01:00
sepro
fe4f14b836
[ie/viki] Remove extractors (#12703)
Closes #2907, Closes #2869
Authored by: seproDev
2025-03-22 22:34:07 +01:00
Simon Sawicki
b872ffec50
[core] Fix attribute error on failed VT init (#12696)
Authored by: Grub4K
2025-03-22 21:03:28 +01:00
bashonly
e2dfccaf80
[ie/chzzk:video] Fix extraction (#12692)
Closes #12487
Authored by: dirkf, bashonly

Co-authored-by: dirkf <fieldhouse@gmx.net>
2025-03-22 16:44:05 +00:00
github-actions[bot]
b4488a9e12 Release 2025.03.21
Created by: bashonly

:ci skip all
2025-03-21 23:49:09 +00:00
Simon Sawicki
f36e4b6e65
[cleanup] Misc (#12526)
Authored by: Grub4K, seproDev, gamer191, dirkf

Co-authored-by: sepro <sepro@sepr0.com>
2025-03-21 23:41:56 +00:00
D Trombett
983095485c
[ie/loco] Add extractor (#12667)
Closes #12496
Authored by: DTrombett
2025-03-21 23:24:13 +00:00
Michaël De Boey
bbada3ec07
[ie/ketnet] Remove extractor (#12628)
Authored by: MichaelDeBoey
2025-03-21 23:19:36 +00:00
Michiel Sikma
8305df0001
[ie/soop] Fix timestamp extraction (#12609)
Closes #12606
Authored by: msikma
2025-03-21 23:16:30 +00:00
bashonly
7223d29569
[ie/mitele] Fix extractor (#12689)
Closes #12655
Authored by: bashonly
2025-03-21 23:14:46 +00:00
bashonly
f5fb2229e6
[ie/BilibiliPlaylist] Fix extractor (#12690)
Closes #12651
Authored by: bashonly
2025-03-21 23:04:58 +00:00
JChris246
89a68c4857
[ie/jamendo] Fix thumbnail extraction (#12622)
Closes #11779
Authored by: JChris246, bashonly

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
2025-03-21 23:04:34 +00:00
sepro
9b868518a1
[ie/youtube] Fix nsig and signature extraction for player 643afba4 (#12684)
Closes #12677, Closes #12682
Authored by: seproDev, bashonly

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
2025-03-21 20:58:10 +00:00
D Trombett
2ee3a0aff9
[ie/tv8.it] Add live and playlist extractors (#12569)
Closes #12542
Authored by: DTrombett
2025-03-16 23:10:16 +01:00
Arc8ne
01a8be4c23
[ie/Canalsurmas] Add extractor (#12497)
Closes #5516
Authored by: Arc8ne
2025-03-16 23:03:10 +01:00
Refael Ackermann
ebac65aa9e
[ie/NBCStations] Fix extractor (#12534)
Authored by: refack
2025-03-16 21:41:32 +00:00
thedenv
4815dac131
[ie/msn] Rework extractor (#12513)
Closes #3225
Authored by: thedenv, seproDev

Co-authored-by: sepro <sepro@sepr0.com>
2025-03-16 19:54:46 +01:00
Simon Sawicki
95f8df2f79
[networking] Always add unsupported suffix on version mismatch (#12626)
Authored by: Grub4K
2025-03-16 12:45:44 +01:00
coletdjnz
e67d786c7c
[ie/youtube] Warn on DRM formats (#12593)
Authored by: coletdjnz
2025-03-16 10:28:16 +13:00
sepro
d9a53cc1e6
[ie/reddit] Truncate title (#12567)
Authored by: seproDev
2025-03-15 22:16:00 +01:00
sepro
83b119dadb
[ie/tiktok] Truncate title (#12566)
Authored by: seproDev
2025-03-15 22:15:29 +01:00
sepro
06f6de78db
[ie/twitter] Truncate title (#12560)
Authored by: seproDev
2025-03-15 22:15:03 +01:00
sepro
3380febe99
[ie/youtube] Player client maintenance (#12603)
Authored by: seproDev
2025-03-15 21:57:56 +01:00
rysson
be0d819e11
[ie/cda] Fix login support (#12552)
Closes #10306
Authored by: rysson
2025-03-15 21:47:50 +01:00
Michaël De Boey
df9ebeec00
[ie/vrtmax] Rework extractor (#12479)
Closes #7997, Closes #8174, Closes #9375
Authored by: MichaelDeBoey, bergoid, seproDev

Co-authored-by: bergoid <bergoid@users.noreply.github.com>
Co-authored-by: sepro <sepro@sepr0.com>
2025-03-15 21:29:22 +01:00
fireattack
17504f2535
[ie/openrec] Fix _VALID_URL (#12608)
Authored by: fireattack
2025-03-15 17:14:01 +01:00
coletdjnz
4432a9390c
[ie/youtube] Split into package (#12557)
Authored by: coletdjnz
2025-03-13 17:37:33 +13:00
sepro
05c8023a27
[ie/vk] Improve metadata extraction (#12510)
Closes #12509
Authored by: seproDev
2025-03-07 22:14:38 +01:00
bashonly
bd0a668169
[ie/pinterest] Fix extractor (#12538)
Closes #12529
Authored by: mikf

Co-authored-by: =?UTF-8?q?Mike=20F=C3=A4hrmann?= <mike_faehrmann@web.de>
2025-03-05 06:38:23 +00:00
bashonly
b8b4754704
[ie/twitter] Fix syndication token generation (#12537)
Fix 14cd7f3443c6da4d49edaefcc12da9dee86e243e

Authored by: bashonly
2025-03-05 06:22:52 +00:00
u-spec-png
9d70abe4de
[ie/N1] Fix extraction of newer articles (#12514)
Authored by: u-spec-png
2025-03-04 01:51:23 +01:00
sepro
8eb9c1bf3b
[ie/RTP] Rework extractor (#11638)
Closes #4661, Closes #10393, Closes #11244
Authored by: seproDev, vallovic, red-acid, pferreir, somini

Co-authored-by: vallovic <vallovic@gmail.com>
Co-authored-by: red-acid <161967284+red-acid@users.noreply.github.com>
Co-authored-by: Pedro Ferreira <pedro@dete.st>
Co-authored-by: somini <dev@somini.xyz>
2025-03-04 00:46:18 +01:00
fries1234
42b7440963
[ie/tvw] Add extractor (#12271)
Authored by: fries1234
2025-03-03 23:25:30 +01:00
sepro
172d5fcd77
[ie/MagellanTV] Fix extractor (#12505)
Closes #12498
Authored by: seproDev
2025-03-03 22:55:03 +01:00
Simon Sawicki
7d18fed8f1
[networking] Add keep_header_casing extension (#11652)
Authored by: coletdjnz, Grub4K

Co-authored-by: coletdjnz <coletdjnz@protonmail.com>
2025-03-03 00:10:01 +01:00
coletdjnz
79ec2fdff7
[ie/youtube] Warn on missing formats due to SSAP (#12483)
See https://github.com/yt-dlp/yt-dlp/issues/12482

Authored by: coletdjnz
2025-02-28 19:33:31 +13:00
sepro
3042afb5fe
[ie/CultureUnplugged] Extend _VALID_URL (#12486)
Closes #12477
Authored by: seproDev
2025-02-26 19:39:50 +01:00
sepro
ad60137c14
[ie/Dailymotion] Improve embed detection (#12464)
Closes #12453
Authored by: seproDev
2025-02-26 19:36:33 +01:00
4ft35t
0bb3978862
[ie/weibo] Support playlists (#12284)
Closes #12283
Authored by: 4ft35t
2025-02-23 19:16:06 +00:00
XPA
7508e34f20
[ie/niconico] Fix format sorting (#12442)
Authored by: xpadev-net
2025-02-23 19:07:08 +00:00
bashonly
9807181cfb
[ie/lbry] Make m3u8 format extraction non-fatal (#12463)
Closes #12459
Authored by: bashonly
2025-02-23 18:24:48 +00:00
bashonly
7126b47260
[ie/lbry] Raise appropriate error for non-media files (#12462)
Closes #12182
Authored by: bashonly
2025-02-23 17:59:22 +00:00
bashonly
eb1417786a
[ie/gem.cbc.ca] Fix login support (#12414)
Closes #12406
Authored by: bashonly
2025-02-23 09:56:47 +00:00
bashonly
6933f5670c
[ie/playsuisse] Fix login support (#12444)
Closes #12425
Authored by: bashonly
2025-02-23 09:22:51 +00:00
Alexander Seiler
26a502fc72
[ie/azmedien] Fix extractor (#12375)
Authored by: goggle
2025-02-23 09:14:35 +00:00
Ben Faerber
652827d5a0
[ie/softwhiteunderbelly] Add extractor (#12281)
Authored by: benfaerber
2025-02-23 09:11:58 +00:00
Pedro Belo
0e1697232f
[ie/globo] Fix subtitles extraction (#12270)
Authored by: pedro
2025-02-23 08:57:27 +00:00
Kenshin9977
9f77e04c76
Fix external downloader availability when using --ffmpeg-location (#12318)
This fix is only applicable to the CLI option

Authored by: Kenshin9977
2025-02-23 08:50:43 +00:00
Simon Sawicki
c034d65548
Fix lazy extractor state (Fix 4445f37a7a66b248dbd8376c43137e6e441f138e) (#12452)
Authored by: coletdjnz, Grub4K, pukkandan
2025-02-23 09:44:27 +01:00
bashonly
480125560a
[ie/instagram] Improve error handling (#12410)
Closes #5967, Closes #6294, Closes #7328, Closes #8452
Authored by: bashonly
2025-02-23 08:35:22 +00:00
bashonly
a59abe0636
[ie/instagram] Fix extraction of older private posts (#12451)
Authored by: bashonly
2025-02-23 08:31:00 +00:00
Chris Ellsworth
a90641c836
[ie/instagram] Add app_id extractor-arg (#12359)
Authored by: chrisellsworth
2025-02-23 08:16:04 +00:00
fireattack
65c3c58c0a
[ie/instagram:story] Support --no-playlist (#12397)
Closes #12395
Authored by: fireattack
2025-02-23 07:24:21 +00:00
bashonly
99ea297875
[ie/tiktok] Improve error handling (#12445)
Closes #8678
Authored by: bashonly
2025-02-23 06:53:13 +00:00
bashonly
6deeda5c11
[ie/soundcloud] Fix thumbnail extraction (#12447)
Closes #11835, Closes #12435
Authored by: bashonly
2025-02-23 06:20:53 +00:00
Refael Ackermann
7f3006eb0c
[ie/wsj] Support opinion URLs and impersonation (#12431)
Authored by: refack
2025-02-23 00:40:53 +00:00
coletdjnz
4445f37a7a
[core] Load plugins on demand (#11305)
- Adds `--no-plugin-dirs` to disable plugin loading
- `--plugin-dirs` now supports post-processors

Authored by: coletdjnz, Grub4K, pukkandan
2025-02-23 11:00:46 +13:00
sepro
3a1583ca75
[ie/BunnyCdn] Add extractor (#11586)
Also adds BunnyCdnFD

Authored by: seproDev, Grub4K

Co-authored-by: Simon Sawicki <contact@grub4k.xyz>
2025-02-21 22:39:41 +01:00
Simon Sawicki
a3e0c7d3b2
[test] Show all differences for expect_value and expect_dict (#12334)
Authored by: Grub4K
2025-02-21 21:29:07 +01:00
Simon Sawicki
f7a1f2d813
[core] Support emitting ConEmu progress codes (#10649)
Authored by: Grub4K
2025-02-20 20:33:31 +01:00
bashonly
9deed13d7c
[ie/soundcloud] Extract tags (#12420)
Authored by: bashonly
2025-02-20 15:51:08 +00:00
bashonly
c2e6e1d5f7
[ie/niconico:live] Fix thumbnail extraction (#12419)
Closes #12417
Authored by: bashonly
2025-02-20 15:39:06 +00:00
141 changed files with 9242 additions and 6254 deletions

View File

@ -742,3 +742,21 @@ lfavole
mp3butcher
slipinthedove
YoshiTabletopGamer
Arc8ne
benfaerber
chrisellsworth
fries1234
Kenshin9977
MichaelDeBoey
msikma
pedro
pferreir
red-acid
refack
rysson
somini
thedenv
vallovic
arabcoders
mireq
mlabeeb03

View File

@ -4,6 +4,142 @@
# To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master
-->
### 2025.03.31
#### Core changes
- [Add `--compat-options 2024`](https://github.com/yt-dlp/yt-dlp/commit/22e34adbd741e1c7072015debd615dc3fb71c401) ([#12789](https://github.com/yt-dlp/yt-dlp/issues/12789)) by [seproDev](https://github.com/seproDev)
#### Extractor changes
- **francaisfacile**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/bb321cfdc3fd4400598ddb12a15862bc2ac8fc10) ([#12787](https://github.com/yt-dlp/yt-dlp/issues/12787)) by [mlabeeb03](https://github.com/mlabeeb03)
- **generic**: [Validate response before checking m3u8 live status](https://github.com/yt-dlp/yt-dlp/commit/9a1ec1d36e172d252714cef712a6d091e0a0c4f2) ([#12784](https://github.com/yt-dlp/yt-dlp/issues/12784)) by [bashonly](https://github.com/bashonly)
- **microsoftlearnepisode**: [Extract more formats](https://github.com/yt-dlp/yt-dlp/commit/d63696f23a341ee36a3237ccb5d5e14b34c2c579) ([#12799](https://github.com/yt-dlp/yt-dlp/issues/12799)) by [bashonly](https://github.com/bashonly)
- **mlbtv**: [Fix radio-only extraction](https://github.com/yt-dlp/yt-dlp/commit/f033d86b96b36f8c5289dd7c3304f42d4d9f6ff4) ([#12792](https://github.com/yt-dlp/yt-dlp/issues/12792)) by [bashonly](https://github.com/bashonly)
- **on24**: [Support `mainEvent` URLs](https://github.com/yt-dlp/yt-dlp/commit/e465b078ead75472fcb7b86f6ccaf2b5d3bc4c21) ([#12800](https://github.com/yt-dlp/yt-dlp/issues/12800)) by [bashonly](https://github.com/bashonly)
- **sbs**: [Fix subtitles extraction](https://github.com/yt-dlp/yt-dlp/commit/29560359120f28adaaac67c86fa8442eb72daa0d) ([#12785](https://github.com/yt-dlp/yt-dlp/issues/12785)) by [bashonly](https://github.com/bashonly)
- **stvr**: [Rename extractor from RTVS to STVR](https://github.com/yt-dlp/yt-dlp/commit/5fc521cbd0ce7b2410d0935369558838728e205d) ([#12788](https://github.com/yt-dlp/yt-dlp/issues/12788)) by [mireq](https://github.com/mireq)
- **twitch**: clips: [Extract portrait formats](https://github.com/yt-dlp/yt-dlp/commit/61046c31612b30c749cbdae934b7fe26abe659d7) ([#12763](https://github.com/yt-dlp/yt-dlp/issues/12763)) by [DmitryScaletta](https://github.com/DmitryScaletta)
- **youtube**
- [Add `player_js_variant` extractor-arg](https://github.com/yt-dlp/yt-dlp/commit/07f04005e40ebdb368920c511e36e98af0077ed3) ([#12767](https://github.com/yt-dlp/yt-dlp/issues/12767)) by [bashonly](https://github.com/bashonly)
- tab: [Fix playlist continuation extraction](https://github.com/yt-dlp/yt-dlp/commit/6a6d97b2cbc78f818de05cc96edcdcfd52caa259) ([#12777](https://github.com/yt-dlp/yt-dlp/issues/12777)) by [coletdjnz](https://github.com/coletdjnz)
#### Misc. changes
- **cleanup**: Miscellaneous: [5e457af](https://github.com/yt-dlp/yt-dlp/commit/5e457af57fae9645b1b8fa0ed689229c8fb9656b) by [bashonly](https://github.com/bashonly)
### 2025.03.27
#### Core changes
- **jsinterp**: [Fix nested attributes and object extraction](https://github.com/yt-dlp/yt-dlp/commit/a8b9ff3c2a0ae25735e580173becc78545b92572) ([#12760](https://github.com/yt-dlp/yt-dlp/issues/12760)) by [bashonly](https://github.com/bashonly), [seproDev](https://github.com/seproDev)
#### Extractor changes
- **youtube**: [Make signature and nsig extraction more robust](https://github.com/yt-dlp/yt-dlp/commit/48be862b32648bff5b3e553e40fca4dcc6e88b28) ([#12761](https://github.com/yt-dlp/yt-dlp/issues/12761)) by [bashonly](https://github.com/bashonly), [seproDev](https://github.com/seproDev)
### 2025.03.26
#### Extractor changes
- **youtube**
- [Fix signature and nsig extraction for player `4fcd6e4a`](https://github.com/yt-dlp/yt-dlp/commit/a550dfc904a02843a26369ae50dbb7c0febfb30e) ([#12748](https://github.com/yt-dlp/yt-dlp/issues/12748)) by [seproDev](https://github.com/seproDev)
- [Only cache nsig code on successful decoding](https://github.com/yt-dlp/yt-dlp/commit/ecee97b4fa90d51c48f9154c3a6d5a8ffe46cd5c) ([#12750](https://github.com/yt-dlp/yt-dlp/issues/12750)) by [bashonly](https://github.com/bashonly), [seproDev](https://github.com/seproDev)
### 2025.03.25
#### Core changes
- [Fix attribute error on failed VT init](https://github.com/yt-dlp/yt-dlp/commit/b872ffec50fd50f790a5a490e006a369a28a3df3) ([#12696](https://github.com/yt-dlp/yt-dlp/issues/12696)) by [Grub4K](https://github.com/Grub4K)
- **utils**: `js_to_json`: [Make function less fatal](https://github.com/yt-dlp/yt-dlp/commit/9491b44032b330e05bd5eaa546187005d1e8538e) ([#12715](https://github.com/yt-dlp/yt-dlp/issues/12715)) by [seproDev](https://github.com/seproDev)
#### Extractor changes
- [Fix sorting of HLS audio formats by `GROUP-ID`](https://github.com/yt-dlp/yt-dlp/commit/86ab79e1a5182092321102adf6ca34195803b878) ([#12714](https://github.com/yt-dlp/yt-dlp/issues/12714)) by [bashonly](https://github.com/bashonly)
- **17live**: vod: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/3396eb50dcd245b49c0f4aecd6e80ec914095d16) ([#12723](https://github.com/yt-dlp/yt-dlp/issues/12723)) by [subrat-lima](https://github.com/subrat-lima)
- **9now.com.au**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/9d5e6de2e7a47226d1f72c713ad45c88ba01db68) ([#12702](https://github.com/yt-dlp/yt-dlp/issues/12702)) by [bashonly](https://github.com/bashonly)
- **chzzk**: video: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/e2dfccaf808b406d5bcb7dd04ae9ce420752dd6f) ([#12692](https://github.com/yt-dlp/yt-dlp/issues/12692)) by [bashonly](https://github.com/bashonly), [dirkf](https://github.com/dirkf)
- **deezer**: [Remove extractors](https://github.com/yt-dlp/yt-dlp/commit/be5af3f9e91747768c2b41157851bfbe14c663f7) ([#12704](https://github.com/yt-dlp/yt-dlp/issues/12704)) by [seproDev](https://github.com/seproDev)
- **generic**: [Fix MPD base URL parsing](https://github.com/yt-dlp/yt-dlp/commit/5086d4aed6aeb3908c62f49e2d8f74cc0cb05110) ([#12718](https://github.com/yt-dlp/yt-dlp/issues/12718)) by [fireattack](https://github.com/fireattack)
- **streaks**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/801afeac91f97dc0b58cd39cc7e8c50f619dc4e1) ([#12679](https://github.com/yt-dlp/yt-dlp/issues/12679)) by [doe1080](https://github.com/doe1080)
- **tver**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/66e0bab814e4a52ef3e12d81123ad992a29df50e) ([#12659](https://github.com/yt-dlp/yt-dlp/issues/12659)) by [arabcoders](https://github.com/arabcoders), [bashonly](https://github.com/bashonly)
- **viki**: [Remove extractors](https://github.com/yt-dlp/yt-dlp/commit/fe4f14b8369038e7c58f7de546d76de1ce3a91ce) ([#12703](https://github.com/yt-dlp/yt-dlp/issues/12703)) by [seproDev](https://github.com/seproDev)
- **vrsquare**: [Add extractors](https://github.com/yt-dlp/yt-dlp/commit/b7fbb5a0a16a8e8d3e29c29e26ebed677d0d6ea3) ([#12515](https://github.com/yt-dlp/yt-dlp/issues/12515)) by [doe1080](https://github.com/doe1080)
- **youtube**
- [Fix PhantomJS nsig fallback](https://github.com/yt-dlp/yt-dlp/commit/4054a2b623bd1e277b49d2e9abc3d112a4b1c7be) ([#12728](https://github.com/yt-dlp/yt-dlp/issues/12728)) by [bashonly](https://github.com/bashonly)
- [Fix signature and nsig extraction for player `363db69b`](https://github.com/yt-dlp/yt-dlp/commit/b9c979461b244713bf42691a5bc02834e2ba4b2c) ([#12725](https://github.com/yt-dlp/yt-dlp/issues/12725)) by [bashonly](https://github.com/bashonly)
#### Networking changes
- **Request Handler**: curl_cffi: [Support `curl_cffi` 0.10.x](https://github.com/yt-dlp/yt-dlp/commit/9bf23902ceb948b9685ce1dab575491571720fc6) ([#12670](https://github.com/yt-dlp/yt-dlp/issues/12670)) by [Grub4K](https://github.com/Grub4K)
#### Misc. changes
- **cleanup**: Miscellaneous: [9dde546](https://github.com/yt-dlp/yt-dlp/commit/9dde546e7ee3e1515d88ee3af08b099351455dc0) by [seproDev](https://github.com/seproDev)
### 2025.03.21
#### Core changes
- [Fix external downloader availability when using `--ffmpeg-location`](https://github.com/yt-dlp/yt-dlp/commit/9f77e04c76e36e1cbbf49bc9eb385fa6ef804b67) ([#12318](https://github.com/yt-dlp/yt-dlp/issues/12318)) by [Kenshin9977](https://github.com/Kenshin9977)
- [Load plugins on demand](https://github.com/yt-dlp/yt-dlp/commit/4445f37a7a66b248dbd8376c43137e6e441f138e) ([#11305](https://github.com/yt-dlp/yt-dlp/issues/11305)) by [coletdjnz](https://github.com/coletdjnz), [Grub4K](https://github.com/Grub4K), [pukkandan](https://github.com/pukkandan) (With fixes in [c034d65](https://github.com/yt-dlp/yt-dlp/commit/c034d655487be668222ef9476a16f374584e49a7))
- [Support emitting ConEmu progress codes](https://github.com/yt-dlp/yt-dlp/commit/f7a1f2d8132967a62b0f6d5665c6d2dde2d42c09) ([#10649](https://github.com/yt-dlp/yt-dlp/issues/10649)) by [Grub4K](https://github.com/Grub4K)
#### Extractor changes
- **azmedien**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/26a502fc727d0e91b2db6bf4a112823bcc672e85) ([#12375](https://github.com/yt-dlp/yt-dlp/issues/12375)) by [goggle](https://github.com/goggle)
- **bilibiliplaylist**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/f5fb2229e66cf59d5bf16065bc041b42a28354a0) ([#12690](https://github.com/yt-dlp/yt-dlp/issues/12690)) by [bashonly](https://github.com/bashonly)
- **bunnycdn**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/3a1583ca75fb523cbad0e5e174387ea7b477d175) ([#11586](https://github.com/yt-dlp/yt-dlp/issues/11586)) by [Grub4K](https://github.com/Grub4K), [seproDev](https://github.com/seproDev)
- **canalsurmas**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/01a8be4c23f186329d85f9c78db34a55f3294ac5) ([#12497](https://github.com/yt-dlp/yt-dlp/issues/12497)) by [Arc8ne](https://github.com/Arc8ne)
- **cda**: [Fix login support](https://github.com/yt-dlp/yt-dlp/commit/be0d819e1103195043f6743650781f0d4d343f6d) ([#12552](https://github.com/yt-dlp/yt-dlp/issues/12552)) by [rysson](https://github.com/rysson)
- **cultureunplugged**: [Extend `_VALID_URL`](https://github.com/yt-dlp/yt-dlp/commit/3042afb5fe342d3a00de76704cd7de611acc350e) ([#12486](https://github.com/yt-dlp/yt-dlp/issues/12486)) by [seproDev](https://github.com/seproDev)
- **dailymotion**: [Improve embed detection](https://github.com/yt-dlp/yt-dlp/commit/ad60137c141efa5023fbc0ac8579eaefe8b3d8cc) ([#12464](https://github.com/yt-dlp/yt-dlp/issues/12464)) by [seproDev](https://github.com/seproDev)
- **gem.cbc.ca**: [Fix login support](https://github.com/yt-dlp/yt-dlp/commit/eb1417786a3027b1e7290ec37ef6aaece50ebed0) ([#12414](https://github.com/yt-dlp/yt-dlp/issues/12414)) by [bashonly](https://github.com/bashonly)
- **globo**: [Fix subtitles extraction](https://github.com/yt-dlp/yt-dlp/commit/0e1697232fcbba7551f983fd1ba93bb445cbb08b) ([#12270](https://github.com/yt-dlp/yt-dlp/issues/12270)) by [pedro](https://github.com/pedro)
- **instagram**
- [Add `app_id` extractor-arg](https://github.com/yt-dlp/yt-dlp/commit/a90641c8363fa0c10800b36eb6b01ee22d3a9409) ([#12359](https://github.com/yt-dlp/yt-dlp/issues/12359)) by [chrisellsworth](https://github.com/chrisellsworth)
- [Fix extraction of older private posts](https://github.com/yt-dlp/yt-dlp/commit/a59abe0636dc49b22a67246afe35613571b86f05) ([#12451](https://github.com/yt-dlp/yt-dlp/issues/12451)) by [bashonly](https://github.com/bashonly)
- [Improve error handling](https://github.com/yt-dlp/yt-dlp/commit/480125560a3b9972d29ae0da850aba8109e6bd41) ([#12410](https://github.com/yt-dlp/yt-dlp/issues/12410)) by [bashonly](https://github.com/bashonly)
- story: [Support `--no-playlist`](https://github.com/yt-dlp/yt-dlp/commit/65c3c58c0a67463a150920203cec929045c95a24) ([#12397](https://github.com/yt-dlp/yt-dlp/issues/12397)) by [fireattack](https://github.com/fireattack)
- **jamendo**: [Fix thumbnail extraction](https://github.com/yt-dlp/yt-dlp/commit/89a68c4857ddbaf937ff22f12648baaf6b5af840) ([#12622](https://github.com/yt-dlp/yt-dlp/issues/12622)) by [bashonly](https://github.com/bashonly), [JChris246](https://github.com/JChris246)
- **ketnet**: [Remove extractor](https://github.com/yt-dlp/yt-dlp/commit/bbada3ec0779422cde34f1ce3dcf595da463b493) ([#12628](https://github.com/yt-dlp/yt-dlp/issues/12628)) by [MichaelDeBoey](https://github.com/MichaelDeBoey)
- **lbry**
- [Make m3u8 format extraction non-fatal](https://github.com/yt-dlp/yt-dlp/commit/9807181cfbf87bfa732f415c30412bdbd77cbf81) ([#12463](https://github.com/yt-dlp/yt-dlp/issues/12463)) by [bashonly](https://github.com/bashonly)
- [Raise appropriate error for non-media files](https://github.com/yt-dlp/yt-dlp/commit/7126b472601814b7fd8c9de02069e8fff1764891) ([#12462](https://github.com/yt-dlp/yt-dlp/issues/12462)) by [bashonly](https://github.com/bashonly)
- **loco**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/983095485c731240aae27c950cb8c24a50827b56) ([#12667](https://github.com/yt-dlp/yt-dlp/issues/12667)) by [DTrombett](https://github.com/DTrombett)
- **magellantv**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/172d5fcd778bf2605db7647ebc56b29ed18d24ac) ([#12505](https://github.com/yt-dlp/yt-dlp/issues/12505)) by [seproDev](https://github.com/seproDev)
- **mitele**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/7223d29569a48a35ad132a508c115973866838d3) ([#12689](https://github.com/yt-dlp/yt-dlp/issues/12689)) by [bashonly](https://github.com/bashonly)
- **msn**: [Rework extractor](https://github.com/yt-dlp/yt-dlp/commit/4815dac131d42c51e12c1d05232db0bbbf607329) ([#12513](https://github.com/yt-dlp/yt-dlp/issues/12513)) by [seproDev](https://github.com/seproDev), [thedenv](https://github.com/thedenv)
- **n1**: [Fix extraction of newer articles](https://github.com/yt-dlp/yt-dlp/commit/9d70abe4de401175cbbaaa36017806f16b2df9af) ([#12514](https://github.com/yt-dlp/yt-dlp/issues/12514)) by [u-spec-png](https://github.com/u-spec-png)
- **nbcstations**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/ebac65aa9e0bf9a97c24d00f7977900d2577364b) ([#12534](https://github.com/yt-dlp/yt-dlp/issues/12534)) by [refack](https://github.com/refack)
- **niconico**
- [Fix format sorting](https://github.com/yt-dlp/yt-dlp/commit/7508e34f203e97389f1d04db92140b13401dd724) ([#12442](https://github.com/yt-dlp/yt-dlp/issues/12442)) by [xpadev-net](https://github.com/xpadev-net)
- live: [Fix thumbnail extraction](https://github.com/yt-dlp/yt-dlp/commit/c2e6e1d5f77f3b720a6266f2869eb750d20e5dc1) ([#12419](https://github.com/yt-dlp/yt-dlp/issues/12419)) by [bashonly](https://github.com/bashonly)
- **openrec**: [Fix `_VALID_URL`](https://github.com/yt-dlp/yt-dlp/commit/17504f253564cfad86244de2b6346d07d2300ca5) ([#12608](https://github.com/yt-dlp/yt-dlp/issues/12608)) by [fireattack](https://github.com/fireattack)
- **pinterest**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/bd0a66816934de70312eea1e71c59c13b401dc3a) ([#12538](https://github.com/yt-dlp/yt-dlp/issues/12538)) by [mikf](https://github.com/mikf)
- **playsuisse**: [Fix login support](https://github.com/yt-dlp/yt-dlp/commit/6933f5670cea9c3e2fb16c1caa1eda54d13122c5) ([#12444](https://github.com/yt-dlp/yt-dlp/issues/12444)) by [bashonly](https://github.com/bashonly)
- **reddit**: [Truncate title](https://github.com/yt-dlp/yt-dlp/commit/d9a53cc1e6fd912daf500ca4f19e9ca88994dbf9) ([#12567](https://github.com/yt-dlp/yt-dlp/issues/12567)) by [seproDev](https://github.com/seproDev)
- **rtp**: [Rework extractor](https://github.com/yt-dlp/yt-dlp/commit/8eb9c1bf3b9908cca22ef043602aa24fb9f352c6) ([#11638](https://github.com/yt-dlp/yt-dlp/issues/11638)) by [pferreir](https://github.com/pferreir), [red-acid](https://github.com/red-acid), [seproDev](https://github.com/seproDev), [somini](https://github.com/somini), [vallovic](https://github.com/vallovic)
- **softwhiteunderbelly**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/652827d5a076c9483c36654ad2cf3fe46219baf4) ([#12281](https://github.com/yt-dlp/yt-dlp/issues/12281)) by [benfaerber](https://github.com/benfaerber)
- **soop**: [Fix timestamp extraction](https://github.com/yt-dlp/yt-dlp/commit/8305df00012ff8138a6ff95279d06b54ac607f63) ([#12609](https://github.com/yt-dlp/yt-dlp/issues/12609)) by [msikma](https://github.com/msikma)
- **soundcloud**
- [Extract tags](https://github.com/yt-dlp/yt-dlp/commit/9deed13d7cce6d3647379e50589c92de89227509) ([#12420](https://github.com/yt-dlp/yt-dlp/issues/12420)) by [bashonly](https://github.com/bashonly)
- [Fix thumbnail extraction](https://github.com/yt-dlp/yt-dlp/commit/6deeda5c11f34f613724fa0627879f0d607ba1b4) ([#12447](https://github.com/yt-dlp/yt-dlp/issues/12447)) by [bashonly](https://github.com/bashonly)
- **tiktok**
- [Improve error handling](https://github.com/yt-dlp/yt-dlp/commit/99ea2978757a431eeb2a265b3395ccbe4ce202cf) ([#12445](https://github.com/yt-dlp/yt-dlp/issues/12445)) by [bashonly](https://github.com/bashonly)
- [Truncate title](https://github.com/yt-dlp/yt-dlp/commit/83b119dadb0f267f1fb66bf7ed74c097349de79e) ([#12566](https://github.com/yt-dlp/yt-dlp/issues/12566)) by [seproDev](https://github.com/seproDev)
- **tv8.it**: [Add live and playlist extractors](https://github.com/yt-dlp/yt-dlp/commit/2ee3a0aff9be2be3bea60640d3d8a0febaf0acb6) ([#12569](https://github.com/yt-dlp/yt-dlp/issues/12569)) by [DTrombett](https://github.com/DTrombett)
- **tvw**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/42b7440963866e31ff84a5b89030d1c596fa2e6e) ([#12271](https://github.com/yt-dlp/yt-dlp/issues/12271)) by [fries1234](https://github.com/fries1234)
- **twitter**
- [Fix syndication token generation](https://github.com/yt-dlp/yt-dlp/commit/b8b47547049f5ebc3dd680fc7de70ed0ca9c0d70) ([#12537](https://github.com/yt-dlp/yt-dlp/issues/12537)) by [bashonly](https://github.com/bashonly)
- [Truncate title](https://github.com/yt-dlp/yt-dlp/commit/06f6de78db2eceeabd062ab1a3023e0ff9d4df53) ([#12560](https://github.com/yt-dlp/yt-dlp/issues/12560)) by [seproDev](https://github.com/seproDev)
- **vk**: [Improve metadata extraction](https://github.com/yt-dlp/yt-dlp/commit/05c8023a27dd37c49163c0498bf98e3e3c1cb4b9) ([#12510](https://github.com/yt-dlp/yt-dlp/issues/12510)) by [seproDev](https://github.com/seproDev)
- **vrtmax**: [Rework extractor](https://github.com/yt-dlp/yt-dlp/commit/df9ebeec00d658693252978d1ffb885e67aa6ab6) ([#12479](https://github.com/yt-dlp/yt-dlp/issues/12479)) by [bergoid](https://github.com/bergoid), [MichaelDeBoey](https://github.com/MichaelDeBoey), [seproDev](https://github.com/seproDev)
- **weibo**: [Support playlists](https://github.com/yt-dlp/yt-dlp/commit/0bb39788626002a8a67e925580227952c563c8b9) ([#12284](https://github.com/yt-dlp/yt-dlp/issues/12284)) by [4ft35t](https://github.com/4ft35t)
- **wsj**: [Support opinion URLs and impersonation](https://github.com/yt-dlp/yt-dlp/commit/7f3006eb0c0659982bb956d71b0bc806bcb0a5f2) ([#12431](https://github.com/yt-dlp/yt-dlp/issues/12431)) by [refack](https://github.com/refack)
- **youtube**
- [Fix nsig and signature extraction for player `643afba4`](https://github.com/yt-dlp/yt-dlp/commit/9b868518a15599f3d7ef5a1c730dda164c30da9b) ([#12684](https://github.com/yt-dlp/yt-dlp/issues/12684)) by [bashonly](https://github.com/bashonly), [seproDev](https://github.com/seproDev)
- [Player client maintenance](https://github.com/yt-dlp/yt-dlp/commit/3380febe9984c21c79c3147c1d390a4cf339bc4c) ([#12603](https://github.com/yt-dlp/yt-dlp/issues/12603)) by [seproDev](https://github.com/seproDev)
- [Split into package](https://github.com/yt-dlp/yt-dlp/commit/4432a9390c79253ac830702b226d2e558b636725) ([#12557](https://github.com/yt-dlp/yt-dlp/issues/12557)) by [coletdjnz](https://github.com/coletdjnz)
- [Warn on DRM formats](https://github.com/yt-dlp/yt-dlp/commit/e67d786c7cc87bd449d22e0ddef08306891c1173) ([#12593](https://github.com/yt-dlp/yt-dlp/issues/12593)) by [coletdjnz](https://github.com/coletdjnz)
- [Warn on missing formats due to SSAP](https://github.com/yt-dlp/yt-dlp/commit/79ec2fdff75c8c1bb89b550266849ad4dec48dd3) ([#12483](https://github.com/yt-dlp/yt-dlp/issues/12483)) by [coletdjnz](https://github.com/coletdjnz)
#### Networking changes
- [Add `keep_header_casing` extension](https://github.com/yt-dlp/yt-dlp/commit/7d18fed8f1983fe6de4ddc810dfb2761ba5744ac) ([#11652](https://github.com/yt-dlp/yt-dlp/issues/11652)) by [coletdjnz](https://github.com/coletdjnz), [Grub4K](https://github.com/Grub4K)
- [Always add unsupported suffix on version mismatch](https://github.com/yt-dlp/yt-dlp/commit/95f8df2f796d0048119615200758199aedcd7cf4) ([#12626](https://github.com/yt-dlp/yt-dlp/issues/12626)) by [Grub4K](https://github.com/Grub4K)
#### Misc. changes
- **cleanup**: Miscellaneous: [f36e4b6](https://github.com/yt-dlp/yt-dlp/commit/f36e4b6e65cb8403791aae2f520697115cb88dec) by [dirkf](https://github.com/dirkf), [gamer191](https://github.com/gamer191), [Grub4K](https://github.com/Grub4K), [seproDev](https://github.com/seproDev)
- **test**: [Show all differences for `expect_value` and `expect_dict`](https://github.com/yt-dlp/yt-dlp/commit/a3e0c7d3b267abdf3933b709704a28d43bb46503) ([#12334](https://github.com/yt-dlp/yt-dlp/issues/12334)) by [Grub4K](https://github.com/Grub4K)
### 2025.02.19
#### Core changes

View File

@ -337,10 +337,11 @@ If you fork the project on GitHub, you can run your fork's [build workflow](.git
--plugin-dirs PATH Path to an additional directory to search
for plugins. This option can be used
multiple times to add multiple directories.
Note that this currently only works for
extractor plugins; postprocessor plugins can
only be loaded from the default plugin
directories
Use "default" to search the default plugin
directories (default)
--no-plugin-dirs Clear plugin directories to search,
including defaults and those provided by
previous --plugin-dirs
--flat-playlist Do not extract a playlist's URL result
entries; some entry metadata may be missing
and downloading may be bypassed
@ -1768,7 +1769,7 @@ The following extractors use this feature:
#### youtube
* `lang`: Prefer translated metadata (`title`, `description` etc) of this language code (case-sensitive). By default, the video primary language metadata is preferred, with a fallback to `en` translated. See [youtube.py](https://github.com/yt-dlp/yt-dlp/blob/c26f9b991a0681fd3ea548d535919cec1fbbd430/yt_dlp/extractor/youtube.py#L381-L390) for list of supported content language codes
* `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively
* `player_client`: Clients to extract video data from. The main clients are `web`, `ios` and `android`, with variants `_music` and `_creator` (e.g. `ios_creator`); and `mweb`, `android_vr`, `web_safari`, `web_embedded`, `tv` and `tv_embedded` with no variants. By default, `tv,ios,web` is used, or `tv,web` is used when authenticating with cookies. The `web_music` client is added for `music.youtube.com` URLs when logged-in cookies are used. The `tv_embedded` and `web_creator` clients are added for age-restricted videos if account age-verification is required. Some clients, such as `web` and `web_music`, require a `po_token` for their formats to be downloadable. Some clients, such as the `_creator` variants, will only work with authentication. Not all clients support authentication via cookies. You can use `default` for the default clients, or you can use `all` for all clients (not recommended). You can prefix a client with `-` to exclude it, e.g. `youtube:player_client=default,-ios`
* `player_client`: Clients to extract video data from. The currently available clients are `web`, `web_safari`, `web_embedded`, `web_music`, `web_creator`, `mweb`, `ios`, `android`, `android_vr`, `tv` and `tv_embedded`. By default, `tv,ios,web` is used, or `tv,web` is used when authenticating with cookies. The `web_music` client is added for `music.youtube.com` URLs when logged-in cookies are used. The `tv_embedded` and `web_creator` clients are added for age-restricted videos if account age-verification is required. Some clients, such as `web` and `web_music`, require a `po_token` for their formats to be downloadable. Some clients, such as `web_creator`, will only work with authentication. Not all clients support authentication via cookies. You can use `default` for the default clients, or you can use `all` for all clients (not recommended). You can prefix a client with `-` to exclude it, e.g. `youtube:player_client=default,-ios`
* `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause some issues. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) for more details
* `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp.
* `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side)
@ -1781,6 +1782,7 @@ The following extractors use this feature:
* `data_sync_id`: Overrides the account Data Sync ID used in Innertube API requests. This may be needed if you are using an account with `youtube:player_skip=webpage,configs` or `youtubetab:skip=webpage`
* `visitor_data`: Overrides the Visitor Data used in Innertube API requests. This should be used with `player_skip=webpage,configs` and without cookies. Note: this may have adverse effects if used improperly. If a session from a browser is wanted, you should pass cookies instead (which contain the Visitor ID)
* `po_token`: Proof of Origin (PO) Token(s) to use. Comma seperated list of PO Tokens in the format `CLIENT.CONTEXT+PO_TOKEN`, e.g. `youtube:po_token=web.gvs+XXX,web.player=XXX,web_safari.gvs+YYY`. Context can be either `gvs` (Google Video Server URLs) or `player` (Innertube player request)
* `player_js_variant`: The player javascript variant to use for signature and nsig deciphering. The known variants are: `main`, `tce`, `tv`, `tv_es6`, `phone`, `tablet`. Only `main` is recommended as a possible workaround; the others are for debugging purposes. The default is to use what is prescribed by the site, and can be selected with `actual`
#### youtubetab (YouTube playlists, channels, feeds, etc.)
* `skip`: One or more of `webpage` (skip initial webpage download), `authcheck` (allow the download of playlists requiring authentication when no initial webpage is downloaded. This may cause unwanted behavior, see [#1122](https://github.com/yt-dlp/yt-dlp/pull/1122) for more details)
@ -1811,6 +1813,9 @@ The following extractors use this feature:
* `vcodec`: vcodec to ignore - one or more of `h264`, `h265`, `dvh265`
* `dr`: dynamic range to ignore - one or more of `sdr`, `hdr10`, `dv`
#### instagram
* `app_id`: The value of the `X-IG-App-ID` header used for API requests. Default is the web app ID, `936619743392459`
#### niconicochannelplus
* `max_comments`: Maximum number of comments to extract - default is `120`
@ -1862,6 +1867,9 @@ The following extractors use this feature:
#### sonylivseries
* `sort_order`: Episode sort order for series extraction - one of `asc` (ascending, oldest first) or `desc` (descending, newest first). Default is `asc`
#### tver
* `backend`: Backend API to use for extraction - one of `streaks` (default) or `brightcove` (deprecated)
**Note**: These options may be changed/removed in the future without concern for backward compatibility
<!-- MANPAGE: MOVE "INSTALLATION" SECTION HERE -->
@ -2211,7 +2219,7 @@ Some of yt-dlp's default options are different from that of youtube-dl and youtu
* Live chats (if available) are considered as subtitles. Use `--sub-langs all,-live_chat` to download all subtitles except live chat. You can also use `--compat-options no-live-chat` to prevent any live chat/danmaku from downloading
* YouTube channel URLs download all uploads of the channel. To download only the videos in a specific tab, pass the tab's URL. If the channel does not show the requested tab, an error will be raised. Also, `/live` URLs raise an error if there are no live videos instead of silently downloading the entire channel. You may use `--compat-options no-youtube-channel-redirect` to revert all these redirections
* Unavailable videos are also listed for YouTube playlists. Use `--compat-options no-youtube-unavailable-videos` to remove this
* The upload dates extracted from YouTube are in UTC [when available](https://github.com/yt-dlp/yt-dlp/blob/89e4d86171c7b7c997c77d4714542e0383bf0db0/yt_dlp/extractor/youtube.py#L3898-L3900). Use `--compat-options no-youtube-prefer-utc-upload-date` to prefer the non-UTC upload date.
* The upload dates extracted from YouTube are in UTC.
* If `ffmpeg` is used as the downloader, the downloading and merging of formats happen in a single step when possible. Use `--compat-options no-direct-merge` to revert this
* Thumbnail embedding in `mp4` is done with mutagen if possible. Use `--compat-options embed-thumbnail-atomicparsley` to force the use of AtomicParsley instead
* Some internal metadata such as filenames are removed by default from the infojson. Use `--no-clean-infojson` or `--compat-options no-clean-infojson` to revert this
@ -2230,9 +2238,10 @@ For ease of use, a few more compat options are available:
* `--compat-options all`: Use all compat options (**Do NOT use this!**)
* `--compat-options youtube-dl`: Same as `--compat-options all,-multistreams,-playlist-match-filter,-manifest-filesize-approx,-allow-unsafe-ext,-prefer-vp9-sort`
* `--compat-options youtube-dlc`: Same as `--compat-options all,-no-live-chat,-no-youtube-channel-redirect,-playlist-match-filter,-manifest-filesize-approx,-allow-unsafe-ext,-prefer-vp9-sort`
* `--compat-options 2021`: Same as `--compat-options 2022,no-certifi,filename-sanitization,no-youtube-prefer-utc-upload-date`
* `--compat-options 2021`: Same as `--compat-options 2022,no-certifi,filename-sanitization`
* `--compat-options 2022`: Same as `--compat-options 2023,playlist-match-filter,no-external-downloader-progress,prefer-legacy-http-handler,manifest-filesize-approx`
* `--compat-options 2023`: Same as `--compat-options prefer-vp9-sort`. Use this to enable all future compat options
* `--compat-options 2023`: Same as `--compat-options 2024,prefer-vp9-sort`
* `--compat-options 2024`: Currently does nothing. Use this to enable all future compat options
The following compat options restore vulnerable behavior from before security patches:

View File

@ -10,6 +10,9 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from inspect import getsource
from devscripts.utils import get_filename_args, read_file, write_file
from yt_dlp.extractor import import_extractors
from yt_dlp.extractor.common import InfoExtractor, SearchInfoExtractor
from yt_dlp.globals import extractors
NO_ATTR = object()
STATIC_CLASS_PROPERTIES = [
@ -38,8 +41,7 @@ def main():
lazy_extractors_filename = get_filename_args(default_outfile='yt_dlp/extractor/lazy_extractors.py')
from yt_dlp.extractor.extractors import _ALL_CLASSES
from yt_dlp.extractor.common import InfoExtractor, SearchInfoExtractor
import_extractors()
DummyInfoExtractor = type('InfoExtractor', (InfoExtractor,), {'IE_NAME': NO_ATTR})
module_src = '\n'.join((
@ -47,7 +49,7 @@ def main():
' _module = None',
*extra_ie_code(DummyInfoExtractor),
'\nclass LazyLoadSearchExtractor(LazyLoadExtractor):\n pass\n',
*build_ies(_ALL_CLASSES, (InfoExtractor, SearchInfoExtractor), DummyInfoExtractor),
*build_ies(list(extractors.value.values()), (InfoExtractor, SearchInfoExtractor), DummyInfoExtractor),
))
write_file(lazy_extractors_filename, f'{module_src}\n')
@ -73,7 +75,7 @@ def build_ies(ies, bases, attr_base):
if ie in ies:
names.append(ie.__name__)
yield f'\n_ALL_CLASSES = [{", ".join(names)}]'
yield '\n_CLASS_LOOKUP = {%s}' % ', '.join(f'{name!r}: {name}' for name in names)
def sort_ies(ies, ignored_bases):

View File

@ -55,8 +55,7 @@ default = [
"websockets>=13.0",
]
curl-cffi = [
"curl-cffi==0.5.10; os_name=='nt' and implementation_name=='cpython'",
"curl-cffi>=0.5.10,!=0.6.*,<0.7.2; os_name!='nt' and implementation_name=='cpython'",
"curl-cffi>=0.5.10,!=0.6.*,!=0.7.*,!=0.8.*,!=0.9.*,<0.11; implementation_name=='cpython'",
]
secretstorage = [
"cffi",
@ -76,7 +75,7 @@ dev = [
]
static-analysis = [
"autopep8~=2.0",
"ruff~=0.9.0",
"ruff~=0.11.0",
]
test = [
"pytest~=8.1",
@ -384,9 +383,14 @@ select = [
"W391",
"W504",
]
exclude = "*/extractor/lazy_extractors.py,*venv*,*/test/testdata/sigs/player-*.js,.idea,.vscode"
[tool.pytest.ini_options]
addopts = "-ra -v --strict-markers"
addopts = [
"-ra", # summary: all except passed
"--verbose",
"--strict-markers",
]
markers = [
"download",
]

View File

@ -7,6 +7,7 @@ The only reliable way to check if a site is supported is to try it.
- **17live**
- **17live:clip**
- **17live:vod**
- **1News**: 1news.co.nz article videos
- **1tv**: Первый канал
- **20min**
@ -200,7 +201,7 @@ The only reliable way to check if a site is supported is to try it.
- **blogger.com**
- **Bloomberg**
- **Bluesky**
- **BokeCC**
- **BokeCC**: CC视频
- **BongaCams**
- **Boosty**
- **BostonGlobe**
@ -224,6 +225,7 @@ The only reliable way to check if a site is supported is to try it.
- **bt:vestlendingen**: Bergens Tidende - Vestlendingen
- **Bundesliga**
- **Bundestag**
- **BunnyCdn**
- **BusinessInsider**
- **BuzzFeed**
- **BYUtv**: (**Currently broken**)
@ -242,6 +244,7 @@ The only reliable way to check if a site is supported is to try it.
- **CanalAlpha**
- **canalc2.tv**
- **Canalplus**: mycanal.fr and piwiplus.fr
- **Canalsurmas**
- **CaracolTvPlay**: [*caracoltv-play*](## "netrc machine")
- **CartoonNetwork**
- **cbc.ca**
@ -345,8 +348,6 @@ The only reliable way to check if a site is supported is to try it.
- **daystar:clip**
- **DBTV**
- **DctpTv**
- **DeezerAlbum**
- **DeezerPlaylist**
- **democracynow**
- **DestinationAmerica**
- **DetikEmbed**
@ -471,6 +472,7 @@ The only reliable way to check if a site is supported is to try it.
- **FoxNewsVideo**
- **FoxSports**
- **fptplay**: fptplay.vn
- **FrancaisFacile**
- **FranceCulture**
- **FranceInter**
- **francetv**
@ -609,10 +611,10 @@ The only reliable way to check if a site is supported is to try it.
- **Inc**
- **IndavideoEmbed**
- **InfoQ**
- **Instagram**: [*instagram*](## "netrc machine")
- **instagram:story**: [*instagram*](## "netrc machine")
- **instagram:tag**: [*instagram*](## "netrc machine") Instagram hashtag search URLs
- **instagram:user**: [*instagram*](## "netrc machine") Instagram user profile (**Currently broken**)
- **Instagram**
- **instagram:story**
- **instagram:tag**: Instagram hashtag search URLs
- **instagram:user**: Instagram user profile (**Currently broken**)
- **InstagramIOS**: IOS instagram:// URL
- **Internazionale**
- **InternetVideoArchive**
@ -661,7 +663,6 @@ The only reliable way to check if a site is supported is to try it.
- **KelbyOne**: (**Currently broken**)
- **Kenh14Playlist**
- **Kenh14Video**
- **Ketnet**
- **khanacademy**
- **khanacademy:unit**
- **kick:clips**
@ -733,6 +734,7 @@ The only reliable way to check if a site is supported is to try it.
- **Livestreamfails**
- **Lnk**
- **loc**: Library of Congress
- **Loco**
- **loom**
- **loom:folder**
- **LoveHomePorn**
@ -827,11 +829,11 @@ The only reliable way to check if a site is supported is to try it.
- **MotherlessUploader**
- **Motorsport**: motorsport.com (**Currently broken**)
- **MovieFap**
- **Moviepilot**
- **moviepilot**: Moviepilot trailer
- **MoviewPlay**
- **Moviezine**
- **MovingImage**
- **MSN**: (**Currently broken**)
- **MSN**
- **mtg**: MTG services
- **mtv**
- **mtv.de**: (**Currently broken**)
@ -1250,7 +1252,6 @@ The only reliable way to check if a site is supported is to try it.
- **rtve.es:infantil**: RTVE infantil
- **rtve.es:live**: RTVE.es live streams
- **rtve.es:television**
- **RTVS**
- **rtvslo.si**
- **rtvslo.si:show**
- **RudoVideo**
@ -1305,8 +1306,8 @@ The only reliable way to check if a site is supported is to try it.
- **sejm**
- **Sen**
- **SenalColombiaLive**: (**Currently broken**)
- **SenateGov**
- **SenateISVP**
- **senate.gov**
- **senate.gov:isvp**
- **SendtoNews**: (**Currently broken**)
- **Servus**
- **Sexu**: (**Currently broken**)
@ -1342,6 +1343,7 @@ The only reliable way to check if a site is supported is to try it.
- **Smotrim**
- **SnapchatSpotlight**
- **Snotr**
- **SoftWhiteUnderbelly**: [*softwhiteunderbelly*](## "netrc machine")
- **Sohu**
- **SohuV**
- **SonyLIV**: [*sonyliv*](## "netrc machine")
@ -1398,12 +1400,14 @@ The only reliable way to check if a site is supported is to try it.
- **StoryFire**
- **StoryFireSeries**
- **StoryFireUser**
- **Streaks**
- **Streamable**
- **StreamCZ**
- **StreetVoice**
- **StretchInternet**
- **Stripchat**
- **stv:player**
- **stvr**: Slovak Television and Radio (formerly RTVS)
- **Subsplash**
- **subsplash:playlist**
- **Substack**
@ -1536,6 +1540,8 @@ The only reliable way to check if a site is supported is to try it.
- **tv5unis**
- **tv5unis:video**
- **tv8.it**
- **tv8.it:live**: TV8 Live
- **tv8.it:playlist**: TV8 Playlist
- **TVANouvelles**
- **TVANouvellesArticle**
- **tvaplus**: TVA+
@ -1556,6 +1562,7 @@ The only reliable way to check if a site is supported is to try it.
- **tvp:vod:series**
- **TVPlayer**
- **TVPlayHome**
- **Tvw**
- **Tweakers**
- **TwitCasting**
- **TwitCastingLive**
@ -1637,8 +1644,6 @@ The only reliable way to check if a site is supported is to try it.
- **viewlift**
- **viewlift:embed**
- **Viidea**
- **viki**: [*viki*](## "netrc machine")
- **viki:channel**: [*viki*](## "netrc machine")
- **vimeo**: [*vimeo*](## "netrc machine")
- **vimeo:album**: [*vimeo*](## "netrc machine")
- **vimeo:channel**: [*vimeo*](## "netrc machine")
@ -1676,8 +1681,12 @@ The only reliable way to check if a site is supported is to try it.
- **vpro**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **vqq:series**
- **vqq:video**
- **vrsquare**: VR SQUARE
- **vrsquare:channel**
- **vrsquare:search**
- **vrsquare:section**
- **VRT**: VRT NWS, Flanders News, Flandern Info and Sporza
- **VrtNU**: [*vrtnu*](## "netrc machine") VRT MAX
- **vrtmax**: [*vrtnu*](## "netrc machine") VRT MAX (formerly VRT NU)
- **VTM**: (**Currently broken**)
- **VTV**
- **VTVGo**

View File

@ -101,87 +101,109 @@ def getwebpagetestcases():
md5 = lambda s: hashlib.md5(s.encode()).hexdigest()
def expect_value(self, got, expected, field):
if isinstance(expected, str) and expected.startswith('re:'):
match_str = expected[len('re:'):]
match_rex = re.compile(match_str)
def _iter_differences(got, expected, field):
if isinstance(expected, str):
op, _, val = expected.partition(':')
if op in ('mincount', 'maxcount', 'count'):
if not isinstance(got, (list, dict)):
yield field, f'expected either {list.__name__} or {dict.__name__}, got {type(got).__name__}'
return
self.assertTrue(
isinstance(got, str),
f'Expected a {str.__name__} object, but got {type(got).__name__} for field {field}')
self.assertTrue(
match_rex.match(got),
f'field {field} (value: {got!r}) should match {match_str!r}')
elif isinstance(expected, str) and expected.startswith('startswith:'):
start_str = expected[len('startswith:'):]
self.assertTrue(
isinstance(got, str),
f'Expected a {str.__name__} object, but got {type(got).__name__} for field {field}')
self.assertTrue(
got.startswith(start_str),
f'field {field} (value: {got!r}) should start with {start_str!r}')
elif isinstance(expected, str) and expected.startswith('contains:'):
contains_str = expected[len('contains:'):]
self.assertTrue(
isinstance(got, str),
f'Expected a {str.__name__} object, but got {type(got).__name__} for field {field}')
self.assertTrue(
contains_str in got,
f'field {field} (value: {got!r}) should contain {contains_str!r}')
elif isinstance(expected, type):
self.assertTrue(
isinstance(got, expected),
f'Expected type {expected!r} for field {field}, but got value {got!r} of type {type(got)!r}')
elif isinstance(expected, dict) and isinstance(got, dict):
expect_dict(self, got, expected)
elif isinstance(expected, list) and isinstance(got, list):
self.assertEqual(
len(expected), len(got),
f'Expect a list of length {len(expected)}, but got a list of length {len(got)} for field {field}')
for index, (item_got, item_expected) in enumerate(zip(got, expected)):
type_got = type(item_got)
type_expected = type(item_expected)
self.assertEqual(
type_expected, type_got,
f'Type mismatch for list item at index {index} for field {field}, '
f'expected {type_expected!r}, got {type_got!r}')
expect_value(self, item_got, item_expected, field)
else:
if isinstance(expected, str) and expected.startswith('md5:'):
self.assertTrue(
isinstance(got, str),
f'Expected field {field} to be a unicode object, but got value {got!r} of type {type(got)!r}')
got = 'md5:' + md5(got)
elif isinstance(expected, str) and re.match(r'^(?:min|max)?count:\d+', expected):
self.assertTrue(
isinstance(got, (list, dict)),
f'Expected field {field} to be a list or a dict, but it is of type {type(got).__name__}')
op, _, expected_num = expected.partition(':')
expected_num = int(expected_num)
expected_num = int(val)
got_num = len(got)
if op == 'mincount':
assert_func = assertGreaterEqual
msg_tmpl = 'Expected %d items in field %s, but only got %d'
elif op == 'maxcount':
assert_func = assertLessEqual
msg_tmpl = 'Expected maximum %d items in field %s, but got %d'
elif op == 'count':
assert_func = assertEqual
msg_tmpl = 'Expected exactly %d items in field %s, but got %d'
else:
assert False
assert_func(
self, len(got), expected_num,
msg_tmpl % (expected_num, field, len(got)))
if got_num < expected_num:
yield field, f'expected at least {val} items, got {got_num}'
return
if op == 'maxcount':
if got_num > expected_num:
yield field, f'expected at most {val} items, got {got_num}'
return
assert op == 'count'
if got_num != expected_num:
yield field, f'expected exactly {val} items, got {got_num}'
return
self.assertEqual(
expected, got,
f'Invalid value for field {field}, expected {expected!r}, got {got!r}')
if not isinstance(got, str):
yield field, f'expected {str.__name__}, got {type(got).__name__}'
return
if op == 're':
if not re.match(val, got):
yield field, f'should match {val!r}, got {got!r}'
return
if op == 'startswith':
if not val.startswith(got):
yield field, f'should start with {val!r}, got {got!r}'
return
if op == 'contains':
if not val.startswith(got):
yield field, f'should contain {val!r}, got {got!r}'
return
if op == 'md5':
hash_val = md5(got)
if hash_val != val:
yield field, f'expected hash {val}, got {hash_val}'
return
if got != expected:
yield field, f'expected {expected!r}, got {got!r}'
return
if isinstance(expected, dict) and isinstance(got, dict):
for key, expected_val in expected.items():
if key not in got:
yield field, f'missing key: {key!r}'
continue
field_name = key if field is None else f'{field}.{key}'
yield from _iter_differences(got[key], expected_val, field_name)
return
if isinstance(expected, type):
if not isinstance(got, expected):
yield field, f'expected {expected.__name__}, got {type(got).__name__}'
return
if isinstance(expected, list) and isinstance(got, list):
# TODO: clever diffing algorithm lmao
if len(expected) != len(got):
yield field, f'expected length of {len(expected)}, got {len(got)}'
return
for index, (got_val, expected_val) in enumerate(zip(got, expected)):
field_name = str(index) if field is None else f'{field}.{index}'
yield from _iter_differences(got_val, expected_val, field_name)
return
if got != expected:
yield field, f'expected {expected!r}, got {got!r}'
def _expect_value(message, got, expected, field):
mismatches = list(_iter_differences(got, expected, field))
if not mismatches:
return
fields = [field for field, _ in mismatches if field is not None]
return ''.join((
message, f' ({", ".join(fields)})' if fields else '',
*(f'\n\t{field}: {message}' for field, message in mismatches)))
def expect_value(self, got, expected, field):
if message := _expect_value('values differ', got, expected, field):
self.fail(message)
def expect_dict(self, got_dict, expected_dict):
for info_field, expected in expected_dict.items():
got = got_dict.get(info_field)
expect_value(self, got, expected, info_field)
if message := _expect_value('dictionaries differ', got_dict, expected_dict, None):
self.fail(message)
def sanitize_got_info_dict(got_dict):

View File

@ -638,6 +638,7 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
'img_bipbop_adv_example_fmp4',
'https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/master.m3u8',
[{
# 60kbps (bitrate not provided in m3u8); sorted as worst because it's grouped with lowest bitrate video track
'format_id': 'aud1-English',
'url': 'https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/a1/prog_index.m3u8',
'manifest_url': 'https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/master.m3u8',
@ -645,15 +646,9 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
'ext': 'mp4',
'protocol': 'm3u8_native',
'audio_ext': 'mp4',
'source_preference': 0,
}, {
'format_id': 'aud2-English',
'url': 'https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/a2/prog_index.m3u8',
'manifest_url': 'https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/master.m3u8',
'language': 'en',
'ext': 'mp4',
'protocol': 'm3u8_native',
'audio_ext': 'mp4',
}, {
# 192kbps (bitrate not provided in m3u8)
'format_id': 'aud3-English',
'url': 'https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/a3/prog_index.m3u8',
'manifest_url': 'https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/master.m3u8',
@ -661,6 +656,17 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
'ext': 'mp4',
'protocol': 'm3u8_native',
'audio_ext': 'mp4',
'source_preference': 1,
}, {
# 384kbps (bitrate not provided in m3u8); sorted as best because it's grouped with the highest bitrate video track
'format_id': 'aud2-English',
'url': 'https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/a2/prog_index.m3u8',
'manifest_url': 'https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/master.m3u8',
'language': 'en',
'ext': 'mp4',
'protocol': 'm3u8_native',
'audio_ext': 'mp4',
'source_preference': 2,
}, {
'format_id': '530',
'url': 'https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/v2/prog_index.m3u8',

View File

@ -6,6 +6,8 @@ import sys
import unittest
from unittest.mock import patch
from yt_dlp.globals import all_plugins_loaded
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
@ -1427,6 +1429,12 @@ class TestYoutubeDL(unittest.TestCase):
self.assertFalse(result.get('cookies'), msg='Cookies set in cookies field for wrong domain')
self.assertFalse(ydl.cookiejar.get_cookie_header(fmt['url']), msg='Cookies set in cookiejar for wrong domain')
def test_load_plugins_compat(self):
# Should try to reload plugins if they haven't already been loaded
all_plugins_loaded.value = False
FakeYDL().close()
assert all_plugins_loaded.value
if __name__ == '__main__':
unittest.main()

View File

@ -331,10 +331,6 @@ class TestHTTPConnectProxy:
assert proxy_info['proxy'] == server_address
assert 'Proxy-Authorization' in proxy_info['headers']
@pytest.mark.skip_handler(
'Requests',
'bug in urllib3 causes unclosed socket: https://github.com/urllib3/urllib3/issues/3374',
)
def test_http_connect_bad_auth(self, handler, ctx):
with ctx.http_server(HTTPConnectProxyHandler, username='test', password='test') as server_address:
with handler(verify=False, proxies={ctx.REQUEST_PROTO: f'http://test:bad@{server_address}'}) as rh:

View File

@ -118,6 +118,7 @@ class TestJSInterpreter(unittest.TestCase):
self._test('function f(){var x = 20; x = 30 + 1; return x;}', 31)
self._test('function f(){var x = 20; x += 30 + 1; return x;}', 51)
self._test('function f(){var x = 20; x -= 30 + 1; return x;}', -11)
self._test('function f(){var x = 2; var y = ["a", "b"]; y[x%y["length"]]="z"; return y}', ['z', 'b'])
@unittest.skip('Not implemented')
def test_comments(self):
@ -384,7 +385,7 @@ class TestJSInterpreter(unittest.TestCase):
@unittest.skip('Not implemented')
def test_packed(self):
jsi = JSInterpreter('''function f(p,a,c,k,e,d){while(c--)if(k[c])p=p.replace(new RegExp('\\b'+c.toString(a)+'\\b','g'),k[c]);return p}''')
self.assertEqual(jsi.call_function('f', '''h 7=g("1j");7.7h({7g:[{33:"w://7f-7e-7d-7c.v.7b/7a/79/78/77/76.74?t=73&s=2s&e=72&f=2t&71=70.0.0.1&6z=6y&6x=6w"}],6v:"w://32.v.u/6u.31",16:"r%",15:"r%",6t:"6s",6r:"",6q:"l",6p:"l",6o:"6n",6m:\'6l\',6k:"6j",9:[{33:"/2u?b=6i&n=50&6h=w://32.v.u/6g.31",6f:"6e"}],1y:{6d:1,6c:\'#6b\',6a:\'#69\',68:"67",66:30,65:r,},"64":{63:"%62 2m%m%61%5z%5y%5x.u%5w%5v%5u.2y%22 2k%m%1o%22 5t%m%1o%22 5s%m%1o%22 2j%m%5r%22 16%m%5q%22 15%m%5p%22 5o%2z%5n%5m%2z",5l:"w://v.u/d/1k/5k.2y",5j:[]},\'5i\':{"5h":"5g"},5f:"5e",5d:"w://v.u",5c:{},5b:l,1x:[0.25,0.50,0.75,1,1.25,1.5,2]});h 1m,1n,5a;h 59=0,58=0;h 7=g("1j");h 2x=0,57=0,56=0;$.55({54:{\'53-52\':\'2i-51\'}});7.j(\'4z\',6(x){c(5>0&&x.1l>=5&&1n!=1){1n=1;$(\'q.4y\').4x(\'4w\')}});7.j(\'13\',6(x){2x=x.1l});7.j(\'2g\',6(x){2w(x)});7.j(\'4v\',6(){$(\'q.2v\').4u()});6 2w(x){$(\'q.2v\').4t();c(1m)19;1m=1;17=0;c(4s.4r===l){17=1}$.4q(\'/2u?b=4p&2l=1k&4o=2t-4n-4m-2s-4l&4k=&4j=&4i=&17=\'+17,6(2r){$(\'#4h\').4g(2r)});$(\'.3-8-4f-4e:4d("4c")\').2h(6(e){2q();g().4b(0);g().4a(l)});6 2q(){h $14=$("<q />").2p({1l:"49",16:"r%",15:"r%",48:0,2n:0,2o:47,46:"45(10%, 10%, 10%, 0.4)","44-43":"42"});$("<41 />").2p({16:"60%",15:"60%",2o:40,"3z-2n":"3y"}).3x({\'2m\':\'/?b=3w&2l=1k\',\'2k\':\'0\',\'2j\':\'2i\'}).2f($14);$14.2h(6(){$(3v).3u();g().2g()});$14.2f($(\'#1j\'))}g().13(0);}6 3t(){h 9=7.1b(2e);2d.2c(9);c(9.n>1){1r(i=0;i<9.n;i++){c(9[i].1a==2e){2d.2c(\'!!=\'+i);7.1p(i)}}}}7.j(\'3s\',6(){g().1h("/2a/3r.29","3q 10 28",6(){g().13(g().27()+10)},"2b");$("q[26=2b]").23().21(\'.3-20-1z\');g().1h("/2a/3p.29","3o 10 28",6(){h 12=g().27()-10;c(12<0)12=0;g().13(12)},"24");$("q[26=24]").23().21(\'.3-20-1z\');});6 1i(){}7.j(\'3n\',6(){1i()});7.j(\'3m\',6(){1i()});7.j("k",6(y){h 9=7.1b();c(9.n<2)19;$(\'.3-8-3l-3k\').3j(6(){$(\'#3-8-a-k\').1e(\'3-8-a-z\');$(\'.3-a-k\').p(\'o-1f\',\'11\')});7.1h("/3i/3h.3g","3f 3e",6(){$(\'.3-1w\').3d(\'3-8-1v\');$(\'.3-8-1y, .3-8-1x\').p(\'o-1g\',\'11\');c($(\'.3-1w\').3c(\'3-8-1v\')){$(\'.3-a-k\').p(\'o-1g\',\'l\');$(\'.3-a-k\').p(\'o-1f\',\'l\');$(\'.3-8-a\').1e(\'3-8-a-z\');$(\'.3-8-a:1u\').3b(\'3-8-a-z\')}3a{$(\'.3-a-k\').p(\'o-1g\',\'11\');$(\'.3-a-k\').p(\'o-1f\',\'11\');$(\'.3-8-a:1u\').1e(\'3-8-a-z\')}},"39");7.j("38",6(y){1d.37(\'1c\',y.9[y.36].1a)});c(1d.1t(\'1c\')){35("1s(1d.1t(\'1c\'));",34)}});h 18;6 1s(1q){h 9=7.1b();c(9.n>1){1r(i=0;i<9.n;i++){c(9[i].1a==1q){c(i==18){19}18=i;7.1p(i)}}}}',36,270,'|||jw|||function|player|settings|tracks|submenu||if||||jwplayer|var||on|audioTracks|true|3D|length|aria|attr|div|100|||sx|filemoon|https||event|active||false|tt|seek|dd|height|width|adb|current_audio|return|name|getAudioTracks|default_audio|localStorage|removeClass|expanded|checked|addButton|callMeMaybe|vplayer|0fxcyc2ajhp1|position|vvplay|vvad|220|setCurrentAudioTrack|audio_name|for|audio_set|getItem|last|open|controls|playbackRates|captions|rewind|icon|insertAfter||detach|ff00||button|getPosition|sec|png|player8|ff11|log|console|track_name|appendTo|play|click|no|scrolling|frameborder|file_code|src|top|zIndex|css|showCCform|data|1662367683|383371|dl|video_ad|doPlay|prevt|mp4|3E||jpg|thumbs|file|300|setTimeout|currentTrack|setItem|audioTrackChanged|dualSound|else|addClass|hasClass|toggleClass|Track|Audio|svg|dualy|images|mousedown|buttons|topbar|playAttemptFailed|beforePlay|Rewind|fr|Forward|ff|ready|set_audio_track|remove|this|upload_srt|prop|50px|margin|1000001|iframe|center|align|text|rgba|background|1000000|left|absolute|pause|setCurrentCaptions|Upload|contains|item|content|html|fviews|referer|prem|embed|3e57249ef633e0d03bf76ceb8d8a4b65|216|83|hash|view|get|TokenZir|window|hide|show|complete|slow|fadeIn|video_ad_fadein|time||cache|Cache|Content|headers|ajaxSetup|v2done|tott|vastdone2|vastdone1|vvbefore|playbackRateControls|cast|aboutlink|FileMoon|abouttext|UHD|1870|qualityLabels|sites|GNOME_POWER|link|2Fiframe|3C|allowfullscreen|22360|22640|22no|marginheight|marginwidth|2FGNOME_POWER|2F0fxcyc2ajhp1|2Fe|2Ffilemoon|2F|3A||22https|3Ciframe|code|sharing|fontOpacity|backgroundOpacity|Tahoma|fontFamily|303030|backgroundColor|FFFFFF|color|userFontScale|thumbnails|kind|0fxcyc2ajhp10000|url|get_slides|start|startparam|none|preload|html5|primary|hlshtml|androidhls|duration|uniform|stretching|0fxcyc2ajhp1_xt|image|2048|sp|6871|asn|127|srv|43200|_g3XlBcu2lmD9oDexD2NLWSmah2Nu3XcDrl93m9PwXY|m3u8||master|0fxcyc2ajhp1_x|00076|01|hls2|to|s01|delivery|storage|moon|sources|setup'''.split('|')))
self.assertEqual(jsi.call_function('f', '''h 7=g("1j");7.7h({7g:[{33:"w://7f-7e-7d-7c.v.7b/7a/79/78/77/76.74?t=73&s=2s&e=72&f=2t&71=70.0.0.1&6z=6y&6x=6w"}],6v:"w://32.v.u/6u.31",16:"r%",15:"r%",6t:"6s",6r:"",6q:"l",6p:"l",6o:"6n",6m:\'6l\',6k:"6j",9:[{33:"/2u?b=6i&n=50&6h=w://32.v.u/6g.31",6f:"6e"}],1y:{6d:1,6c:\'#6b\',6a:\'#69\',68:"67",66:30,65:r,},"64":{63:"%62 2m%m%61%5z%5y%5x.u%5w%5v%5u.2y%22 2k%m%1o%22 5t%m%1o%22 5s%m%1o%22 2j%m%5r%22 16%m%5q%22 15%m%5p%22 5o%2z%5n%5m%2z",5l:"w://v.u/d/1k/5k.2y",5j:[]},\'5i\':{"5h":"5g"},5f:"5e",5d:"w://v.u",5c:{},5b:l,1x:[0.25,0.50,0.75,1,1.25,1.5,2]});h 1m,1n,5a;h 59=0,58=0;h 7=g("1j");h 2x=0,57=0,56=0;$.55({54:{\'53-52\':\'2i-51\'}});7.j(\'4z\',6(x){c(5>0&&x.1l>=5&&1n!=1){1n=1;$(\'q.4y\').4x(\'4w\')}});7.j(\'13\',6(x){2x=x.1l});7.j(\'2g\',6(x){2w(x)});7.j(\'4v\',6(){$(\'q.2v\').4u()});6 2w(x){$(\'q.2v\').4t();c(1m)19;1m=1;17=0;c(4s.4r===l){17=1}$.4q(\'/2u?b=4p&2l=1k&4o=2t-4n-4m-2s-4l&4k=&4j=&4i=&17=\'+17,6(2r){$(\'#4h\').4g(2r)});$(\'.3-8-4f-4e:4d("4c")\').2h(6(e){2q();g().4b(0);g().4a(l)});6 2q(){h $14=$("<q />").2p({1l:"49",16:"r%",15:"r%",48:0,2n:0,2o:47,46:"45(10%, 10%, 10%, 0.4)","44-43":"42"});$("<41 />").2p({16:"60%",15:"60%",2o:40,"3z-2n":"3y"}).3x({\'2m\':\'/?b=3w&2l=1k\',\'2k\':\'0\',\'2j\':\'2i\'}).2f($14);$14.2h(6(){$(3v).3u();g().2g()});$14.2f($(\'#1j\'))}g().13(0);}6 3t(){h 9=7.1b(2e);2d.2c(9);c(9.n>1){1r(i=0;i<9.n;i++){c(9[i].1a==2e){2d.2c(\'!!=\'+i);7.1p(i)}}}}7.j(\'3s\',6(){g().1h("/2a/3r.29","3q 10 28",6(){g().13(g().27()+10)},"2b");$("q[26=2b]").23().21(\'.3-20-1z\');g().1h("/2a/3p.29","3o 10 28",6(){h 12=g().27()-10;c(12<0)12=0;g().13(12)},"24");$("q[26=24]").23().21(\'.3-20-1z\');});6 1i(){}7.j(\'3n\',6(){1i()});7.j(\'3m\',6(){1i()});7.j("k",6(y){h 9=7.1b();c(9.n<2)19;$(\'.3-8-3l-3k\').3j(6(){$(\'#3-8-a-k\').1e(\'3-8-a-z\');$(\'.3-a-k\').p(\'o-1f\',\'11\')});7.1h("/3i/3h.3g","3f 3e",6(){$(\'.3-1w\').3d(\'3-8-1v\');$(\'.3-8-1y, .3-8-1x\').p(\'o-1g\',\'11\');c($(\'.3-1w\').3c(\'3-8-1v\')){$(\'.3-a-k\').p(\'o-1g\',\'l\');$(\'.3-a-k\').p(\'o-1f\',\'l\');$(\'.3-8-a\').1e(\'3-8-a-z\');$(\'.3-8-a:1u\').3b(\'3-8-a-z\')}3a{$(\'.3-a-k\').p(\'o-1g\',\'11\');$(\'.3-a-k\').p(\'o-1f\',\'11\');$(\'.3-8-a:1u\').1e(\'3-8-a-z\')}},"39");7.j("38",6(y){1d.37(\'1c\',y.9[y.36].1a)});c(1d.1t(\'1c\')){35("1s(1d.1t(\'1c\'));",34)}});h 18;6 1s(1q){h 9=7.1b();c(9.n>1){1r(i=0;i<9.n;i++){c(9[i].1a==1q){c(i==18){19}18=i;7.1p(i)}}}}',36,270,'|||jw|||function|player|settings|tracks|submenu||if||||jwplayer|var||on|audioTracks|true|3D|length|aria|attr|div|100|||sx|filemoon|https||event|active||false|tt|seek|dd|height|width|adb|current_audio|return|name|getAudioTracks|default_audio|localStorage|removeClass|expanded|checked|addButton|callMeMaybe|vplayer|0fxcyc2ajhp1|position|vvplay|vvad|220|setCurrentAudioTrack|audio_name|for|audio_set|getItem|last|open|controls|playbackRates|captions|rewind|icon|insertAfter||detach|ff00||button|getPosition|sec|png|player8|ff11|log|console|track_name|appendTo|play|click|no|scrolling|frameborder|file_code|src|top|zIndex|css|showCCform|data|1662367683|383371|dl|video_ad|doPlay|prevt|mp4|3E||jpg|thumbs|file|300|setTimeout|currentTrack|setItem|audioTrackChanged|dualSound|else|addClass|hasClass|toggleClass|Track|Audio|svg|dualy|images|mousedown|buttons|topbar|playAttemptFailed|beforePlay|Rewind|fr|Forward|ff|ready|set_audio_track|remove|this|upload_srt|prop|50px|margin|1000001|iframe|center|align|text|rgba|background|1000000|left|absolute|pause|setCurrentCaptions|Upload|contains|item|content|html|fviews|referer|prem|embed|3e57249ef633e0d03bf76ceb8d8a4b65|216|83|hash|view|get|TokenZir|window|hide|show|complete|slow|fadeIn|video_ad_fadein|time||cache|Cache|Content|headers|ajaxSetup|v2done|tott|vastdone2|vastdone1|vvbefore|playbackRateControls|cast|aboutlink|FileMoon|abouttext|UHD|1870|qualityLabels|sites|GNOME_POWER|link|2Fiframe|3C|allowfullscreen|22360|22640|22no|marginheight|marginwidth|2FGNOME_POWER|2F0fxcyc2ajhp1|2Fe|2Ffilemoon|2F|3A||22https|3Ciframe|code|sharing|fontOpacity|backgroundOpacity|Tahoma|fontFamily|303030|backgroundColor|FFFFFF|color|userFontScale|thumbnails|kind|0fxcyc2ajhp10000|url|get_slides|start|startparam|none|preload|html5|primary|hlshtml|androidhls|duration|uniform|stretching|0fxcyc2ajhp1_xt|image|2048|sp|6871|asn|127|srv|43200|_g3XlBcu2lmD9oDexD2NLWSmah2Nu3XcDrl93m9PwXY|m3u8||master|0fxcyc2ajhp1_x|00076|01|hls2|to|s01|delivery|storage|moon|sources|setup'''.split('|'))) # noqa: SIM905
def test_join(self):
test_input = list('test')
@ -403,6 +404,8 @@ class TestJSInterpreter(unittest.TestCase):
test_result = list('test')
tests = [
'function f(a, b){return a.split(b)}',
'function f(a, b){return a["split"](b)}',
'function f(a, b){let x = ["split"]; return a[x[0]](b)}',
'function f(a, b){return String.prototype.split.call(a, b)}',
'function f(a, b){return String.prototype.split.apply(a, [b])}',
]
@ -441,6 +444,9 @@ class TestJSInterpreter(unittest.TestCase):
self._test('function f(){return "012345678".slice(-1, 1)}', '')
self._test('function f(){return "012345678".slice(-3, -1)}', '67')
def test_splice(self):
self._test('function f(){var T = ["0", "1", "2"]; T["splice"](2, 1, "0")[0]; return T }', ['0', '1', '0'])
def test_js_number_to_string(self):
for test, radix, expected in [
(0, None, '0'),
@ -462,6 +468,16 @@ class TestJSInterpreter(unittest.TestCase):
]:
assert js_number_to_string(test, radix) == expected
def test_extract_function(self):
jsi = JSInterpreter('function a(b) { return b + 1; }')
func = jsi.extract_function('a')
self.assertEqual(func([2]), 3)
def test_extract_function_with_global_stack(self):
jsi = JSInterpreter('function c(d) { return d + e + f + g; }')
func = jsi.extract_function('c', {'e': 10}, {'f': 100, 'g': 1000})
self.assertEqual(func([1]), 1111)
if __name__ == '__main__':
unittest.main()

View File

@ -614,7 +614,6 @@ class TestHTTPRequestHandler(TestRequestHandlerBase):
rh, Request(f'http://127.0.0.1:{self.http_port}/source_address')).read().decode()
assert source_address == data
# Not supported by CurlCFFI
@pytest.mark.skip_handler('CurlCFFI', 'not supported by curl-cffi')
def test_gzip_trailing_garbage(self, handler):
with handler() as rh:
@ -720,6 +719,15 @@ class TestHTTPRequestHandler(TestRequestHandlerBase):
rh, Request(
f'http://127.0.0.1:{self.http_port}/headers', proxies={'all': 'http://10.255.255.255'})).close()
@pytest.mark.skip_handlers_if(lambda _, handler: handler not in ['Urllib', 'CurlCFFI'], 'handler does not support keep_header_casing')
def test_keep_header_casing(self, handler):
with handler() as rh:
res = validate_and_send(
rh, Request(
f'http://127.0.0.1:{self.http_port}/headers', headers={'X-test-heaDer': 'test'}, extensions={'keep_header_casing': True})).read().decode()
assert 'X-test-heaDer: test' in res
@pytest.mark.parametrize('handler', ['Urllib', 'Requests', 'CurlCFFI'], indirect=True)
class TestClientCertificate:
@ -1289,6 +1297,7 @@ class TestRequestHandlerValidation:
({'legacy_ssl': False}, False),
({'legacy_ssl': True}, False),
({'legacy_ssl': 'notabool'}, AssertionError),
({'keep_header_casing': True}, UnsupportedRequest),
]),
('Requests', 'http', [
({'cookiejar': 'notacookiejar'}, AssertionError),
@ -1299,6 +1308,9 @@ class TestRequestHandlerValidation:
({'legacy_ssl': False}, False),
({'legacy_ssl': True}, False),
({'legacy_ssl': 'notabool'}, AssertionError),
({'keep_header_casing': False}, False),
({'keep_header_casing': True}, False),
({'keep_header_casing': 'notabool'}, AssertionError),
]),
('CurlCFFI', 'http', [
({'cookiejar': 'notacookiejar'}, AssertionError),

View File

@ -10,22 +10,71 @@ TEST_DATA_DIR = Path(os.path.dirname(os.path.abspath(__file__)), 'testdata')
sys.path.append(str(TEST_DATA_DIR))
importlib.invalidate_caches()
from yt_dlp.utils import Config
from yt_dlp.plugins import PACKAGE_NAME, directories, load_plugins
from yt_dlp.plugins import (
PACKAGE_NAME,
PluginSpec,
directories,
load_plugins,
load_all_plugins,
register_plugin_spec,
)
from yt_dlp.globals import (
extractors,
postprocessors,
plugin_dirs,
plugin_ies,
plugin_pps,
all_plugins_loaded,
plugin_specs,
)
EXTRACTOR_PLUGIN_SPEC = PluginSpec(
module_name='extractor',
suffix='IE',
destination=extractors,
plugin_destination=plugin_ies,
)
POSTPROCESSOR_PLUGIN_SPEC = PluginSpec(
module_name='postprocessor',
suffix='PP',
destination=postprocessors,
plugin_destination=plugin_pps,
)
def reset_plugins():
plugin_ies.value = {}
plugin_pps.value = {}
plugin_dirs.value = ['default']
plugin_specs.value = {}
all_plugins_loaded.value = False
# Clearing override plugins is probably difficult
for module_name in tuple(sys.modules):
for plugin_type in ('extractor', 'postprocessor'):
if module_name.startswith(f'{PACKAGE_NAME}.{plugin_type}.'):
del sys.modules[module_name]
importlib.invalidate_caches()
class TestPlugins(unittest.TestCase):
TEST_PLUGIN_DIR = TEST_DATA_DIR / PACKAGE_NAME
def setUp(self):
reset_plugins()
def tearDown(self):
reset_plugins()
def test_directories_containing_plugins(self):
self.assertIn(self.TEST_PLUGIN_DIR, map(Path, directories()))
def test_extractor_classes(self):
for module_name in tuple(sys.modules):
if module_name.startswith(f'{PACKAGE_NAME}.extractor'):
del sys.modules[module_name]
plugins_ie = load_plugins('extractor', 'IE')
plugins_ie = load_plugins(EXTRACTOR_PLUGIN_SPEC)
self.assertIn(f'{PACKAGE_NAME}.extractor.normal', sys.modules.keys())
self.assertIn('NormalPluginIE', plugins_ie.keys())
@ -35,17 +84,29 @@ class TestPlugins(unittest.TestCase):
f'{PACKAGE_NAME}.extractor._ignore' in sys.modules,
'loaded module beginning with underscore')
self.assertNotIn('IgnorePluginIE', plugins_ie.keys())
self.assertNotIn('IgnorePluginIE', plugin_ies.value)
# Don't load extractors with underscore prefix
self.assertNotIn('_IgnoreUnderscorePluginIE', plugins_ie.keys())
self.assertNotIn('_IgnoreUnderscorePluginIE', plugin_ies.value)
# Don't load extractors not specified in __all__ (if supplied)
self.assertNotIn('IgnoreNotInAllPluginIE', plugins_ie.keys())
self.assertNotIn('IgnoreNotInAllPluginIE', plugin_ies.value)
self.assertIn('InAllPluginIE', plugins_ie.keys())
self.assertIn('InAllPluginIE', plugin_ies.value)
# Don't load override extractors
self.assertNotIn('OverrideGenericIE', plugins_ie.keys())
self.assertNotIn('OverrideGenericIE', plugin_ies.value)
self.assertNotIn('_UnderscoreOverrideGenericIE', plugins_ie.keys())
self.assertNotIn('_UnderscoreOverrideGenericIE', plugin_ies.value)
def test_postprocessor_classes(self):
plugins_pp = load_plugins('postprocessor', 'PP')
plugins_pp = load_plugins(POSTPROCESSOR_PLUGIN_SPEC)
self.assertIn('NormalPluginPP', plugins_pp.keys())
self.assertIn(f'{PACKAGE_NAME}.postprocessor.normal', sys.modules.keys())
self.assertIn('NormalPluginPP', plugin_pps.value)
def test_importing_zipped_module(self):
zip_path = TEST_DATA_DIR / 'zipped_plugins.zip'
@ -58,10 +119,10 @@ class TestPlugins(unittest.TestCase):
package = importlib.import_module(f'{PACKAGE_NAME}.{plugin_type}')
self.assertIn(zip_path / PACKAGE_NAME / plugin_type, map(Path, package.__path__))
plugins_ie = load_plugins('extractor', 'IE')
plugins_ie = load_plugins(EXTRACTOR_PLUGIN_SPEC)
self.assertIn('ZippedPluginIE', plugins_ie.keys())
plugins_pp = load_plugins('postprocessor', 'PP')
plugins_pp = load_plugins(POSTPROCESSOR_PLUGIN_SPEC)
self.assertIn('ZippedPluginPP', plugins_pp.keys())
finally:
@ -69,23 +130,116 @@ class TestPlugins(unittest.TestCase):
os.remove(zip_path)
importlib.invalidate_caches() # reset the import caches
def test_plugin_dirs(self):
# Internal plugin dirs hack for CLI --plugin-dirs
# To be replaced with proper system later
custom_plugin_dir = TEST_DATA_DIR / 'plugin_packages'
Config._plugin_dirs = [str(custom_plugin_dir)]
importlib.invalidate_caches() # reset the import caches
def test_reloading_plugins(self):
reload_plugins_path = TEST_DATA_DIR / 'reload_plugins'
load_plugins(EXTRACTOR_PLUGIN_SPEC)
load_plugins(POSTPROCESSOR_PLUGIN_SPEC)
# Remove default folder and add reload_plugin path
sys.path.remove(str(TEST_DATA_DIR))
sys.path.append(str(reload_plugins_path))
importlib.invalidate_caches()
try:
package = importlib.import_module(f'{PACKAGE_NAME}.extractor')
self.assertIn(custom_plugin_dir / 'testpackage' / PACKAGE_NAME / 'extractor', map(Path, package.__path__))
for plugin_type in ('extractor', 'postprocessor'):
package = importlib.import_module(f'{PACKAGE_NAME}.{plugin_type}')
self.assertIn(reload_plugins_path / PACKAGE_NAME / plugin_type, map(Path, package.__path__))
plugins_ie = load_plugins('extractor', 'IE')
self.assertIn('PackagePluginIE', plugins_ie.keys())
plugins_ie = load_plugins(EXTRACTOR_PLUGIN_SPEC)
self.assertIn('NormalPluginIE', plugins_ie.keys())
self.assertTrue(
plugins_ie['NormalPluginIE'].REPLACED,
msg='Reloading has not replaced original extractor plugin')
self.assertTrue(
extractors.value['NormalPluginIE'].REPLACED,
msg='Reloading has not replaced original extractor plugin globally')
plugins_pp = load_plugins(POSTPROCESSOR_PLUGIN_SPEC)
self.assertIn('NormalPluginPP', plugins_pp.keys())
self.assertTrue(plugins_pp['NormalPluginPP'].REPLACED,
msg='Reloading has not replaced original postprocessor plugin')
self.assertTrue(
postprocessors.value['NormalPluginPP'].REPLACED,
msg='Reloading has not replaced original postprocessor plugin globally')
finally:
Config._plugin_dirs = []
importlib.invalidate_caches() # reset the import caches
sys.path.remove(str(reload_plugins_path))
sys.path.append(str(TEST_DATA_DIR))
importlib.invalidate_caches()
def test_extractor_override_plugin(self):
load_plugins(EXTRACTOR_PLUGIN_SPEC)
from yt_dlp.extractor.generic import GenericIE
self.assertEqual(GenericIE.TEST_FIELD, 'override')
self.assertEqual(GenericIE.SECONDARY_TEST_FIELD, 'underscore-override')
self.assertEqual(GenericIE.IE_NAME, 'generic+override+underscore-override')
importlib.invalidate_caches()
# test that loading a second time doesn't wrap a second time
load_plugins(EXTRACTOR_PLUGIN_SPEC)
from yt_dlp.extractor.generic import GenericIE
self.assertEqual(GenericIE.IE_NAME, 'generic+override+underscore-override')
def test_load_all_plugin_types(self):
# no plugin specs registered
load_all_plugins()
self.assertNotIn(f'{PACKAGE_NAME}.extractor.normal', sys.modules.keys())
self.assertNotIn(f'{PACKAGE_NAME}.postprocessor.normal', sys.modules.keys())
register_plugin_spec(EXTRACTOR_PLUGIN_SPEC)
register_plugin_spec(POSTPROCESSOR_PLUGIN_SPEC)
load_all_plugins()
self.assertTrue(all_plugins_loaded.value)
self.assertIn(f'{PACKAGE_NAME}.extractor.normal', sys.modules.keys())
self.assertIn(f'{PACKAGE_NAME}.postprocessor.normal', sys.modules.keys())
def test_no_plugin_dirs(self):
register_plugin_spec(EXTRACTOR_PLUGIN_SPEC)
register_plugin_spec(POSTPROCESSOR_PLUGIN_SPEC)
plugin_dirs.value = []
load_all_plugins()
self.assertNotIn(f'{PACKAGE_NAME}.extractor.normal', sys.modules.keys())
self.assertNotIn(f'{PACKAGE_NAME}.postprocessor.normal', sys.modules.keys())
def test_set_plugin_dirs(self):
custom_plugin_dir = str(TEST_DATA_DIR / 'plugin_packages')
plugin_dirs.value = [custom_plugin_dir]
load_plugins(EXTRACTOR_PLUGIN_SPEC)
self.assertIn(f'{PACKAGE_NAME}.extractor.package', sys.modules.keys())
self.assertIn('PackagePluginIE', plugin_ies.value)
def test_invalid_plugin_dir(self):
plugin_dirs.value = ['invalid_dir']
with self.assertRaises(ValueError):
load_plugins(EXTRACTOR_PLUGIN_SPEC)
def test_append_plugin_dirs(self):
custom_plugin_dir = str(TEST_DATA_DIR / 'plugin_packages')
self.assertEqual(plugin_dirs.value, ['default'])
plugin_dirs.value.append(custom_plugin_dir)
self.assertEqual(plugin_dirs.value, ['default', custom_plugin_dir])
load_plugins(EXTRACTOR_PLUGIN_SPEC)
self.assertIn(f'{PACKAGE_NAME}.extractor.package', sys.modules.keys())
self.assertIn('PackagePluginIE', plugin_ies.value)
def test_get_plugin_spec(self):
register_plugin_spec(EXTRACTOR_PLUGIN_SPEC)
register_plugin_spec(POSTPROCESSOR_PLUGIN_SPEC)
self.assertEqual(plugin_specs.value.get('extractor'), EXTRACTOR_PLUGIN_SPEC)
self.assertEqual(plugin_specs.value.get('postprocessor'), POSTPROCESSOR_PLUGIN_SPEC)
self.assertIsNone(plugin_specs.value.get('invalid'))
if __name__ == '__main__':

View File

@ -23,7 +23,6 @@ from yt_dlp.extractor import (
TedTalkIE,
ThePlatformFeedIE,
ThePlatformIE,
VikiIE,
VimeoIE,
WallaIE,
YoutubeIE,
@ -331,20 +330,6 @@ class TestRaiPlaySubtitles(BaseTestSubtitles):
self.assertEqual(md5(subtitles['it']), '4b3264186fbb103508abe5311cfcb9cd')
@is_download_test
@unittest.skip('IE broken - DRM only')
class TestVikiSubtitles(BaseTestSubtitles):
url = 'http://www.viki.com/videos/1060846v-punch-episode-18'
IE = VikiIE
def test_allsubtitles(self):
self.DL.params['writesubtitles'] = True
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), {'en'})
self.assertEqual(md5(subtitles['en']), '53cb083a5914b2d84ef1ab67b880d18a')
@is_download_test
class TestThePlatformSubtitles(BaseTestSubtitles):
# from http://www.3playmedia.com/services-features/tools/integrations/theplatform/

View File

@ -3,19 +3,20 @@
# Allow direct execution
import os
import sys
import unittest
import unittest.mock
import warnings
import datetime as dt
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import contextlib
import datetime as dt
import io
import itertools
import json
import pickle
import subprocess
import unittest
import unittest.mock
import warnings
import xml.etree.ElementTree
from yt_dlp.compat import (
@ -218,11 +219,8 @@ class TestUtil(unittest.TestCase):
self.assertEqual(sanitize_filename('_BD_eEpuzXw', is_id=True), '_BD_eEpuzXw')
self.assertEqual(sanitize_filename('N0Y__7-UOdI', is_id=True), 'N0Y__7-UOdI')
@unittest.mock.patch('sys.platform', 'win32')
def test_sanitize_path(self):
with unittest.mock.patch('sys.platform', 'win32'):
self._test_sanitize_path()
def _test_sanitize_path(self):
self.assertEqual(sanitize_path('abc'), 'abc')
self.assertEqual(sanitize_path('abc/def'), 'abc\\def')
self.assertEqual(sanitize_path('abc\\def'), 'abc\\def')
@ -253,10 +251,8 @@ class TestUtil(unittest.TestCase):
# Check with nt._path_normpath if available
try:
import nt
nt_path_normpath = getattr(nt, '_path_normpath', None)
except Exception:
from nt import _path_normpath as nt_path_normpath
except ImportError:
nt_path_normpath = None
for test, expected in [
@ -663,6 +659,8 @@ class TestUtil(unittest.TestCase):
self.assertEqual(url_or_none('mms://foo.de'), 'mms://foo.de')
self.assertEqual(url_or_none('rtspu://foo.de'), 'rtspu://foo.de')
self.assertEqual(url_or_none('ftps://foo.de'), 'ftps://foo.de')
self.assertEqual(url_or_none('ws://foo.de'), 'ws://foo.de')
self.assertEqual(url_or_none('wss://foo.de'), 'wss://foo.de')
def test_parse_age_limit(self):
self.assertEqual(parse_age_limit(None), None)
@ -1264,6 +1262,7 @@ class TestUtil(unittest.TestCase):
def test_js_to_json_malformed(self):
self.assertEqual(js_to_json('42a1'), '42"a1"')
self.assertEqual(js_to_json('42a-1'), '42"a"-1')
self.assertEqual(js_to_json('{a: `${e("")}`}'), '{"a": "\\"e\\"(\\"\\")"}')
def test_js_to_json_template_literal(self):
self.assertEqual(js_to_json('`Hello ${name}`', {'name': '"world"'}), '"Hello world"')
@ -2087,21 +2086,26 @@ Line 1
headers = HTTPHeaderDict()
headers['ytdl-test'] = b'0'
self.assertEqual(list(headers.items()), [('Ytdl-Test', '0')])
self.assertEqual(list(headers.sensitive().items()), [('ytdl-test', '0')])
headers['ytdl-test'] = 1
self.assertEqual(list(headers.items()), [('Ytdl-Test', '1')])
self.assertEqual(list(headers.sensitive().items()), [('ytdl-test', '1')])
headers['Ytdl-test'] = '2'
self.assertEqual(list(headers.items()), [('Ytdl-Test', '2')])
self.assertEqual(list(headers.sensitive().items()), [('Ytdl-test', '2')])
self.assertTrue('ytDl-Test' in headers)
self.assertEqual(str(headers), str(dict(headers)))
self.assertEqual(repr(headers), str(dict(headers)))
headers.update({'X-dlp': 'data'})
self.assertEqual(set(headers.items()), {('Ytdl-Test', '2'), ('X-Dlp', 'data')})
self.assertEqual(set(headers.sensitive().items()), {('Ytdl-test', '2'), ('X-dlp', 'data')})
self.assertEqual(dict(headers), {'Ytdl-Test': '2', 'X-Dlp': 'data'})
self.assertEqual(len(headers), 2)
self.assertEqual(headers.copy(), headers)
headers2 = HTTPHeaderDict({'X-dlp': 'data3'}, **headers, **{'X-dlp': 'data2'})
headers2 = HTTPHeaderDict({'X-dlp': 'data3'}, headers, **{'X-dlP': 'data2'})
self.assertEqual(set(headers2.items()), {('Ytdl-Test', '2'), ('X-Dlp', 'data2')})
self.assertEqual(set(headers2.sensitive().items()), {('Ytdl-test', '2'), ('X-dlP', 'data2')})
self.assertEqual(len(headers2), 2)
headers2.clear()
self.assertEqual(len(headers2), 0)
@ -2109,16 +2113,23 @@ Line 1
# ensure we prefer latter headers
headers3 = HTTPHeaderDict({'Ytdl-TeSt': 1}, {'Ytdl-test': 2})
self.assertEqual(set(headers3.items()), {('Ytdl-Test', '2')})
self.assertEqual(set(headers3.sensitive().items()), {('Ytdl-test', '2')})
del headers3['ytdl-tesT']
self.assertEqual(dict(headers3), {})
headers4 = HTTPHeaderDict({'ytdl-test': 'data;'})
self.assertEqual(set(headers4.items()), {('Ytdl-Test', 'data;')})
self.assertEqual(set(headers4.sensitive().items()), {('ytdl-test', 'data;')})
# common mistake: strip whitespace from values
# https://github.com/yt-dlp/yt-dlp/issues/8729
headers5 = HTTPHeaderDict({'ytdl-test': ' data; '})
self.assertEqual(set(headers5.items()), {('Ytdl-Test', 'data;')})
self.assertEqual(set(headers5.sensitive().items()), {('ytdl-test', 'data;')})
# test if picklable
headers6 = HTTPHeaderDict(a=1, b=2)
self.assertEqual(pickle.loads(pickle.dumps(headers6)), headers6)
def test_extract_basic_auth(self):
assert extract_basic_auth('http://:foo.bar') == ('http://:foo.bar', None)

View File

@ -44,7 +44,7 @@ def websocket_handler(websocket):
return websocket.send('2')
elif isinstance(message, str):
if message == 'headers':
return websocket.send(json.dumps(dict(websocket.request.headers)))
return websocket.send(json.dumps(dict(websocket.request.headers.raw_items())))
elif message == 'path':
return websocket.send(websocket.request.path)
elif message == 'source_address':
@ -266,18 +266,18 @@ class TestWebsSocketRequestHandlerConformance:
with handler(cookiejar=cookiejar) as rh:
ws = ws_validate_and_send(rh, Request(self.ws_base_url))
ws.send('headers')
assert json.loads(ws.recv())['cookie'] == 'test=ytdlp'
assert HTTPHeaderDict(json.loads(ws.recv()))['cookie'] == 'test=ytdlp'
ws.close()
with handler() as rh:
ws = ws_validate_and_send(rh, Request(self.ws_base_url))
ws.send('headers')
assert 'cookie' not in json.loads(ws.recv())
assert 'cookie' not in HTTPHeaderDict(json.loads(ws.recv()))
ws.close()
ws = ws_validate_and_send(rh, Request(self.ws_base_url, extensions={'cookiejar': cookiejar}))
ws.send('headers')
assert json.loads(ws.recv())['cookie'] == 'test=ytdlp'
assert HTTPHeaderDict(json.loads(ws.recv()))['cookie'] == 'test=ytdlp'
ws.close()
@pytest.mark.skip_handler('Websockets', 'Set-Cookie not supported by websockets')
@ -287,7 +287,7 @@ class TestWebsSocketRequestHandlerConformance:
ws_validate_and_send(rh, Request(f'{self.ws_base_url}/get_cookie', extensions={'cookiejar': YoutubeDLCookieJar()}))
ws = ws_validate_and_send(rh, Request(self.ws_base_url, extensions={'cookiejar': YoutubeDLCookieJar()}))
ws.send('headers')
assert 'cookie' not in json.loads(ws.recv())
assert 'cookie' not in HTTPHeaderDict(json.loads(ws.recv()))
ws.close()
@pytest.mark.skip_handler('Websockets', 'Set-Cookie not supported by websockets')
@ -298,12 +298,12 @@ class TestWebsSocketRequestHandlerConformance:
ws_validate_and_send(rh, Request(f'{self.ws_base_url}/get_cookie'))
ws = ws_validate_and_send(rh, Request(self.ws_base_url))
ws.send('headers')
assert json.loads(ws.recv())['cookie'] == 'test=ytdlp'
assert HTTPHeaderDict(json.loads(ws.recv()))['cookie'] == 'test=ytdlp'
ws.close()
cookiejar.clear_session_cookies()
ws = ws_validate_and_send(rh, Request(self.ws_base_url))
ws.send('headers')
assert 'cookie' not in json.loads(ws.recv())
assert 'cookie' not in HTTPHeaderDict(json.loads(ws.recv()))
ws.close()
def test_source_address(self, handler):
@ -341,6 +341,14 @@ class TestWebsSocketRequestHandlerConformance:
assert headers['test3'] == 'test3'
ws.close()
def test_keep_header_casing(self, handler):
with handler(headers=HTTPHeaderDict({'x-TeSt1': 'test'})) as rh:
ws = ws_validate_and_send(rh, Request(self.ws_base_url, headers={'x-TeSt2': 'test'}, extensions={'keep_header_casing': True}))
ws.send('headers')
headers = json.loads(ws.recv())
assert 'x-TeSt1' in headers
assert 'x-TeSt2' in headers
@pytest.mark.parametrize('client_cert', (
{'client_certificate': os.path.join(MTLS_CERT_DIR, 'clientwithkey.crt')},
{

View File

@ -78,6 +78,61 @@ _SIG_TESTS = [
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'0QJ8wRAIgXmPlOPSBkkUs1bYFYlJCfe29xxAj7v1pDL0QwbdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_EMu-m37KtXJ2OySqa0q',
),
(
'https://www.youtube.com/s/player/643afba4/tv-player-ias.vflset/tv-player-ias.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'AAOAOq0QJ8wRAIgXmPlOPSBkkUs1bYFYlJCfe29xx8j7vgpDL0QwbdV06sCIEzpWqMGkFR20CFOS21Tp-7vj_EMu-m37KtXJoOy1',
),
(
'https://www.youtube.com/s/player/363db69b/player_ias.vflset/en_US/base.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpz2ICs6EVdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
),
(
'https://www.youtube.com/s/player/363db69b/player_ias_tce.vflset/en_US/base.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpz2ICs6EVdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
),
(
'https://www.youtube.com/s/player/4fcd6e4a/player_ias.vflset/en_US/base.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'wAOAOq0QJ8ARAIgXmPlOPSBkkUs1bYFYlJCfe29xx8q7v1pDL0QwbdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_EMu-m37KtXJoOySqa0',
),
(
'https://www.youtube.com/s/player/4fcd6e4a/player_ias_tce.vflset/en_US/base.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'wAOAOq0QJ8ARAIgXmPlOPSBkkUs1bYFYlJCfe29xx8q7v1pDL0QwbdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_EMu-m37KtXJoOySqa0',
),
(
'https://www.youtube.com/s/player/20830619/player_ias.vflset/en_US/base.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'7AOq0QJ8wRAIgXmPlOPSBkkAs1bYFYlJCfe29xx8jOv1pDL0Q2bdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_EMu-m37KtXJoOySqa0qaw',
),
(
'https://www.youtube.com/s/player/20830619/player_ias_tce.vflset/en_US/base.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'7AOq0QJ8wRAIgXmPlOPSBkkAs1bYFYlJCfe29xx8jOv1pDL0Q2bdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_EMu-m37KtXJoOySqa0qaw',
),
(
'https://www.youtube.com/s/player/20830619/player-plasma-ias-phone-en_US.vflset/base.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'7AOq0QJ8wRAIgXmPlOPSBkkAs1bYFYlJCfe29xx8jOv1pDL0Q2bdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_EMu-m37KtXJoOySqa0qaw',
),
(
'https://www.youtube.com/s/player/20830619/player-plasma-ias-tablet-en_US.vflset/base.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'7AOq0QJ8wRAIgXmPlOPSBkkAs1bYFYlJCfe29xx8jOv1pDL0Q2bdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_EMu-m37KtXJoOySqa0qaw',
),
(
'https://www.youtube.com/s/player/8a8ac953/player_ias_tce.vflset/en_US/base.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'IAOAOq0QJ8wRAAgXmPlOPSBkkUs1bYFYlJCfe29xx8j7v1pDL0QwbdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_E2u-m37KtXJoOySqa0',
),
(
'https://www.youtube.com/s/player/8a8ac953/tv-player-es6.vflset/tv-player-es6.js',
'2aq0aqSyOoJXtK73m-uME_jv7-pT15gOFC02RFkGMqWpzEICs69VdbwQ0LDp1v7j8xx92efCJlYFYb1sUkkBSPOlPmXgIARw8JQ0qOAOAA',
'IAOAOq0QJ8wRAAgXmPlOPSBkkUs1bYFYlJCfe29xx8j7v1pDL0QwbdV96sCIEzpWqMGkFR20CFOg51Tp-7vj_E2u-m37KtXJoOySqa0',
),
]
_NSIG_TESTS = [
@ -205,6 +260,62 @@ _NSIG_TESTS = [
'https://www.youtube.com/s/player/9c6dfc4a/player_ias.vflset/en_US/base.js',
'jbu7ylIosQHyJyJV', 'uwI0ESiynAmhNg',
),
(
'https://www.youtube.com/s/player/e7567ecf/player_ias_tce.vflset/en_US/base.js',
'Sy4aDGc0VpYRR9ew_', '5UPOT1VhoZxNLQ',
),
(
'https://www.youtube.com/s/player/d50f54ef/player_ias_tce.vflset/en_US/base.js',
'Ha7507LzRmH3Utygtj', 'XFTb2HoeOE5MHg',
),
(
'https://www.youtube.com/s/player/074a8365/player_ias_tce.vflset/en_US/base.js',
'Ha7507LzRmH3Utygtj', 'ufTsrE0IVYrkl8v',
),
(
'https://www.youtube.com/s/player/643afba4/player_ias.vflset/en_US/base.js',
'N5uAlLqm0eg1GyHO', 'dCBQOejdq5s-ww',
),
(
'https://www.youtube.com/s/player/69f581a5/tv-player-ias.vflset/tv-player-ias.js',
'-qIP447rVlTTwaZjY', 'KNcGOksBAvwqQg',
),
(
'https://www.youtube.com/s/player/643afba4/tv-player-ias.vflset/tv-player-ias.js',
'ir9-V6cdbCiyKxhr', '2PL7ZDYAALMfmA',
),
(
'https://www.youtube.com/s/player/363db69b/player_ias.vflset/en_US/base.js',
'eWYu5d5YeY_4LyEDc', 'XJQqf-N7Xra3gg',
),
(
'https://www.youtube.com/s/player/4fcd6e4a/player_ias.vflset/en_US/base.js',
'o_L251jm8yhZkWtBW', 'lXoxI3XvToqn6A',
),
(
'https://www.youtube.com/s/player/4fcd6e4a/player_ias_tce.vflset/en_US/base.js',
'o_L251jm8yhZkWtBW', 'lXoxI3XvToqn6A',
),
(
'https://www.youtube.com/s/player/20830619/tv-player-ias.vflset/tv-player-ias.js',
'ir9-V6cdbCiyKxhr', '9YE85kNjZiS4',
),
(
'https://www.youtube.com/s/player/20830619/player-plasma-ias-phone-en_US.vflset/base.js',
'ir9-V6cdbCiyKxhr', '9YE85kNjZiS4',
),
(
'https://www.youtube.com/s/player/20830619/player-plasma-ias-tablet-en_US.vflset/base.js',
'ir9-V6cdbCiyKxhr', '9YE85kNjZiS4',
),
(
'https://www.youtube.com/s/player/8a8ac953/player_ias_tce.vflset/en_US/base.js',
'MiBYeXx_vRREbiCCmh', 'RtZYMVvmkE0JE',
),
(
'https://www.youtube.com/s/player/8a8ac953/tv-player-es6.vflset/tv-player-es6.js',
'MiBYeXx_vRREbiCCmh', 'RtZYMVvmkE0JE',
),
]
@ -218,6 +329,8 @@ class TestPlayerInfo(unittest.TestCase):
('https://www.youtube.com/s/player/64dddad9/player-plasma-ias-phone-en_US.vflset/base.js', '64dddad9'),
('https://www.youtube.com/s/player/64dddad9/player-plasma-ias-phone-de_DE.vflset/base.js', '64dddad9'),
('https://www.youtube.com/s/player/64dddad9/player-plasma-ias-tablet-en_US.vflset/base.js', '64dddad9'),
('https://www.youtube.com/s/player/e7567ecf/player_ias_tce.vflset/en_US/base.js', 'e7567ecf'),
('https://www.youtube.com/s/player/643afba4/tv-player-ias.vflset/tv-player-ias.js', '643afba4'),
# obsolete
('https://www.youtube.com/yts/jsbin/player_ias-vfle4-e03/en_US/base.js', 'vfle4-e03'),
('https://www.youtube.com/yts/jsbin/player_ias-vfl49f_g4/en_US/base.js', 'vfl49f_g4'),
@ -250,46 +363,51 @@ def t_factory(name, sig_func, url_pattern):
def make_tfunc(url, sig_input, expected_sig):
m = url_pattern.match(url)
assert m, f'{url!r} should follow URL format'
test_id = m.group('id')
test_id = re.sub(r'[/.-]', '_', m.group('id') or m.group('compat_id'))
def test_func(self):
basename = f'player-{name}-{test_id}.js'
basename = f'player-{test_id}.js'
fn = os.path.join(self.TESTDATA_DIR, basename)
if not os.path.exists(fn):
urllib.request.urlretrieve(url, fn)
with open(fn, encoding='utf-8') as testf:
jscode = testf.read()
self.assertEqual(sig_func(jscode, sig_input), expected_sig)
self.assertEqual(sig_func(jscode, sig_input, url), expected_sig)
test_func.__name__ = f'test_{name}_js_{test_id}'
setattr(TestSignature, test_func.__name__, test_func)
return make_tfunc
def signature(jscode, sig_input):
func = YoutubeIE(FakeYDL())._parse_sig_js(jscode)
def signature(jscode, sig_input, player_url):
func = YoutubeIE(FakeYDL())._parse_sig_js(jscode, player_url)
src_sig = (
str(string.printable[:sig_input])
if isinstance(sig_input, int) else sig_input)
return func(src_sig)
def n_sig(jscode, sig_input):
def n_sig(jscode, sig_input, player_url):
ie = YoutubeIE(FakeYDL())
funcname = ie._extract_n_function_name(jscode)
funcname = ie._extract_n_function_name(jscode, player_url=player_url)
jsi = JSInterpreter(jscode)
func = jsi.extract_function_from_code(*ie._fixup_n_function_code(*jsi.extract_function_code(funcname)))
func = jsi.extract_function_from_code(*ie._fixup_n_function_code(*jsi.extract_function_code(funcname), jscode, player_url))
return func([sig_input])
make_sig_test = t_factory(
'signature', signature, re.compile(r'.*(?:-|/player/)(?P<id>[a-zA-Z0-9_-]+)(?:/.+\.js|(?:/watch_as3|/html5player)?\.[a-z]+)$'))
'signature', signature,
re.compile(r'''(?x)
.+(?:
/player/(?P<id>[a-zA-Z0-9_/.-]+)|
/html5player-(?:en_US-)?(?P<compat_id>[a-zA-Z0-9_-]+)(?:/watch_as3|/html5player)?
)\.js$'''))
for test_spec in _SIG_TESTS:
make_sig_test(*test_spec)
make_nsig_test = t_factory(
'nsig', n_sig, re.compile(r'.+/player/(?P<id>[a-zA-Z0-9_-]+)/.+.js$'))
'nsig', n_sig, re.compile(r'.+/player/(?P<id>[a-zA-Z0-9_/.-]+)\.js$'))
for test_spec in _NSIG_TESTS:
make_nsig_test(*test_spec)

View File

@ -2,4 +2,5 @@ from yt_dlp.extractor.common import InfoExtractor
class PackagePluginIE(InfoExtractor):
_VALID_URL = 'package'
pass

View File

@ -0,0 +1,10 @@
from yt_dlp.extractor.common import InfoExtractor
class NormalPluginIE(InfoExtractor):
_VALID_URL = 'normal'
REPLACED = True
class _IgnoreUnderscorePluginIE(InfoExtractor):
pass

View File

@ -0,0 +1,5 @@
from yt_dlp.postprocessor.common import PostProcessor
class NormalPluginPP(PostProcessor):
REPLACED = True

View File

@ -6,6 +6,7 @@ class IgnoreNotInAllPluginIE(InfoExtractor):
class InAllPluginIE(InfoExtractor):
_VALID_URL = 'inallpluginie'
pass

View File

@ -2,8 +2,10 @@ from yt_dlp.extractor.common import InfoExtractor
class NormalPluginIE(InfoExtractor):
pass
_VALID_URL = 'normalpluginie'
REPLACED = False
class _IgnoreUnderscorePluginIE(InfoExtractor):
_VALID_URL = 'ignoreunderscorepluginie'
pass

View File

@ -0,0 +1,5 @@
from yt_dlp.extractor.generic import GenericIE
class OverrideGenericIE(GenericIE, plugin_name='override'):
TEST_FIELD = 'override'

View File

@ -0,0 +1,5 @@
from yt_dlp.extractor.generic import GenericIE
class _UnderscoreOverrideGenericIE(GenericIE, plugin_name='underscore-override'):
SECONDARY_TEST_FIELD = 'underscore-override'

View File

@ -2,4 +2,4 @@ from yt_dlp.postprocessor.common import PostProcessor
class NormalPluginPP(PostProcessor):
pass
REPLACED = False

View File

@ -2,4 +2,5 @@ from yt_dlp.extractor.common import InfoExtractor
class ZippedPluginIE(InfoExtractor):
_VALID_URL = 'zippedpluginie'
pass

View File

@ -30,9 +30,18 @@ from .compat import urllib_req_to_req
from .cookies import CookieLoadError, LenientSimpleCookie, load_cookies
from .downloader import FFmpegFD, get_suitable_downloader, shorten_protocol_name
from .downloader.rtmp import rtmpdump_version
from .extractor import gen_extractor_classes, get_info_extractor
from .extractor import gen_extractor_classes, get_info_extractor, import_extractors
from .extractor.common import UnsupportedURLIE
from .extractor.openload import PhantomJSwrapper
from .globals import (
IN_CLI,
LAZY_EXTRACTORS,
plugin_ies,
plugin_ies_overrides,
plugin_pps,
all_plugins_loaded,
plugin_dirs,
)
from .minicurses import format_text
from .networking import HEADRequest, Request, RequestDirector
from .networking.common import _REQUEST_HANDLERS, _RH_PREFERENCES
@ -44,8 +53,7 @@ from .networking.exceptions import (
network_exceptions,
)
from .networking.impersonate import ImpersonateRequestHandler
from .plugins import directories as plugin_directories
from .postprocessor import _PLUGIN_CLASSES as plugin_pps
from .plugins import directories as plugin_directories, load_all_plugins
from .postprocessor import (
EmbedThumbnailPP,
FFmpegFixupDuplicateMoovPP,
@ -157,7 +165,7 @@ from .utils import (
write_json_file,
write_string,
)
from .utils._utils import _UnsafeExtensionError, _YDLLogger
from .utils._utils import _UnsafeExtensionError, _YDLLogger, _ProgressState
from .utils.networking import (
HTTPHeaderDict,
clean_headers,
@ -642,13 +650,15 @@ class YoutubeDL:
self.cache = Cache(self)
self.__header_cookies = []
# compat for API: load plugins if they have not already
if not all_plugins_loaded.value:
load_all_plugins()
stdout = sys.stderr if self.params.get('logtostderr') else sys.stdout
self._out_files = Namespace(
out=stdout,
error=sys.stderr,
screen=sys.stderr if self.params.get('quiet') else stdout,
console=None if os.name == 'nt' else next(
filter(supports_terminal_sequences, (sys.stderr, sys.stdout)), None),
)
try:
@ -656,6 +666,9 @@ class YoutubeDL:
except Exception as e:
self.write_debug(f'Failed to enable VT mode: {e}')
# hehe "immutable" namespace
self._out_files.console = next(filter(supports_terminal_sequences, (sys.stderr, sys.stdout)), None)
if self.params.get('no_color'):
if self.params.get('color') is not None:
self.params.setdefault('_warnings', []).append(
@ -956,21 +969,22 @@ class YoutubeDL:
self._write_string(f'{self._bidi_workaround(message)}\n', self._out_files.error, only_once=only_once)
def _send_console_code(self, code):
if os.name == 'nt' or not self._out_files.console:
return
if not supports_terminal_sequences(self._out_files.console):
return False
self._write_string(code, self._out_files.console)
return True
def to_console_title(self, message):
if not self.params.get('consoletitle', False):
def to_console_title(self, message=None, progress_state=None, percent=None):
if not self.params.get('consoletitle'):
return
message = remove_terminal_sequences(message)
if os.name == 'nt':
if ctypes.windll.kernel32.GetConsoleWindow():
# c_wchar_p() might not be necessary if `message` is
# already of type unicode()
ctypes.windll.kernel32.SetConsoleTitleW(ctypes.c_wchar_p(message))
else:
self._send_console_code(f'\033]0;{message}\007')
if message:
success = self._send_console_code(f'\033]0;{remove_terminal_sequences(message)}\007')
if not success and os.name == 'nt' and ctypes.windll.kernel32.GetConsoleWindow():
ctypes.windll.kernel32.SetConsoleTitleW(message)
if isinstance(progress_state, _ProgressState):
self._send_console_code(progress_state.get_ansi_escape(percent))
def save_console_title(self):
if not self.params.get('consoletitle') or self.params.get('simulate'):
@ -984,6 +998,7 @@ class YoutubeDL:
def __enter__(self):
self.save_console_title()
self.to_console_title(progress_state=_ProgressState.INDETERMINATE)
return self
def save_cookies(self):
@ -992,6 +1007,7 @@ class YoutubeDL:
def __exit__(self, *args):
self.restore_console_title()
self.to_console_title(progress_state=_ProgressState.HIDDEN)
self.close()
def close(self):
@ -3993,15 +4009,6 @@ class YoutubeDL:
if not self.params.get('verbose'):
return
from . import _IN_CLI # Must be delayed import
# These imports can be slow. So import them only as needed
from .extractor.extractors import _LAZY_LOADER
from .extractor.extractors import (
_PLUGIN_CLASSES as plugin_ies,
_PLUGIN_OVERRIDES as plugin_ie_overrides,
)
def get_encoding(stream):
ret = str(getattr(stream, 'encoding', f'missing ({type(stream).__name__})'))
additional_info = []
@ -4040,17 +4047,18 @@ class YoutubeDL:
_make_label(ORIGIN, CHANNEL.partition('@')[2] or __version__, __version__),
f'[{RELEASE_GIT_HEAD[:9]}]' if RELEASE_GIT_HEAD else '',
'' if source == 'unknown' else f'({source})',
'' if _IN_CLI else 'API' if klass == YoutubeDL else f'API:{self.__module__}.{klass.__qualname__}',
'' if IN_CLI.value else 'API' if klass == YoutubeDL else f'API:{self.__module__}.{klass.__qualname__}',
delim=' '))
if not _IN_CLI:
if not IN_CLI.value:
write_debug(f'params: {self.params}')
if not _LAZY_LOADER:
if os.environ.get('YTDLP_NO_LAZY_EXTRACTORS'):
write_debug('Lazy loading extractors is forcibly disabled')
else:
write_debug('Lazy loading extractors is disabled')
import_extractors()
lazy_extractors = LAZY_EXTRACTORS.value
if lazy_extractors is None:
write_debug('Lazy loading extractors is disabled')
elif not lazy_extractors:
write_debug('Lazy loading extractors is forcibly disabled')
if self.params['compat_opts']:
write_debug('Compatibility options: {}'.format(', '.join(self.params['compat_opts'])))
@ -4079,24 +4087,27 @@ class YoutubeDL:
write_debug(f'Proxy map: {self.proxies}')
write_debug(f'Request Handlers: {", ".join(rh.RH_NAME for rh in self._request_director.handlers.values())}')
if os.environ.get('YTDLP_NO_PLUGINS'):
write_debug('Plugins are forcibly disabled')
return
for plugin_type, plugins in {'Extractor': plugin_ies, 'Post-Processor': plugin_pps}.items():
display_list = ['{}{}'.format(
klass.__name__, '' if klass.__name__ == name else f' as {name}')
for name, klass in plugins.items()]
for plugin_type, plugins in (('Extractor', plugin_ies), ('Post-Processor', plugin_pps)):
display_list = [
klass.__name__ if klass.__name__ == name else f'{klass.__name__} as {name}'
for name, klass in plugins.value.items()]
if plugin_type == 'Extractor':
display_list.extend(f'{plugins[-1].IE_NAME.partition("+")[2]} ({parent.__name__})'
for parent, plugins in plugin_ie_overrides.items())
for parent, plugins in plugin_ies_overrides.value.items())
if not display_list:
continue
write_debug(f'{plugin_type} Plugins: {", ".join(sorted(display_list))}')
plugin_dirs = plugin_directories()
if plugin_dirs:
write_debug(f'Plugin directories: {plugin_dirs}')
plugin_dirs_msg = 'none'
if not plugin_dirs.value:
plugin_dirs_msg = 'none (disabled)'
else:
found_plugin_directories = plugin_directories()
if found_plugin_directories:
plugin_dirs_msg = ', '.join(found_plugin_directories)
write_debug(f'Plugin directories: {plugin_dirs_msg}')
@functools.cached_property
def proxies(self):
@ -4141,7 +4152,7 @@ class YoutubeDL:
(target, rh.RH_NAME)
for rh in self._request_director.handlers.values()
if isinstance(rh, ImpersonateRequestHandler)
for target in rh.supported_targets
for target in reversed(rh.supported_targets)
]
def _impersonate_target_available(self, target):

View File

@ -19,7 +19,9 @@ from .downloader.external import get_external_downloader
from .extractor import list_extractor_classes
from .extractor.adobepass import MSO_INFO
from .networking.impersonate import ImpersonateTarget
from .globals import IN_CLI, plugin_dirs
from .options import parseOpts
from .plugins import load_all_plugins as _load_all_plugins
from .postprocessor import (
FFmpegExtractAudioPP,
FFmpegMergerPP,
@ -33,7 +35,6 @@ from .postprocessor import (
)
from .update import Updater
from .utils import (
Config,
NO_DEFAULT,
POSTPROCESS_WHEN,
DateRange,
@ -66,8 +67,6 @@ from .utils.networking import std_headers
from .utils._utils import _UnsafeExtensionError
from .YoutubeDL import YoutubeDL
_IN_CLI = False
def _exit(status=0, *args):
for msg in args:
@ -433,6 +432,10 @@ def validate_options(opts):
}
# Other options
opts.plugin_dirs = opts.plugin_dirs
if opts.plugin_dirs is None:
opts.plugin_dirs = ['default']
if opts.playlist_items is not None:
try:
tuple(PlaylistEntries.parse_playlist_items(opts.playlist_items))
@ -973,11 +976,6 @@ def _real_main(argv=None):
parser, opts, all_urls, ydl_opts = parse_options(argv)
# HACK: Set the plugin dirs early on
# TODO(coletdjnz): remove when plugin globals system is implemented
if opts.plugin_dirs is not None:
Config._plugin_dirs = list(map(expand_path, opts.plugin_dirs))
# Dump user agent
if opts.dump_user_agent:
ua = traverse_obj(opts.headers, 'User-Agent', casesense=False, default=std_headers['User-Agent'])
@ -992,6 +990,11 @@ def _real_main(argv=None):
if opts.ffmpeg_location:
FFmpegPostProcessor._ffmpeg_location.set(opts.ffmpeg_location)
# load all plugins into the global lookup
plugin_dirs.value = opts.plugin_dirs
if plugin_dirs.value:
_load_all_plugins()
with YoutubeDL(ydl_opts) as ydl:
pre_process = opts.update_self or opts.rm_cachedir
actual_use = all_urls or opts.load_info_filename
@ -1018,8 +1021,9 @@ def _real_main(argv=None):
# List of simplified targets we know are supported,
# to help users know what dependencies may be required.
(ImpersonateTarget('chrome'), 'curl_cffi'),
(ImpersonateTarget('edge'), 'curl_cffi'),
(ImpersonateTarget('safari'), 'curl_cffi'),
(ImpersonateTarget('firefox'), 'curl_cffi>=0.10'),
(ImpersonateTarget('edge'), 'curl_cffi'),
]
available_targets = ydl._get_available_impersonate_targets()
@ -1035,12 +1039,12 @@ def _real_main(argv=None):
for known_target, known_handler in known_targets:
if not any(
known_target in target and handler == known_handler
known_target in target and known_handler.startswith(handler)
for target, handler in available_targets
):
rows.append([
rows.insert(0, [
ydl._format_out(text, ydl.Styles.SUPPRESS)
for text in make_row(known_target, f'{known_handler} (not available)')
for text in make_row(known_target, f'{known_handler} (unavailable)')
])
ydl.to_screen('[info] Available impersonate targets')
@ -1091,8 +1095,7 @@ def _real_main(argv=None):
def main(argv=None):
global _IN_CLI
_IN_CLI = True
IN_CLI.value = True
try:
_exit(*variadic(_real_main(argv)))
except (CookieLoadError, DownloadError):

View File

@ -83,7 +83,7 @@ def aes_ecb_encrypt(data, key, iv=None):
@returns {int[]} encrypted data
"""
expanded_key = key_expansion(key)
block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES))
block_count = ceil(len(data) / BLOCK_SIZE_BYTES)
encrypted_data = []
for i in range(block_count):
@ -103,7 +103,7 @@ def aes_ecb_decrypt(data, key, iv=None):
@returns {int[]} decrypted data
"""
expanded_key = key_expansion(key)
block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES))
block_count = ceil(len(data) / BLOCK_SIZE_BYTES)
encrypted_data = []
for i in range(block_count):
@ -134,7 +134,7 @@ def aes_ctr_encrypt(data, key, iv):
@returns {int[]} encrypted data
"""
expanded_key = key_expansion(key)
block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES))
block_count = ceil(len(data) / BLOCK_SIZE_BYTES)
counter = iter_vector(iv)
encrypted_data = []
@ -158,7 +158,7 @@ def aes_cbc_decrypt(data, key, iv):
@returns {int[]} decrypted data
"""
expanded_key = key_expansion(key)
block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES))
block_count = ceil(len(data) / BLOCK_SIZE_BYTES)
decrypted_data = []
previous_cipher_block = iv
@ -183,7 +183,7 @@ def aes_cbc_encrypt(data, key, iv, *, padding_mode='pkcs7'):
@returns {int[]} encrypted data
"""
expanded_key = key_expansion(key)
block_count = int(ceil(float(len(data)) / BLOCK_SIZE_BYTES))
block_count = ceil(len(data) / BLOCK_SIZE_BYTES)
encrypted_data = []
previous_cipher_block = iv

View File

@ -35,6 +35,7 @@ from .rtmp import RtmpFD
from .rtsp import RtspFD
from .websocket import WebSocketFragmentFD
from .youtube_live_chat import YoutubeLiveChatFD
from .bunnycdn import BunnyCdnFD
PROTOCOL_MAP = {
'rtmp': RtmpFD,
@ -55,6 +56,7 @@ PROTOCOL_MAP = {
'websocket_frag': WebSocketFragmentFD,
'youtube_live_chat': YoutubeLiveChatFD,
'youtube_live_chat_replay': YoutubeLiveChatFD,
'bunnycdn': BunnyCdnFD,
}

View File

@ -0,0 +1,50 @@
import hashlib
import random
import threading
from .common import FileDownloader
from . import HlsFD
from ..networking import Request
from ..networking.exceptions import network_exceptions
class BunnyCdnFD(FileDownloader):
"""
Downloads from BunnyCDN with required pings
Note, this is not a part of public API, and will be removed without notice.
DO NOT USE
"""
def real_download(self, filename, info_dict):
self.to_screen(f'[{self.FD_NAME}] Downloading from BunnyCDN')
fd = HlsFD(self.ydl, self.params)
stop_event = threading.Event()
ping_thread = threading.Thread(target=self.ping_thread, args=(stop_event,), kwargs=info_dict['_bunnycdn_ping_data'])
ping_thread.start()
try:
return fd.real_download(filename, info_dict)
finally:
stop_event.set()
def ping_thread(self, stop_event, url, headers, secret, context_id):
# Site sends ping every 4 seconds, but this throttles the download. Pinging every 2 seconds seems to work.
ping_interval = 2
# Hard coded resolution as it doesn't seem to matter
res = 1080
paused = 'false'
current_time = 0
while not stop_event.wait(ping_interval):
current_time += ping_interval
time = current_time + round(random.random(), 6)
md5_hash = hashlib.md5(f'{secret}_{context_id}_{time}_{paused}_{res}'.encode()).hexdigest()
ping_url = f'{url}?hash={md5_hash}&time={time}&paused={paused}&resolution={res}'
try:
self.ydl.urlopen(Request(ping_url, headers=headers)).read()
except network_exceptions as e:
self.to_screen(f'[{self.FD_NAME}] Ping failed: {e}')

View File

@ -31,6 +31,7 @@ from ..utils import (
timetuple_from_msec,
try_call,
)
from ..utils._utils import _ProgressState
class FileDownloader:
@ -333,7 +334,7 @@ class FileDownloader:
progress_dict), s.get('progress_idx') or 0)
self.to_console_title(self.ydl.evaluate_outtmpl(
progress_template.get('download-title') or 'yt-dlp %(progress._default_template)s',
progress_dict))
progress_dict), _ProgressState.from_dict(s), s.get('_percent'))
def _format_progress(self, *args, **kwargs):
return self.ydl._format_text(
@ -357,6 +358,7 @@ class FileDownloader:
'_speed_str': self.format_speed(speed).strip(),
'_total_bytes_str': _format_bytes('total_bytes'),
'_elapsed_str': self.format_seconds(s.get('elapsed')),
'_percent': 100.0,
'_percent_str': self.format_percent(100),
})
self._report_progress_status(s, join_nonempty(
@ -375,13 +377,15 @@ class FileDownloader:
return
self._progress_delta_time += update_delta
progress = try_call(
lambda: 100 * s['downloaded_bytes'] / s['total_bytes'],
lambda: 100 * s['downloaded_bytes'] / s['total_bytes_estimate'],
lambda: s['downloaded_bytes'] == 0 and 0)
s.update({
'_eta_str': self.format_eta(s.get('eta')).strip(),
'_speed_str': self.format_speed(s.get('speed')),
'_percent_str': self.format_percent(try_call(
lambda: 100 * s['downloaded_bytes'] / s['total_bytes'],
lambda: 100 * s['downloaded_bytes'] / s['total_bytes_estimate'],
lambda: s['downloaded_bytes'] == 0 and 0)),
'_percent': progress,
'_percent_str': self.format_percent(progress),
'_total_bytes_str': _format_bytes('total_bytes'),
'_total_bytes_estimate_str': _format_bytes('total_bytes_estimate'),
'_downloaded_bytes_str': _format_bytes('downloaded_bytes'),

View File

@ -457,8 +457,6 @@ class FFmpegFD(ExternalFD):
@classmethod
def available(cls, path=None):
# TODO: Fix path for ffmpeg
# Fixme: This may be wrong when --ffmpeg-location is used
return FFmpegPostProcessor().available
def on_process_started(self, proc, stdin):

View File

@ -85,6 +85,7 @@ class NiconicoLiveFD(FileDownloader):
'quality': live_quality,
'protocol': 'hls+fmp4',
'latency': live_latency,
'accessRightMethod': 'single_cookie',
'chasePlay': False,
},
'room': {

View File

@ -1,16 +1,25 @@
from ..compat.compat_utils import passthrough_module
from ..globals import extractors as _extractors_context
from ..globals import plugin_ies as _plugin_ies_context
from ..plugins import PluginSpec, register_plugin_spec
passthrough_module(__name__, '.extractors')
del passthrough_module
register_plugin_spec(PluginSpec(
module_name='extractor',
suffix='IE',
destination=_extractors_context,
plugin_destination=_plugin_ies_context,
))
def gen_extractor_classes():
""" Return a list of supported extractors.
The order does matter; the first extractor matched is the one handling the URL.
"""
from .extractors import _ALL_CLASSES
return _ALL_CLASSES
import_extractors()
return list(_extractors_context.value.values())
def gen_extractors():
@ -37,6 +46,9 @@ def list_extractors(age_limit=None):
def get_info_extractor(ie_name):
"""Returns the info extractor class with the given ie_name"""
from . import extractors
import_extractors()
return _extractors_context.value[f'{ie_name}IE']
return getattr(extractors, f'{ie_name}IE')
def import_extractors():
from . import extractors # noqa: F401

View File

@ -312,6 +312,7 @@ from .brilliantpala import (
)
from .bundesliga import BundesligaIE
from .bundestag import BundestagIE
from .bunnycdn import BunnyCdnIE
from .businessinsider import BusinessInsiderIE
from .buzzfeed import BuzzFeedIE
from .byutv import BYUtvIE
@ -335,6 +336,7 @@ from .canal1 import Canal1IE
from .canalalpha import CanalAlphaIE
from .canalc2 import Canalc2IE
from .canalplus import CanalplusIE
from .canalsurmas import CanalsurmasIE
from .caracoltv import CaracolTvPlayIE
from .cartoonnetwork import CartoonNetworkIE
from .cbc import (
@ -494,10 +496,6 @@ from .daum import (
from .daystar import DaystarClipIE
from .dbtv import DBTVIE
from .dctp import DctpTvIE
from .deezer import (
DeezerAlbumIE,
DeezerPlaylistIE,
)
from .democracynow import DemocracynowIE
from .detik import DetikEmbedIE
from .deuxm import (
@ -685,6 +683,7 @@ from .foxnews import (
)
from .foxsports import FoxSportsIE
from .fptplay import FptplayIE
from .francaisfacile import FrancaisFacileIE
from .franceinter import FranceInterIE
from .francetv import (
FranceTVIE,
@ -841,6 +840,7 @@ from .icareus import IcareusIE
from .ichinanalive import (
IchinanaLiveClipIE,
IchinanaLiveIE,
IchinanaLiveVODIE,
)
from .idolplus import IdolPlusIE
from .ign import (
@ -903,6 +903,7 @@ from .ivi import (
IviIE,
)
from .ivideon import IvideonIE
from .ivoox import IvooxIE
from .iwara import (
IwaraIE,
IwaraPlaylistIE,
@ -960,7 +961,10 @@ from .kick import (
)
from .kicker import KickerIE
from .kickstarter import KickStarterIE
from .kika import KikaIE
from .kika import (
KikaIE,
KikaPlaylistIE,
)
from .kinja import KinjaEmbedIE
from .kinopoisk import KinoPoiskIE
from .kommunetv import KommunetvIE
@ -1053,6 +1057,7 @@ from .livestream import (
)
from .livestreamfails import LivestreamfailsIE
from .lnk import LnkIE
from .loco import LocoIE
from .loom import (
LoomFolderIE,
LoomIE,
@ -1060,6 +1065,7 @@ from .loom import (
from .lovehomeporn import LoveHomePornIE
from .lrt import (
LRTVODIE,
LRTRadioIE,
LRTStreamIE,
)
from .lsm import (
@ -1492,6 +1498,10 @@ from .paramountplus import (
)
from .parler import ParlerIE
from .parlview import ParlviewIE
from .parti import (
PartiLivestreamIE,
PartiVideoIE,
)
from .patreon import (
PatreonCampaignIE,
PatreonIE,
@ -1738,6 +1748,7 @@ from .roosterteeth import (
RoosterTeethSeriesIE,
)
from .rottentomatoes import RottenTomatoesIE
from .roya import RoyaLiveIE
from .rozhlas import (
MujRozhlasIE,
RozhlasIE,
@ -1881,6 +1892,8 @@ from .skyit import (
SkyItVideoIE,
SkyItVideoLiveIE,
TV8ItIE,
TV8ItLiveIE,
TV8ItPlaylistIE,
)
from .skylinewebcams import SkylineWebcamsIE
from .skynewsarabia import (
@ -1894,6 +1907,7 @@ from .slutload import SlutloadIE
from .smotrim import SmotrimIE
from .snapchat import SnapchatSpotlightIE
from .snotr import SnotrIE
from .softwhiteunderbelly import SoftWhiteUnderbellyIE
from .sohu import (
SohuIE,
SohuVIE,
@ -1983,6 +1997,7 @@ from .storyfire import (
StoryFireSeriesIE,
StoryFireUserIE,
)
from .streaks import StreaksIE
from .streamable import StreamableIE
from .streamcz import StreamCZIE
from .streetvoice import StreetVoiceIE
@ -2222,6 +2237,7 @@ from .tvplay import (
TVPlayIE,
)
from .tvplayer import TVPlayerIE
from .tvw import TvwIE
from .tweakers import TweakersIE
from .twentymin import TwentyMinutenIE
from .twentythreevideo import TwentyThreeVideoIE
@ -2345,10 +2361,6 @@ from .viewlift import (
ViewLiftIE,
)
from .viidea import ViideaIE
from .viki import (
VikiChannelIE,
VikiIE,
)
from .vimeo import (
VHXEmbedIE,
VimeoAlbumIE,
@ -2393,10 +2405,15 @@ from .voxmedia import (
VoxMediaIE,
VoxMediaVolumeIE,
)
from .vrsquare import (
VrSquareChannelIE,
VrSquareIE,
VrSquareSearchIE,
VrSquareSectionIE,
)
from .vrt import (
VRTIE,
DagelijkseKostIE,
KetnetIE,
Radio1BeIE,
VrtNUIE,
)

View File

@ -1,3 +1,4 @@
import datetime as dt
import functools
from .common import InfoExtractor
@ -10,7 +11,7 @@ from ..utils import (
filter_dict,
int_or_none,
orderedSet,
unified_timestamp,
parse_iso8601,
url_or_none,
urlencode_postdata,
urljoin,
@ -87,9 +88,9 @@ class AfreecaTVIE(AfreecaTVBaseIE):
'uploader_id': 'rlantnghks',
'uploader': '페이즈으',
'duration': 10840,
'thumbnail': r're:https?://videoimg\.sooplive\.co/.kr/.+',
'thumbnail': r're:https?://videoimg\.(?:sooplive\.co\.kr|afreecatv\.com)/.+',
'upload_date': '20230108',
'timestamp': 1673218805,
'timestamp': 1673186405,
'title': '젠지 페이즈',
},
'params': {
@ -102,7 +103,7 @@ class AfreecaTVIE(AfreecaTVBaseIE):
'id': '20170411_BE689A0E_190960999_1_2_h',
'ext': 'mp4',
'title': '혼자사는여자집',
'thumbnail': r're:https?://(?:video|st)img\.sooplive\.co\.kr/.+',
'thumbnail': r're:https?://(?:video|st)img\.(?:sooplive\.co\.kr|afreecatv\.com)/.+',
'uploader': '♥이슬이',
'uploader_id': 'dasl8121',
'upload_date': '20170411',
@ -119,7 +120,7 @@ class AfreecaTVIE(AfreecaTVBaseIE):
'id': '20180327_27901457_202289533_1',
'ext': 'mp4',
'title': '[생]빨개요♥ (part 1)',
'thumbnail': r're:https?://(?:video|st)img\.sooplive\.co\.kr/.+',
'thumbnail': r're:https?://(?:video|st)img\.(?:sooplive\.co\.kr|afreecatv\.com)/.+',
'uploader': '[SA]서아',
'uploader_id': 'bjdyrksu',
'upload_date': '20180327',
@ -187,7 +188,7 @@ class AfreecaTVIE(AfreecaTVBaseIE):
'formats': formats,
**traverse_obj(file_element, {
'duration': ('duration', {int_or_none(scale=1000)}),
'timestamp': ('file_start', {unified_timestamp}),
'timestamp': ('file_start', {parse_iso8601(delimiter=' ', timezone=dt.timedelta(hours=9))}),
}),
})
@ -370,7 +371,7 @@ class AfreecaTVLiveIE(AfreecaTVBaseIE):
'title': channel_info.get('TITLE') or station_info.get('station_title'),
'uploader': channel_info.get('BJNICK') or station_info.get('station_name'),
'uploader_id': broadcaster_id,
'timestamp': unified_timestamp(station_info.get('broad_start')),
'timestamp': parse_iso8601(station_info.get('broad_start'), delimiter=' ', timezone=dt.timedelta(hours=9)),
'formats': formats,
'is_live': True,
'http_headers': {'Referer': url},

View File

@ -146,7 +146,7 @@ class TokFMPodcastIE(InfoExtractor):
'url': 'https://audycje.tokfm.pl/podcast/91275,-Systemowy-rasizm-Czy-zamieszki-w-USA-po-morderstwie-w-Minneapolis-doprowadza-do-zmian-w-sluzbach-panstwowych',
'info_dict': {
'id': '91275',
'ext': 'aac',
'ext': 'mp3',
'title': 'md5:a9b15488009065556900169fb8061cce',
'episode': 'md5:a9b15488009065556900169fb8061cce',
'series': 'Analizy',
@ -164,23 +164,20 @@ class TokFMPodcastIE(InfoExtractor):
raise ExtractorError('No such podcast', expected=True)
metadata = metadata[0]
formats = []
for ext in ('aac', 'mp3'):
url_data = self._download_json(
f'https://api.podcast.radioagora.pl/api4/getSongUrl?podcast_id={media_id}&device_id={uuid.uuid4()}&ppre=false&audio={ext}',
media_id, f'Downloading podcast {ext} URL')
# prevents inserting the mp3 (default) multiple times
if 'link_ssl' in url_data and f'.{ext}' in url_data['link_ssl']:
formats.append({
'url': url_data['link_ssl'],
'ext': ext,
'vcodec': 'none',
'acodec': ext,
})
mp3_url = self._download_json(
'https://api.podcast.radioagora.pl/api4/getSongUrl',
media_id, 'Downloading podcast mp3 URL', query={
'podcast_id': media_id,
'device_id': str(uuid.uuid4()),
'ppre': 'false',
'audio': 'mp3',
})['link_ssl']
return {
'id': media_id,
'formats': formats,
'url': mp3_url,
'vcodec': 'none',
'ext': 'mp3',
'title': metadata.get('podcast_name'),
'series': metadata.get('series_name'),
'episode': metadata.get('podcast_name'),

View File

@ -1,7 +1,6 @@
import json
from .common import InfoExtractor
from .kaltura import KalturaIE
from ..utils.traversal import require, traverse_obj
class AZMedienIE(InfoExtractor):
@ -9,15 +8,15 @@ class AZMedienIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?:www\.|tv\.)?
(?P<host>
(?:
telezueri\.ch|
telebaern\.tv|
telem1\.ch|
tvo-online\.ch
)/
[^/]+/
[^/?#]+/
(?P<id>
[^/]+-(?P<article_id>\d+)
[^/?#]+-\d+
)
(?:
\#video=
@ -47,19 +46,17 @@ class AZMedienIE(InfoExtractor):
'url': 'https://www.telebaern.tv/telebaern-news/montag-1-oktober-2018-ganze-sendung-133531189#video=0_7xjo9lf1',
'only_matching': True,
}]
_API_TEMPL = 'https://www.%s/api/pub/gql/%s/NewsArticleTeaser/a4016f65fe62b81dc6664dd9f4910e4ab40383be'
_PARTNER_ID = '1719221'
def _real_extract(self, url):
host, display_id, article_id, entry_id = self._match_valid_url(url).groups()
display_id, entry_id = self._match_valid_url(url).groups()
if not entry_id:
entry_id = self._download_json(
self._API_TEMPL % (host, host.split('.')[0]), display_id, query={
'variables': json.dumps({
'contextId': 'NewsArticle:' + article_id,
}),
})['data']['context']['mainAsset']['video']['kaltura']['kalturaId']
webpage = self._download_webpage(url, display_id)
data = self._search_json(
r'window\.__APOLLO_STATE__\s*=', webpage, 'video data', display_id)
entry_id = traverse_obj(data, (
lambda _, v: v['__typename'] == 'KalturaData', 'kalturaId', any, {require('kaltura id')}))
return self.url_result(
f'kaltura:{self._PARTNER_ID}:{entry_id}',

View File

@ -86,7 +86,7 @@ class BandlabBaseIE(InfoExtractor):
'webpage_url': (
'id', ({value(url)}, {format_field(template='https://www.bandlab.com/post/%s')}), filter, any),
'url': ('video', 'url', {url_or_none}),
'title': ('caption', {lambda x: x.replace('\n', ' ')}, {truncate_string(left=50)}),
'title': ('caption', {lambda x: x.replace('\n', ' ')}, {truncate_string(left=72)}),
'description': ('caption', {str}),
'thumbnail': ('video', 'picture', 'url', {url_or_none}),
'view_count': ('video', 'counters', 'plays', {int_or_none}),
@ -120,7 +120,7 @@ class BandlabIE(BandlabBaseIE):
'duration': 54.629999999999995,
'title': 'sweet black',
'upload_date': '20231210',
'thumbnail': 'https://bandlabimages.azureedge.net/v1.0/songs/fa082beb-b856-4730-9170-a57e4e32cc2c/',
'thumbnail': 'https://bl-prod-images.azureedge.net/v1.0/songs/fa082beb-b856-4730-9170-a57e4e32cc2c/',
'genres': ['Lofi'],
'uploader': 'ender milze',
'comment_count': int,
@ -142,7 +142,7 @@ class BandlabIE(BandlabBaseIE):
'duration': 54.629999999999995,
'title': 'sweet black',
'upload_date': '20231210',
'thumbnail': 'https://bandlabimages.azureedge.net/v1.0/songs/fa082beb-b856-4730-9170-a57e4e32cc2c/',
'thumbnail': 'https://bl-prod-images.azureedge.net/v1.0/songs/fa082beb-b856-4730-9170-a57e4e32cc2c/',
'genres': ['Lofi'],
'uploader': 'ender milze',
'comment_count': int,
@ -158,7 +158,7 @@ class BandlabIE(BandlabBaseIE):
'comment_count': int,
'genres': ['Other'],
'uploader_id': 'user8353034818103753',
'thumbnail': 'https://bandlabimages.azureedge.net/v1.0/songs/51b18363-da23-4b9b-a29c-2933a3e561ca/',
'thumbnail': 'https://bl-prod-images.azureedge.net/v1.0/songs/51b18363-da23-4b9b-a29c-2933a3e561ca/',
'timestamp': 1709625771,
'track': 'PodcastMaerchen4b',
'duration': 468.14,
@ -178,7 +178,7 @@ class BandlabIE(BandlabBaseIE):
'id': '110343fc-148b-ea11-96d2-0003ffd1fc09',
'ext': 'm4a',
'timestamp': 1588273294,
'thumbnail': 'https://bandlabimages.azureedge.net/v1.0/users/b612e533-e4f7-4542-9f50-3fcfd8dd822c/',
'thumbnail': 'https://bl-prod-images.azureedge.net/v1.0/users/b612e533-e4f7-4542-9f50-3fcfd8dd822c/',
'description': 'Final Revision.',
'title': 'Replay ( Instrumental)',
'uploader': 'David R Sparks',
@ -200,7 +200,7 @@ class BandlabIE(BandlabBaseIE):
'id': '5cdf9036-3857-ef11-991a-6045bd36e0d9',
'ext': 'mp4',
'duration': 44.705,
'thumbnail': 'https://bandlabimages.azureedge.net/v1.0/videos/67c6cef1-cef6-40d3-831e-a55bc1dcb972/',
'thumbnail': 'https://bl-prod-images.azureedge.net/v1.0/videos/67c6cef1-cef6-40d3-831e-a55bc1dcb972/',
'comment_count': int,
'title': 'backing vocals',
'uploader_id': 'marliashya',
@ -224,7 +224,7 @@ class BandlabIE(BandlabBaseIE):
'view_count': int,
'track': 'Positronic Meltdown',
'duration': 318.55,
'thumbnail': 'https://bandlabimages.azureedge.net/v1.0/songs/87165bc3-5439-496e-b1f7-a9f13b541ff2/',
'thumbnail': 'https://bl-prod-images.azureedge.net/v1.0/songs/87165bc3-5439-496e-b1f7-a9f13b541ff2/',
'description': 'Checkout my tracks at AOMX http://aomxsounds.com/',
'uploader_id': 'microfreaks',
'title': 'Positronic Meltdown',
@ -246,7 +246,7 @@ class BandlabIE(BandlabBaseIE):
'comment_count': int,
'uploader': 'Sorakime',
'uploader_id': 'sorakime',
'thumbnail': 'https://bandlabimages.azureedge.net/v1.0/users/572a351a-0f3a-4c6a-ac39-1a5defdeeb1c/',
'thumbnail': 'https://bl-prod-images.azureedge.net/v1.0/users/572a351a-0f3a-4c6a-ac39-1a5defdeeb1c/',
'timestamp': 1691162128,
'upload_date': '20230804',
'media_type': 'track',

View File

@ -1596,16 +1596,16 @@ class BilibiliPlaylistIE(BilibiliSpaceListBaseIE):
webpage = self._download_webpage(url, list_id)
initial_state = self._search_json(r'window\.__INITIAL_STATE__\s*=', webpage, 'initial state', list_id)
if traverse_obj(initial_state, ('error', 'code', {int_or_none})) != 200:
error_code = traverse_obj(initial_state, ('error', 'trueCode', {int_or_none}))
error_message = traverse_obj(initial_state, ('error', 'message', {str_or_none}))
error = traverse_obj(initial_state, (('error', 'listError'), all, lambda _, v: v['code'], any))
if error and error['code'] != 200:
error_code = error.get('trueCode')
if error_code == -400 and list_id == 'watchlater':
self.raise_login_required('You need to login to access your watchlater playlist')
elif error_code == -403:
self.raise_login_required('This is a private playlist. You need to login as its owner')
elif error_code == 11010:
raise ExtractorError('Playlist is no longer available', expected=True)
raise ExtractorError(f'Could not access playlist: {error_code} {error_message}')
raise ExtractorError(f'Could not access playlist: {error_code} {error.get("message")}')
query = {
'ps': 20,

View File

@ -53,7 +53,7 @@ class BlueskyIE(InfoExtractor):
'channel_id': 'did:plc:z72i7hdynmk6r22z27h6tvur',
'channel_url': 'https://bsky.app/profile/did:plc:z72i7hdynmk6r22z27h6tvur',
'thumbnail': r're:https://video.bsky.app/watch/.*\.jpg$',
'title': 'Bluesky now has video! Update your app to versi...',
'title': 'Bluesky now has video! Update your app to version 1.91 or refresh on ...',
'alt_title': 'Bluesky video feature announcement',
'description': r're:(?s)Bluesky now has video! .{239}',
'upload_date': '20240911',
@ -172,7 +172,7 @@ class BlueskyIE(InfoExtractor):
'channel_id': 'did:plc:z72i7hdynmk6r22z27h6tvur',
'channel_url': 'https://bsky.app/profile/did:plc:z72i7hdynmk6r22z27h6tvur',
'thumbnail': r're:https://video.bsky.app/watch/.*\.jpg$',
'title': 'Bluesky now has video! Update your app to versi...',
'title': 'Bluesky now has video! Update your app to version 1.91 or refresh on ...',
'alt_title': 'Bluesky video feature announcement',
'description': r're:(?s)Bluesky now has video! .{239}',
'upload_date': '20240911',
@ -191,7 +191,7 @@ class BlueskyIE(InfoExtractor):
'info_dict': {
'id': '3l7rdfxhyds2f',
'ext': 'mp4',
'uploader': 'cinnamon',
'uploader': 'cinnamon 🐇 🏳️‍⚧️',
'uploader_id': 'cinny.bun.how',
'uploader_url': 'https://bsky.app/profile/cinny.bun.how',
'channel_id': 'did:plc:7x6rtuenkuvxq3zsvffp2ide',
@ -255,7 +255,7 @@ class BlueskyIE(InfoExtractor):
'info_dict': {
'id': '3l77u64l7le2e',
'ext': 'mp4',
'title': 'hearing people on twitter say that bluesky isn\'...',
'title': "hearing people on twitter say that bluesky isn't funny yet so post t...",
'like_count': int,
'uploader_id': 'thafnine.net',
'uploader_url': 'https://bsky.app/profile/thafnine.net',
@ -387,7 +387,7 @@ class BlueskyIE(InfoExtractor):
'age_limit': (
'labels', ..., 'val', {lambda x: 18 if x in ('sexual', 'porn', 'graphic-media') else None}, any),
'description': (*record_path, 'text', {str}, filter),
'title': (*record_path, 'text', {lambda x: x.replace('\n', ' ')}, {truncate_string(left=50)}),
'title': (*record_path, 'text', {lambda x: x.replace('\n', ' ')}, {truncate_string(left=72)}),
}),
})
return entries

View File

@ -24,7 +24,7 @@ class BokeCCBaseIE(InfoExtractor):
class BokeCCIE(BokeCCBaseIE):
_IE_DESC = 'CC视频'
IE_DESC = 'CC视频'
_VALID_URL = r'https?://union\.bokecc\.com/playvideo\.bo\?(?P<query>.*)'
_TESTS = [{

View File

@ -0,0 +1,178 @@
import json
from .common import InfoExtractor
from ..networking import HEADRequest
from ..utils import (
ExtractorError,
extract_attributes,
int_or_none,
parse_qs,
smuggle_url,
unsmuggle_url,
url_or_none,
urlhandle_detect_ext,
)
from ..utils.traversal import find_element, traverse_obj
class BunnyCdnIE(InfoExtractor):
_VALID_URL = r'https?://(?:iframe\.mediadelivery\.net|video\.bunnycdn\.com)/(?:embed|play)/(?P<library_id>\d+)/(?P<id>[\da-f-]+)'
_EMBED_REGEX = [rf'<iframe[^>]+src=[\'"](?P<url>{_VALID_URL}[^\'"]*)[\'"]']
_TESTS = [{
'url': 'https://iframe.mediadelivery.net/embed/113933/e73edec1-e381-4c8b-ae73-717a140e0924',
'info_dict': {
'id': 'e73edec1-e381-4c8b-ae73-717a140e0924',
'ext': 'mp4',
'title': 'mistress morgana (3).mp4',
'description': '',
'timestamp': 1693251673,
'thumbnail': r're:^https?://.*\.b-cdn\.net/e73edec1-e381-4c8b-ae73-717a140e0924/thumbnail\.jpg',
'duration': 7.0,
'upload_date': '20230828',
},
'params': {'skip_download': True},
}, {
'url': 'https://iframe.mediadelivery.net/play/136145/32e34c4b-0d72-437c-9abb-05e67657da34',
'info_dict': {
'id': '32e34c4b-0d72-437c-9abb-05e67657da34',
'ext': 'mp4',
'timestamp': 1691145748,
'thumbnail': r're:^https?://.*\.b-cdn\.net/32e34c4b-0d72-437c-9abb-05e67657da34/thumbnail_9172dc16\.jpg',
'duration': 106.0,
'description': 'md5:981a3e899a5c78352b21ed8b2f1efd81',
'upload_date': '20230804',
'title': 'Sanela ist Teil der #arbeitsmarktkraft',
},
'params': {'skip_download': True},
}, {
# Stream requires activation and pings
'url': 'https://iframe.mediadelivery.net/embed/200867/2e8545ec-509d-4571-b855-4cf0235ccd75',
'info_dict': {
'id': '2e8545ec-509d-4571-b855-4cf0235ccd75',
'ext': 'mp4',
'timestamp': 1708497752,
'title': 'netflix part 1',
'duration': 3959.0,
'description': '',
'upload_date': '20240221',
'thumbnail': r're:^https?://.*\.b-cdn\.net/2e8545ec-509d-4571-b855-4cf0235ccd75/thumbnail\.jpg',
},
'params': {'skip_download': True},
}]
_WEBPAGE_TESTS = [{
# Stream requires Referer
'url': 'https://conword.io/',
'info_dict': {
'id': '3a5d863e-9cd6-447e-b6ef-e289af50b349',
'ext': 'mp4',
'title': 'Conword bei der Stadt Köln und Stadt Dortmund',
'description': '',
'upload_date': '20231031',
'duration': 31.0,
'thumbnail': 'https://video.watchuh.com/3a5d863e-9cd6-447e-b6ef-e289af50b349/thumbnail.jpg',
'timestamp': 1698783879,
},
'params': {'skip_download': True},
}, {
# URL requires token and expires
'url': 'https://www.stockphotos.com/video/moscow-subway-the-train-is-arriving-at-the-park-kultury-station-10017830',
'info_dict': {
'id': '0b02fa20-4e8c-4140-8f87-f64d820a3386',
'ext': 'mp4',
'thumbnail': r're:^https?://.*\.b-cdn\.net/0b02fa20-4e8c-4140-8f87-f64d820a3386/thumbnail\.jpg',
'title': 'Moscow subway. The train is arriving at the Park Kultury station.',
'upload_date': '20240531',
'duration': 18.0,
'timestamp': 1717152269,
'description': '',
},
'params': {'skip_download': True},
}]
@classmethod
def _extract_embed_urls(cls, url, webpage):
for embed_url in super()._extract_embed_urls(url, webpage):
yield smuggle_url(embed_url, {'Referer': url})
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
video_id, library_id = self._match_valid_url(url).group('id', 'library_id')
webpage = self._download_webpage(
f'https://iframe.mediadelivery.net/embed/{library_id}/{video_id}', video_id,
headers=traverse_obj(smuggled_data, {'Referer': 'Referer'}),
query=traverse_obj(parse_qs(url), {'token': 'token', 'expires': 'expires'}))
if html_title := self._html_extract_title(webpage, default=None) == '403':
raise ExtractorError(
'This video is inaccessible. Setting a Referer header '
'might be required to access the video', expected=True)
elif html_title == '404':
raise ExtractorError('This video does not exist', expected=True)
headers = {'Referer': url}
info = traverse_obj(self._parse_html5_media_entries(url, webpage, video_id, _headers=headers), 0) or {}
formats = info.get('formats') or []
subtitles = info.get('subtitles') or {}
original_url = self._search_regex(
r'(?:var|const|let)\s+originalUrl\s*=\s*["\']([^"\']+)["\']', webpage, 'original url', default=None)
if url_or_none(original_url):
urlh = self._request_webpage(
HEADRequest(original_url), video_id=video_id, note='Checking original',
headers=headers, fatal=False, expected_status=(403, 404))
if urlh and urlh.status == 200:
formats.append({
'url': original_url,
'format_id': 'source',
'quality': 1,
'http_headers': headers,
'ext': urlhandle_detect_ext(urlh, default='mp4'),
'filesize': int_or_none(urlh.get_header('Content-Length')),
})
# MediaCage Streams require activation and pings
src_url = self._search_regex(
r'\.setAttribute\([\'"]src[\'"],\s*[\'"]([^\'"]+)[\'"]\)', webpage, 'src url', default=None)
activation_url = self._search_regex(
r'loadUrl\([\'"]([^\'"]+/activate)[\'"]', webpage, 'activation url', default=None)
ping_url = self._search_regex(
r'loadUrl\([\'"]([^\'"]+/ping)[\'"]', webpage, 'ping url', default=None)
secret = traverse_obj(parse_qs(src_url), ('secret', 0))
context_id = traverse_obj(parse_qs(src_url), ('contextId', 0))
ping_data = {}
if src_url and activation_url and ping_url and secret and context_id:
self._download_webpage(
activation_url, video_id, headers=headers, note='Downloading activation data')
fmts, subs = self._extract_m3u8_formats_and_subtitles(
src_url, video_id, 'mp4', headers=headers, m3u8_id='hls', fatal=False)
for fmt in fmts:
fmt.update({
'protocol': 'bunnycdn',
'http_headers': headers,
})
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
ping_data = {
'_bunnycdn_ping_data': {
'url': ping_url,
'headers': headers,
'secret': secret,
'context_id': context_id,
},
}
return {
'id': video_id,
'formats': formats,
'subtitles': subtitles,
**traverse_obj(webpage, ({find_element(id='main-video', html=True)}, {extract_attributes}, {
'title': ('data-plyr-config', {json.loads}, 'title', {str}),
'thumbnail': ('data-poster', {url_or_none}),
})),
**ping_data,
**self._search_json_ld(webpage, video_id, fatal=False),
}

View File

@ -0,0 +1,84 @@
import json
import time
from .common import InfoExtractor
from ..utils import (
determine_ext,
float_or_none,
jwt_decode_hs256,
parse_iso8601,
url_or_none,
variadic,
)
from ..utils.traversal import traverse_obj
class CanalsurmasIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?canalsurmas\.es/videos/(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.canalsurmas.es/videos/44006-el-gran-queo-1-lora-del-rio-sevilla-20072014',
'md5': '861f86fdc1221175e15523047d0087ef',
'info_dict': {
'id': '44006',
'ext': 'mp4',
'title': 'Lora del Río (Sevilla)',
'description': 'md5:3d9ee40a9b1b26ed8259e6b71ed27b8b',
'thumbnail': 'https://cdn2.rtva.interactvty.com/content_cards/00f3e8f67b0a4f3b90a4a14618a48b0d.jpg',
'timestamp': 1648123182,
'upload_date': '20220324',
},
}]
_API_BASE = 'https://api-rtva.interactvty.com'
_access_token = None
@staticmethod
def _is_jwt_expired(token):
return jwt_decode_hs256(token)['exp'] - time.time() < 300
def _call_api(self, endpoint, video_id, fields=None):
if not self._access_token or self._is_jwt_expired(self._access_token):
self._access_token = self._download_json(
f'{self._API_BASE}/jwt/token/', None,
'Downloading access token', 'Failed to download access token',
headers={'Content-Type': 'application/json'},
data=json.dumps({
'username': 'canalsur_demo',
'password': 'dsUBXUcI',
}).encode())['access']
return self._download_json(
f'{self._API_BASE}/api/2.0/contents/{endpoint}/{video_id}/', video_id,
f'Downloading {endpoint} API JSON', f'Failed to download {endpoint} API JSON',
headers={'Authorization': f'jwtok {self._access_token}'},
query={'optional_fields': ','.join(variadic(fields))} if fields else None)
def _real_extract(self, url):
video_id = self._match_id(url)
video_info = self._call_api('content', video_id, fields=[
'description', 'image', 'duration', 'created_at', 'tags',
])
stream_info = self._call_api('content_resources', video_id, 'media_url')
formats, subtitles = [], {}
for stream_url in traverse_obj(stream_info, ('results', ..., 'media_url', {url_or_none})):
if determine_ext(stream_url) == 'm3u8':
fmts, subs = self._extract_m3u8_formats_and_subtitles(
stream_url, video_id, m3u8_id='hls', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
else:
formats.append({'url': stream_url})
return {
'id': video_id,
'formats': formats,
'subtitles': subtitles,
**traverse_obj(video_info, {
'title': ('name', {str.strip}),
'description': ('description', {str}),
'thumbnail': ('image', {url_or_none}),
'duration': ('duration', {float_or_none}),
'timestamp': ('created_at', {parse_iso8601}),
'tags': ('tags', ..., {str}),
}),
}

View File

@ -1,17 +1,17 @@
import base64
import functools
import json
import re
import time
import urllib.parse
from .common import InfoExtractor
from ..networking import HEADRequest
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
js_to_json,
jwt_decode_hs256,
mimetype2ext,
orderedSet,
parse_age_limit,
@ -24,6 +24,7 @@ from ..utils import (
update_url,
url_basename,
url_or_none,
urlencode_postdata,
)
from ..utils.traversal import require, traverse_obj, trim_str
@ -608,66 +609,82 @@ class CBCGemIE(CBCGemBaseIE):
'only_matching': True,
}]
_TOKEN_API_KEY = '3f4beddd-2061-49b0-ae80-6f1f2ed65b37'
_CLIENT_ID = 'fc05b0ee-3865-4400-a3cc-3da82c330c23'
_refresh_token = None
_access_token = None
_claims_token = None
def _new_claims_token(self, email, password):
data = json.dumps({
'email': email,
'password': password,
}).encode()
headers = {'content-type': 'application/json'}
query = {'apikey': self._TOKEN_API_KEY}
resp = self._download_json('https://api.loginradius.com/identity/v2/auth/login',
None, data=data, headers=headers, query=query)
access_token = resp['access_token']
@functools.cached_property
def _ropc_settings(self):
return self._download_json(
'https://services.radio-canada.ca/ott/catalog/v1/gem/settings', None,
'Downloading site settings', query={'device': 'web'})['identityManagement']['ropc']
query = {
'access_token': access_token,
'apikey': self._TOKEN_API_KEY,
'jwtapp': 'jwt',
}
resp = self._download_json('https://cloud-api.loginradius.com/sso/jwt/api/token',
None, headers=headers, query=query)
sig = resp['signature']
def _is_jwt_expired(self, token):
return jwt_decode_hs256(token)['exp'] - time.time() < 300
data = json.dumps({'jwt': sig}).encode()
headers = {'content-type': 'application/json', 'ott-device-type': 'web'}
resp = self._download_json('https://services.radio-canada.ca/ott/cbc-api/v2/token',
None, data=data, headers=headers, expected_status=426)
cbc_access_token = resp['accessToken']
def _call_oauth_api(self, oauth_data, note='Refreshing access token'):
response = self._download_json(
self._ropc_settings['url'], None, note, data=urlencode_postdata({
'client_id': self._CLIENT_ID,
**oauth_data,
'scope': self._ropc_settings['scopes'],
}))
self._refresh_token = response['refresh_token']
self._access_token = response['access_token']
self.cache.store(self._NETRC_MACHINE, 'token_data', [self._refresh_token, self._access_token])
headers = {'content-type': 'application/json', 'ott-device-type': 'web', 'ott-access-token': cbc_access_token}
resp = self._download_json('https://services.radio-canada.ca/ott/cbc-api/v2/profile',
None, headers=headers, expected_status=426)
return resp['claimsToken']
def _perform_login(self, username, password):
if not self._refresh_token:
self._refresh_token, self._access_token = self.cache.load(
self._NETRC_MACHINE, 'token_data', default=[None, None])
def _get_claims_token_expiry(self):
# Token is a JWT
# JWT is decoded here and 'exp' field is extracted
# It is a Unix timestamp for when the token expires
b64_data = self._claims_token.split('.')[1]
data = base64.urlsafe_b64decode(b64_data + '==')
return json.loads(data)['exp']
def claims_token_expired(self):
exp = self._get_claims_token_expiry()
# It will expire in less than 10 seconds, or has already expired
return exp - time.time() < 10
def claims_token_valid(self):
return self._claims_token is not None and not self.claims_token_expired()
def _get_claims_token(self, email, password):
if not self.claims_token_valid():
self._claims_token = self._new_claims_token(email, password)
self.cache.store(self._NETRC_MACHINE, 'claims_token', self._claims_token)
return self._claims_token
def _real_initialize(self):
if self.claims_token_valid():
if self._refresh_token and self._access_token:
self.write_debug('Using cached refresh token')
if not self._claims_token:
self._claims_token = self.cache.load(self._NETRC_MACHINE, 'claims_token')
return
self._claims_token = self.cache.load(self._NETRC_MACHINE, 'claims_token')
try:
self._call_oauth_api({
'grant_type': 'password',
'username': username,
'password': password,
}, note='Logging in')
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 400:
raise ExtractorError('Invalid username and/or password', expected=True)
raise
def _fetch_access_token(self):
if self._is_jwt_expired(self._access_token):
try:
self._call_oauth_api({
'grant_type': 'refresh_token',
'refresh_token': self._refresh_token,
})
except ExtractorError:
self._refresh_token, self._access_token = None, None
self.cache.store(self._NETRC_MACHINE, 'token_data', [None, None])
self.report_warning('Refresh token has been invalidated; retrying with credentials')
self._perform_login(*self._get_login_info())
return self._access_token
def _fetch_claims_token(self):
if not self._get_login_info()[0]:
return None
if not self._claims_token or self._is_jwt_expired(self._claims_token):
self._claims_token = self._download_json(
'https://services.radio-canada.ca/ott/subscription/v2/gem/Subscriber/profile',
None, 'Downloading claims token', query={'device': 'web'},
headers={'Authorization': f'Bearer {self._fetch_access_token()}'})['claimsToken']
self.cache.store(self._NETRC_MACHINE, 'claims_token', self._claims_token)
else:
self.write_debug('Using cached claims token')
return self._claims_token
def _real_extract(self, url):
video_id, season_number = self._match_valid_url(url).group('id', 'season')
@ -675,14 +692,10 @@ class CBCGemIE(CBCGemBaseIE):
item_info = traverse_obj(video_info, (
'content', ..., 'lineups', ..., 'items',
lambda _, v: v['url'] == video_id, any, {require('item info')}))
media_id = item_info['idMedia']
email, password = self._get_login_info()
if email and password:
claims_token = self._get_claims_token(email, password)
headers = {'x-claims-token': claims_token}
else:
headers = {}
headers = {}
if claims_token := self._fetch_claims_token():
headers['x-claims-token'] = claims_token
m3u8_info = self._download_json(
'https://services.radio-canada.ca/media/validation/v2/',
@ -695,7 +708,7 @@ class CBCGemIE(CBCGemBaseIE):
'tech': 'hls',
'manifestVersion': '2',
'manifestType': 'desktop',
'idMedia': media_id,
'idMedia': item_info['idMedia'],
})
if m3u8_info.get('errorCode') == 1:

View File

@ -121,10 +121,7 @@ class CDAIE(InfoExtractor):
}, **kwargs)
def _perform_login(self, username, password):
app_version = random.choice((
'1.2.88 build 15306',
'1.2.174 build 18469',
))
app_version = '1.2.255 build 21541'
android_version = random.randrange(8, 14)
phone_model = random.choice((
# x-kom.pl top selling Android smartphones, as of 2022-12-26
@ -190,7 +187,7 @@ class CDAIE(InfoExtractor):
meta = self._download_json(
f'{self._BASE_API_URL}/video/{video_id}', video_id, headers=self._API_HEADERS)['video']
uploader = traverse_obj(meta, 'author', 'login')
uploader = traverse_obj(meta, ('author', 'login', {str}))
formats = [{
'url': quality['file'],

View File

@ -21,7 +21,7 @@ class CHZZKLiveIE(InfoExtractor):
'channel': '진짜도현',
'channel_id': 'c68b8ef525fb3d2fa146344d84991753',
'channel_is_verified': False,
'thumbnail': r're:^https?://.*\.jpg$',
'thumbnail': r're:https?://.+/.+\.jpg',
'timestamp': 1705510344,
'upload_date': '20240117',
'live_status': 'is_live',
@ -98,7 +98,7 @@ class CHZZKVideoIE(InfoExtractor):
'channel': '침착맨',
'channel_id': 'bb382c2c0cc9fa7c86ab3b037fb5799c',
'channel_is_verified': False,
'thumbnail': r're:^https?://.*\.jpg$',
'thumbnail': r're:https?://.+/.+\.jpg',
'duration': 15577,
'timestamp': 1702970505.417,
'upload_date': '20231219',
@ -115,7 +115,7 @@ class CHZZKVideoIE(InfoExtractor):
'channel': '라디유radiyu',
'channel_id': '68f895c59a1043bc5019b5e08c83a5c5',
'channel_is_verified': False,
'thumbnail': r're:^https?://.*\.jpg$',
'thumbnail': r're:https?://.+/.+\.jpg',
'duration': 95,
'timestamp': 1703102631.722,
'upload_date': '20231220',
@ -131,12 +131,30 @@ class CHZZKVideoIE(InfoExtractor):
'channel': '강지',
'channel_id': 'b5ed5db484d04faf4d150aedd362f34b',
'channel_is_verified': True,
'thumbnail': r're:^https?://.*\.jpg$',
'thumbnail': r're:https?://.+/.+\.jpg',
'duration': 4433,
'timestamp': 1703307460.214,
'upload_date': '20231223',
'view_count': int,
},
}, {
# video_status == 'NONE' but is downloadable
'url': 'https://chzzk.naver.com/video/6325166',
'info_dict': {
'id': '6325166',
'ext': 'mp4',
'title': '와이프 숙제빼주기',
'channel': '이 다',
'channel_id': '0076a519f147ee9fd0959bf02f9571ca',
'channel_is_verified': False,
'view_count': int,
'duration': 28167,
'thumbnail': r're:https?://.+/.+\.jpg',
'timestamp': 1742139216.86,
'upload_date': '20250316',
'live_status': 'was_live',
},
'params': {'skip_download': 'm3u8'},
}]
def _real_extract(self, url):
@ -147,11 +165,7 @@ class CHZZKVideoIE(InfoExtractor):
live_status = 'was_live' if video_meta.get('liveOpenDate') else 'not_live'
video_status = video_meta.get('vodStatus')
if video_status == 'UPLOAD':
playback = self._parse_json(video_meta['liveRewindPlaybackJson'], video_id)
formats, subtitles = self._extract_m3u8_formats_and_subtitles(
playback['media'][0]['path'], video_id, 'mp4', m3u8_id='hls')
elif video_status == 'ABR_HLS':
if video_status == 'ABR_HLS':
formats, subtitles = self._extract_mpd_formats_and_subtitles(
f'https://apis.naver.com/neonplayer/vodplay/v1/playback/{video_meta["videoId"]}',
video_id, query={
@ -161,10 +175,17 @@ class CHZZKVideoIE(InfoExtractor):
'cpl': 'en_US',
})
else:
self.raise_no_formats(
f'Unknown video status detected: "{video_status}"', expected=True, video_id=video_id)
formats, subtitles = [], {}
live_status = 'post_live' if live_status == 'was_live' else None
fatal = video_status == 'UPLOAD'
playback = self._parse_json(video_meta['liveRewindPlaybackJson'], video_id, fatal=fatal)
formats, subtitles = self._extract_m3u8_formats_and_subtitles(
traverse_obj(playback, ('media', 0, 'path')), video_id, 'mp4', m3u8_id='hls', fatal=fatal)
if formats and video_status != 'UPLOAD':
self.write_debug(f'Video found with status: "{video_status}"')
elif not formats:
self.raise_no_formats(
f'Unknown video status detected: "{video_status}"', expected=True, video_id=video_id)
formats, subtitles = [], {}
live_status = 'post_live' if live_status == 'was_live' else None
return {
'id': video_id,

View File

@ -29,6 +29,7 @@ from ..compat import (
from ..cookies import LenientSimpleCookie
from ..downloader.f4m import get_base_url, remove_encrypted_media
from ..downloader.hls import HlsFD
from ..globals import plugin_ies_overrides
from ..networking import HEADRequest, Request
from ..networking.exceptions import (
HTTPError,
@ -77,6 +78,7 @@ from ..utils import (
parse_iso8601,
parse_m3u8_attributes,
parse_resolution,
qualities,
sanitize_url,
smuggle_url,
str_or_none,
@ -1568,6 +1570,8 @@ class InfoExtractor:
"""Yield all json ld objects in the html"""
if default is not NO_DEFAULT:
fatal = False
if not fatal and not isinstance(html, str):
return
for mobj in re.finditer(JSON_LD_RE, html):
json_ld_item = self._parse_json(
mobj.group('json_ld'), video_id, fatal=fatal,
@ -2176,6 +2180,8 @@ class InfoExtractor:
media_url = media.get('URI')
if media_url:
manifest_url = format_url(media_url)
is_audio = media_type == 'AUDIO'
is_alternate = media.get('DEFAULT') == 'NO' or media.get('AUTOSELECT') == 'NO'
formats.extend({
'format_id': join_nonempty(m3u8_id, group_id, name, idx),
'format_note': name,
@ -2188,7 +2194,11 @@ class InfoExtractor:
'preference': preference,
'quality': quality,
'has_drm': has_drm,
'vcodec': 'none' if media_type == 'AUDIO' else None,
'vcodec': 'none' if is_audio else None,
# Alternate audio formats (e.g. audio description) should be deprioritized
'source_preference': -2 if is_audio and is_alternate else None,
# Save this to assign source_preference based on associated video stream
'_audio_group_id': group_id if is_audio and not is_alternate else None,
} for idx in _extract_m3u8_playlist_indices(manifest_url))
def build_stream_name():
@ -2283,6 +2293,8 @@ class InfoExtractor:
# ignore references to rendition groups and treat them
# as complete formats.
if audio_group_id and codecs and f.get('vcodec') != 'none':
# Save this to determine quality of audio formats that only have a GROUP-ID
f['_audio_group_id'] = audio_group_id
audio_group = groups.get(audio_group_id)
if audio_group and audio_group[0].get('URI'):
# TODO: update acodec for audio only formats with
@ -2305,6 +2317,28 @@ class InfoExtractor:
formats.append(http_f)
last_stream_inf = {}
# Some audio-only formats only have a GROUP-ID without any other quality/bitrate/codec info
# Each audio GROUP-ID corresponds with one or more video formats' AUDIO attribute
# For sorting purposes, set source_preference based on the quality of the video formats they are grouped with
# See https://github.com/yt-dlp/yt-dlp/issues/11178
audio_groups_by_quality = orderedSet(f['_audio_group_id'] for f in sorted(
traverse_obj(formats, lambda _, v: v.get('vcodec') != 'none' and v['_audio_group_id']),
key=lambda x: (x.get('tbr') or 0, x.get('width') or 0)))
audio_quality_map = {
audio_groups_by_quality[0]: 'low',
audio_groups_by_quality[-1]: 'high',
} if len(audio_groups_by_quality) > 1 else None
audio_preference = qualities(audio_groups_by_quality)
for fmt in formats:
audio_group_id = fmt.pop('_audio_group_id', None)
if not audio_quality_map or not audio_group_id or fmt.get('vcodec') != 'none':
continue
# Use source_preference since quality and preference are set by params
fmt['source_preference'] = audio_preference(audio_group_id)
fmt['format_note'] = join_nonempty(
fmt.get('format_note'), audio_quality_map.get(audio_group_id), delim=', ')
return formats, subtitles
def _extract_m3u8_vod_duration(
@ -2934,8 +2968,7 @@ class InfoExtractor:
segment_duration = None
if 'total_number' not in representation_ms_info and 'segment_duration' in representation_ms_info:
segment_duration = float_or_none(representation_ms_info['segment_duration'], representation_ms_info['timescale'])
representation_ms_info['total_number'] = int(math.ceil(
float_or_none(period_duration, segment_duration, default=0)))
representation_ms_info['total_number'] = math.ceil(float_or_none(period_duration, segment_duration, default=0))
representation_ms_info['fragments'] = [{
media_location_key: media_template % {
'Number': segment_number,
@ -3954,14 +3987,18 @@ class InfoExtractor:
def __init_subclass__(cls, *, plugin_name=None, **kwargs):
if plugin_name:
mro = inspect.getmro(cls)
super_class = cls.__wrapped__ = mro[mro.index(cls) + 1]
cls.PLUGIN_NAME, cls.ie_key = plugin_name, super_class.ie_key
cls.IE_NAME = f'{super_class.IE_NAME}+{plugin_name}'
next_mro_class = super_class = mro[mro.index(cls) + 1]
while getattr(super_class, '__wrapped__', None):
super_class = super_class.__wrapped__
setattr(sys.modules[super_class.__module__], super_class.__name__, cls)
_PLUGIN_OVERRIDES[super_class].append(cls)
if not any(override.PLUGIN_NAME == plugin_name for override in plugin_ies_overrides.value[super_class]):
cls.__wrapped__ = next_mro_class
cls.PLUGIN_NAME, cls.ie_key = plugin_name, next_mro_class.ie_key
cls.IE_NAME = f'{next_mro_class.IE_NAME}+{plugin_name}'
setattr(sys.modules[super_class.__module__], super_class.__name__, cls)
plugin_ies_overrides.value[super_class].append(cls)
return super().__init_subclass__(**kwargs)
@ -4017,6 +4054,3 @@ class UnsupportedURLIE(InfoExtractor):
def _real_extract(self, url):
raise UnsupportedError(url)
_PLUGIN_OVERRIDES = collections.defaultdict(list)

View File

@ -5,7 +5,9 @@ from ..utils import (
int_or_none,
try_get,
unified_strdate,
url_or_none,
)
from ..utils.traversal import traverse_obj
class CrowdBunkerIE(InfoExtractor):
@ -44,16 +46,15 @@ class CrowdBunkerIE(InfoExtractor):
'url': sub_url,
})
mpd_url = try_get(video_json, lambda x: x['dashManifest']['url'])
if mpd_url:
fmts, subs = self._extract_mpd_formats_and_subtitles(mpd_url, video_id)
if mpd_url := traverse_obj(video_json, ('dashManifest', 'url', {url_or_none})):
fmts, subs = self._extract_mpd_formats_and_subtitles(mpd_url, video_id, mpd_id='dash', fatal=False)
formats.extend(fmts)
subtitles = self._merge_subtitles(subtitles, subs)
m3u8_url = try_get(video_json, lambda x: x['hlsManifest']['url'])
if m3u8_url:
fmts, subs = self._extract_m3u8_formats_and_subtitles(mpd_url, video_id)
self._merge_subtitles(subs, target=subtitles)
if m3u8_url := traverse_obj(video_json, ('hlsManifest', 'url', {url_or_none})):
fmts, subs = self._extract_m3u8_formats_and_subtitles(m3u8_url, video_id, m3u8_id='hls', fatal=False)
formats.extend(fmts)
subtitles = self._merge_subtitles(subtitles, subs)
self._merge_subtitles(subs, target=subtitles)
thumbnails = [{
'url': image['url'],

View File

@ -3,7 +3,7 @@ from ..utils import int_or_none
class CultureUnpluggedIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cultureunplugged\.com/documentary/watch-online/play/(?P<id>\d+)(?:/(?P<display_id>[^/]+))?'
_VALID_URL = r'https?://(?:www\.)?cultureunplugged\.com/(?:documentary/watch-online/)?play/(?P<id>\d+)(?:/(?P<display_id>[^/#?]+))?'
_TESTS = [{
'url': 'http://www.cultureunplugged.com/documentary/watch-online/play/53662/The-Next--Best-West',
'md5': 'ac6c093b089f7d05e79934dcb3d228fc',
@ -12,12 +12,25 @@ class CultureUnpluggedIE(InfoExtractor):
'display_id': 'The-Next--Best-West',
'ext': 'mp4',
'title': 'The Next, Best West',
'description': 'md5:0423cd00833dea1519cf014e9d0903b1',
'description': 'md5:770033a3b7c2946a3bcfb7f1c6fb7045',
'thumbnail': r're:^https?://.*\.jpg$',
'creator': 'Coldstream Creative',
'creators': ['Coldstream Creative'],
'duration': 2203,
'view_count': int,
},
}, {
'url': 'https://www.cultureunplugged.com/play/2833/Koi-Sunta-Hai--Journeys-with-Kumar---Kabir--Someone-is-Listening-',
'md5': 'dc2014bc470dfccba389a1c934fa29fa',
'info_dict': {
'id': '2833',
'display_id': 'Koi-Sunta-Hai--Journeys-with-Kumar---Kabir--Someone-is-Listening-',
'ext': 'mp4',
'title': 'Koi Sunta Hai: Journeys with Kumar & Kabir (Someone is Listening)',
'description': 'md5:fa94ac934927c98660362b8285b2cda5',
'view_count': int,
'thumbnail': 'https://s3.amazonaws.com/cdn.cultureunplugged.com/thumbnails_16_9/lg/2833.jpg',
'creators': ['Srishti'],
},
}, {
'url': 'http://www.cultureunplugged.com/documentary/watch-online/play/53662',
'only_matching': True,

View File

@ -100,7 +100,7 @@ class DailymotionBaseInfoExtractor(InfoExtractor):
class DailymotionIE(DailymotionBaseInfoExtractor):
_VALID_URL = r'''(?ix)
https?://
(?:https?:)?//
(?:
dai\.ly/|
(?:
@ -116,7 +116,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
(?P<id>[^/?_&#]+)(?:[\w-]*\?playlist=(?P<playlist_id>x[0-9a-z]+))?
'''
IE_NAME = 'dailymotion'
_EMBED_REGEX = [r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1']
_EMBED_REGEX = [rf'(?ix)<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)["\'](?P<url>{_VALID_URL[5:]})']
_TESTS = [{
'url': 'http://www.dailymotion.com/video/x5kesuj_office-christmas-party-review-jason-bateman-olivia-munn-t-j-miller_news',
'md5': '074b95bdee76b9e3654137aee9c79dfe',
@ -308,6 +308,25 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
'description': 'Que lindura',
'tags': [],
},
}, {
# //geo.dailymotion.com/player/xysxq.html?video=k2Y4Mjp7krAF9iCuINM
'url': 'https://lcp.fr/programmes/avant-la-catastrophe-la-naissance-de-la-dictature-nazie-1933-1936-346819',
'info_dict': {
'id': 'k2Y4Mjp7krAF9iCuINM',
'ext': 'mp4',
'title': 'Avant la catastrophe la naissance de la dictature nazie 1933 -1936',
'description': 'md5:7b620d5e26edbe45f27bbddc1c0257c1',
'uploader': 'LCP Assemblée nationale',
'uploader_id': 'xbz33d',
'view_count': int,
'like_count': int,
'age_limit': 0,
'duration': 3220,
'thumbnail': 'https://s1.dmcdn.net/v/Xvumk1djJBUZfjj2a/x1080',
'tags': [],
'timestamp': 1739919947,
'upload_date': '20250218',
},
}]
_GEO_BYPASS = False
_COMMON_MEDIA_FIELDS = '''description

View File

@ -1,142 +0,0 @@
import json
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
orderedSet,
)
class DeezerBaseInfoExtractor(InfoExtractor):
def get_data(self, url):
if not self.get_param('test'):
self.report_warning('For now, this extractor only supports the 30 second previews. Patches welcome!')
mobj = self._match_valid_url(url)
data_id = mobj.group('id')
webpage = self._download_webpage(url, data_id)
geoblocking_msg = self._html_search_regex(
r'<p class="soon-txt">(.*?)</p>', webpage, 'geoblocking message',
default=None)
if geoblocking_msg is not None:
raise ExtractorError(
f'Deezer said: {geoblocking_msg}', expected=True)
data_json = self._search_regex(
(r'__DZR_APP_STATE__\s*=\s*({.+?})\s*</script>',
r'naboo\.display\(\'[^\']+\',\s*(.*?)\);\n'),
webpage, 'data JSON')
data = json.loads(data_json)
return data_id, webpage, data
class DeezerPlaylistIE(DeezerBaseInfoExtractor):
_VALID_URL = r'https?://(?:www\.)?deezer\.com/(../)?playlist/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.deezer.com/playlist/176747451',
'info_dict': {
'id': '176747451',
'title': 'Best!',
'uploader': 'anonymous',
'thumbnail': r're:^https?://(e-)?cdns-images\.dzcdn\.net/images/cover/.*\.jpg$',
},
'playlist_count': 29,
}
def _real_extract(self, url):
playlist_id, webpage, data = self.get_data(url)
playlist_title = data.get('DATA', {}).get('TITLE')
playlist_uploader = data.get('DATA', {}).get('PARENT_USERNAME')
playlist_thumbnail = self._search_regex(
r'<img id="naboo_playlist_image".*?src="([^"]+)"', webpage,
'playlist thumbnail')
entries = []
for s in data.get('SONGS', {}).get('data'):
formats = [{
'format_id': 'preview',
'url': s.get('MEDIA', [{}])[0].get('HREF'),
'preference': -100, # Only the first 30 seconds
'ext': 'mp3',
}]
artists = ', '.join(
orderedSet(a.get('ART_NAME') for a in s.get('ARTISTS')))
entries.append({
'id': s.get('SNG_ID'),
'duration': int_or_none(s.get('DURATION')),
'title': '{} - {}'.format(artists, s.get('SNG_TITLE')),
'uploader': s.get('ART_NAME'),
'uploader_id': s.get('ART_ID'),
'age_limit': 16 if s.get('EXPLICIT_LYRICS') == '1' else 0,
'formats': formats,
})
return {
'_type': 'playlist',
'id': playlist_id,
'title': playlist_title,
'uploader': playlist_uploader,
'thumbnail': playlist_thumbnail,
'entries': entries,
}
class DeezerAlbumIE(DeezerBaseInfoExtractor):
_VALID_URL = r'https?://(?:www\.)?deezer\.com/(../)?album/(?P<id>[0-9]+)'
_TEST = {
'url': 'https://www.deezer.com/fr/album/67505622',
'info_dict': {
'id': '67505622',
'title': 'Last Week',
'uploader': 'Home Brew',
'thumbnail': r're:^https?://(e-)?cdns-images\.dzcdn\.net/images/cover/.*\.jpg$',
},
'playlist_count': 7,
}
def _real_extract(self, url):
album_id, webpage, data = self.get_data(url)
album_title = data.get('DATA', {}).get('ALB_TITLE')
album_uploader = data.get('DATA', {}).get('ART_NAME')
album_thumbnail = self._search_regex(
r'<img id="naboo_album_image".*?src="([^"]+)"', webpage,
'album thumbnail')
entries = []
for s in data.get('SONGS', {}).get('data'):
formats = [{
'format_id': 'preview',
'url': s.get('MEDIA', [{}])[0].get('HREF'),
'preference': -100, # Only the first 30 seconds
'ext': 'mp3',
}]
artists = ', '.join(
orderedSet(a.get('ART_NAME') for a in s.get('ARTISTS')))
entries.append({
'id': s.get('SNG_ID'),
'duration': int_or_none(s.get('DURATION')),
'title': '{} - {}'.format(artists, s.get('SNG_TITLE')),
'uploader': s.get('ART_NAME'),
'uploader_id': s.get('ART_ID'),
'age_limit': 16 if s.get('EXPLICIT_LYRICS') == '1' else 0,
'formats': formats,
'track': s.get('SNG_TITLE'),
'track_number': int_or_none(s.get('TRACK_NUMBER')),
'track_id': s.get('SNG_ID'),
'artist': album_uploader,
'album': album_title,
'album_artist': album_uploader,
})
return {
'_type': 'playlist',
'id': album_id,
'title': album_title,
'uploader': album_uploader,
'thumbnail': album_thumbnail,
'entries': entries,
}

View File

@ -1,28 +1,37 @@
import contextlib
import inspect
import os
from ..plugins import load_plugins
from ..globals import LAZY_EXTRACTORS
from ..globals import extractors as _extractors_context
# NB: Must be before other imports so that plugins can be correctly injected
_PLUGIN_CLASSES = load_plugins('extractor', 'IE')
_CLASS_LOOKUP = None
if os.environ.get('YTDLP_NO_LAZY_EXTRACTORS'):
LAZY_EXTRACTORS.value = False
else:
try:
from .lazy_extractors import _CLASS_LOOKUP
LAZY_EXTRACTORS.value = True
except ImportError:
LAZY_EXTRACTORS.value = None
_LAZY_LOADER = False
if not os.environ.get('YTDLP_NO_LAZY_EXTRACTORS'):
with contextlib.suppress(ImportError):
from .lazy_extractors import * # noqa: F403
from .lazy_extractors import _ALL_CLASSES
_LAZY_LOADER = True
if not _CLASS_LOOKUP:
from . import _extractors
if not _LAZY_LOADER:
from ._extractors import * # noqa: F403
_ALL_CLASSES = [ # noqa: F811
klass
for name, klass in globals().items()
_CLASS_LOOKUP = {
name: value
for name, value in inspect.getmembers(_extractors)
if name.endswith('IE') and name != 'GenericIE'
]
_ALL_CLASSES.append(GenericIE) # noqa: F405
}
_CLASS_LOOKUP['GenericIE'] = _extractors.GenericIE
globals().update(_PLUGIN_CLASSES)
_ALL_CLASSES[:0] = _PLUGIN_CLASSES.values()
# We want to append to the main lookup
_current = _extractors_context.value
for name, ie in _CLASS_LOOKUP.items():
_current.setdefault(name, ie)
from .common import _PLUGIN_OVERRIDES # noqa: F401
def __getattr__(name):
value = _CLASS_LOOKUP.get(name)
if not value:
raise AttributeError(f'module {__name__} has no attribute {name}')
return value

View File

@ -0,0 +1,87 @@
import urllib.parse
from .common import InfoExtractor
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
float_or_none,
url_or_none,
)
from ..utils.traversal import traverse_obj
class FrancaisFacileIE(InfoExtractor):
_VALID_URL = r'https?://francaisfacile\.rfi\.fr/[a-z]{2}/(?:actualit%C3%A9|podcasts/[^/#?]+)/(?P<id>[^/#?]+)'
_TESTS = [{
'url': 'https://francaisfacile.rfi.fr/fr/actualit%C3%A9/20250305-r%C3%A9concilier-les-jeunes-avec-la-lecture-gr%C3%A2ce-aux-r%C3%A9seaux-sociaux',
'md5': '4f33674cb205744345cc835991100afa',
'info_dict': {
'id': 'WBMZ58952-FLE-FR-20250305',
'display_id': '20250305-réconcilier-les-jeunes-avec-la-lecture-grâce-aux-réseaux-sociaux',
'title': 'Réconcilier les jeunes avec la lecture grâce aux réseaux sociaux',
'url': 'https://aod-fle.akamaized.net/fle/sounds/fr/2025/03/05/6b6af52a-f9ba-11ef-a1f8-005056a97652.mp3',
'ext': 'mp3',
'description': 'md5:b903c63d8585bd59e8cc4d5f80c4272d',
'duration': 103.15,
'timestamp': 1741177984,
'upload_date': '20250305',
},
}, {
'url': 'https://francaisfacile.rfi.fr/fr/actualit%C3%A9/20250307-argentine-le-sac-d-un-alpiniste-retrouv%C3%A9-40-ans-apr%C3%A8s-sa-mort',
'md5': 'b8c3a63652d4ae8e8092dda5700c1cd9',
'info_dict': {
'id': 'WBMZ59102-FLE-FR-20250307',
'display_id': '20250307-argentine-le-sac-d-un-alpiniste-retrouvé-40-ans-après-sa-mort',
'title': 'Argentine: le sac d\'un alpiniste retrouvé 40 ans après sa mort',
'url': 'https://aod-fle.akamaized.net/fle/sounds/fr/2025/03/07/8edf4082-fb46-11ef-8a37-005056bf762b.mp3',
'ext': 'mp3',
'description': 'md5:7fd088fbdf4a943bb68cf82462160dca',
'duration': 117.74,
'timestamp': 1741352789,
'upload_date': '20250307',
},
}, {
'url': 'https://francaisfacile.rfi.fr/fr/podcasts/un-mot-une-histoire/20250317-le-mot-de-david-foenkinos-peut-%C3%AAtre',
'md5': 'db83c2cc2589b4c24571c6b6cf14f5f1',
'info_dict': {
'id': 'WBMZ59441-FLE-FR-20250317',
'display_id': '20250317-le-mot-de-david-foenkinos-peut-être',
'title': 'Le mot de David Foenkinos: «peut-être» - Un mot, une histoire',
'url': 'https://aod-fle.akamaized.net/fle/sounds/fr/2025/03/17/4ca6cbbe-0315-11f0-a85b-005056a97652.mp3',
'ext': 'mp3',
'description': 'md5:3fe35fae035803df696bfa7af2496e49',
'duration': 198.96,
'timestamp': 1742210897,
'upload_date': '20250317',
},
}]
def _real_extract(self, url):
display_id = urllib.parse.unquote(self._match_id(url))
try: # yt-dlp's default user-agents are too old and blocked by the site
webpage = self._download_webpage(url, display_id, headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; rv:136.0) Gecko/20100101 Firefox/136.0',
})
except ExtractorError as e:
if not isinstance(e.cause, HTTPError) or e.cause.status != 403:
raise
# Retry with impersonation if hardcoded UA is insufficient
webpage = self._download_webpage(url, display_id, impersonate=True)
data = self._search_json(
r'<script[^>]+\bdata-media-id=[^>]+\btype="application/json"[^>]*>',
webpage, 'audio data', display_id)
return {
'id': data['mediaId'],
'display_id': display_id,
'vcodec': 'none',
'title': self._html_extract_title(webpage),
**self._search_json_ld(webpage, display_id, fatal=False),
**traverse_obj(data, {
'title': ('title', {str}),
'url': ('sources', ..., 'url', {url_or_none}, any),
'duration': ('sources', ..., 'duration', {float_or_none}, any),
}),
}

View File

@ -16,6 +16,7 @@ from ..utils import (
MEDIA_EXTENSIONS,
ExtractorError,
UnsupportedError,
base_url,
determine_ext,
determine_protocol,
dict_get,
@ -2213,10 +2214,21 @@ class GenericIE(InfoExtractor):
if is_live is not None:
info['live_status'] = 'not_live' if is_live == 'false' else 'is_live'
return
headers = m3u8_format.get('http_headers') or info.get('http_headers')
duration = self._extract_m3u8_vod_duration(
m3u8_format['url'], info.get('id'), note='Checking m3u8 live status',
errnote='Failed to download m3u8 media playlist', headers=headers)
headers = m3u8_format.get('http_headers') or info.get('http_headers') or {}
display_id = info.get('id')
urlh = self._request_webpage(
m3u8_format['url'], display_id, 'Checking m3u8 live status', errnote=False,
headers={**headers, 'Accept-Encoding': 'identity'}, fatal=False)
if urlh is False:
return
first_bytes = urlh.read(512)
if not first_bytes.startswith(b'#EXTM3U'):
return
m3u8_doc = self._webpage_read_content(
urlh, urlh.url, display_id, prefix=first_bytes, fatal=False, errnote=False)
if not m3u8_doc:
return
duration = self._parse_m3u8_vod_duration(m3u8_doc, display_id)
if not duration:
info['live_status'] = 'is_live'
info['duration'] = info.get('duration') or duration
@ -2531,7 +2543,7 @@ class GenericIE(InfoExtractor):
elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
info_dict['formats'], info_dict['subtitles'] = self._parse_mpd_formats_and_subtitles(
doc,
mpd_base_url=full_response.url.rpartition('/')[0],
mpd_base_url=base_url(full_response.url),
mpd_url=url)
info_dict['live_status'] = 'is_live' if doc.get('type') == 'dynamic' else None
self._extra_manifest_info(info_dict, url)

View File

@ -1,19 +0,0 @@
from .common import InfoExtractor
from ..utils import (
ExtractorError,
urlencode_postdata,
)
class GigyaBaseIE(InfoExtractor):
def _gigya_login(self, auth_data):
auth_info = self._download_json(
'https://accounts.eu1.gigya.com/accounts.login', None,
note='Logging in', errnote='Unable to log in',
data=urlencode_postdata(auth_data))
error_message = auth_info.get('errorDetails') or auth_info.get('errorMessage')
if error_message:
raise ExtractorError(
f'Unable to login: {error_message}', expected=True)
return auth_info

View File

@ -69,8 +69,13 @@ class GloboIE(InfoExtractor):
'info_dict': {
'id': '8013907',
'ext': 'mp4',
'title': 'Capítulo de 14081989',
'title': 'Capítulo de 14/08/1989',
'episode': 'Episode 1',
'episode_number': 1,
'uploader': 'Tieta',
'uploader_id': '11895',
'duration': 2858.389,
'subtitles': 'count:1',
},
'params': {
'skip_download': True,
@ -82,7 +87,12 @@ class GloboIE(InfoExtractor):
'id': '12824146',
'ext': 'mp4',
'title': 'Acordo de damas',
'episode': 'Episode 1',
'episode_number': 1,
'uploader': 'Rensga Hits!',
'uploader_id': '20481',
'duration': 1953.994,
'season': 'Season 2',
'season_number': 2,
},
'params': {
@ -136,9 +146,10 @@ class GloboIE(InfoExtractor):
else:
formats, subtitles = self._extract_m3u8_formats_and_subtitles(
main_source['url'], video_id, 'mp4', m3u8_id='hls')
self._merge_subtitles(traverse_obj(main_source, ('text', ..., {
'url': ('subtitle', 'srt', 'url', {url_or_none}),
}, all, {subs_list_to_dict(lang='en')})), target=subtitles)
self._merge_subtitles(traverse_obj(main_source, ('text', ..., ('caption', 'subtitle'), {
'url': ('srt', 'url', {url_or_none}),
}, all, {subs_list_to_dict(lang='pt-BR')})), target=subtitles)
return {
'id': video_id,

View File

@ -6,7 +6,7 @@ from ..utils import (
)
class HSEShowBaseInfoExtractor(InfoExtractor):
class HSEShowBaseIE(InfoExtractor):
_GEO_COUNTRIES = ['DE']
def _extract_redux_data(self, url, video_id):
@ -28,7 +28,7 @@ class HSEShowBaseInfoExtractor(InfoExtractor):
return formats, subtitles
class HSEShowIE(HSEShowBaseInfoExtractor):
class HSEShowIE(HSEShowBaseIE):
_VALID_URL = r'https?://(?:www\.)?hse\.de/dpl/c/tv-shows/(?P<id>[0-9]+)'
_TESTS = [{
'url': 'https://www.hse.de/dpl/c/tv-shows/505350',
@ -64,7 +64,7 @@ class HSEShowIE(HSEShowBaseInfoExtractor):
}
class HSEProductIE(HSEShowBaseInfoExtractor):
class HSEProductIE(HSEShowBaseIE):
_VALID_URL = r'https?://(?:www\.)?hse\.de/dpl/p/product/(?P<id>[0-9]+)'
_TESTS = [{
'url': 'https://www.hse.de/dpl/p/product/408630',

View File

@ -1,5 +1,13 @@
from .common import InfoExtractor
from ..utils import ExtractorError, str_or_none, traverse_obj, unified_strdate
from ..utils import (
ExtractorError,
int_or_none,
str_or_none,
traverse_obj,
unified_strdate,
url_or_none,
)
class IchinanaLiveIE(InfoExtractor):
@ -157,3 +165,51 @@ class IchinanaLiveClipIE(InfoExtractor):
'description': view_data.get('caption'),
'upload_date': unified_strdate(str_or_none(view_data.get('createdAt'))),
}
class IchinanaLiveVODIE(InfoExtractor):
IE_NAME = '17live:vod'
_VALID_URL = r'https?://(?:www\.)?17\.live/ja/vod/[^/?#]+/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://17.live/ja/vod/27323042/2cf84520-e65e-4b22-891e-1d3a00b0f068',
'md5': '3299b930d7457b069639486998a89580',
'info_dict': {
'id': '2cf84520-e65e-4b22-891e-1d3a00b0f068',
'ext': 'mp4',
'title': 'md5:b5f8cbf497d54cc6a60eb3b480182f01',
'uploader': 'md5:29fb12122ab94b5a8495586e7c3085a5',
'uploader_id': '27323042',
'channel': '🌟オールナイトニッポン アーカイブ🌟',
'channel_id': '2b4f85f1-d61e-429d-a901-68d32bdd8645',
'like_count': int,
'view_count': int,
'thumbnail': r're:https?://.+/.+\.(?:jpe?g|png)',
'duration': 549,
'description': 'md5:116f326579700f00eaaf5581aae1192e',
'timestamp': 1741058645,
'upload_date': '20250304',
},
}, {
'url': 'https://17.live/ja/vod/27323042/0de11bac-9bea-40b8-9eab-0239a7d88079',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
json_data = self._download_json(f'https://wap-api.17app.co/api/v1/vods/{video_id}', video_id)
return traverse_obj(json_data, {
'id': ('vodID', {str}),
'title': ('title', {str}),
'formats': ('vodURL', {lambda x: self._extract_m3u8_formats(x, video_id)}),
'uploader': ('userInfo', 'displayName', {str}),
'uploader_id': ('userInfo', 'roomID', {int}, {str_or_none}),
'channel': ('userInfo', 'name', {str}),
'channel_id': ('userInfo', 'userID', {str}),
'like_count': ('likeCount', {int_or_none}),
'view_count': ('viewCount', {int_or_none}),
'thumbnail': ('imageURL', {url_or_none}),
'duration': ('duration', {int_or_none}),
'description': ('description', {str}),
'timestamp': ('createdAt', {int_or_none}),
})

View File

@ -2,12 +2,12 @@ import hashlib
import itertools
import json
import re
import time
from .common import InfoExtractor
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
bug_reports_message,
decode_base_n,
encode_base_n,
filter_dict,
@ -15,12 +15,12 @@ from ..utils import (
format_field,
get_element_by_attribute,
int_or_none,
join_nonempty,
lowercase_escape,
str_or_none,
str_to_int,
traverse_obj,
url_or_none,
urlencode_postdata,
)
_ENCODING_CHARS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_'
@ -28,63 +28,30 @@ _ENCODING_CHARS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz012345678
def _pk_to_id(media_id):
"""Source: https://stackoverflow.com/questions/24437823/getting-instagram-post-url-from-media-id"""
return encode_base_n(int(media_id.split('_')[0]), table=_ENCODING_CHARS)
pk = int(str(media_id).split('_')[0])
return encode_base_n(pk, table=_ENCODING_CHARS)
def _id_to_pk(shortcode):
"""Covert a shortcode to a numeric value"""
return decode_base_n(shortcode[:11], table=_ENCODING_CHARS)
"""Convert a shortcode to a numeric value"""
if len(shortcode) > 28:
shortcode = shortcode[:-28]
return decode_base_n(shortcode, table=_ENCODING_CHARS)
class InstagramBaseIE(InfoExtractor):
_NETRC_MACHINE = 'instagram'
_IS_LOGGED_IN = False
_API_BASE_URL = 'https://i.instagram.com/api/v1'
_LOGIN_URL = 'https://www.instagram.com/accounts/login'
_API_HEADERS = {
'X-IG-App-ID': '936619743392459',
'X-ASBD-ID': '198387',
'X-IG-WWW-Claim': '0',
'Origin': 'https://www.instagram.com',
'Accept': '*/*',
}
def _perform_login(self, username, password):
if self._IS_LOGGED_IN:
return
login_webpage = self._download_webpage(
self._LOGIN_URL, None, note='Downloading login webpage', errnote='Failed to download login webpage')
shared_data = self._parse_json(self._search_regex(
r'window\._sharedData\s*=\s*({.+?});', login_webpage, 'shared data', default='{}'), None)
login = self._download_json(
f'{self._LOGIN_URL}/ajax/', None, note='Logging in', headers={
**self._API_HEADERS,
'X-Requested-With': 'XMLHttpRequest',
'X-CSRFToken': shared_data['config']['csrf_token'],
'X-Instagram-AJAX': shared_data['rollout_hash'],
'Referer': 'https://www.instagram.com/',
}, data=urlencode_postdata({
'enc_password': f'#PWD_INSTAGRAM_BROWSER:0:{int(time.time())}:{password}',
'username': username,
'queryParams': '{}',
'optIntoOneTap': 'false',
'stopDeletionNonce': '',
'trustedDeviceRecords': '{}',
}))
if not login.get('authenticated'):
if login.get('message'):
raise ExtractorError(f'Unable to login: {login["message"]}')
elif login.get('user'):
raise ExtractorError('Unable to login: Sorry, your password was incorrect. Please double-check your password.', expected=True)
elif login.get('user') is False:
raise ExtractorError('Unable to login: The username you entered doesn\'t belong to an account. Please check your username and try again.', expected=True)
raise ExtractorError('Unable to login')
InstagramBaseIE._IS_LOGGED_IN = True
@property
def _api_headers(self):
return {
'X-IG-App-ID': self._configuration_arg('app_id', ['936619743392459'], ie_key=InstagramIE)[0],
'X-ASBD-ID': '198387',
'X-IG-WWW-Claim': '0',
'Origin': 'https://www.instagram.com',
'Accept': '*/*',
}
def _get_count(self, media, kind, *keys):
return traverse_obj(
@ -209,7 +176,7 @@ class InstagramBaseIE(InfoExtractor):
def _get_comments(self, video_id):
comments_info = self._download_json(
f'{self._API_BASE_URL}/media/{_id_to_pk(video_id)}/comments/?can_support_threading=true&permalink_enabled=false', video_id,
fatal=False, errnote='Comments extraction failed', note='Downloading comments info', headers=self._API_HEADERS) or {}
fatal=False, errnote='Comments extraction failed', note='Downloading comments info', headers=self._api_headers) or {}
comment_data = traverse_obj(comments_info, ('edge_media_to_parent_comment', 'edges'), 'comments')
for comment_dict in comment_data or []:
@ -402,14 +369,14 @@ class InstagramIE(InstagramBaseIE):
info = traverse_obj(self._download_json(
f'{self._API_BASE_URL}/media/{_id_to_pk(video_id)}/info/', video_id,
fatal=False, errnote='Video info extraction failed',
note='Downloading video info', headers=self._API_HEADERS), ('items', 0))
note='Downloading video info', headers=self._api_headers), ('items', 0))
if info:
media.update(info)
return self._extract_product(media)
api_check = self._download_json(
f'{self._API_BASE_URL}/web/get_ruling_for_content/?content_type=MEDIA&target_id={_id_to_pk(video_id)}',
video_id, headers=self._API_HEADERS, fatal=False, note='Setting up session', errnote=False) or {}
video_id, headers=self._api_headers, fatal=False, note='Setting up session', errnote=False) or {}
csrf_token = self._get_cookies('https://www.instagram.com').get('csrftoken')
if not csrf_token:
@ -429,7 +396,7 @@ class InstagramIE(InstagramBaseIE):
general_info = self._download_json(
'https://www.instagram.com/graphql/query/', video_id, fatal=False, errnote=False,
headers={
**self._API_HEADERS,
**self._api_headers,
'X-CSRFToken': csrf_token or '',
'X-Requested-With': 'XMLHttpRequest',
'Referer': url,
@ -437,7 +404,6 @@ class InstagramIE(InstagramBaseIE):
'doc_id': '8845758582119845',
'variables': json.dumps(variables, separators=(',', ':')),
})
media.update(traverse_obj(general_info, ('data', 'xdt_shortcode_media')) or {})
if not general_info:
self.report_warning('General metadata extraction failed (some metadata might be missing).', video_id)
@ -466,6 +432,26 @@ class InstagramIE(InstagramBaseIE):
media.update(traverse_obj(
additional_data, ('graphql', 'shortcode_media'), 'shortcode_media', expected_type=dict) or {})
else:
xdt_shortcode_media = traverse_obj(general_info, ('data', 'xdt_shortcode_media', {dict})) or {}
if not xdt_shortcode_media:
error = join_nonempty('title', 'description', delim=': ', from_dict=api_check)
if 'Restricted Video' in error:
self.raise_login_required(error)
elif error:
raise ExtractorError(error, expected=True)
elif len(video_id) > 28:
# It's a private post (video_id == shortcode + 28 extra characters)
# Only raise after getting empty response; sometimes "long"-shortcode posts are public
self.raise_login_required(
'This content is only available for registered users who follow this account')
raise ExtractorError(
'Instagram sent an empty media response. Check if this post is accessible in your '
f'browser without being logged-in. If it is not, then u{self._login_hint()[1:]}. '
'Otherwise, if the post is accessible in browser without being logged-in'
f'{bug_reports_message(before=",")}', expected=True)
media.update(xdt_shortcode_media)
username = traverse_obj(media, ('owner', 'username')) or self._search_regex(
r'"owner"\s*:\s*{\s*"username"\s*:\s*"(.+?)"', webpage, 'username', fatal=False)
@ -485,8 +471,7 @@ class InstagramIE(InstagramBaseIE):
return self.playlist_result(
self._extract_nodes(nodes, True), video_id,
format_field(username, None, 'Post by %s'), description)
video_url = self._og_search_video_url(webpage, secure=False)
raise ExtractorError('There is no video in this post', expected=True)
formats = [{
'url': video_url,
@ -689,7 +674,7 @@ class InstagramTagIE(InstagramPlaylistBaseIE):
class InstagramStoryIE(InstagramBaseIE):
_VALID_URL = r'https?://(?:www\.)?instagram\.com/stories/(?P<user>[^/]+)/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?instagram\.com/stories/(?P<user>[^/?#]+)(?:/(?P<id>\d+))?'
IE_NAME = 'instagram:story'
_TESTS = [{
@ -699,25 +684,38 @@ class InstagramStoryIE(InstagramBaseIE):
'title': 'Rare',
},
'playlist_mincount': 50,
}, {
'url': 'https://www.instagram.com/stories/fruits_zipper/3570766765028588805/',
'only_matching': True,
}, {
'url': 'https://www.instagram.com/stories/fruits_zipper',
'only_matching': True,
}]
def _real_extract(self, url):
username, story_id = self._match_valid_url(url).groups()
story_info = self._download_webpage(url, story_id)
user_info = self._search_json(r'"user":', story_info, 'user info', story_id, fatal=False)
username, story_id = self._match_valid_url(url).group('user', 'id')
if username == 'highlights' and not story_id: # story id is only mandatory for highlights
raise ExtractorError('Input URL is missing a highlight ID', expected=True)
display_id = story_id or username
story_info = self._download_webpage(url, display_id)
user_info = self._search_json(r'"user":', story_info, 'user info', display_id, fatal=False)
if not user_info:
self.raise_login_required('This content is unreachable')
user_id = traverse_obj(user_info, 'pk', 'id', expected_type=str)
story_info_url = user_id if username != 'highlights' else f'highlight:{story_id}'
if not story_info_url: # user id is only mandatory for non-highlights
raise ExtractorError('Unable to extract user id')
if username == 'highlights':
story_info_url = f'highlight:{story_id}'
else:
if not user_id: # user id is only mandatory for non-highlights
raise ExtractorError('Unable to extract user id')
story_info_url = user_id
videos = traverse_obj(self._download_json(
f'{self._API_BASE_URL}/feed/reels_media/?reel_ids={story_info_url}',
story_id, errnote=False, fatal=False, headers=self._API_HEADERS), 'reels')
display_id, errnote=False, fatal=False, headers=self._api_headers), 'reels')
if not videos:
self.raise_login_required('You need to log in to access this content')
user_info = traverse_obj(videos, (user_id, 'user', {dict})) or {}
full_name = traverse_obj(videos, (f'highlight:{story_id}', 'user', 'full_name'), (user_id, 'user', 'full_name'))
story_title = traverse_obj(videos, (f'highlight:{story_id}', 'title'))
@ -727,6 +725,7 @@ class InstagramStoryIE(InstagramBaseIE):
highlights = traverse_obj(videos, (f'highlight:{story_id}', 'items'), (user_id, 'items'))
info_data = []
for highlight in highlights:
highlight.setdefault('user', {}).update(user_info)
highlight_data = self._extract_product(highlight)
if highlight_data.get('formats'):
info_data.append({
@ -734,4 +733,7 @@ class InstagramStoryIE(InstagramBaseIE):
'uploader_id': user_id,
**filter_dict(highlight_data),
})
if username != 'highlights' and story_id and not self._yes_playlist(username, story_id):
return traverse_obj(info_data, (lambda _, v: v['id'] == _pk_to_id(story_id), any))
return self.playlist_result(info_data, playlist_id=story_id, playlist_title=story_title)

78
yt_dlp/extractor/ivoox.py Normal file
View File

@ -0,0 +1,78 @@
from .common import InfoExtractor
from ..utils import int_or_none, parse_iso8601, url_or_none, urljoin
from ..utils.traversal import traverse_obj
class IvooxIE(InfoExtractor):
_VALID_URL = (
r'https?://(?:www\.)?ivoox\.com/(?:\w{2}/)?[^/?#]+_rf_(?P<id>[0-9]+)_1\.html',
r'https?://go\.ivoox\.com/rf/(?P<id>[0-9]+)',
)
_TESTS = [{
'url': 'https://www.ivoox.com/dex-08x30-rostros-del-mal-los-asesinos-en-audios-mp3_rf_143594959_1.html',
'md5': '993f712de5b7d552459fc66aa3726885',
'info_dict': {
'id': '143594959',
'ext': 'mp3',
'timestamp': 1742731200,
'channel': 'DIAS EXTRAÑOS con Santiago Camacho',
'title': 'DEx 08x30 Rostros del mal: Los asesinos en serie que aterrorizaron España',
'description': 'md5:eae8b4b9740d0216d3871390b056bb08',
'uploader': 'Santiago Camacho',
'thumbnail': 'https://static-1.ivoox.com/audios/c/d/5/2/cd52f46783fe735000c33a803dce2554_XXL.jpg',
'upload_date': '20250323',
'episode': 'DEx 08x30 Rostros del mal: Los asesinos en serie que aterrorizaron España',
'duration': 11837,
'tags': ['españa', 'asesinos en serie', 'arropiero', 'historia criminal', 'mataviejas'],
},
}, {
'url': 'https://go.ivoox.com/rf/143594959',
'only_matching': True,
}, {
'url': 'https://www.ivoox.com/en/campodelgas-28-03-2025-audios-mp3_rf_144036942_1.html',
'only_matching': True,
}]
def _real_extract(self, url):
media_id = self._match_id(url)
webpage = self._download_webpage(url, media_id, fatal=False)
data = self._search_nuxt_data(
webpage, media_id, fatal=False, traverse=('data', 0, 'data', 'audio'))
direct_download = self._download_json(
f'https://vcore-web.ivoox.com/v1/public/audios/{media_id}/download-url', media_id, fatal=False,
note='Fetching direct download link', headers={'Referer': url})
download_paths = {
*traverse_obj(direct_download, ('data', 'downloadUrl', {str}, filter, all)),
*traverse_obj(data, (('downloadUrl', 'mediaUrl'), {str}, filter)),
}
formats = []
for path in download_paths:
formats.append({
'url': urljoin('https://ivoox.com', path),
'http_headers': {'Referer': url},
})
return {
'id': media_id,
'formats': formats,
'uploader': self._html_search_regex(r'data-prm-author="([^"]+)"', webpage, 'author', default=None),
'timestamp': parse_iso8601(
self._html_search_regex(r'data-prm-pubdate="([^"]+)"', webpage, 'timestamp', default=None)),
'channel': self._html_search_regex(r'data-prm-podname="([^"]+)"', webpage, 'channel', default=None),
'title': self._html_search_regex(r'data-prm-title="([^"]+)"', webpage, 'title', default=None),
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'description': self._og_search_description(webpage, default=None),
**self._search_json_ld(webpage, media_id, default={}),
**traverse_obj(data, {
'title': ('title', {str}),
'description': ('description', {str}),
'thumbnail': ('image', {url_or_none}),
'timestamp': ('uploadDate', {parse_iso8601(delimiter=' ')}),
'duration': ('duration', {int_or_none}),
'tags': ('tags', ..., 'name', {str}),
}),
}

View File

@ -2,10 +2,12 @@ import hashlib
import random
from .common import InfoExtractor
from ..networking import HEADRequest
from ..utils import (
clean_html,
int_or_none,
try_get,
urlhandle_detect_ext,
)
@ -27,7 +29,7 @@ class JamendoIE(InfoExtractor):
'ext': 'flac',
# 'title': 'Maya Filipič - Stories from Emona I',
'title': 'Stories from Emona I',
'artist': 'Maya Filipič',
'artists': ['Maya Filipič'],
'album': 'Between two worlds',
'track': 'Stories from Emona I',
'duration': 210,
@ -93,9 +95,15 @@ class JamendoIE(InfoExtractor):
if not cover_url or cover_url in urls:
continue
urls.append(cover_url)
urlh = self._request_webpage(
HEADRequest(cover_url), track_id, 'Checking thumbnail extension',
errnote=False, fatal=False)
if not urlh:
continue
size = int_or_none(cover_id.lstrip('size'))
thumbnails.append({
'id': cover_id,
'ext': urlhandle_detect_ext(urlh, default='jpg'),
'url': cover_url,
'width': size,
'height': size,

View File

@ -1,3 +1,5 @@
import itertools
from .common import InfoExtractor
from ..utils import (
determine_ext,
@ -124,3 +126,43 @@ class KikaIE(InfoExtractor):
'vbr': ('bitrateVideo', {int_or_none}, {lambda x: None if x == -1 else x}),
}),
}
class KikaPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?kika\.de/[\w-]+/(?P<id>[a-z-]+\d+)'
_TESTS = [{
'url': 'https://www.kika.de/logo/logo-die-welt-und-ich-562',
'info_dict': {
'id': 'logo-die-welt-und-ich-562',
'title': 'logo!',
'description': 'md5:7b9d7f65561b82fa512f2cfb553c397d',
},
'playlist_count': 100,
}]
def _entries(self, playlist_url, playlist_id):
for page in itertools.count(1):
data = self._download_json(playlist_url, playlist_id, note=f'Downloading page {page}')
for item in traverse_obj(data, ('content', lambda _, v: url_or_none(v['api']['url']))):
yield self.url_result(
item['api']['url'], ie=KikaIE,
**traverse_obj(item, {
'id': ('id', {str}),
'title': ('title', {str}),
'duration': ('duration', {int_or_none}),
'timestamp': ('date', {parse_iso8601}),
}))
playlist_url = traverse_obj(data, ('links', 'next', {url_or_none}))
if not playlist_url:
break
def _real_extract(self, url):
playlist_id = self._match_id(url)
brand_data = self._download_json(
f'https://www.kika.de/_next-api/proxy/v1/brands/{playlist_id}', playlist_id)
return self.playlist_result(
self._entries(brand_data['videoSubchannel']['videosPageUrl'], playlist_id),
playlist_id, title=brand_data.get('title'), description=brand_data.get('description'))

View File

@ -26,6 +26,7 @@ class LBRYBaseIE(InfoExtractor):
_CLAIM_ID_REGEX = r'[0-9a-f]{1,40}'
_OPT_CLAIM_ID = f'[^$@:/?#&]+(?:[:#]{_CLAIM_ID_REGEX})?'
_SUPPORTED_STREAM_TYPES = ['video', 'audio']
_UNSUPPORTED_STREAM_TYPES = ['binary']
_PAGE_SIZE = 50
def _call_api_proxy(self, method, display_id, params, resource):
@ -336,12 +337,15 @@ class LBRYIE(LBRYBaseIE):
'vcodec': 'none' if stream_type == 'audio' else None,
})
final_url = None
# HEAD request returns redirect response to m3u8 URL if available
final_url = self._request_webpage(
urlh = self._request_webpage(
HEADRequest(streaming_url), display_id, headers=headers,
note='Downloading streaming redirect url info').url
note='Downloading streaming redirect url info', fatal=False)
if urlh:
final_url = urlh.url
elif result.get('value_type') == 'stream':
elif result.get('value_type') == 'stream' and stream_type not in self._UNSUPPORTED_STREAM_TYPES:
claim_id, is_live = result['signing_channel']['claim_id'], True
live_data = self._download_json(
'https://api.odysee.live/livestream/is_live', claim_id,

87
yt_dlp/extractor/loco.py Normal file
View File

@ -0,0 +1,87 @@
from .common import InfoExtractor
from ..utils import int_or_none, url_or_none
from ..utils.traversal import require, traverse_obj
class LocoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?loco\.com/(?P<type>streamers|stream)/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://loco.com/streamers/teuzinfps',
'info_dict': {
'id': 'teuzinfps',
'ext': 'mp4',
'title': r're:MS BOLADAO, RESENHA & GAMEPLAY ALTO NIVEL',
'description': 'bom e novo',
'uploader_id': 'RLUVE3S9JU',
'channel': 'teuzinfps',
'channel_follower_count': int,
'comment_count': int,
'view_count': int,
'concurrent_view_count': int,
'like_count': int,
'thumbnail': 'https://static.ivory.getloconow.com/default_thumb/743701a9-98ca-41ae-9a8b-70bd5da070ad.jpg',
'tags': ['MMORPG', 'Gameplay'],
'series': 'Tibia',
'timestamp': int,
'modified_timestamp': int,
'live_status': 'is_live',
'upload_date': str,
'modified_date': str,
},
'params': {
'skip_download': 'Livestream',
},
}, {
'url': 'https://loco.com/stream/c64916eb-10fb-46a9-9a19-8c4b7ed064e7',
'md5': '45ebc8a47ee1c2240178757caf8881b5',
'info_dict': {
'id': 'c64916eb-10fb-46a9-9a19-8c4b7ed064e7',
'ext': 'mp4',
'title': 'PAULINHO LOKO NA LOCO!',
'description': 'live on na loco',
'uploader_id': '2MDO7Z1DPM',
'channel': 'paulinholokobr',
'channel_follower_count': int,
'comment_count': int,
'view_count': int,
'concurrent_view_count': int,
'like_count': int,
'duration': 14491,
'thumbnail': 'https://static.ivory.getloconow.com/default_thumb/59b5970b-23c1-4518-9e96-17ce341299fe.jpg',
'tags': ['Gameplay'],
'series': 'GTA 5',
'timestamp': 1740612872,
'modified_timestamp': 1740613037,
'upload_date': '20250226',
'modified_date': '20250226',
},
}]
def _real_extract(self, url):
video_type, video_id = self._match_valid_url(url).group('type', 'id')
webpage = self._download_webpage(url, video_id)
stream = traverse_obj(self._search_nextjs_data(webpage, video_id), (
'props', 'pageProps', ('liveStreamData', 'stream'), {dict}, any, {require('stream info')}))
return {
'formats': self._extract_m3u8_formats(stream['conf']['hls'], video_id),
'id': video_id,
'is_live': video_type == 'streamers',
**traverse_obj(stream, {
'title': ('title', {str}),
'series': ('game_name', {str}),
'uploader_id': ('user_uid', {str}),
'channel': ('alias', {str}),
'description': ('description', {str}),
'concurrent_view_count': ('viewersCurrent', {int_or_none}),
'view_count': ('total_views', {int_or_none}),
'thumbnail': ('thumbnail_url_small', {url_or_none}),
'like_count': ('likes', {int_or_none}),
'tags': ('tags', ..., {str}),
'timestamp': ('started_at', {int_or_none(scale=1000)}),
'modified_timestamp': ('updated_at', {int_or_none(scale=1000)}),
'comment_count': ('comments_count', {int_or_none}),
'channel_follower_count': ('followers_count', {int_or_none}),
'duration': ('duration', {int_or_none}),
}),
}

View File

@ -2,8 +2,11 @@ from .common import InfoExtractor
from ..utils import (
clean_html,
merge_dicts,
str_or_none,
traverse_obj,
unified_timestamp,
url_or_none,
urljoin,
)
@ -80,7 +83,7 @@ class LRTVODIE(LRTBaseIE):
}]
def _real_extract(self, url):
path, video_id = self._match_valid_url(url).groups()
path, video_id = self._match_valid_url(url).group('path', 'id')
webpage = self._download_webpage(url, video_id)
media_url = self._extract_js_var(webpage, 'main_url', path)
@ -106,3 +109,42 @@ class LRTVODIE(LRTBaseIE):
}
return merge_dicts(clean_info, jw_data, json_ld_data)
class LRTRadioIE(LRTBaseIE):
_VALID_URL = r'https?://(?:www\.)?lrt\.lt/radioteka/irasas/(?P<id>\d+)/(?P<path>[^?#/]+)'
_TESTS = [{
# m3u8 download
'url': 'https://www.lrt.lt/radioteka/irasas/2000359728/nemarios-eiles-apie-pragarus-ir-skaistyklas-su-aiste-kiltinaviciute',
'info_dict': {
'id': '2000359728',
'ext': 'm4a',
'title': 'Nemarios eilės: apie pragarus ir skaistyklas su Aiste Kiltinavičiūte',
'description': 'md5:5eee9a0e86a55bf547bd67596204625d',
'timestamp': 1726143120,
'upload_date': '20240912',
'tags': 'count:5',
'thumbnail': r're:https?://.+/.+\.jpe?g',
'categories': ['Daiktiniai įrodymai'],
},
}, {
'url': 'https://www.lrt.lt/radioteka/irasas/2000304654/vakaras-su-knyga-svetlana-aleksijevic-cernobylio-malda-v-dalis?season=%2Fmediateka%2Faudio%2Fvakaras-su-knyga%2F2023',
'only_matching': True,
}]
def _real_extract(self, url):
video_id, path = self._match_valid_url(url).group('id', 'path')
media = self._download_json(
'https://www.lrt.lt/radioteka/api/media', video_id,
query={'url': f'/mediateka/irasas/{video_id}/{path}'})
return traverse_obj(media, {
'id': ('id', {int}, {str_or_none}),
'title': ('title', {str}),
'tags': ('tags', ..., 'name', {str}),
'categories': ('playlist_item', 'category', {str}, filter, all, filter),
'description': ('content', {clean_html}, {str}),
'timestamp': ('date', {lambda x: x.replace('.', '/')}, {unified_timestamp}),
'thumbnail': ('playlist_item', 'image', {urljoin('https://www.lrt.lt')}),
'formats': ('playlist_item', 'file', {lambda x: self._extract_m3u8_formats(x, video_id)}),
})

View File

@ -1,35 +1,36 @@
from .common import InfoExtractor
from ..utils import parse_age_limit, parse_duration, traverse_obj
from ..utils import parse_age_limit, parse_duration, url_or_none
from ..utils.traversal import traverse_obj
class MagellanTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?magellantv\.com/(?:watch|video)/(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://www.magellantv.com/watch/my-dads-on-death-row?type=v',
'url': 'https://www.magellantv.com/watch/incas-the-new-story?type=v',
'info_dict': {
'id': 'my-dads-on-death-row',
'id': 'incas-the-new-story',
'ext': 'mp4',
'title': 'My Dad\'s On Death Row',
'description': 'md5:33ba23b9f0651fc4537ed19b1d5b0d7a',
'duration': 3780.0,
'title': 'Incas: The New Story',
'description': 'md5:936c7f6d711c02dfb9db22a067b586fe',
'age_limit': 14,
'tags': ['Justice', 'Reality', 'United States', 'True Crime'],
'duration': 3060.0,
'tags': ['Ancient History', 'Archaeology', 'Anthropology'],
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.magellantv.com/video/james-bulger-the-new-revelations',
'url': 'https://www.magellantv.com/video/tortured-to-death-murdering-the-nanny',
'info_dict': {
'id': 'james-bulger-the-new-revelations',
'id': 'tortured-to-death-murdering-the-nanny',
'ext': 'mp4',
'title': 'James Bulger: The New Revelations',
'description': 'md5:7b97922038bad1d0fe8d0470d8a189f2',
'title': 'Tortured to Death: Murdering the Nanny',
'description': 'md5:d87033594fa218af2b1a8b49f52511e5',
'age_limit': 14,
'duration': 2640.0,
'age_limit': 0,
'tags': ['Investigation', 'True Crime', 'Justice', 'Europe'],
'tags': ['True Crime', 'Murder'],
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.magellantv.com/watch/celebration-nation',
'url': 'https://www.magellantv.com/watch/celebration-nation?type=s',
'info_dict': {
'id': 'celebration-nation',
'ext': 'mp4',
@ -43,10 +44,19 @@ class MagellanTVIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
data = traverse_obj(self._search_nextjs_data(webpage, video_id), (
'props', 'pageProps', 'reactContext',
(('video', 'detail'), ('series', 'currentEpisode')), {dict}), get_all=False)
formats, subtitles = self._extract_m3u8_formats_and_subtitles(data['jwpVideoUrl'], video_id)
context = self._search_nextjs_data(webpage, video_id)['props']['pageProps']['reactContext']
data = traverse_obj(context, ((('video', 'detail'), ('series', 'currentEpisode')), {dict}, any))
formats, subtitles = [], {}
for m3u8_url in set(traverse_obj(data, ((('manifests', ..., 'hls'), 'jwp_video_url'), {url_or_none}))):
fmts, subs = self._extract_m3u8_formats_and_subtitles(
m3u8_url, video_id, 'mp4', m3u8_id='hls', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
if not formats and (error := traverse_obj(context, ('errorDetailPage', 'errorMessage', {str}))):
if 'available in your country' in error:
self.raise_geo_restricted(msg=error)
self.raise_no_formats(f'{self.IE_NAME} said: {error}', expected=True)
return {
'id': video_id,

View File

@ -102,11 +102,10 @@ class MedalTVIE(InfoExtractor):
item_id = item_id or '%dp' % height
if item_id not in item_url:
return
width = int(round(aspect_ratio * height))
container.append({
'url': item_url,
id_key: item_id,
'width': width,
'width': round(aspect_ratio * height),
'height': height,
})

View File

@ -4,6 +4,7 @@ from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_iso8601,
parse_resolution,
traverse_obj,
unified_timestamp,
url_basename,
@ -83,8 +84,8 @@ class MicrosoftMediusBaseIE(InfoExtractor):
subtitles.setdefault(sub.pop('tag', 'und'), []).append(sub)
return subtitles
def _extract_ism(self, ism_url, video_id):
formats = self._extract_ism_formats(ism_url, video_id)
def _extract_ism(self, ism_url, video_id, fatal=True):
formats = self._extract_ism_formats(ism_url, video_id, fatal=fatal)
for fmt in formats:
if fmt['language'] != 'eng' and 'English' not in fmt['format_id']:
fmt['language_preference'] = -10
@ -218,9 +219,21 @@ class MicrosoftLearnEpisodeIE(MicrosoftMediusBaseIE):
'description': 'md5:7bbbfb593d21c2cf2babc3715ade6b88',
'timestamp': 1676339547,
'upload_date': '20230214',
'thumbnail': r're:https://learn\.microsoft\.com/video/media/.*\.png',
'thumbnail': r're:https://learn\.microsoft\.com/video/media/.+\.png',
'subtitles': 'count:14',
},
}, {
'url': 'https://learn.microsoft.com/en-gb/shows/on-demand-instructor-led-training-series/az-900-module-1',
'info_dict': {
'id': '4fe10f7c-d83c-463b-ac0e-c30a8195e01b',
'ext': 'mp4',
'title': 'AZ-900 Cloud fundamentals (1 of 6)',
'description': 'md5:3c2212ce865e9142f402c766441bd5c9',
'thumbnail': r're:https://.+/.+\.jpg',
'timestamp': 1706605184,
'upload_date': '20240130',
},
'params': {'format': 'bv[protocol=https]'},
}]
def _real_extract(self, url):
@ -230,9 +243,32 @@ class MicrosoftLearnEpisodeIE(MicrosoftMediusBaseIE):
entry_id = self._html_search_meta('entryId', webpage, 'entryId', fatal=True)
video_info = self._download_json(
f'https://learn.microsoft.com/api/video/public/v1/entries/{entry_id}', video_id)
formats = []
if ism_url := traverse_obj(video_info, ('publicVideo', 'adaptiveVideoUrl', {url_or_none})):
formats.extend(self._extract_ism(ism_url, video_id, fatal=False))
if hls_url := traverse_obj(video_info, ('publicVideo', 'adaptiveVideoHLSUrl', {url_or_none})):
formats.extend(self._extract_m3u8_formats(hls_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
if mpd_url := traverse_obj(video_info, ('publicVideo', 'adaptiveVideoDashUrl', {url_or_none})):
formats.extend(self._extract_mpd_formats(mpd_url, video_id, mpd_id='dash', fatal=False))
for key in ('low', 'medium', 'high'):
if video_url := traverse_obj(video_info, ('publicVideo', f'{key}QualityVideoUrl', {url_or_none})):
formats.append({
'url': video_url,
'format_id': f'video-http-{key}',
'acodec': 'none',
**parse_resolution(video_url),
})
if audio_url := traverse_obj(video_info, ('publicVideo', 'audioUrl', {url_or_none})):
formats.append({
'url': audio_url,
'format_id': 'audio-http',
'vcodec': 'none',
})
return {
'id': entry_id,
'formats': self._extract_ism(video_info['publicVideo']['adaptiveVideoUrl'], video_id),
'formats': formats,
'subtitles': self._sub_to_dict(traverse_obj(video_info, (
'publicVideo', 'captions', lambda _, v: url_or_none(v['url']), {
'tag': ('language', {str}),

View File

@ -1,5 +1,7 @@
from .telecinco import TelecincoBaseIE
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
int_or_none,
parse_iso8601,
)
@ -79,7 +81,17 @@ class MiTeleIE(TelecincoBaseIE):
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
try: # yt-dlp's default user-agents are too old and blocked by akamai
webpage = self._download_webpage(url, display_id, headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; rv:136.0) Gecko/20100101 Firefox/136.0',
})
except ExtractorError as e:
if not isinstance(e.cause, HTTPError) or e.cause.status != 403:
raise
# Retry with impersonation if hardcoded UA is insufficient to bypass akamai
webpage = self._download_webpage(url, display_id, impersonate=True)
pre_player = self._search_json(
r'window\.\$REACTBASE_STATE\.prePlayer_mtweb\s*=',
webpage, 'Pre Player', display_id)['prePlayer']

View File

@ -10,7 +10,9 @@ from ..utils import (
parse_iso8601,
strip_or_none,
try_get,
url_or_none,
)
from ..utils.traversal import traverse_obj
class MixcloudBaseIE(InfoExtractor):
@ -37,7 +39,7 @@ class MixcloudIE(MixcloudBaseIE):
'ext': 'm4a',
'title': 'Cryptkeeper',
'description': 'After quite a long silence from myself, finally another Drum\'n\'Bass mix with my favourite current dance floor bangers.',
'uploader': 'Daniel Holbach',
'uploader': 'dholbach',
'uploader_id': 'dholbach',
'thumbnail': r're:https?://.*\.jpg',
'view_count': int,
@ -46,10 +48,11 @@ class MixcloudIE(MixcloudBaseIE):
'uploader_url': 'https://www.mixcloud.com/dholbach/',
'artist': 'Submorphics & Chino , Telekinesis, Porter Robinson, Enei, Breakage ft Jess Mills',
'duration': 3723,
'tags': [],
'tags': ['liquid drum and bass', 'drum and bass'],
'comment_count': int,
'repost_count': int,
'like_count': int,
'artists': list,
},
'params': {'skip_download': 'm3u8'},
}, {
@ -67,7 +70,7 @@ class MixcloudIE(MixcloudBaseIE):
'upload_date': '20150203',
'uploader_url': 'https://www.mixcloud.com/gillespeterson/',
'duration': 2992,
'tags': [],
'tags': ['jazz', 'soul', 'world music', 'funk'],
'comment_count': int,
'repost_count': int,
'like_count': int,
@ -149,8 +152,6 @@ class MixcloudIE(MixcloudBaseIE):
elif reason:
raise ExtractorError('Track is restricted', expected=True)
title = cloudcast['name']
stream_info = cloudcast['streamInfo']
formats = []
@ -182,47 +183,39 @@ class MixcloudIE(MixcloudBaseIE):
self.raise_login_required(metadata_available=True)
comments = []
for edge in (try_get(cloudcast, lambda x: x['comments']['edges']) or []):
node = edge.get('node') or {}
for node in traverse_obj(cloudcast, ('comments', 'edges', ..., 'node', {dict})):
text = strip_or_none(node.get('comment'))
if not text:
continue
user = node.get('user') or {}
comments.append({
'author': user.get('displayName'),
'author_id': user.get('username'),
'text': text,
'timestamp': parse_iso8601(node.get('created')),
**traverse_obj(node, {
'author': ('user', 'displayName', {str}),
'author_id': ('user', 'username', {str}),
'timestamp': ('created', {parse_iso8601}),
}),
})
tags = []
for t in cloudcast.get('tags'):
tag = try_get(t, lambda x: x['tag']['name'], str)
if not tag:
tags.append(tag)
get_count = lambda x: int_or_none(try_get(cloudcast, lambda y: y[x]['totalCount']))
owner = cloudcast.get('owner') or {}
return {
'id': track_id,
'title': title,
'formats': formats,
'description': cloudcast.get('description'),
'thumbnail': try_get(cloudcast, lambda x: x['picture']['url'], str),
'uploader': owner.get('displayName'),
'timestamp': parse_iso8601(cloudcast.get('publishDate')),
'uploader_id': owner.get('username'),
'uploader_url': owner.get('url'),
'duration': int_or_none(cloudcast.get('audioLength')),
'view_count': int_or_none(cloudcast.get('plays')),
'like_count': get_count('favorites'),
'repost_count': get_count('reposts'),
'comment_count': get_count('comments'),
'comments': comments,
'tags': tags,
'artist': ', '.join(cloudcast.get('featuringArtistList') or []) or None,
**traverse_obj(cloudcast, {
'title': ('name', {str}),
'description': ('description', {str}),
'thumbnail': ('picture', 'url', {url_or_none}),
'timestamp': ('publishDate', {parse_iso8601}),
'duration': ('audioLength', {int_or_none}),
'uploader': ('owner', 'displayName', {str}),
'uploader_id': ('owner', 'username', {str}),
'uploader_url': ('owner', 'url', {url_or_none}),
'view_count': ('plays', {int_or_none}),
'like_count': ('favorites', 'totalCount', {int_or_none}),
'repost_count': ('reposts', 'totalCount', {int_or_none}),
'comment_count': ('comments', 'totalCount', {int_or_none}),
'tags': ('tags', ..., 'tag', 'name', {str}, filter, all, filter),
'artists': ('featuringArtistList', ..., {str}, filter, all, filter),
}),
}
@ -295,7 +288,7 @@ class MixcloudUserIE(MixcloudPlaylistBaseIE):
'url': 'http://www.mixcloud.com/dholbach/',
'info_dict': {
'id': 'dholbach_uploads',
'title': 'Daniel Holbach (uploads)',
'title': 'dholbach (uploads)',
'description': 'md5:a3f468a60ac8c3e1f8616380fc469b2b',
},
'playlist_mincount': 36,
@ -303,7 +296,7 @@ class MixcloudUserIE(MixcloudPlaylistBaseIE):
'url': 'http://www.mixcloud.com/dholbach/uploads/',
'info_dict': {
'id': 'dholbach_uploads',
'title': 'Daniel Holbach (uploads)',
'title': 'dholbach (uploads)',
'description': 'md5:a3f468a60ac8c3e1f8616380fc469b2b',
},
'playlist_mincount': 36,
@ -311,7 +304,7 @@ class MixcloudUserIE(MixcloudPlaylistBaseIE):
'url': 'http://www.mixcloud.com/dholbach/favorites/',
'info_dict': {
'id': 'dholbach_favorites',
'title': 'Daniel Holbach (favorites)',
'title': 'dholbach (favorites)',
'description': 'md5:a3f468a60ac8c3e1f8616380fc469b2b',
},
# 'params': {
@ -337,7 +330,7 @@ class MixcloudUserIE(MixcloudPlaylistBaseIE):
'title': 'First Ear (stream)',
'description': 'we maraud for ears',
},
'playlist_mincount': 269,
'playlist_mincount': 267,
}]
_TITLE_KEY = 'displayName'
@ -361,7 +354,7 @@ class MixcloudPlaylistIE(MixcloudPlaylistBaseIE):
'id': 'maxvibes_jazzcat-on-ness-radio',
'title': 'Ness Radio sessions',
},
'playlist_mincount': 59,
'playlist_mincount': 58,
}]
_TITLE_KEY = 'name'
_DESCRIPTION_KEY = 'description'

View File

@ -449,9 +449,7 @@ mutation initPlaybackSession(
if not (m3u8_url and token):
errors = '; '.join(traverse_obj(response, ('errors', ..., 'message', {str})))
if 'not entitled' in errors:
raise ExtractorError(errors, expected=True)
elif errors: # Only warn when 'blacked out' since radio formats are available
if errors: # Only warn when 'blacked out' or 'not entitled'; radio formats may be available
self.report_warning(f'API returned errors for {format_id}: {errors}')
else:
self.report_warning(f'No formats available for {format_id} broadcast; skipping')

View File

@ -3,8 +3,8 @@ from .dailymotion import DailymotionIE
class MoviepilotIE(InfoExtractor):
_IE_NAME = 'moviepilot'
_IE_DESC = 'Moviepilot trailer'
IE_NAME = 'moviepilot'
IE_DESC = 'Moviepilot trailer'
_VALID_URL = r'https?://(?:www\.)?moviepilot\.de/movies/(?P<id>[^/]+)'
_TESTS = [{

View File

@ -1,167 +1,215 @@
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
clean_html,
determine_ext,
int_or_none,
unescapeHTML,
parse_iso8601,
url_or_none,
)
from ..utils.traversal import traverse_obj
class MSNIE(InfoExtractor):
_WORKING = False
_VALID_URL = r'https?://(?:(?:www|preview)\.)?msn\.com/(?:[^/]+/)+(?P<display_id>[^/]+)/[a-z]{2}-(?P<id>[\da-zA-Z]+)'
_VALID_URL = r'https?://(?:(?:www|preview)\.)?msn\.com/(?P<locale>[a-z]{2}-[a-z]{2})/(?:[^/?#]+/)+(?P<display_id>[^/?#]+)/[a-z]{2}-(?P<id>[\da-zA-Z]+)'
_TESTS = [{
'url': 'https://www.msn.com/en-in/money/video/7-ways-to-get-rid-of-chest-congestion/vi-BBPxU6d',
'md5': '087548191d273c5c55d05028f8d2cbcd',
'url': 'https://www.msn.com/en-gb/video/news/president-macron-interrupts-trump-over-ukraine-funding/vi-AA1zMcD7',
'info_dict': {
'id': 'BBPxU6d',
'display_id': '7-ways-to-get-rid-of-chest-congestion',
'id': 'AA1zMcD7',
'ext': 'mp4',
'title': 'Seven ways to get rid of chest congestion',
'description': '7 Ways to Get Rid of Chest Congestion',
'duration': 88,
'uploader': 'Health',
'uploader_id': 'BBPrMqa',
'display_id': 'president-macron-interrupts-trump-over-ukraine-funding',
'title': 'President Macron interrupts Trump over Ukraine funding',
'description': 'md5:5fd3857ac25849e7a56cb25fbe1a2a8b',
'uploader': 'k! News UK',
'uploader_id': 'BB1hz5Rj',
'duration': 59,
'thumbnail': 'https://img-s-msn-com.akamaized.net/tenant/amp/entityid/AA1zMagX.img',
'tags': 'count:14',
'timestamp': 1740510914,
'upload_date': '20250225',
'release_timestamp': 1740513600,
'release_date': '20250225',
'modified_timestamp': 1741413241,
'modified_date': '20250308',
},
}, {
# Article, multiple Dailymotion Embeds
'url': 'https://www.msn.com/en-in/money/sports/hottest-football-wags-greatest-footballers-turned-managers-and-more/ar-BBpc7Nl',
'url': 'https://www.msn.com/en-gb/video/watch/films-success-saved-adam-pearsons-acting-career/vi-AA1znZGE?ocid=hpmsn',
'info_dict': {
'id': 'BBpc7Nl',
'id': 'AA1znZGE',
'ext': 'mp4',
'display_id': 'films-success-saved-adam-pearsons-acting-career',
'title': "Films' success saved Adam Pearson's acting career",
'description': 'md5:98c05f7bd9ab4f9c423400f62f2d3da5',
'uploader': 'Sky News',
'uploader_id': 'AA2eki',
'duration': 52,
'thumbnail': 'https://img-s-msn-com.akamaized.net/tenant/amp/entityid/AA1zo7nU.img',
'timestamp': 1739993965,
'upload_date': '20250219',
'release_timestamp': 1739977753,
'release_date': '20250219',
'modified_timestamp': 1742076259,
'modified_date': '20250315',
},
'playlist_mincount': 4,
}, {
'url': 'http://www.msn.com/en-ae/news/offbeat/meet-the-nine-year-old-self-made-millionaire/ar-BBt6ZKf',
'only_matching': True,
}, {
'url': 'http://www.msn.com/en-ae/video/watch/obama-a-lot-of-people-will-be-disappointed/vi-AAhxUMH',
'only_matching': True,
}, {
# geo restricted
'url': 'http://www.msn.com/en-ae/foodanddrink/joinourtable/the-first-fart-makes-you-laugh-the-last-fart-makes-you-cry/vp-AAhzIBU',
'only_matching': True,
}, {
'url': 'http://www.msn.com/en-ae/entertainment/bollywood/watch-how-salman-khan-reacted-when-asked-if-he-would-apologize-for-his-raped-woman-comment/vi-AAhvzW6',
'only_matching': True,
}, {
# Vidible(AOL) Embed
'url': 'https://www.msn.com/en-us/money/other/jupiter-is-about-to-come-so-close-you-can-see-its-moons-with-binoculars/vi-AACqsHR',
'only_matching': True,
'url': 'https://www.msn.com/en-us/entertainment/news/rock-frontman-replacements-you-might-not-know-happened/vi-AA1yLVcD',
'info_dict': {
'id': 'AA1yLVcD',
'ext': 'mp4',
'display_id': 'rock-frontman-replacements-you-might-not-know-happened',
'title': 'Rock Frontman Replacements You Might Not Know Happened',
'description': 'md5:451a125496ff0c9f6816055bb1808da9',
'uploader': 'Grunge (Video)',
'uploader_id': 'BB1oveoV',
'duration': 596,
'thumbnail': 'https://img-s-msn-com.akamaized.net/tenant/amp/entityid/AA1yM4OJ.img',
'timestamp': 1739223456,
'upload_date': '20250210',
'release_timestamp': 1739219731,
'release_date': '20250210',
'modified_timestamp': 1741427272,
'modified_date': '20250308',
},
}, {
# Dailymotion Embed
'url': 'https://www.msn.com/es-ve/entretenimiento/watch/winston-salem-paire-refait-des-siennes-en-perdant-sa-raquette-au-service/vp-AAG704L',
'only_matching': True,
'url': 'https://www.msn.com/de-de/nachrichten/other/the-first-descendant-gameplay-trailer-zu-serena-der-neuen-gefl%C3%BCgelten-nachfahrin/vi-AA1B1d06',
'info_dict': {
'id': 'x9g6oli',
'ext': 'mp4',
'title': 'The First Descendant: Gameplay-Trailer zu Serena, der neuen geflügelten Nachfahrin',
'description': '',
'uploader': 'MeinMMO',
'uploader_id': 'x2mvqi4',
'view_count': int,
'like_count': int,
'age_limit': 0,
'duration': 60,
'thumbnail': 'https://s1.dmcdn.net/v/Y3fO61drj56vPB9SS/x1080',
'tags': ['MeinMMO', 'The First Descendant'],
'timestamp': 1742124877,
'upload_date': '20250316',
},
}, {
# YouTube Embed
'url': 'https://www.msn.com/en-in/money/news/meet-vikram-%E2%80%94-chandrayaan-2s-lander/vi-AAGUr0v',
'only_matching': True,
# Youtube Embed
'url': 'https://www.msn.com/en-gb/video/webcontent/web-content/vi-AA1ybFaJ',
'info_dict': {
'id': 'kQSChWu95nE',
'ext': 'mp4',
'title': '7 Daily Habits to Nurture Your Personal Growth',
'description': 'md5:6f233c68341b74dee30c8c121924e827',
'uploader': 'TopThink',
'uploader_id': '@TopThink',
'uploader_url': 'https://www.youtube.com/@TopThink',
'channel': 'TopThink',
'channel_id': 'UCMlGmHokrQRp-RaNO7aq4Uw',
'channel_url': 'https://www.youtube.com/channel/UCMlGmHokrQRp-RaNO7aq4Uw',
'channel_is_verified': True,
'channel_follower_count': int,
'comment_count': int,
'view_count': int,
'like_count': int,
'age_limit': 0,
'duration': 705,
'thumbnail': 'https://i.ytimg.com/vi/kQSChWu95nE/maxresdefault.jpg',
'categories': ['Howto & Style'],
'tags': ['topthink', 'top think', 'personal growth'],
'timestamp': 1722711620,
'upload_date': '20240803',
'playable_in_embed': True,
'availability': 'public',
'live_status': 'not_live',
},
}, {
# NBCSports Embed
'url': 'https://www.msn.com/en-us/money/football_nfl/week-13-preview-redskins-vs-panthers/vi-BBXsCDb',
'only_matching': True,
# Article with social embed
'url': 'https://www.msn.com/en-in/news/techandscience/watch-earth-sets-and-rises-behind-moon-in-breathtaking-blue-ghost-video/ar-AA1zKoAc',
'info_dict': {
'id': 'AA1zKoAc',
'title': 'Watch: Earth sets and rises behind Moon in breathtaking Blue Ghost video',
'description': 'md5:0ad51cfa77e42e7f0c46cf98a619dbbf',
'uploader': 'India Today',
'uploader_id': 'AAyFWG',
'tags': 'count:11',
'timestamp': 1740485034,
'upload_date': '20250225',
'release_timestamp': 1740484875,
'release_date': '20250225',
'modified_timestamp': 1740488561,
'modified_date': '20250225',
},
'playlist_count': 1,
}]
def _real_extract(self, url):
display_id, page_id = self._match_valid_url(url).groups()
locale, display_id, page_id = self._match_valid_url(url).group('locale', 'display_id', 'id')
webpage = self._download_webpage(url, display_id)
json_data = self._download_json(
f'https://assets.msn.com/content/view/v2/Detail/{locale}/{page_id}', page_id)
entries = []
for _, metadata in re.findall(r'data-metadata\s*=\s*(["\'])(?P<data>.+?)\1', webpage):
video = self._parse_json(unescapeHTML(metadata), display_id)
provider_id = video.get('providerId')
player_name = video.get('playerName')
if player_name and provider_id:
entry = None
if player_name == 'AOL':
if provider_id.startswith('http'):
provider_id = self._search_regex(
r'https?://delivery\.vidible\.tv/video/redirect/([0-9a-f]{24})',
provider_id, 'vidible id')
entry = self.url_result(
'aol-video:' + provider_id, 'Aol', provider_id)
elif player_name == 'Dailymotion':
entry = self.url_result(
'https://www.dailymotion.com/video/' + provider_id,
'Dailymotion', provider_id)
elif player_name == 'YouTube':
entry = self.url_result(
provider_id, 'Youtube', provider_id)
elif player_name == 'NBCSports':
entry = self.url_result(
'http://vplayer.nbcsports.com/p/BxmELC/nbcsports_embed/select/media/' + provider_id,
'NBCSportsVPlayer', provider_id)
if entry:
entries.append(entry)
continue
video_id = video['uuid']
title = video['title']
common_metadata = traverse_obj(json_data, {
'title': ('title', {str}),
'description': (('abstract', ('body', {clean_html})), {str}, filter, any),
'timestamp': ('createdDateTime', {parse_iso8601}),
'release_timestamp': ('publishedDateTime', {parse_iso8601}),
'modified_timestamp': ('updatedDateTime', {parse_iso8601}),
'thumbnail': ('thumbnail', 'image', 'url', {url_or_none}),
'duration': ('videoMetadata', 'playTime', {int_or_none}),
'tags': ('keywords', ..., {str}),
'uploader': ('provider', 'name', {str}),
'uploader_id': ('provider', 'id', {str}),
})
page_type = json_data['type']
source_url = traverse_obj(json_data, ('sourceHref', {url_or_none}))
if page_type == 'video':
if traverse_obj(json_data, ('thirdPartyVideoPlayer', 'enabled')) and source_url:
return self.url_result(source_url)
formats = []
for file_ in video.get('videoFiles', []):
format_url = file_.get('url')
if not format_url:
continue
if 'format=m3u8-aapl' in format_url:
# m3u8_native should not be used here until
# https://github.com/ytdl-org/youtube-dl/issues/9913 is fixed
formats.extend(self._extract_m3u8_formats(
format_url, display_id, 'mp4',
m3u8_id='hls', fatal=False))
elif 'format=mpd-time-csf' in format_url:
formats.extend(self._extract_mpd_formats(
format_url, display_id, 'dash', fatal=False))
elif '.ism' in format_url:
if format_url.endswith('.ism'):
format_url += '/manifest'
formats.extend(self._extract_ism_formats(
format_url, display_id, 'mss', fatal=False))
else:
format_id = file_.get('formatCode')
formats.append({
'url': format_url,
'ext': 'mp4',
'format_id': format_id,
'width': int_or_none(file_.get('width')),
'height': int_or_none(file_.get('height')),
'vbr': int_or_none(self._search_regex(r'_(\d+)\.mp4', format_url, 'vbr', default=None)),
'quality': 1 if format_id == '1001' else None,
})
subtitles = {}
for file_ in video.get('files', []):
format_url = file_.get('url')
format_code = file_.get('formatCode')
if not format_url or not format_code:
continue
if str(format_code) == '3100':
subtitles.setdefault(file_.get('culture', 'en'), []).append({
'ext': determine_ext(format_url, 'ttml'),
'url': format_url,
})
for file in traverse_obj(json_data, ('videoMetadata', 'externalVideoFiles', lambda _, v: url_or_none(v['url']))):
file_url = file['url']
ext = determine_ext(file_url)
if ext == 'm3u8':
fmts, subs = self._extract_m3u8_formats_and_subtitles(
file_url, page_id, 'mp4', m3u8_id='hls', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
elif ext == 'mpd':
fmts, subs = self._extract_mpd_formats_and_subtitles(
file_url, page_id, mpd_id='dash', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
else:
formats.append(
traverse_obj(file, {
'url': 'url',
'format_id': ('format', {str}),
'filesize': ('fileSize', {int_or_none}),
'height': ('height', {int_or_none}),
'width': ('width', {int_or_none}),
}))
for caption in traverse_obj(json_data, ('videoMetadata', 'closedCaptions', lambda _, v: url_or_none(v['href']))):
lang = caption.get('locale') or 'en-us'
subtitles.setdefault(lang, []).append({
'url': caption['href'],
'ext': 'ttml',
})
entries.append({
'id': video_id,
return {
'id': page_id,
'display_id': display_id,
'title': title,
'description': video.get('description'),
'thumbnail': video.get('headlineImage', {}).get('url'),
'duration': int_or_none(video.get('durationSecs')),
'uploader': video.get('sourceFriendly'),
'uploader_id': video.get('providerId'),
'creator': video.get('creator'),
'subtitles': subtitles,
'formats': formats,
})
'subtitles': subtitles,
**common_metadata,
}
elif page_type == 'webcontent':
if not source_url:
raise ExtractorError('Could not find source URL')
return self.url_result(source_url)
elif page_type == 'article':
entries = []
for embed_url in traverse_obj(json_data, ('socialEmbeds', ..., 'postUrl', {url_or_none})):
entries.append(self.url_result(embed_url))
if not entries:
error = unescapeHTML(self._search_regex(
r'data-error=(["\'])(?P<error>.+?)\1',
webpage, 'error', group='error'))
raise ExtractorError(f'{self.IE_NAME} said: {error}', expected=True)
return self.playlist_result(entries, page_id, **common_metadata)
return self.playlist_result(entries, page_id)
raise ExtractorError(f'Unsupported page type: {page_type}')

View File

@ -4,7 +4,9 @@ from .common import InfoExtractor
from ..utils import (
extract_attributes,
unified_timestamp,
url_or_none,
)
from ..utils.traversal import traverse_obj
class N1InfoAssetIE(InfoExtractor):
@ -35,9 +37,9 @@ class N1InfoIIE(InfoExtractor):
IE_NAME = 'N1Info:article'
_VALID_URL = r'https?://(?:(?:\w+\.)?n1info\.\w+|nova\.rs)/(?:[^/?#]+/){1,2}(?P<id>[^/?#]+)'
_TESTS = [{
# Youtube embedded
# YouTube embedded
'url': 'https://rs.n1info.com/sport-klub/tenis/kako-je-djokovic-propustio-istorijsku-priliku-video/',
'md5': '01ddb6646d0fd9c4c7d990aa77fe1c5a',
'md5': '987ce6fd72acfecc453281e066b87973',
'info_dict': {
'id': 'L5Hd4hQVUpk',
'ext': 'mp4',
@ -45,7 +47,26 @@ class N1InfoIIE(InfoExtractor):
'title': 'Ozmo i USO21, ep. 13: Novak Đoković Danil Medvedev | Ključevi Poraza, Budućnost | SPORT KLUB TENIS',
'description': 'md5:467f330af1effedd2e290f10dc31bb8e',
'uploader': 'Sport Klub',
'uploader_id': 'sportklub',
'uploader_id': '@sportklub',
'uploader_url': 'https://www.youtube.com/@sportklub',
'channel': 'Sport Klub',
'channel_id': 'UChpzBje9Ro6CComXe3BgNaw',
'channel_url': 'https://www.youtube.com/channel/UChpzBje9Ro6CComXe3BgNaw',
'channel_is_verified': True,
'channel_follower_count': int,
'comment_count': int,
'view_count': int,
'like_count': int,
'age_limit': 0,
'duration': 1049,
'thumbnail': 'https://i.ytimg.com/vi/L5Hd4hQVUpk/maxresdefault.jpg',
'chapters': 'count:9',
'categories': ['Sports'],
'tags': 'count:10',
'timestamp': 1631522787,
'playable_in_embed': True,
'availability': 'public',
'live_status': 'not_live',
},
}, {
'url': 'https://rs.n1info.com/vesti/djilas-los-plan-za-metro-nece-resiti-nijedan-saobracajni-problem/',
@ -55,6 +76,7 @@ class N1InfoIIE(InfoExtractor):
'title': 'Đilas: Predlog izgradnje metroa besmislen; SNS odbacuje navode',
'upload_date': '20210924',
'timestamp': 1632481347,
'thumbnail': 'http://n1info.rs/wp-content/themes/ucnewsportal-n1/dist/assets/images/placeholder-image-video.jpg',
},
'params': {
'skip_download': True,
@ -67,6 +89,7 @@ class N1InfoIIE(InfoExtractor):
'title': 'Zadnji dnevi na kopališču Ilirija: “Ilirija ni umrla, ubili so jo”',
'timestamp': 1632567630,
'upload_date': '20210925',
'thumbnail': 'https://n1info.si/wp-content/uploads/2021/09/06/1630945843-tomaz3.png',
},
'params': {
'skip_download': True,
@ -81,6 +104,14 @@ class N1InfoIIE(InfoExtractor):
'upload_date': '20210924',
'timestamp': 1632448649.0,
'uploader': 'YouLotWhatDontStop',
'display_id': 'pu9wbx',
'channel_id': 'serbia',
'comment_count': int,
'like_count': int,
'dislike_count': int,
'age_limit': 0,
'duration': 134,
'thumbnail': 'https://external-preview.redd.it/5nmmawSeGx60miQM3Iq-ueC9oyCLTLjjqX-qqY8uRsc.png?format=pjpg&auto=webp&s=2f973400b04d23f871b608b178e47fc01f9b8f1d',
},
'params': {
'skip_download': True,
@ -93,6 +124,7 @@ class N1InfoIIE(InfoExtractor):
'title': 'Žaklina Tatalović Ani Brnabić: Pričate laži (VIDEO)',
'upload_date': '20211102',
'timestamp': 1635861677,
'thumbnail': 'https://nova.rs/wp-content/uploads/2021/11/02/1635860298-TNJG_Ana_Brnabic_i_Zaklina_Tatalovic_100_dana_Vlade_GP.jpg',
},
}, {
'url': 'https://n1info.rs/vesti/cuta-biti-u-kosovskoj-mitrovici-znaci-da-te-docekaju-eksplozivnim-napravama/',
@ -104,6 +136,16 @@ class N1InfoIIE(InfoExtractor):
'timestamp': 1687290536,
'thumbnail': 'https://cdn.brid.tv/live/partners/26827/snapshot/1332368_th_6492013a8356f_1687290170.jpg',
},
}, {
'url': 'https://n1info.rs/vesti/vuciceva-turneja-po-srbiji-najavljuje-kontrarevoluciju-preti-svom-narodu-vredja-novinare/',
'info_dict': {
'id': '2025974',
'ext': 'mp4',
'title': 'Vučićeva turneja po Srbiji: Najavljuje kontrarevoluciju, preti svom narodu, vređa novinare',
'thumbnail': 'https://cdn-uc.brid.tv/live/partners/26827/snapshot/2025974_fhd_67c4a23280a81_1740939826.jpg',
'timestamp': 1740939936,
'upload_date': '20250302',
},
}, {
'url': 'https://hr.n1info.com/vijesti/pravobraniteljica-o-ubojstvu-u-zagrebu-radi-se-o-doista-nezapamcenoj-situaciji/',
'only_matching': True,
@ -115,11 +157,11 @@ class N1InfoIIE(InfoExtractor):
title = self._html_search_regex(r'<h1[^>]+>(.+?)</h1>', webpage, 'title')
timestamp = unified_timestamp(self._html_search_meta('article:published_time', webpage))
plugin_data = self._html_search_meta('BridPlugin', webpage)
plugin_data = re.findall(r'\$bp\("(?:Brid|TargetVideo)_\d+",\s(.+)\);', webpage)
entries = []
if plugin_data:
site_id = self._html_search_regex(r'site:(\d+)', webpage, 'site id')
for video_data in re.findall(r'\$bp\("Brid_\d+", (.+)\);', webpage):
for video_data in plugin_data:
video_id = self._parse_json(video_data, title)['video']
entries.append({
'id': video_id,
@ -140,7 +182,7 @@ class N1InfoIIE(InfoExtractor):
'url': video_data.get('data-url'),
'id': video_data.get('id'),
'title': title,
'thumbnail': video_data.get('data-thumbnail'),
'thumbnail': traverse_obj(video_data, (('data-thumbnail', 'data-default_thumbnail'), {url_or_none}, any)),
'timestamp': timestamp,
'ie_key': 'N1InfoAsset',
})
@ -152,7 +194,7 @@ class N1InfoIIE(InfoExtractor):
if url.startswith('https://www.youtube.com'):
entries.append(self.url_result(url, ie='Youtube'))
elif url.startswith('https://www.redditmedia.com'):
entries.append(self.url_result(url, ie='RedditR'))
entries.append(self.url_result(url, ie='Reddit'))
return {
'_type': 'playlist',

View File

@ -736,7 +736,7 @@ class NBCStationsIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
nbc_data = self._search_json(
r'<script>\s*var\s+nbc\s*=', webpage, 'NBC JSON data', video_id)
r'(?:<script>\s*var\s+nbc\s*=|Object\.assign\(nbc,)', webpage, 'NBC JSON data', video_id)
pdk_acct = nbc_data.get('pdkAcct') or 'Yh1nAC'
fw_ssid = traverse_obj(nbc_data, ('video', 'fwSSID'))

View File

@ -13,11 +13,13 @@ from ..utils import (
ExtractorError,
OnDemandPagedList,
clean_html,
determine_ext,
float_or_none,
int_or_none,
join_nonempty,
parse_duration,
parse_iso8601,
parse_qs,
parse_resolution,
qualities,
remove_start,
@ -25,7 +27,9 @@ from ..utils import (
traverse_obj,
try_get,
unescapeHTML,
unified_timestamp,
update_url_query,
url_basename,
url_or_none,
urlencode_postdata,
urljoin,
@ -430,6 +434,7 @@ class NiconicoIE(InfoExtractor):
'format_id': ('id', {str}),
'abr': ('bitRate', {float_or_none(scale=1000)}),
'asr': ('samplingRate', {int_or_none}),
'quality': ('qualityLevel', {int_or_none}),
}), get_all=False),
'acodec': 'aac',
}
@ -441,7 +446,9 @@ class NiconicoIE(InfoExtractor):
min_abr = min(traverse_obj(audios, (..., 'bitRate', {float_or_none})), default=0) / 1000
for video_fmt in video_fmts:
video_fmt['tbr'] -= min_abr
video_fmt['format_id'] = f'video-{video_fmt["tbr"]:.0f}'
video_fmt['format_id'] = url_basename(video_fmt['url']).rpartition('.')[0]
video_fmt['quality'] = traverse_obj(videos, (
lambda _, v: v['id'] == video_fmt['format_id'], 'qualityLevel', {int_or_none}, any)) or -1
yield video_fmt
def _real_extract(self, url):
@ -979,6 +986,7 @@ class NiconicoLiveIE(InfoExtractor):
'quality': 'abr',
'protocol': 'hls+fmp4',
'latency': latency,
'accessRightMethod': 'single_cookie',
'chasePlay': False,
},
'room': {
@ -999,6 +1007,7 @@ class NiconicoLiveIE(InfoExtractor):
if data.get('type') == 'stream':
m3u8_url = data['data']['uri']
qualities = data['data']['availableQualities']
cookies = data['data']['cookies']
break
elif data.get('type') == 'disconnect':
self.write_debug(recv)
@ -1033,9 +1042,15 @@ class NiconicoLiveIE(InfoExtractor):
thumbnails.append({
'id': f'{name}_{width}x{height}',
'url': img_url,
'ext': traverse_obj(parse_qs(img_url), ('image', 0, {determine_ext(default_ext='jpg')})),
**res,
})
for cookie in cookies:
self._set_cookie(
cookie['domain'], cookie['name'], cookie['value'],
expire_time=unified_timestamp(cookie['expires']), path=cookie['path'], secure=cookie['secure'])
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4', live=True)
for fmt, q in zip(formats, reversed(qualities[1:])):
fmt.update({

View File

@ -1,34 +1,46 @@
import json
import re
from .brightcove import BrightcoveNewIE
from .common import InfoExtractor
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
smuggle_url,
parse_iso8601,
parse_resolution,
str_or_none,
try_get,
unified_strdate,
unified_timestamp,
url_or_none,
)
from ..utils.traversal import require, traverse_obj, value
class NineNowIE(InfoExtractor):
IE_NAME = '9now.com.au'
_VALID_URL = r'https?://(?:www\.)?9now\.com\.au/(?:[^/]+/){2}(?P<id>[^/?#]+)'
_GEO_COUNTRIES = ['AU']
_VALID_URL = r'https?://(?:www\.)?9now\.com\.au/(?:[^/?#]+/){2}(?P<id>(?P<type>clip|episode)-[^/?#]+)'
_GEO_BYPASS = False
_TESTS = [{
# clip
'url': 'https://www.9now.com.au/afl-footy-show/2016/clip-ciql02091000g0hp5oktrnytc',
'md5': '17cf47d63ec9323e562c9957a968b565',
'url': 'https://www.9now.com.au/today/season-2025/clip-cm8hw9h5z00080hquqa5hszq7',
'info_dict': {
'id': '16801',
'id': '6370295582112',
'ext': 'mp4',
'title': 'St. Kilda\'s Joey Montagna on the potential for a player\'s strike',
'description': 'Is a boycott of the NAB Cup "on the table"?',
'title': 'Would Karl Stefanovic be able to land a plane?',
'description': 'The Today host\'s skills are put to the test with the latest simulation tech.',
'uploader_id': '4460760524001',
'upload_date': '20160713',
'timestamp': 1468421266,
'duration': 197.376,
'tags': ['flights', 'technology', 'Karl Stefanovic'],
'season': 'Season 2025',
'season_number': 2025,
'series': 'TODAY',
'timestamp': 1742507988,
'upload_date': '20250320',
'release_timestamp': 1742507983,
'release_date': '20250320',
'thumbnail': r're:https?://.+/1920x0/.+\.jpg',
},
'params': {
'skip_download': 'HLS/DASH fragments and mp4 URLs are geo-restricted; only available in AU',
},
'skip': 'Only available in Australia',
}, {
# episode
'url': 'https://www.9now.com.au/afl-footy-show/2016/episode-19',
@ -41,7 +53,7 @@ class NineNowIE(InfoExtractor):
# episode of series
'url': 'https://www.9now.com.au/lego-masters/season-3/episode-3',
'info_dict': {
'id': '6249614030001',
'id': '6308830406112',
'title': 'Episode 3',
'ext': 'mp4',
'season_number': 3,
@ -50,72 +62,87 @@ class NineNowIE(InfoExtractor):
'uploader_id': '4460760524001',
'timestamp': 1619002200,
'upload_date': '20210421',
'duration': 3574.085,
'thumbnail': r're:https?://.+/1920x0/.+\.jpg',
'tags': ['episode'],
'series': 'Lego Masters',
'season': 'Season 3',
'episode': 'Episode 3',
'release_timestamp': 1619002200,
'release_date': '20210421',
},
'expected_warnings': ['Ignoring subtitle tracks'],
'params': {
'skip_download': True,
'skip_download': 'HLS/DASH fragments and mp4 URLs are geo-restricted; only available in AU',
},
}, {
'url': 'https://www.9now.com.au/married-at-first-sight/season-12/episode-1',
'info_dict': {
'id': '6367798770112',
'ext': 'mp4',
'title': 'Episode 1',
'description': r're:The cultural sensation of Married At First Sight returns with our first weddings! .{90}$',
'uploader_id': '4460760524001',
'duration': 5415.079,
'thumbnail': r're:https?://.+/1920x0/.+\.png',
'tags': ['episode'],
'season': 'Season 12',
'season_number': 12,
'episode': 'Episode 1',
'episode_number': 1,
'series': 'Married at First Sight',
'timestamp': 1737973800,
'upload_date': '20250127',
'release_timestamp': 1737973800,
'release_date': '20250127',
},
'params': {
'skip_download': 'HLS/DASH fragments and mp4 URLs are geo-restricted; only available in AU',
},
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/4460760524001/default_default/index.html?videoId=%s'
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/4460760524001/default_default/index.html?videoId={}'
# XXX: For parsing next.js v15+ data; see also yt_dlp.extractor.francetv and yt_dlp.extractor.goplay
def _find_json(self, s):
return self._search_json(
r'\w+\s*:\s*', s, 'next js data', None, contains_pattern=r'\[(?s:.+)\]', default=None)
def _real_extract(self, url):
display_id = self._match_id(url)
display_id, video_type = self._match_valid_url(url).group('id', 'type')
webpage = self._download_webpage(url, display_id)
page_data = self._parse_json(self._search_regex(
r'window\.__data\s*=\s*({.*?});', webpage,
'page data', default='{}'), display_id, fatal=False)
if not page_data:
page_data = self._parse_json(self._parse_json(self._search_regex(
r'window\.__data\s*=\s*JSON\.parse\s*\(\s*(".+?")\s*\)\s*;',
webpage, 'page data'), display_id), display_id)
for kind in ('episode', 'clip'):
current_key = page_data.get(kind, {}).get(
f'current{kind.capitalize()}Key')
if not current_key:
continue
cache = page_data.get(kind, {}).get(f'{kind}Cache', {})
if not cache:
continue
common_data = {
'episode': (cache.get(current_key) or next(iter(cache.values())))[kind],
'season': (cache.get(current_key) or next(iter(cache.values()))).get('season', None),
}
break
else:
raise ExtractorError('Unable to find video data')
common_data = traverse_obj(
re.findall(r'<script[^>]*>\s*self\.__next_f\.push\(\s*(\[.+?\])\s*\);?\s*</script>', webpage),
(..., {json.loads}, ..., {self._find_json},
lambda _, v: v['payload'][video_type]['slug'] == display_id,
'payload', any, {require('video data')}))
if not self.get_param('allow_unplayable_formats') and try_get(common_data, lambda x: x['episode']['video']['drm'], bool):
if traverse_obj(common_data, (video_type, 'video', 'drm', {bool})):
self.report_drm(display_id)
brightcove_id = try_get(
common_data, lambda x: x['episode']['video']['brightcoveId'], str) or 'ref:{}'.format(common_data['episode']['video']['referenceId'])
video_id = str_or_none(try_get(common_data, lambda x: x['episode']['video']['id'])) or brightcove_id
title = try_get(common_data, lambda x: x['episode']['name'], str)
season_number = try_get(common_data, lambda x: x['season']['seasonNumber'], int)
episode_number = try_get(common_data, lambda x: x['episode']['episodeNumber'], int)
timestamp = unified_timestamp(try_get(common_data, lambda x: x['episode']['airDate'], str))
release_date = unified_strdate(try_get(common_data, lambda x: x['episode']['availability'], str))
thumbnails_data = try_get(common_data, lambda x: x['episode']['image']['sizes'], dict) or {}
thumbnails = [{
'id': thumbnail_id,
'url': thumbnail_url,
'width': int_or_none(thumbnail_id[1:]),
} for thumbnail_id, thumbnail_url in thumbnails_data.items()]
brightcove_id = traverse_obj(common_data, (
video_type, 'video', (
('brightcoveId', {str}),
('referenceId', {str}, {lambda x: f'ref:{x}' if x else None}),
), any, {require('brightcove ID')}))
return {
'_type': 'url_transparent',
'url': smuggle_url(
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
{'geo_countries': self._GEO_COUNTRIES}),
'id': video_id,
'title': title,
'description': try_get(common_data, lambda x: x['episode']['description'], str),
'duration': float_or_none(try_get(common_data, lambda x: x['episode']['video']['duration'], float), 1000),
'thumbnails': thumbnails,
'ie_key': 'BrightcoveNew',
'season_number': season_number,
'episode_number': episode_number,
'timestamp': timestamp,
'release_date': release_date,
'ie_key': BrightcoveNewIE.ie_key(),
'url': self.BRIGHTCOVE_URL_TEMPLATE.format(brightcove_id),
**traverse_obj(common_data, {
'id': (video_type, 'video', 'id', {int}, ({str_or_none}, {value(brightcove_id)}), any),
'title': (video_type, 'name', {str}),
'description': (video_type, 'description', {str}),
'duration': (video_type, 'video', 'duration', {float_or_none(scale=1000)}),
'tags': (video_type, 'tags', ..., 'name', {str}, all, filter),
'series': ('tvSeries', 'name', {str}),
'season_number': ('season', 'seasonNumber', {int_or_none}),
'episode_number': ('episode', 'episodeNumber', {int_or_none}),
'timestamp': ('episode', 'airDate', {parse_iso8601}),
'release_timestamp': (video_type, 'availability', {parse_iso8601}),
'thumbnails': (video_type, 'image', 'sizes', {dict.items}, lambda _, v: url_or_none(v[1]), {
'id': 0,
'url': 1,
'width': (1, {parse_resolution}, 'width'),
}),
}),
}

View File

@ -11,12 +11,15 @@ class On24IE(InfoExtractor):
IE_NAME = 'on24'
IE_DESC = 'ON24'
_VALID_URL = r'''(?x)
https?://event\.on24\.com/(?:
wcc/r/(?P<id_1>\d{7})/(?P<key_1>[0-9A-F]{32})|
eventRegistration/(?:console/EventConsoleApollo|EventLobbyServlet\?target=lobby30)
\.jsp\?(?:[^/#?]*&)?eventid=(?P<id_2>\d{7})[^/#?]*&key=(?P<key_2>[0-9A-F]{32})
)'''
_ID_RE = r'(?P<id>\d{7})'
_KEY_RE = r'(?P<key>[0-9A-F]{32})'
_URL_BASE_RE = r'https?://event\.on24\.com'
_URL_QUERY_RE = rf'(?:[^#]*&)?eventid={_ID_RE}&(?:[^#]+&)?key={_KEY_RE}'
_VALID_URL = [
rf'{_URL_BASE_RE}/wcc/r/{_ID_RE}/{_KEY_RE}',
rf'{_URL_BASE_RE}/eventRegistration/console/(?:EventConsoleApollo\.jsp|apollox/mainEvent/?)\?{_URL_QUERY_RE}',
rf'{_URL_BASE_RE}/eventRegistration/EventLobbyServlet/?\?{_URL_QUERY_RE}',
]
_TESTS = [{
'url': 'https://event.on24.com/eventRegistration/console/EventConsoleApollo.jsp?uimode=nextgeneration&eventid=2197467&sessionid=1&key=5DF57BE53237F36A43B478DD36277A84&contenttype=A&eventuserid=305999&playerwidth=1000&playerheight=650&caller=previewLobby&text_language_id=en&format=fhaudio&newConsole=false',
@ -34,12 +37,16 @@ class On24IE(InfoExtractor):
}, {
'url': 'https://event.on24.com/eventRegistration/console/EventConsoleApollo.jsp?&eventid=2639291&sessionid=1&username=&partnerref=&format=fhvideo1&mobile=&flashsupportedmobiledevice=&helpcenter=&key=82829018E813065A122363877975752E&newConsole=true&nxChe=true&newTabCon=true&text_language_id=en&playerwidth=748&playerheight=526&eventuserid=338788762&contenttype=A&mediametricsessionid=384764716&mediametricid=3558192&usercd=369267058&mode=launch',
'only_matching': True,
}, {
'url': 'https://event.on24.com/eventRegistration/EventLobbyServlet?target=reg20.jsp&eventid=3543176&key=BC0F6B968B67C34B50D461D40FDB3E18&groupId=3143628',
'only_matching': True,
}, {
'url': 'https://event.on24.com/eventRegistration/console/apollox/mainEvent?&eventid=4843671&sessionid=1&username=&partnerref=&format=fhvideo1&mobile=&flashsupportedmobiledevice=&helpcenter=&key=4EAC9B5C564CC98FF29E619B06A2F743&newConsole=true&nxChe=true&newTabCon=true&consoleEarEventConsole=false&consoleEarCloudApi=false&text_language_id=en&playerwidth=748&playerheight=526&referrer=https%3A%2F%2Fevent.on24.com%2Finterface%2Fregistration%2Fautoreg%2Findex.html%3Fsessionid%3D1%26eventid%3D4843671%26key%3D4EAC9B5C564CC98FF29E619B06A2F743%26email%3D000a3e42-7952-4dd6-8f8a-34c38ea3cf02%2540platform%26firstname%3Ds%26lastname%3Ds%26deletecookie%3Dtrue%26event_email%3DN%26marketing_email%3DN%26std1%3D0642572014177%26std2%3D0642572014179%26std3%3D550165f7-a44e-4725-9fe6-716f89908c2b%26std4%3D0&eventuserid=745776448&contenttype=A&mediametricsessionid=640613707&mediametricid=6810717&usercd=745776448&mode=launch',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = self._match_valid_url(url)
event_id = mobj.group('id_1') or mobj.group('id_2')
event_key = mobj.group('key_1') or mobj.group('key_2')
event_id, event_key = self._match_valid_url(url).group('id', 'key')
event_data = self._download_json(
'https://event.on24.com/apic/utilApp/EventConsoleCachedServlet',

View File

@ -67,7 +67,7 @@ class OpenRecBaseIE(InfoExtractor):
class OpenRecIE(OpenRecBaseIE):
IE_NAME = 'openrec'
_VALID_URL = r'https?://(?:www\.)?openrec\.tv/live/(?P<id>[^/]+)'
_VALID_URL = r'https?://(?:www\.)?openrec\.tv/live/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://www.openrec.tv/live/2p8v31qe4zy',
'only_matching': True,
@ -85,7 +85,7 @@ class OpenRecIE(OpenRecBaseIE):
class OpenRecCaptureIE(OpenRecBaseIE):
IE_NAME = 'openrec:capture'
_VALID_URL = r'https?://(?:www\.)?openrec\.tv/capture/(?P<id>[^/]+)'
_VALID_URL = r'https?://(?:www\.)?openrec\.tv/capture/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://www.openrec.tv/capture/l9nk2x4gn14',
'only_matching': True,
@ -129,7 +129,7 @@ class OpenRecCaptureIE(OpenRecBaseIE):
class OpenRecMovieIE(OpenRecBaseIE):
IE_NAME = 'openrec:movie'
_VALID_URL = r'https?://(?:www\.)?openrec\.tv/movie/(?P<id>[^/]+)'
_VALID_URL = r'https?://(?:www\.)?openrec\.tv/movie/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://www.openrec.tv/movie/nqz5xl5km8v',
'info_dict': {
@ -141,6 +141,9 @@ class OpenRecMovieIE(OpenRecBaseIE):
'uploader_id': 'taiki_to_kazuhiro',
'timestamp': 1638856800,
},
}, {
'url': 'https://www.openrec.tv/movie/2p8vvex548y?playlist_id=98brq96vvsgn2nd',
'only_matching': True,
}]
def _real_extract(self, url):

101
yt_dlp/extractor/parti.py Normal file
View File

@ -0,0 +1,101 @@
from .common import InfoExtractor
from ..utils import UserNotLive, int_or_none, parse_iso8601, url_or_none, urljoin
from ..utils.traversal import traverse_obj
class PartiBaseIE(InfoExtractor):
def _call_api(self, path, video_id, note=None):
return self._download_json(
f'https://api-backend.parti.com/parti_v2/profile/{path}', video_id, note)
class PartiVideoIE(PartiBaseIE):
IE_NAME = 'parti:video'
_VALID_URL = r'https?://(?:www\.)?parti\.com/video/(?P<id>\d+)'
_TESTS = [{
'url': 'https://parti.com/video/66284',
'info_dict': {
'id': '66284',
'ext': 'mp4',
'title': 'NOW LIVE ',
'upload_date': '20250327',
'categories': ['Gaming'],
'thumbnail': 'https://assets.parti.com/351424_eb9e5250-2821-484a-9c5f-ca99aa666c87.png',
'channel': 'ItZTMGG',
'timestamp': 1743044379,
},
'params': {'skip_download': 'm3u8'},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._call_api(f'get_livestream_channel_info/recent/{video_id}', video_id)
return {
'id': video_id,
'formats': self._extract_m3u8_formats(
urljoin('https://watch.parti.com', data['livestream_recording']), video_id, 'mp4'),
**traverse_obj(data, {
'title': ('event_title', {str}),
'channel': ('user_name', {str}),
'thumbnail': ('event_file', {url_or_none}),
'categories': ('category_name', {str}, filter, all),
'timestamp': ('event_start_ts', {int_or_none}),
}),
}
class PartiLivestreamIE(PartiBaseIE):
IE_NAME = 'parti:livestream'
_VALID_URL = r'https?://(?:www\.)?parti\.com/creator/(?P<service>[\w]+)/(?P<id>[\w/-]+)'
_TESTS = [{
'url': 'https://parti.com/creator/parti/Capt_Robs_Adventures',
'info_dict': {
'id': 'Capt_Robs_Adventures',
'ext': 'mp4',
'title': r"re:I'm Live on Parti \d{4}-\d{2}-\d{2} \d{2}:\d{2}",
'view_count': int,
'thumbnail': r're:https://assets\.parti\.com/.+\.png',
'timestamp': 1743879776,
'upload_date': '20250405',
'live_status': 'is_live',
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://parti.com/creator/discord/sazboxgaming/0',
'only_matching': True,
}]
def _real_extract(self, url):
service, creator_slug = self._match_valid_url(url).group('service', 'id')
encoded_creator_slug = creator_slug.replace('/', '%23')
creator_id = self._call_api(
f'get_user_by_social_media/{service}/{encoded_creator_slug}',
creator_slug, note='Fetching user ID')
data = self._call_api(
f'get_livestream_channel_info/{creator_id}', creator_id,
note='Fetching user profile feed')['channel_info']
if not traverse_obj(data, ('channel', 'is_live', {bool})):
raise UserNotLive(video_id=creator_id)
channel_info = data['channel']
return {
'id': creator_slug,
'formats': self._extract_m3u8_formats(
channel_info['playback_url'], creator_slug, live=True, query={
'token': channel_info['playback_auth_token'],
'player_version': '1.17.0',
}),
'is_live': True,
**traverse_obj(data, {
'title': ('livestream_event_info', 'event_name', {str}),
'description': ('livestream_event_info', 'event_description', {str}),
'thumbnail': ('livestream_event_info', 'livestream_preview_file', {url_or_none}),
'timestamp': ('stream', 'start_time', {parse_iso8601}),
'view_count': ('stream', 'viewer_count', {int_or_none}),
}),
}

View File

@ -23,9 +23,9 @@ class PinterestBaseIE(InfoExtractor):
def _call_api(self, resource, video_id, options):
return self._download_json(
f'https://www.pinterest.com/resource/{resource}Resource/get/',
video_id, f'Download {resource} JSON metadata', query={
'data': json.dumps({'options': options}),
})['resource_response']
video_id, f'Download {resource} JSON metadata',
query={'data': json.dumps({'options': options})},
headers={'X-Pinterest-PWS-Handler': 'www/[username].js'})['resource_response']
def _extract_video(self, data, extract_formats=True):
video_id = data['id']

View File

@ -1,4 +1,7 @@
import base64
import hashlib
import json
import uuid
from .common import InfoExtractor
from ..utils import (
@ -142,39 +145,73 @@ class PlaySuisseIE(InfoExtractor):
id
url
}'''
_LOGIN_BASE_URL = 'https://login.srgssr.ch/srgssrlogin.onmicrosoft.com'
_LOGIN_PATH = 'B2C_1A__SignInV2'
_CLIENT_ID = '1e33f1bf-8bf3-45e4-bbd9-c9ad934b5fca'
_LOGIN_BASE = 'https://account.srgssr.ch'
_ID_TOKEN = None
def _perform_login(self, username, password):
login_page = self._download_webpage(
'https://www.playsuisse.ch/api/sso/login', None, note='Downloading login page',
query={'x': 'x', 'locale': 'de', 'redirectUrl': 'https://www.playsuisse.ch/'})
settings = self._search_json(r'var\s+SETTINGS\s*=', login_page, 'settings', None)
code_verifier = uuid.uuid4().hex + uuid.uuid4().hex + uuid.uuid4().hex
code_challenge = base64.urlsafe_b64encode(
hashlib.sha256(code_verifier.encode()).digest()).decode().rstrip('=')
csrf_token = settings['csrf']
query = {'tx': settings['transId'], 'p': self._LOGIN_PATH}
request_id = parse_qs(self._request_webpage(
f'{self._LOGIN_BASE}/authz-srv/authz', None, 'Requesting session ID', query={
'client_id': self._CLIENT_ID,
'redirect_uri': 'https://www.playsuisse.ch/auth',
'scope': 'email profile openid offline_access',
'response_type': 'code',
'code_challenge': code_challenge,
'code_challenge_method': 'S256',
'view_type': 'login',
}).url)['requestId'][0]
status = traverse_obj(self._download_json(
f'{self._LOGIN_BASE_URL}/{self._LOGIN_PATH}/SelfAsserted', None, 'Logging in',
query=query, headers={'X-CSRF-TOKEN': csrf_token}, data=urlencode_postdata({
'request_type': 'RESPONSE',
'signInName': username,
'password': password,
}), expected_status=400), ('status', {int_or_none}))
if status == 400:
raise ExtractorError('Invalid username or password', expected=True)
try:
exchange_id = self._download_json(
f'{self._LOGIN_BASE}/verification-srv/v2/authenticate/initiate/password', None,
'Submitting username', headers={'content-type': 'application/json'}, data=json.dumps({
'usage_type': 'INITIAL_AUTHENTICATION',
'request_id': request_id,
'medium_id': 'PASSWORD',
'type': 'password',
'identifier': username,
}).encode())['data']['exchange_id']['exchange_id']
except ExtractorError:
raise ExtractorError('Invalid username', expected=True)
urlh = self._request_webpage(
f'{self._LOGIN_BASE_URL}/{self._LOGIN_PATH}/api/CombinedSigninAndSignup/confirmed',
None, 'Downloading ID token', query={
'rememberMe': 'false',
'csrf_token': csrf_token,
**query,
'diags': '',
})
try:
login_data = self._download_json(
f'{self._LOGIN_BASE}/verification-srv/v2/authenticate/authenticate/password', None,
'Submitting password', headers={'content-type': 'application/json'}, data=json.dumps({
'requestId': request_id,
'exchange_id': exchange_id,
'type': 'password',
'password': password,
}).encode())['data']
except ExtractorError:
raise ExtractorError('Invalid password', expected=True)
authorization_code = parse_qs(self._request_webpage(
f'{self._LOGIN_BASE}/login-srv/verification/login', None, 'Logging in',
data=urlencode_postdata({
'requestId': request_id,
'exchange_id': login_data['exchange_id']['exchange_id'],
'verificationType': 'password',
'sub': login_data['sub'],
'status_id': login_data['status_id'],
'rememberMe': True,
'lat': '',
'lon': '',
})).url)['code'][0]
self._ID_TOKEN = self._download_json(
f'{self._LOGIN_BASE}/proxy/token', None, 'Downloading token', data=b'', query={
'client_id': self._CLIENT_ID,
'redirect_uri': 'https://www.playsuisse.ch/auth',
'code': authorization_code,
'code_verifier': code_verifier,
'grant_type': 'authorization_code',
})['id_token']
self._ID_TOKEN = traverse_obj(parse_qs(urlh.url), ('id_token', 0))
if not self._ID_TOKEN:
raise ExtractorError('Login failed')

View File

@ -22,7 +22,7 @@ from ..utils import (
)
class PolskieRadioBaseExtractor(InfoExtractor):
class PolskieRadioBaseIE(InfoExtractor):
def _extract_webpage_player_entries(self, webpage, playlist_id, base_data):
media_urls = set()
@ -47,7 +47,7 @@ class PolskieRadioBaseExtractor(InfoExtractor):
yield entry
class PolskieRadioLegacyIE(PolskieRadioBaseExtractor):
class PolskieRadioLegacyIE(PolskieRadioBaseIE):
# legacy sites
IE_NAME = 'polskieradio:legacy'
_VALID_URL = r'https?://(?:www\.)?polskieradio(?:24)?\.pl/\d+/\d+/[Aa]rtykul/(?P<id>\d+)'
@ -127,7 +127,7 @@ class PolskieRadioLegacyIE(PolskieRadioBaseExtractor):
return self.playlist_result(entries, playlist_id, title, description)
class PolskieRadioIE(PolskieRadioBaseExtractor):
class PolskieRadioIE(PolskieRadioBaseIE):
# new next.js sites
_VALID_URL = r'https?://(?:[^/]+\.)?(?:polskieradio(?:24)?|radiokierowcow)\.pl/artykul/(?P<id>\d+)'
_TESTS = [{
@ -519,7 +519,7 @@ class PolskieRadioPlayerIE(InfoExtractor):
}
class PolskieRadioPodcastBaseExtractor(InfoExtractor):
class PolskieRadioPodcastBaseIE(InfoExtractor):
_API_BASE = 'https://apipodcasts.polskieradio.pl/api'
def _parse_episode(self, data):
@ -539,7 +539,7 @@ class PolskieRadioPodcastBaseExtractor(InfoExtractor):
}
class PolskieRadioPodcastListIE(PolskieRadioPodcastBaseExtractor):
class PolskieRadioPodcastListIE(PolskieRadioPodcastBaseIE):
IE_NAME = 'polskieradio:podcast:list'
_VALID_URL = r'https?://podcasty\.polskieradio\.pl/podcast/(?P<id>\d+)'
_TESTS = [{
@ -578,7 +578,7 @@ class PolskieRadioPodcastListIE(PolskieRadioPodcastBaseExtractor):
}
class PolskieRadioPodcastIE(PolskieRadioPodcastBaseExtractor):
class PolskieRadioPodcastIE(PolskieRadioPodcastBaseIE):
IE_NAME = 'polskieradio:podcast'
_VALID_URL = r'https?://podcasty\.polskieradio\.pl/track/(?P<id>[a-f\d]{8}(?:-[a-f\d]{4}){4}[a-f\d]{8})'
_TESTS = [{

View File

@ -8,6 +8,7 @@ from ..utils import (
int_or_none,
parse_qs,
traverse_obj,
truncate_string,
try_get,
unescapeHTML,
update_url_query,
@ -26,6 +27,7 @@ class RedditIE(InfoExtractor):
'ext': 'mp4',
'display_id': '6rrwyj',
'title': 'That small heart attack.',
'alt_title': 'That small heart attack.',
'thumbnail': r're:^https?://.*\.(?:jpg|png)',
'thumbnails': 'count:4',
'timestamp': 1501941939,
@ -49,7 +51,8 @@ class RedditIE(InfoExtractor):
'id': 'gyh95hiqc0b11',
'ext': 'mp4',
'display_id': '90bu6w',
'title': 'Heat index was 110 degrees so we offered him a cold drink. He went for a full body soak instead',
'title': 'Heat index was 110 degrees so we offered him a cold drink. He went fo...',
'alt_title': 'Heat index was 110 degrees so we offered him a cold drink. He went for a full body soak instead',
'thumbnail': r're:^https?://.*\.(?:jpg|png)',
'thumbnails': 'count:7',
'timestamp': 1532051078,
@ -69,7 +72,8 @@ class RedditIE(InfoExtractor):
'id': 'zasobba6wp071',
'ext': 'mp4',
'display_id': 'nip71r',
'title': 'I plan to make more stickers and prints! Check them out on my Etsy! Or get them through my Patreon. Links below.',
'title': 'I plan to make more stickers and prints! Check them out on my Etsy! O...',
'alt_title': 'I plan to make more stickers and prints! Check them out on my Etsy! Or get them through my Patreon. Links below.',
'thumbnail': r're:^https?://.*\.(?:jpg|png)',
'thumbnails': 'count:5',
'timestamp': 1621709093,
@ -91,7 +95,17 @@ class RedditIE(InfoExtractor):
'playlist_count': 2,
'info_dict': {
'id': 'wzqkxp',
'title': 'md5:72d3d19402aa11eff5bd32fc96369b37',
'title': '[Finale] Kamen Rider Revice Episode 50 "Family to the End, Until the ...',
'alt_title': '[Finale] Kamen Rider Revice Episode 50 "Family to the End, Until the Day We Meet Again" Discussion',
'description': 'md5:5b7deb328062b164b15704c5fd67c335',
'uploader': 'TheTwelveYearOld',
'channel_id': 'KamenRider',
'comment_count': int,
'like_count': int,
'dislike_count': int,
'age_limit': 0,
'timestamp': 1661676059.0,
'upload_date': '20220828',
},
}, {
# crossposted reddit-hosted media
@ -102,6 +116,7 @@ class RedditIE(InfoExtractor):
'ext': 'mp4',
'display_id': 'zjjw82',
'title': 'Cringe',
'alt_title': 'Cringe',
'uploader': 'Otaku-senpai69420',
'thumbnail': r're:^https?://.*\.(?:jpg|png)',
'upload_date': '20221212',
@ -122,6 +137,7 @@ class RedditIE(InfoExtractor):
'ext': 'mp4',
'display_id': '124pp33',
'title': 'Harmless prank of some old friends',
'alt_title': 'Harmless prank of some old friends',
'uploader': 'Dudezila',
'channel_id': 'ContagiousLaughter',
'duration': 17,
@ -142,6 +158,7 @@ class RedditIE(InfoExtractor):
'ext': 'mp4',
'display_id': '12fujy3',
'title': 'Based Hasan?',
'alt_title': 'Based Hasan?',
'uploader': 'KingNigelXLII',
'channel_id': 'GenZedong',
'duration': 16,
@ -161,6 +178,7 @@ class RedditIE(InfoExtractor):
'ext': 'mp4',
'display_id': '1cl9h0u',
'title': 'The insurance claim will be interesting',
'alt_title': 'The insurance claim will be interesting',
'uploader': 'darrenpauli',
'channel_id': 'Unexpected',
'duration': 53,
@ -183,6 +201,7 @@ class RedditIE(InfoExtractor):
'ext': 'mp4',
'display_id': '1cxwzso',
'title': 'Tottenham [1] - 0 Newcastle United - James Maddison 31\'',
'alt_title': 'Tottenham [1] - 0 Newcastle United - James Maddison 31\'',
'uploader': 'Woodstovia',
'channel_id': 'soccer',
'duration': 30,
@ -206,6 +225,7 @@ class RedditIE(InfoExtractor):
'ext': 'mp4',
'display_id': 'degtjo',
'title': 'When the K hits',
'alt_title': 'When the K hits',
'uploader': '[deleted]',
'channel_id': 'ketamine',
'comment_count': int,
@ -304,14 +324,6 @@ class RedditIE(InfoExtractor):
data = data[0]['data']['children'][0]['data']
video_url = data['url']
over_18 = data.get('over_18')
if over_18 is True:
age_limit = 18
elif over_18 is False:
age_limit = 0
else:
age_limit = None
thumbnails = []
def add_thumbnail(src):
@ -337,15 +349,19 @@ class RedditIE(InfoExtractor):
add_thumbnail(resolution)
info = {
'title': data.get('title'),
'thumbnails': thumbnails,
'timestamp': float_or_none(data.get('created_utc')),
'uploader': data.get('author'),
'channel_id': data.get('subreddit'),
'like_count': int_or_none(data.get('ups')),
'dislike_count': int_or_none(data.get('downs')),
'comment_count': int_or_none(data.get('num_comments')),
'age_limit': age_limit,
'age_limit': {True: 18, False: 0}.get(data.get('over_18')),
**traverse_obj(data, {
'title': ('title', {truncate_string(left=72)}),
'alt_title': ('title', {str}),
'description': ('selftext', {str}, filter),
'timestamp': ('created_utc', {float_or_none}),
'uploader': ('author', {str}),
'channel_id': ('subreddit', {str}),
'like_count': ('ups', {int_or_none}),
'dislike_count': ('downs', {int_or_none}),
'comment_count': ('num_comments', {int_or_none}),
}),
}
parsed_url = urllib.parse.urlparse(video_url)
@ -371,7 +387,7 @@ class RedditIE(InfoExtractor):
**info,
})
if entries:
return self.playlist_result(entries, video_id, info.get('title'))
return self.playlist_result(entries, video_id, **info)
raise ExtractorError('No media found', expected=True)
# Check if media is hosted on reddit:

View File

@ -12,7 +12,7 @@ from ..utils import (
)
class RedGifsBaseInfoExtractor(InfoExtractor):
class RedGifsBaseIE(InfoExtractor):
_FORMATS = {
'gif': 250,
'sd': 480,
@ -113,7 +113,7 @@ class RedGifsBaseInfoExtractor(InfoExtractor):
return page_fetcher(page) if page else OnDemandPagedList(page_fetcher, self._PAGE_SIZE)
class RedGifsIE(RedGifsBaseInfoExtractor):
class RedGifsIE(RedGifsBaseIE):
_VALID_URL = r'https?://(?:(?:www\.)?redgifs\.com/(?:watch|ifr)/|thumbs2\.redgifs\.com/)(?P<id>[^-/?#\.]+)'
_TESTS = [{
'url': 'https://www.redgifs.com/watch/squeakyhelplesswisent',
@ -172,7 +172,7 @@ class RedGifsIE(RedGifsBaseInfoExtractor):
return self._parse_gif_data(video_info['gif'])
class RedGifsSearchIE(RedGifsBaseInfoExtractor):
class RedGifsSearchIE(RedGifsBaseIE):
IE_DESC = 'Redgifs search'
_VALID_URL = r'https?://(?:www\.)?redgifs\.com/browse\?(?P<query>[^#]+)'
_PAGE_SIZE = 80
@ -226,7 +226,7 @@ class RedGifsSearchIE(RedGifsBaseInfoExtractor):
entries, query_str, tags, f'RedGifs search for {tags}, ordered by {order}')
class RedGifsUserIE(RedGifsBaseInfoExtractor):
class RedGifsUserIE(RedGifsBaseIE):
IE_DESC = 'Redgifs user'
_VALID_URL = r'https?://(?:www\.)?redgifs\.com/users/(?P<username>[^/?#]+)(?:\?(?P<query>[^#]+))?'
_PAGE_SIZE = 80

43
yt_dlp/extractor/roya.py Normal file
View File

@ -0,0 +1,43 @@
from .common import InfoExtractor
from ..utils.traversal import traverse_obj
class RoyaLiveIE(InfoExtractor):
_VALID_URL = r'https?://roya\.tv/live-stream/(?P<id>\d+)'
_TESTS = [{
'url': 'https://roya.tv/live-stream/1',
'info_dict': {
'id': '1',
'title': r're:Roya TV \d{4}-\d{2}-\d{2} \d{2}:\d{2}',
'ext': 'mp4',
'live_status': 'is_live',
},
}, {
'url': 'https://roya.tv/live-stream/21',
'info_dict': {
'id': '21',
'title': r're:Roya News \d{4}-\d{2}-\d{2} \d{2}:\d{2}',
'ext': 'mp4',
'live_status': 'is_live',
},
}, {
'url': 'https://roya.tv/live-stream/10000',
'only_matching': True,
}]
def _real_extract(self, url):
media_id = self._match_id(url)
stream_url = self._download_json(
f'https://ticket.roya-tv.com/api/v5/fastchannel/{media_id}', media_id)['data']['secured_url']
title = traverse_obj(
self._download_json('https://backend.roya.tv/api/v01/channels/schedule-pagination', media_id, fatal=False),
('data', 0, 'channel', lambda _, v: str(v['id']) == media_id, 'title', {str}, any))
return {
'id': media_id,
'formats': self._extract_m3u8_formats(stream_url, media_id, 'mp4', m3u8_id='hls', live=True),
'title': title,
'is_live': True,
}

View File

@ -3,12 +3,20 @@ import json
import re
import urllib.parse
from .common import InfoExtractor
from ..utils import js_to_json
from .common import InfoExtractor, Request
from ..utils import (
determine_ext,
int_or_none,
js_to_json,
parse_duration,
parse_iso8601,
url_or_none,
)
from ..utils.traversal import traverse_obj
class RTPIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?rtp\.pt/play/(?:(?:estudoemcasa|palco|zigzag)/)?p(?P<program_id>[0-9]+)/(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:www\.)?rtp\.pt/play/(?:[^/#?]+/)?p(?P<program_id>\d+)/(?P<id>e\d+)'
_TESTS = [{
'url': 'http://www.rtp.pt/play/p405/e174042/paixoes-cruzadas',
'md5': 'e736ce0c665e459ddb818546220b4ef8',
@ -16,99 +24,173 @@ class RTPIE(InfoExtractor):
'id': 'e174042',
'ext': 'mp3',
'title': 'Paixões Cruzadas',
'description': 'As paixões musicais de António Cartaxo e António Macedo',
'description': 'md5:af979e58ba0ab73f78435fc943fdb070',
'thumbnail': r're:^https?://.*\.jpg',
'series': 'Paixões Cruzadas',
'duration': 2950.0,
'modified_timestamp': 1553693464,
'modified_date': '20190327',
'timestamp': 1417219200,
'upload_date': '20141129',
},
}, {
'url': 'https://www.rtp.pt/play/zigzag/p13166/e757904/25-curiosidades-25-de-abril',
'md5': '9a81ed53f2b2197cfa7ed455b12f8ade',
'md5': '5b4859940e3adef61247a77dfb76046a',
'info_dict': {
'id': 'e757904',
'ext': 'mp4',
'title': '25 Curiosidades, 25 de Abril',
'description': 'Estudar ou não estudar - Em cada um dos episódios descobrimos uma curiosidade acerca de como era viver em Portugal antes da revolução do 25 de abr',
'title': 'Estudar ou não estudar',
'description': 'md5:3bfd7eb8bebfd5711a08df69c9c14c35',
'thumbnail': r're:^https?://.*\.jpg',
'timestamp': 1711958401,
'duration': 146.0,
'upload_date': '20240401',
'modified_timestamp': 1712242991,
'series': '25 Curiosidades, 25 de Abril',
'episode_number': 2,
'episode': 'Estudar ou não estudar',
'modified_date': '20240404',
},
}, {
'url': 'http://www.rtp.pt/play/p831/a-quimica-das-coisas',
'only_matching': True,
}, {
'url': 'https://www.rtp.pt/play/estudoemcasa/p7776/portugues-1-ano',
'only_matching': True,
}, {
'url': 'https://www.rtp.pt/play/palco/p13785/l7nnon',
'only_matching': True,
# Episode not accessible through API
'url': 'https://www.rtp.pt/play/estudoemcasa/p7776/e500050/portugues-1-ano',
'md5': '57660c0b46db9f22118c52cbd65975e4',
'info_dict': {
'id': 'e500050',
'ext': 'mp4',
'title': 'Português - 1.º ano',
'duration': 1669.0,
'description': 'md5:be68925c81269f8c6886589f25fe83ea',
'upload_date': '20201020',
'timestamp': 1603180799,
'thumbnail': 'https://cdn-images.rtp.pt/EPG/imagens/39482_59449_64850.png?v=3&w=860',
},
}]
_USER_AGENT = 'rtpplay/2.0.66 (pt.rtp.rtpplay; build:2066; iOS 15.8.3) Alamofire/5.9.1'
_AUTH_TOKEN = None
def _fetch_auth_token(self):
if self._AUTH_TOKEN:
return self._AUTH_TOKEN
self._AUTH_TOKEN = traverse_obj(self._download_json(Request(
'https://rtpplayapi.rtp.pt/play/api/2/token-manager',
headers={
'Accept': '*/*',
'rtp-play-auth': 'RTPPLAY_MOBILE_IOS',
'rtp-play-auth-hash': 'fac9c328b2f27e26e03d7f8942d66c05b3e59371e16c2a079f5c83cc801bd3ee',
'rtp-play-auth-timestamp': '2145973229682',
'User-Agent': self._USER_AGENT,
}, extensions={'keep_header_casing': True}), None,
note='Fetching guest auth token', errnote='Could not fetch guest auth token',
fatal=False), ('token', 'token', {str}))
return self._AUTH_TOKEN
@staticmethod
def _cleanup_media_url(url):
if urllib.parse.urlparse(url).netloc == 'streaming-ondemand.rtp.pt':
return None
return url.replace('/drm-fps/', '/hls/').replace('/drm-dash/', '/dash/')
def _extract_formats(self, media_urls, episode_id):
formats = []
subtitles = {}
for media_url in set(traverse_obj(media_urls, (..., {url_or_none}, {self._cleanup_media_url}))):
ext = determine_ext(media_url)
if ext == 'm3u8':
fmts, subs = self._extract_m3u8_formats_and_subtitles(
media_url, episode_id, m3u8_id='hls', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
elif ext == 'mpd':
fmts, subs = self._extract_mpd_formats_and_subtitles(
media_url, episode_id, mpd_id='dash', fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
else:
formats.append({
'url': media_url,
'format_id': 'http',
})
return formats, subtitles
def _extract_from_api(self, program_id, episode_id):
auth_token = self._fetch_auth_token()
if not auth_token:
return
episode_data = traverse_obj(self._download_json(
f'https://www.rtp.pt/play/api/1/get-episode/{program_id}/{episode_id[1:]}', episode_id,
query={'include_assets': 'true', 'include_webparams': 'true'},
headers={
'Accept': '*/*',
'Authorization': f'Bearer {auth_token}',
'User-Agent': self._USER_AGENT,
}, fatal=False), 'result', {dict})
if not episode_data:
return
asset_urls = traverse_obj(episode_data, ('assets', 0, 'asset_url', {dict}))
media_urls = traverse_obj(asset_urls, (
((('hls', 'dash'), 'stream_url'), ('multibitrate', ('url_hls', 'url_dash'))),))
formats, subtitles = self._extract_formats(media_urls, episode_id)
for sub_data in traverse_obj(asset_urls, ('subtitles', 'vtt_list', lambda _, v: url_or_none(v['file']))):
subtitles.setdefault(sub_data.get('code') or 'pt', []).append({
'url': sub_data['file'],
'name': sub_data.get('language'),
})
return {
'id': episode_id,
'formats': formats,
'subtitles': subtitles,
'thumbnail': traverse_obj(episode_data, ('assets', 0, 'asset_thumbnail', {url_or_none})),
**traverse_obj(episode_data, ('episode', {
'title': (('episode_title', 'program_title'), {str}, filter, any),
'alt_title': ('episode_subtitle', {str}, filter),
'description': (('episode_description', 'episode_summary'), {str}, filter, any),
'timestamp': ('episode_air_date', {parse_iso8601(delimiter=' ')}),
'modified_timestamp': ('episode_lastchanged', {parse_iso8601(delimiter=' ')}),
'duration': ('episode_duration_complete', {parse_duration}),
'episode': ('episode_title', {str}, filter),
'episode_number': ('episode_number', {int_or_none}),
'season': ('program_season', {str}, filter),
'series': ('program_title', {str}, filter),
})),
}
_RX_OBFUSCATION = re.compile(r'''(?xs)
atob\s*\(\s*decodeURIComponent\s*\(\s*
(\[[0-9A-Za-z%,'"]*\])
\s*\.\s*join\(\s*(?:""|'')\s*\)\s*\)\s*\)
''')
def __unobfuscate(self, data, *, video_id):
if data.startswith('{'):
data = self._RX_OBFUSCATION.sub(
lambda m: json.dumps(
base64.b64decode(urllib.parse.unquote(
''.join(self._parse_json(m.group(1), video_id)),
)).decode('iso-8859-1')),
data)
return js_to_json(data)
def __unobfuscate(self, data):
return self._RX_OBFUSCATION.sub(
lambda m: json.dumps(
base64.b64decode(urllib.parse.unquote(
''.join(json.loads(m.group(1))),
)).decode('iso-8859-1')),
data)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_meta(
'twitter:title', webpage, display_name='title', fatal=True)
f, config = self._search_regex(
r'''(?sx)
(?:var\s+f\s*=\s*(?P<f>".*?"|{[^;]+?});\s*)?
var\s+player1\s+=\s+new\s+RTPPlayer\s*\((?P<config>{(?:(?!\*/).)+?})\);(?!\s*\*/)
''', webpage,
'player config', group=('f', 'config'))
config = self._parse_json(
config, video_id,
lambda data: self.__unobfuscate(data, video_id=video_id))
f = config['file'] if not f else self._parse_json(
f, video_id,
lambda data: self.__unobfuscate(data, video_id=video_id))
def _extract_from_html(self, url, episode_id):
webpage = self._download_webpage(url, episode_id)
formats = []
if isinstance(f, dict):
f_hls = f.get('hls')
if f_hls is not None:
formats.extend(self._extract_m3u8_formats(
f_hls, video_id, 'mp4', 'm3u8_native', m3u8_id='hls'))
f_dash = f.get('dash')
if f_dash is not None:
formats.extend(self._extract_mpd_formats(f_dash, video_id, mpd_id='dash'))
else:
formats.append({
'format_id': 'f',
'url': f,
'vcodec': 'none' if config.get('mediaType') == 'audio' else None,
})
subtitles = {}
vtt = config.get('vtt')
if vtt is not None:
for lcode, lname, url in vtt:
subtitles.setdefault(lcode, []).append({
'name': lname,
'url': url,
})
media_urls = traverse_obj(re.findall(r'(?:var\s+f\s*=|RTPPlayer\({[^}]+file:)\s*({[^}]+}|"[^"]+")', webpage), (
-1, (({self.__unobfuscate}, {js_to_json}, {json.loads}, {dict.values}, ...), {json.loads})))
formats, subtitles = self._extract_formats(media_urls, episode_id)
return {
'id': video_id,
'title': title,
'id': episode_id,
'formats': formats,
'description': self._html_search_meta(['description', 'twitter:description'], webpage),
'thumbnail': config.get('poster') or self._og_search_thumbnail(webpage),
'subtitles': subtitles,
'description': self._html_search_meta(['og:description', 'twitter:description'], webpage, default=None),
'thumbnail': self._html_search_meta(['og:image', 'twitter:image'], webpage, default=None),
**self._search_json_ld(webpage, episode_id, default={}),
'title': self._html_search_meta(['og:title', 'twitter:title'], webpage, default=None),
}
def _real_extract(self, url):
program_id, episode_id = self._match_valid_url(url).group('program_id', 'id')
return self._extract_from_api(program_id, episode_id) or self._extract_from_html(url, episode_id)

View File

@ -9,7 +9,9 @@ from ..utils import (
class RTVSIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?rtvs\.sk/(?:radio|televizia)/archiv(?:/\d+)?/(?P<id>\d+)/?(?:[#?]|$)'
IE_NAME = 'stvr'
IE_DESC = 'Slovak Television and Radio (formerly RTVS)'
_VALID_URL = r'https?://(?:www\.)?(?:rtvs|stvr)\.sk/(?:radio|televizia)/archiv(?:/\d+)?/(?P<id>\d+)/?(?:[#?]|$)'
_TESTS = [{
# radio archive
'url': 'http://www.rtvs.sk/radio/archiv/11224/414872',
@ -19,7 +21,7 @@ class RTVSIE(InfoExtractor):
'ext': 'mp3',
'title': 'Ostrov pokladov 1 časť.mp3',
'duration': 2854,
'thumbnail': 'https://www.rtvs.sk/media/a501/image/file/2/0000/b1R8.rtvs.jpg',
'thumbnail': 'https://www.stvr.sk/media/a501/image/file/2/0000/rtvs-00009383.png',
'display_id': '135331',
},
}, {
@ -30,7 +32,7 @@ class RTVSIE(InfoExtractor):
'ext': 'mp4',
'title': 'Amaro Džives - Náš deň',
'description': 'Galavečer pri príležitosti Medzinárodného dňa Rómov.',
'thumbnail': 'https://www.rtvs.sk/media/a501/image/file/2/0031/L7Qm.amaro_dzives_png.jpg',
'thumbnail': 'https://www.stvr.sk/media/a501/image/file/2/0031/L7Qm.amaro_dzives_png.jpg',
'timestamp': 1428555900,
'upload_date': '20150409',
'duration': 4986,
@ -47,8 +49,11 @@ class RTVSIE(InfoExtractor):
'display_id': '307655',
'duration': 831,
'upload_date': '20211111',
'thumbnail': 'https://www.rtvs.sk/media/a501/image/file/2/0916/robin.jpg',
'thumbnail': 'https://www.stvr.sk/media/a501/image/file/2/0916/robin.jpg',
},
}, {
'url': 'https://www.stvr.sk/radio/archiv/11224/414872',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@ -7,7 +7,6 @@ from ..utils import (
ExtractorError,
UnsupportedError,
clean_html,
determine_ext,
extract_attributes,
format_field,
get_element_by_class,
@ -36,7 +35,7 @@ class RumbleEmbedIE(InfoExtractor):
'upload_date': '20191020',
'channel_url': 'https://rumble.com/c/WMAR',
'channel': 'WMAR',
'thumbnail': 'https://sp.rmbl.ws/s8/1/5/M/z/1/5Mz1a.qR4e-small-WMAR-2-News-Latest-Headline.jpg',
'thumbnail': r're:https://.+\.jpg',
'duration': 234,
'uploader': 'WMAR',
'live_status': 'not_live',
@ -52,7 +51,7 @@ class RumbleEmbedIE(InfoExtractor):
'upload_date': '20220217',
'channel_url': 'https://rumble.com/c/CyberTechNews',
'channel': 'CTNews',
'thumbnail': 'https://sp.rmbl.ws/s8/6/7/i/9/h/7i9hd.OvCc.jpg',
'thumbnail': r're:https://.+\.jpg',
'duration': 901,
'uploader': 'CTNews',
'live_status': 'not_live',
@ -114,6 +113,22 @@ class RumbleEmbedIE(InfoExtractor):
'live_status': 'was_live',
},
'params': {'skip_download': True},
}, {
'url': 'https://rumble.com/embed/v6pezdb',
'info_dict': {
'id': 'v6pezdb',
'ext': 'mp4',
'title': '"Es war einmal ein Mädchen" Ein filmisches Zeitzeugnis aus Leningrad 1944',
'uploader': 'RT DE',
'channel': 'RT DE',
'channel_url': 'https://rumble.com/c/RTDE',
'duration': 309,
'thumbnail': 'https://1a-1791.com/video/fww1/dc/s8/1/n/z/2/y/nz2yy.qR4e-small-Es-war-einmal-ein-Mdchen-Ei.jpg',
'timestamp': 1743703500,
'upload_date': '20250403',
'live_status': 'not_live',
},
'params': {'skip_download': True},
}, {
'url': 'https://rumble.com/embed/ufe9n.v5pv5f',
'only_matching': True,
@ -168,40 +183,42 @@ class RumbleEmbedIE(InfoExtractor):
live_status = None
formats = []
for ext, ext_info in (video.get('ua') or {}).items():
if isinstance(ext_info, dict):
for height, video_info in ext_info.items():
for format_type, format_info in (video.get('ua') or {}).items():
if isinstance(format_info, dict):
for height, video_info in format_info.items():
if not traverse_obj(video_info, ('meta', 'h', {int_or_none})):
video_info.setdefault('meta', {})['h'] = height
ext_info = ext_info.values()
format_info = format_info.values()
for video_info in ext_info:
for video_info in format_info:
meta = video_info.get('meta') or {}
if not video_info.get('url'):
continue
if ext == 'hls':
# With default query params returns m3u8 variants which are duplicates, without returns tar files
if format_type == 'tar':
continue
if format_type == 'hls':
if meta.get('live') is True and video.get('live') == 1:
live_status = 'post_live'
formats.extend(self._extract_m3u8_formats(
video_info['url'], video_id,
ext='mp4', m3u8_id='hls', fatal=False, live=live_status == 'is_live'))
continue
timeline = ext == 'timeline'
if timeline:
ext = determine_ext(video_info['url'])
is_timeline = format_type == 'timeline'
is_audio = format_type == 'audio'
formats.append({
'ext': ext,
'acodec': 'none' if timeline else None,
'acodec': 'none' if is_timeline else None,
'vcodec': 'none' if is_audio else None,
'url': video_info['url'],
'format_id': join_nonempty(ext, format_field(meta, 'h', '%sp')),
'format_note': 'Timeline' if timeline else None,
'fps': None if timeline else video.get('fps'),
'format_id': join_nonempty(format_type, format_field(meta, 'h', '%sp')),
'format_note': 'Timeline' if is_timeline else None,
'fps': None if is_timeline or is_audio else video.get('fps'),
**traverse_obj(meta, {
'tbr': 'bitrate',
'filesize': 'size',
'width': 'w',
'height': 'h',
}, expected_type=lambda x: int(x) or None),
'tbr': ('bitrate', {int_or_none}),
'filesize': ('size', {int_or_none}),
'width': ('w', {int_or_none}),
'height': ('h', {int_or_none}),
}),
})
subtitles = {

View File

@ -122,6 +122,15 @@ class SBSIE(InfoExtractor):
if traverse_obj(media, ('partOfSeries', {dict})):
media['epName'] = traverse_obj(media, ('title', {str}))
# Need to set different language for forced subs or else they have priority over full subs
fixed_subtitles = {}
for lang, subs in subtitles.items():
for sub in subs:
fixed_lang = lang
if sub['url'].lower().endswith('_fe.vtt'):
fixed_lang += '-forced'
fixed_subtitles.setdefault(fixed_lang, []).append(sub)
return {
'id': video_id,
**traverse_obj(media, {
@ -151,6 +160,6 @@ class SBSIE(InfoExtractor):
}),
}),
'formats': formats,
'subtitles': subtitles,
'subtitles': fixed_subtitles,
'uploader': 'SBSC',
}

View File

@ -13,7 +13,7 @@ from ..utils.traversal import traverse_obj
class SenateISVPIE(InfoExtractor):
_IE_NAME = 'senate.gov:isvp'
IE_NAME = 'senate.gov:isvp'
_VALID_URL = r'https?://(?:www\.)?senate\.gov/isvp/?\?(?P<qs>.+)'
_EMBED_REGEX = [r"<iframe[^>]+src=['\"](?P<url>https?://www\.senate\.gov/isvp/?\?[^'\"]+)['\"]"]
@ -137,7 +137,7 @@ class SenateISVPIE(InfoExtractor):
class SenateGovIE(InfoExtractor):
_IE_NAME = 'senate.gov'
IE_NAME = 'senate.gov'
_SUBDOMAIN_RE = '|'.join(map(re.escape, (
'agriculture', 'aging', 'appropriations', 'armed-services', 'banking',
'budget', 'commerce', 'energy', 'epw', 'finance', 'foreign', 'help',

View File

@ -2,16 +2,18 @@ import urllib.parse
from .common import InfoExtractor
from ..utils import (
clean_html,
dict_get,
int_or_none,
parse_duration,
unified_timestamp,
url_or_none,
urljoin,
)
from ..utils.traversal import traverse_obj
class SkyItPlayerIE(InfoExtractor):
IE_NAME = 'player.sky.it'
_VALID_URL = r'https?://player\.sky\.it/player/(?:external|social)\.html\?.*?\bid=(?P<id>\d+)'
class SkyItBaseIE(InfoExtractor):
_GEO_BYPASS = False
_DOMAIN = 'sky'
_PLAYER_TMPL = 'https://player.sky.it/player/external.html?id=%s&domain=%s'
@ -33,7 +35,6 @@ class SkyItPlayerIE(InfoExtractor):
SkyItPlayerIE.ie_key(), video_id)
def _parse_video(self, video, video_id):
title = video['title']
is_live = video.get('type') == 'live'
hls_url = video.get(('streaming' if is_live else 'hls') + '_url')
if not hls_url and video.get('geoblock' if is_live else 'geob'):
@ -43,7 +44,7 @@ class SkyItPlayerIE(InfoExtractor):
return {
'id': video_id,
'title': title,
'title': video.get('title'),
'formats': formats,
'thumbnail': dict_get(video, ('video_still', 'video_still_medium', 'thumb')),
'description': video.get('short_desc') or None,
@ -52,6 +53,11 @@ class SkyItPlayerIE(InfoExtractor):
'is_live': is_live,
}
class SkyItPlayerIE(SkyItBaseIE):
IE_NAME = 'player.sky.it'
_VALID_URL = r'https?://player\.sky\.it/player/(?:external|social)\.html\?.*?\bid=(?P<id>\d+)'
def _real_extract(self, url):
video_id = self._match_id(url)
domain = urllib.parse.parse_qs(urllib.parse.urlparse(
@ -67,7 +73,7 @@ class SkyItPlayerIE(InfoExtractor):
return self._parse_video(video, video_id)
class SkyItVideoIE(SkyItPlayerIE): # XXX: Do not subclass from concrete IE
class SkyItVideoIE(SkyItBaseIE):
IE_NAME = 'video.sky.it'
_VALID_URL = r'https?://(?:masterchef|video|xfactor)\.sky\.it(?:/[^/]+)*/video/[0-9a-z-]+-(?P<id>\d+)'
_TESTS = [{
@ -96,7 +102,7 @@ class SkyItVideoIE(SkyItPlayerIE): # XXX: Do not subclass from concrete IE
return self._player_url_result(video_id)
class SkyItVideoLiveIE(SkyItPlayerIE): # XXX: Do not subclass from concrete IE
class SkyItVideoLiveIE(SkyItBaseIE):
IE_NAME = 'video.sky.it:live'
_VALID_URL = r'https?://video\.sky\.it/diretta/(?P<id>[^/?&#]+)'
_TEST = {
@ -124,7 +130,7 @@ class SkyItVideoLiveIE(SkyItPlayerIE): # XXX: Do not subclass from concrete IE
return self._parse_video(livestream, asset_id)
class SkyItIE(SkyItPlayerIE): # XXX: Do not subclass from concrete IE
class SkyItIE(SkyItBaseIE):
IE_NAME = 'sky.it'
_VALID_URL = r'https?://(?:sport|tg24)\.sky\.it(?:/[^/]+)*/\d{4}/\d{2}/\d{2}/(?P<id>[^/?&#]+)'
_TESTS = [{
@ -223,3 +229,80 @@ class TV8ItIE(SkyItVideoIE): # XXX: Do not subclass from concrete IE
'params': {'skip_download': 'm3u8'},
}]
_DOMAIN = 'mtv8'
class TV8ItLiveIE(SkyItBaseIE):
IE_NAME = 'tv8.it:live'
IE_DESC = 'TV8 Live'
_VALID_URL = r'https?://(?:www\.)?tv8\.it/streaming'
_TESTS = [{
'url': 'https://tv8.it/streaming',
'info_dict': {
'id': 'tv8',
'ext': 'mp4',
'title': str,
'description': str,
'is_live': True,
'live_status': 'is_live',
},
}]
def _real_extract(self, url):
video_id = 'tv8'
livestream = self._download_json(
'https://apid.sky.it/vdp/v1/getLivestream', video_id,
'Downloading manifest JSON', query={'id': '7'})
metadata = self._download_json('https://tv8.it/api/getStreaming', video_id, fatal=False)
return {
**self._parse_video(livestream, video_id),
**traverse_obj(metadata, ('info', {
'title': ('title', 'text', {str}),
'description': ('description', 'html', {clean_html}),
})),
}
class TV8ItPlaylistIE(InfoExtractor):
IE_NAME = 'tv8.it:playlist'
IE_DESC = 'TV8 Playlist'
_VALID_URL = r'https?://(?:www\.)?tv8\.it/(?!video)[^/#?]+/(?P<id>[^/#?]+)'
_TESTS = [{
'url': 'https://tv8.it/intrattenimento/tv8-gialappas-night',
'playlist_mincount': 32,
'info_dict': {
'id': 'tv8-gialappas-night',
'title': 'Tv8 Gialappa\'s Night',
'description': 'md5:c876039d487d9cf40229b768872718ed',
'thumbnail': r're:https://static\.sky\.it/.+\.(png|jpe?g|webp)',
},
}, {
'url': 'https://tv8.it/sport/uefa-europa-league',
'playlist_mincount': 11,
'info_dict': {
'id': 'uefa-europa-league',
'title': 'UEFA Europa League',
'description': 'md5:9ab1832b7a8b1705b1f590e13a36bc6a',
'thumbnail': r're:https://static\.sky\.it/.+\.(png|jpe?g|webp)',
},
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
data = self._search_nextjs_data(webpage, playlist_id)['props']['pageProps']['data']
entries = [self.url_result(
urljoin('https://tv8.it', card['href']), ie=TV8ItIE,
**traverse_obj(card, {
'description': ('extraData', 'videoDesc', {str}),
'id': ('extraData', 'asset_id', {str}),
'thumbnail': ('image', 'src', {url_or_none}),
'title': ('title', 'typography', 'text', {str}),
}))
for card in traverse_obj(data, ('lastContent', 'cards', lambda _, v: v['href']))]
return self.playlist_result(entries, playlist_id, **traverse_obj(data, ('card', 'desktop', {
'description': ('description', 'html', {clean_html}),
'thumbnail': ('image', 'src', {url_or_none}),
'title': ('title', 'text', {str}),
})))

View File

@ -0,0 +1,87 @@
from .common import InfoExtractor
from .vimeo import VHXEmbedIE
from ..utils import (
ExtractorError,
clean_html,
update_url,
urlencode_postdata,
)
from ..utils.traversal import find_element, traverse_obj
class SoftWhiteUnderbellyIE(InfoExtractor):
_LOGIN_URL = 'https://www.softwhiteunderbelly.com/login'
_NETRC_MACHINE = 'softwhiteunderbelly'
_VALID_URL = r'https?://(?:www\.)?softwhiteunderbelly\.com/videos/(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://www.softwhiteunderbelly.com/videos/kenneth-final1',
'note': 'A single Soft White Underbelly Episode',
'md5': '8e79f29ec1f1bda6da2e0b998fcbebb8',
'info_dict': {
'id': '3201266',
'ext': 'mp4',
'display_id': 'kenneth-final1',
'title': 'Appalachian Man interview-Kenneth',
'description': 'Soft White Underbelly interview and portrait of Kenneth, an Appalachian man in Clay County, Kentucky.',
'thumbnail': 'https://vhx.imgix.net/softwhiteunderbelly/assets/249f6db0-2b39-49a4-979b-f8dad4681825.jpg',
'uploader_url': 'https://vimeo.com/user80538407',
'uploader': 'OTT Videos',
'uploader_id': 'user80538407',
'duration': 512,
},
'expected_warnings': ['Failed to parse XML: not well-formed'],
}, {
'url': 'https://www.softwhiteunderbelly.com/videos/tj-2-final-2160p',
'note': 'A single Soft White Underbelly Episode',
'md5': '286bd8851b4824c62afb369e6f307036',
'info_dict': {
'id': '3506029',
'ext': 'mp4',
'display_id': 'tj-2-final-2160p',
'title': 'Fentanyl Addict interview-TJ (follow up)',
'description': 'Soft White Underbelly follow up interview and portrait of TJ, a fentanyl addict on Skid Row.',
'thumbnail': 'https://vhx.imgix.net/softwhiteunderbelly/assets/c883d531-5da0-4faf-a2e2-8eba97e5adfc.jpg',
'duration': 817,
'uploader': 'OTT Videos',
'uploader_url': 'https://vimeo.com/user80538407',
'uploader_id': 'user80538407',
},
'expected_warnings': ['Failed to parse XML: not well-formed'],
}]
def _perform_login(self, username, password):
signin_page = self._download_webpage(self._LOGIN_URL, None, 'Fetching authenticity token')
self._download_webpage(
self._LOGIN_URL, None, 'Logging in',
data=urlencode_postdata({
'email': username,
'password': password,
'authenticity_token': self._html_search_regex(
r'name=["\']authenticity_token["\']\s+value=["\']([^"\']+)', signin_page, 'authenticity_token'),
'utf8': True,
}),
)
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
if '<div id="watch-unauthorized"' in webpage:
if self._get_cookies('https://www.softwhiteunderbelly.com').get('_session'):
raise ExtractorError('This account is not subscribed to this content', expected=True)
self.raise_login_required()
embed_url, embed_id = self._html_search_regex(
r'embed_url:\s*["\'](?P<url>https?://embed\.vhx\.tv/videos/(?P<id>\d+)[^"\']*)',
webpage, 'embed url', group=('url', 'id'))
return {
'_type': 'url_transparent',
'ie_key': VHXEmbedIE.ie_key(),
'url': VHXEmbedIE._smuggle_referrer(embed_url, 'https://www.softwhiteunderbelly.com'),
'id': embed_id,
'display_id': display_id,
'title': traverse_obj(webpage, ({find_element(id='watch-info')}, {find_element(cls='video-title')}, {clean_html})),
'description': self._html_search_meta('description', webpage, default=None),
'thumbnail': update_url(self._og_search_thumbnail(webpage) or '', query=None) or None,
}

View File

@ -52,7 +52,8 @@ class SoundcloudBaseIE(InfoExtractor):
_API_VERIFY_AUTH_TOKEN = 'https://api-auth.soundcloud.com/connect/session%s'
_HEADERS = {}
_IMAGE_REPL_RE = r'-([0-9a-z]+)\.jpg'
_IMAGE_REPL_RE = r'-[0-9a-z]+\.(?P<ext>jpg|png)'
_TAGS_RE = re.compile(r'"([^"]+)"|([^ ]+)')
_ARTWORK_MAP = {
'mini': 16,
@ -331,12 +332,14 @@ class SoundcloudBaseIE(InfoExtractor):
thumbnails = []
artwork_url = info.get('artwork_url')
thumbnail = artwork_url or user.get('avatar_url')
if isinstance(thumbnail, str):
if re.search(self._IMAGE_REPL_RE, thumbnail):
if url_or_none(thumbnail):
if mobj := re.search(self._IMAGE_REPL_RE, thumbnail):
for image_id, size in self._ARTWORK_MAP.items():
# Soundcloud serves JPEG regardless of URL's ext *except* for "original" thumb
ext = mobj.group('ext') if image_id == 'original' else 'jpg'
i = {
'id': image_id,
'url': re.sub(self._IMAGE_REPL_RE, f'-{image_id}.jpg', thumbnail),
'url': re.sub(self._IMAGE_REPL_RE, f'-{image_id}.{ext}', thumbnail),
}
if image_id == 'tiny' and not artwork_url:
size = 18
@ -372,6 +375,7 @@ class SoundcloudBaseIE(InfoExtractor):
'comment_count': extract_count('comment'),
'repost_count': extract_count('reposts'),
'genres': traverse_obj(info, ('genre', {str}, filter, all, filter)),
'tags': traverse_obj(info, ('tag_list', {self._TAGS_RE.findall}, ..., ..., filter)),
'artists': traverse_obj(info, ('publisher_metadata', 'artist', {str}, filter, all, filter)),
'formats': formats if not extract_flat else None,
}
@ -425,6 +429,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'repost_count': int,
'thumbnail': 'https://i1.sndcdn.com/artworks-000031955188-rwb18x-original.jpg',
'uploader_url': 'https://soundcloud.com/ethmusic',
'tags': 'count:14',
},
},
# geo-restricted
@ -440,7 +445,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'uploader_id': '9615865',
'timestamp': 1337635207,
'upload_date': '20120521',
'duration': 227.155,
'duration': 227.103,
'license': 'all-rights-reserved',
'view_count': int,
'like_count': int,
@ -450,6 +455,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'thumbnail': 'https://i1.sndcdn.com/artworks-v8bFHhXm7Au6-0-original.jpg',
'genres': ['Alternative'],
'artists': ['The Royal Concept'],
'tags': [],
},
},
# private link
@ -475,6 +481,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'uploader_url': 'https://soundcloud.com/jaimemf',
'thumbnail': 'https://a1.sndcdn.com/images/default_avatar_large.png',
'genres': ['youtubedl'],
'tags': [],
},
},
# private link (alt format)
@ -500,15 +507,16 @@ class SoundcloudIE(SoundcloudBaseIE):
'uploader_url': 'https://soundcloud.com/jaimemf',
'thumbnail': 'https://a1.sndcdn.com/images/default_avatar_large.png',
'genres': ['youtubedl'],
'tags': [],
},
},
# downloadable song
{
'url': 'https://soundcloud.com/the80m/the-following',
'md5': '9ffcddb08c87d74fb5808a3c183a1d04',
'md5': 'ecb87d7705d5f53e6c02a63760573c75', # wav: '9ffcddb08c87d74fb5808a3c183a1d04'
'info_dict': {
'id': '343609555',
'ext': 'wav',
'ext': 'opus', # wav original available with auth
'title': 'The Following',
'track': 'The Following',
'description': '',
@ -526,15 +534,18 @@ class SoundcloudIE(SoundcloudBaseIE):
'view_count': int,
'genres': ['Dance & EDM'],
'artists': ['80M'],
'tags': ['80M', 'EDM', 'Dance', 'Music'],
},
'expected_warnings': ['Original download format is only available for registered users'],
},
# private link, downloadable format
# tags with spaces (e.g. "Uplifting Trance", "Ori Uplift")
{
'url': 'https://soundcloud.com/oriuplift/uponly-238-no-talking-wav/s-AyZUd',
'md5': '64a60b16e617d41d0bef032b7f55441e',
'md5': '2e1530d0e9986a833a67cb34fc90ece0', # wav: '64a60b16e617d41d0bef032b7f55441e'
'info_dict': {
'id': '340344461',
'ext': 'wav',
'ext': 'opus', # wav original available with auth
'title': 'Uplifting Only 238 [No Talking] (incl. Alex Feed Guestmix) (Aug 31, 2017) [wav]',
'track': 'Uplifting Only 238 [No Talking] (incl. Alex Feed Guestmix) (Aug 31, 2017) [wav]',
'description': 'md5:fa20ee0fca76a3d6df8c7e57f3715366',
@ -552,7 +563,9 @@ class SoundcloudIE(SoundcloudBaseIE):
'uploader_url': 'https://soundcloud.com/oriuplift',
'genres': ['Trance'],
'artists': ['Ori Uplift'],
'tags': ['Orchestral', 'Emotional', 'Uplifting Trance', 'Trance', 'Ori Uplift', 'UpOnly'],
},
'expected_warnings': ['Original download format is only available for registered users'],
},
# no album art, use avatar pic for thumbnail
{
@ -577,6 +590,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'repost_count': int,
'uploader_url': 'https://soundcloud.com/garyvee',
'artists': ['MadReal'],
'tags': [],
},
'params': {
'skip_download': True,
@ -604,8 +618,47 @@ class SoundcloudIE(SoundcloudBaseIE):
'repost_count': int,
'genres': ['Piano'],
'uploader_url': 'https://soundcloud.com/giovannisarani',
'tags': 'count:10',
},
},
# .png "original" artwork, 160kbps m4a HLS format
{
'url': 'https://soundcloud.com/skorxh/audio-dealer',
'info_dict': {
'id': '2011421339',
'ext': 'm4a',
'title': 'audio dealer',
'description': '',
'uploader': '$KORCH',
'uploader_id': '150292288',
'uploader_url': 'https://soundcloud.com/skorxh',
'comment_count': int,
'view_count': int,
'like_count': int,
'repost_count': int,
'duration': 213.469,
'tags': [],
'artists': ['$KORXH'],
'track': 'audio dealer',
'timestamp': 1737143201,
'upload_date': '20250117',
'license': 'all-rights-reserved',
'thumbnail': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-original.png',
'thumbnails': [
{'id': 'mini', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-mini.jpg'},
{'id': 'tiny', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-tiny.jpg'},
{'id': 'small', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-small.jpg'},
{'id': 'badge', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-badge.jpg'},
{'id': 't67x67', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-t67x67.jpg'},
{'id': 'large', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-large.jpg'},
{'id': 't300x300', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-t300x300.jpg'},
{'id': 'crop', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-crop.jpg'},
{'id': 't500x500', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-t500x500.jpg'},
{'id': 'original', 'url': 'https://i1.sndcdn.com/artworks-a1wKGMYNreDLTMrT-fGjRiw-original.png'},
],
},
'params': {'skip_download': 'm3u8', 'format': 'hls_aac_160k'},
},
{
# AAC HQ format available (account with active subscription needed)
'url': 'https://soundcloud.com/wandw/the-chainsmokers-ft-daya-dont-let-me-down-ww-remix-1',

View File

@ -1,5 +1,6 @@
from .bunnycdn import BunnyCdnIE
from .common import InfoExtractor
from ..utils import try_get, unified_timestamp
from ..utils import make_archive_id, try_get, unified_timestamp
class SovietsClosetBaseIE(InfoExtractor):
@ -43,7 +44,7 @@ class SovietsClosetIE(SovietsClosetBaseIE):
'url': 'https://sovietscloset.com/video/1337',
'md5': 'bd012b04b261725510ca5383074cdd55',
'info_dict': {
'id': '1337',
'id': '2f0cfbf4-3588-43a9-a7d6-7c9ea3755e67',
'ext': 'mp4',
'title': 'The Witcher #13',
'thumbnail': r're:^https?://.*\.b-cdn\.net/2f0cfbf4-3588-43a9-a7d6-7c9ea3755e67/thumbnail\.jpg$',
@ -55,20 +56,23 @@ class SovietsClosetIE(SovietsClosetBaseIE):
'upload_date': '20170413',
'uploader_id': 'SovietWomble',
'uploader_url': 'https://www.twitch.tv/SovietWomble',
'duration': 7007,
'duration': 7008,
'was_live': True,
'availability': 'public',
'series': 'The Witcher',
'season': 'Misc',
'episode_number': 13,
'episode': 'Episode 13',
'creators': ['SovietWomble'],
'description': '',
'_old_archive_ids': ['sovietscloset 1337'],
},
},
{
'url': 'https://sovietscloset.com/video/1105',
'md5': '89fa928f183893cb65a0b7be846d8a90',
'info_dict': {
'id': '1105',
'id': 'c0e5e76f-3a93-40b4-bf01-12343c2eec5d',
'ext': 'mp4',
'title': 'Arma 3 - Zeus Games #5',
'uploader': 'SovietWomble',
@ -80,39 +84,20 @@ class SovietsClosetIE(SovietsClosetBaseIE):
'upload_date': '20160420',
'uploader_id': 'SovietWomble',
'uploader_url': 'https://www.twitch.tv/SovietWomble',
'duration': 8804,
'duration': 8805,
'was_live': True,
'availability': 'public',
'series': 'Arma 3',
'season': 'Zeus Games',
'episode_number': 5,
'episode': 'Episode 5',
'creators': ['SovietWomble'],
'description': '',
'_old_archive_ids': ['sovietscloset 1105'],
},
},
]
def _extract_bunnycdn_iframe(self, video_id, bunnycdn_id):
iframe = self._download_webpage(
f'https://iframe.mediadelivery.net/embed/5105/{bunnycdn_id}',
video_id, note='Downloading BunnyCDN iframe', headers=self.MEDIADELIVERY_REFERER)
m3u8_url = self._search_regex(r'(https?://.*?\.m3u8)', iframe, 'm3u8 url')
thumbnail_url = self._search_regex(r'(https?://.*?thumbnail\.jpg)', iframe, 'thumbnail url')
m3u8_formats = self._extract_m3u8_formats(m3u8_url, video_id, headers=self.MEDIADELIVERY_REFERER)
if not m3u8_formats:
duration = None
else:
duration = self._extract_m3u8_vod_duration(
m3u8_formats[0]['url'], video_id, headers=self.MEDIADELIVERY_REFERER)
return {
'formats': m3u8_formats,
'thumbnail': thumbnail_url,
'duration': duration,
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
@ -122,13 +107,13 @@ class SovietsClosetIE(SovietsClosetBaseIE):
stream = self.parse_nuxt_jsonp(f'{static_assets_base}/video/{video_id}/payload.js', video_id, 'video')['stream']
return {
return self.url_result(
f'https://iframe.mediadelivery.net/embed/5105/{stream["bunnyId"]}', ie=BunnyCdnIE, url_transparent=True,
**self.video_meta(
video_id=video_id, game_name=stream['game']['name'],
category_name=try_get(stream, lambda x: x['subcategory']['name'], str),
episode_number=stream.get('number'), stream_date=stream.get('date')),
**self._extract_bunnycdn_iframe(video_id, stream['bunnyId']),
}
_old_archive_ids=[make_archive_id(self, video_id)])
class SovietsClosetPlaylistIE(SovietsClosetBaseIE):

236
yt_dlp/extractor/streaks.py Normal file
View File

@ -0,0 +1,236 @@
import json
import urllib.parse
from .common import InfoExtractor
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
filter_dict,
float_or_none,
join_nonempty,
mimetype2ext,
parse_iso8601,
unsmuggle_url,
update_url_query,
url_or_none,
)
from ..utils.traversal import traverse_obj
class StreaksBaseIE(InfoExtractor):
_API_URL_TEMPLATE = 'https://{}.api.streaks.jp/v1/projects/{}/medias/{}{}'
_GEO_BYPASS = False
_GEO_COUNTRIES = ['JP']
def _extract_from_streaks_api(self, project_id, media_id, headers=None, query=None, ssai=False):
try:
response = self._download_json(
self._API_URL_TEMPLATE.format('playback', project_id, media_id, ''),
media_id, 'Downloading STREAKS playback API JSON', headers={
'Accept': 'application/json',
'Origin': 'https://players.streaks.jp',
**self.geo_verification_headers(),
**(headers or {}),
})
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status in {403, 404}:
error = self._parse_json(e.cause.response.read().decode(), media_id, fatal=False)
message = traverse_obj(error, ('message', {str}))
code = traverse_obj(error, ('code', {str}))
if code == 'REQUEST_FAILED':
self.raise_geo_restricted(message, countries=self._GEO_COUNTRIES)
elif code == 'MEDIA_NOT_FOUND':
raise ExtractorError(message, expected=True)
elif code or message:
raise ExtractorError(join_nonempty(code, message, delim=': '))
raise
streaks_id = response['id']
live_status = {
'clip': 'was_live',
'file': 'not_live',
'linear': 'is_live',
'live': 'is_live',
}.get(response.get('type'))
formats, subtitles = [], {}
drm_formats = False
for source in traverse_obj(response, ('sources', lambda _, v: v['src'])):
if source.get('key_systems'):
drm_formats = True
continue
src_url = source['src']
is_live = live_status == 'is_live'
ext = mimetype2ext(source.get('type'))
if ext != 'm3u8':
self.report_warning(f'Unsupported stream type: {ext}')
continue
if is_live and ssai:
session_params = traverse_obj(self._download_json(
self._API_URL_TEMPLATE.format('ssai', project_id, streaks_id, '/ssai/session'),
media_id, 'Downloading session parameters',
headers={'Content-Type': 'application/json', 'Accept': 'application/json'},
data=json.dumps({'id': source['id']}).encode(),
), (0, 'query', {urllib.parse.parse_qs}))
src_url = update_url_query(src_url, session_params)
fmts, subs = self._extract_m3u8_formats_and_subtitles(
src_url, media_id, 'mp4', m3u8_id='hls', fatal=False, live=is_live, query=query)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
if not formats and drm_formats:
self.report_drm(media_id)
self._remove_duplicate_formats(formats)
for subs in traverse_obj(response, (
'tracks', lambda _, v: v['kind'] in ('captions', 'subtitles') and url_or_none(v['src']),
)):
lang = traverse_obj(subs, ('srclang', {str.lower})) or 'ja'
subtitles.setdefault(lang, []).append({'url': subs['src']})
return {
'id': streaks_id,
'display_id': media_id,
'formats': formats,
'live_status': live_status,
'subtitles': subtitles,
'uploader_id': project_id,
**traverse_obj(response, {
'title': ('name', {str}),
'description': ('description', {str}, filter),
'duration': ('duration', {float_or_none}),
'modified_timestamp': ('updated_at', {parse_iso8601}),
'tags': ('tags', ..., {str}),
'thumbnails': (('poster', 'thumbnail'), 'src', {'url': {url_or_none}}),
'timestamp': ('created_at', {parse_iso8601}),
}),
}
class StreaksIE(StreaksBaseIE):
_VALID_URL = [
r'https?://players\.streaks\.jp/(?P<project_id>[\w-]+)/[\da-f]+/index\.html\?(?:[^#]+&)?m=(?P<id>(?:ref:)?[\w-]+)',
r'https?://playback\.api\.streaks\.jp/v1/projects/(?P<project_id>[\w-]+)/medias/(?P<id>(?:ref:)?[\w-]+)',
]
_EMBED_REGEX = [rf'<iframe\s+[^>]*\bsrc\s*=\s*["\'](?P<url>{_VALID_URL[0]})']
_TESTS = [{
'url': 'https://players.streaks.jp/tipness/08155cd19dc14c12bebefb69b92eafcc/index.html?m=dbdf2df35b4d483ebaeeaeb38c594647',
'info_dict': {
'id': 'dbdf2df35b4d483ebaeeaeb38c594647',
'ext': 'mp4',
'title': '3shunenCM_edit.mp4',
'display_id': 'dbdf2df35b4d483ebaeeaeb38c594647',
'duration': 47.533,
'live_status': 'not_live',
'modified_date': '20230726',
'modified_timestamp': 1690356180,
'timestamp': 1690355996,
'upload_date': '20230726',
'uploader_id': 'tipness',
},
}, {
'url': 'https://players.streaks.jp/ktv-web/0298e8964c164ab384c07ef6e08c444b/index.html?m=ref:mycoffeetime_250317',
'info_dict': {
'id': 'dccdc079e3fd41f88b0c8435e2d453ab',
'ext': 'mp4',
'title': 'わたしの珈琲時間_250317',
'display_id': 'ref:mycoffeetime_250317',
'duration': 122.99,
'live_status': 'not_live',
'modified_date': '20250310',
'modified_timestamp': 1741586302,
'thumbnail': r're:https?://.+\.jpg',
'timestamp': 1741585839,
'upload_date': '20250310',
'uploader_id': 'ktv-web',
},
}, {
'url': 'https://playback.api.streaks.jp/v1/projects/ktv-web/medias/b5411938e1e5435dac71edf829dd4813',
'info_dict': {
'id': 'b5411938e1e5435dac71edf829dd4813',
'ext': 'mp4',
'title': 'KANTELE_SYUSEi_0630',
'display_id': 'b5411938e1e5435dac71edf829dd4813',
'live_status': 'not_live',
'modified_date': '20250122',
'modified_timestamp': 1737522999,
'thumbnail': r're:https?://.+\.jpg',
'timestamp': 1735205137,
'upload_date': '20241226',
'uploader_id': 'ktv-web',
},
}, {
# TVer Olympics: website already down, but api remains accessible
'url': 'https://playback.api.streaks.jp/v1/projects/tver-olympic/medias/ref:sp_240806_1748_dvr',
'info_dict': {
'id': 'c10f7345adb648cf804d7578ab93b2e3',
'ext': 'mp4',
'title': 'サッカー 男子 準決勝_dvr',
'display_id': 'ref:sp_240806_1748_dvr',
'duration': 12960.0,
'live_status': 'was_live',
'modified_date': '20240805',
'modified_timestamp': 1722896263,
'timestamp': 1722777618,
'upload_date': '20240804',
'uploader_id': 'tver-olympic',
},
}, {
# TBS FREE: 24-hour stream
'url': 'https://playback.api.streaks.jp/v1/projects/tbs/medias/ref:simul-02',
'info_dict': {
'id': 'c4e83a7b48f4409a96adacec674b4e22',
'ext': 'mp4',
'title': str,
'display_id': 'ref:simul-02',
'live_status': 'is_live',
'modified_date': '20241031',
'modified_timestamp': 1730339858,
'timestamp': 1705466840,
'upload_date': '20240117',
'uploader_id': 'tbs',
},
}, {
# DRM protected
'url': 'https://players.streaks.jp/sp-jbc/a12d7ee0f40c49d6a0a2bff520639677/index.html?m=5f89c62f37ee4a68be8e6e3b1396c7d8',
'only_matching': True,
}]
_WEBPAGE_TESTS = [{
'url': 'https://event.play.jp/playnext2023/',
'info_dict': {
'id': '2d975178293140dc8074a7fc536a7604',
'ext': 'mp4',
'title': 'PLAY NEXTキームービー本番',
'uploader_id': 'play',
'duration': 17.05,
'thumbnail': r're:https?://.+\.jpg',
'timestamp': 1668387517,
'upload_date': '20221114',
'modified_timestamp': 1739411523,
'modified_date': '20250213',
'live_status': 'not_live',
},
}, {
'url': 'https://wowshop.jp/Page/special/cooking_goods/?bid=wowshop&srsltid=AfmBOor_phUNoPEE_UCPiGGSCMrJE5T2US397smvsbrSdLqUxwON0el4',
'playlist_mincount': 2,
'info_dict': {
'id': '?bid=wowshop&srsltid=AfmBOor_phUNoPEE_UCPiGGSCMrJE5T2US397smvsbrSdLqUxwON0el4',
'title': 'ワンランク上の料理道具でとびきりの“おいしい”を食卓へwowshop',
'description': 'md5:914b5cb8624fc69274c7fb7b2342958f',
'age_limit': 0,
'thumbnail': 'https://wowshop.jp/Page/special/cooking_goods/images/ogp.jpg',
},
}]
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
project_id, media_id = self._match_valid_url(url).group('project_id', 'id')
return self._extract_from_streaks_api(
project_id, media_id, headers=filter_dict({
'X-Streaks-Api-Key': smuggled_data.get('api_key'),
}))

View File

@ -191,12 +191,12 @@ class TapTapAppIE(TapTapBaseIE):
}]
class TapTapIntlBase(TapTapBaseIE):
class TapTapIntlBaseIE(TapTapBaseIE):
_X_UA = 'V=1&PN=WebAppIntl2&LANG=zh_TW&VN_CODE=115&VN=0.1.0&LOC=CN&PLT=PC&DS=Android&UID={uuid}&CURR=&DT=PC&OS=Windows&OSV=NT%208.0.0'
_VIDEO_API = 'https://www.taptap.io/webapiv2/video-resource/v1/multi-get'
class TapTapAppIntlIE(TapTapIntlBase):
class TapTapAppIntlIE(TapTapIntlBaseIE):
_VALID_URL = r'https?://www\.taptap\.io/app/(?P<id>\d+)'
_INFO_API = 'https://www.taptap.io/webapiv2/i/app/v5/detail'
_DATA_PATH = 'app'
@ -227,7 +227,7 @@ class TapTapAppIntlIE(TapTapIntlBase):
}]
class TapTapPostIntlIE(TapTapIntlBase):
class TapTapPostIntlIE(TapTapIntlBaseIE):
_VALID_URL = r'https?://www\.taptap\.io/post/(?P<id>\d+)'
_INFO_API = 'https://www.taptap.io/webapiv2/creation/post/v1/detail'
_INFO_QUERY_KEY = 'id_str'

View File

@ -46,7 +46,7 @@ class TelecincoBaseIE(InfoExtractor):
error_code = traverse_obj(
self._webpage_read_content(error.cause.response, caronte['cerbero'], video_id, fatal=False),
({json.loads}, 'code', {int}))
if error_code == 4038:
if error_code in (4038, 40313):
self.raise_geo_restricted(countries=['ES'])
raise

Some files were not shown because too many files have changed in this diff Show More