Compare commits

...

29 Commits

Author SHA1 Message Date
Ricardo Garcia
c6b311c524 Set version number for release 2010-10-31 11:23:58 +01:00
Ricardo Garcia
79e75f66c8 Remove --best-quality option and add proper support for high definition format 2010-10-31 11:23:58 +01:00
Ricardo Garcia
053e77d6ed Remove old ignore patterns which are no longer needed 2010-10-31 11:23:58 +01:00
Ricardo Garcia
d0a9affb46 Replace setter and getter with simple attribute access 2010-10-31 11:23:58 +01:00
Ricardo Garcia
76800042fd Replace version number while in progress 2010-10-31 11:23:58 +01:00
Ricardo Garcia
7ab2043c9c Bump version number 2010-10-31 11:23:52 +01:00
Ricardo Garcia
3e703dd1cd Remove generator and webpage template, moved to wiki 2010-10-31 11:23:52 +01:00
Ricardo Garcia
cc10940385 Fix very wrong code for setting the language
It turned out that, despite the program working without apparent errors,
the code for setting the language was completely wrong. First, it didn't
run unless some form of authentication was performed. Second, I
misstyped _LANG_URL as _LOGIN_URL, so the language was not being set at
all! Amazing it still worked.
2010-10-31 11:23:48 +01:00
Ricardo Garcia
5121ef2071 Fix wrong indentation 2010-10-31 11:23:48 +01:00
Ricardo Garcia
fd20984889 Bump version number 2010-10-31 11:23:48 +01:00
Ricardo Garcia
111ae3695c Document new -w option 2010-10-31 11:23:48 +01:00
Ricardo Garcia
0beeff4b3e Add que -w or --no-overwrites option 2010-10-31 11:23:48 +01:00
Ricardo Garcia
64a6f26c5d Put Danny Colligan as an author in the script itself 2010-10-31 11:23:48 +01:00
Ricardo Garcia
a9633f1457 Use quote_plus instead of manually replacing spaces by plus signs 2010-10-31 11:23:48 +01:00
Ricardo Garcia
a20e4c2f96 Improve documentation of new features in webpage 2010-10-31 11:23:47 +01:00
Ricardo Garcia
d1536018a8 Include Danny Colligan in credits 2010-10-31 11:23:47 +01:00
Ricardo Garcia
25af2bce3a Include Danny Colligan's YouTube search InfoExtractor 2010-10-31 11:23:47 +01:00
Ricardo Garcia
d1580ed990 Fix NameError 2010-10-31 11:23:45 +01:00
Ricardo Garcia
eb0d2909a8 Document new -a option 2010-10-31 11:23:44 +01:00
Ricardo Garcia
ba72f8a5d1 Bump version and increase Firefox version number 2010-10-31 11:23:44 +01:00
Ricardo Garcia
c6fd0bb806 Add -a (--batch-file) option 2010-10-31 11:23:44 +01:00
Ricardo Garcia
72ac78b8b0 Fix for YouTube internationalization changes 2010-10-31 11:23:44 +01:00
Ricardo Garcia
240b737ebd Bump version number 2010-10-31 11:23:41 +01:00
Ricardo Garcia
27d98b6e25 Fix TypeError in decode() method and unordered playlist URLs 2010-10-31 11:23:41 +01:00
Ricardo Garcia
5487aea5d8 Improve documentation 2010-10-31 11:23:41 +01:00
Ricardo Garcia
9ca4851a00 Bump version number 2010-10-31 11:23:38 +01:00
Ricardo Garcia
1e9daf2a48 Make the YouTube login mechanism work across countries 2010-10-31 11:23:38 +01:00
Ricardo Garcia
d853063955 Bump version number 2010-10-31 11:23:38 +01:00
Ricardo Garcia
2546e7679f Fix metacafe.com and UTF8 output filenames 2010-10-31 11:23:35 +01:00
4 changed files with 168 additions and 279 deletions

View File

@@ -1,4 +1,2 @@
syntax: glob
index.html
youtube-dl-*
.*.swp

View File

@@ -1,15 +0,0 @@
#!/usr/bin/env python
import hashlib
import subprocess
template = file('index.html.in', 'r').read()
version = subprocess.Popen(['./youtube-dl', '--version'], stdout=subprocess.PIPE).communicate()[0].strip()
data = file('youtube-dl', 'rb').read()
md5sum = hashlib.md5(data).hexdigest()
sha1sum = hashlib.sha1(data).hexdigest()
sha256sum = hashlib.sha256(data).hexdigest()
template = template.replace('@PROGRAM_VERSION@', version)
template = template.replace('@PROGRAM_MD5SUM@', md5sum)
template = template.replace('@PROGRAM_SHA1SUM@', sha1sum)
template = template.replace('@PROGRAM_SHA256SUM@', sha256sum)
file('index.html', 'w').write(template)

View File

@@ -1,212 +0,0 @@
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-type" content="text/html; charset=UTF-8" />
<title>youtube-dl: Download videos from YouTube.com</title>
<style type="text/css"><!--
body {
font-family: sans-serif;
font-size: small;
}
h1 {
text-align: center;
text-decoration: underline;
color: #006699;
}
h2 {
color: #006699;
}
p {
text-align: justify;
margin-left: 5%;
margin-right: 5%;
}
ul {
margin-left: 5%;
margin-right: 5%;
list-style-type: square;
}
li {
margin-bottom: 0.5ex;
}
.smallnote {
font-size: x-small;
text-align: center;
}
--></style>
</head>
<body>
<h1>youtube-dl: Download videos from YouTube.com</h1>
<p class="smallnote">(and more...)</p>
<h2>What is it?</h2>
<p><em>youtube-dl</em> is a small command-line program to download videos
from YouTube.com. It requires the <a href="http://www.python.org/">Python
interpreter</a>, version 2.4 or later, and it's not platform specific.
It should work in your Unix box, in Windows or in Mac OS X. The latest version
is <strong>@PROGRAM_VERSION@</strong>. It's released to the public domain,
which means you can modify it, redistribute it or use it however you like.</p>
<p>I'll try to keep it updated if YouTube.com changes the way you access
their videos. After all, it's a simple and short program. However, I can't
guarantee anything. If you detect it stops working, check for new versions
and/or inform me about the problem, indicating the program version you
are using. If the program stops working and I can't solve the problem but
you have a solution, I'd like to know it. If that happens and you feel you
can maintain the program yourself, tell me. My contact information is
at <a href="http://freshmeat.net/~rg3/">freshmeat.net</a>.</p>
<p>Thanks for all the feedback received so far. I'm glad people find my
program useful.</p>
<h2>Usage instructions</h2>
<p>In Windows, once you have installed the Python interpreter, save the
program with the <em>.py</em> extension and put it somewhere in the PATH.
Try to follow the
<a href="http://rg03.wordpress.com/youtube-dl-under-windows-xp/">guide to
install youtube-dl under Windows XP</a>.</p>
<p>In Unix, download it, give it execution permission and copy it to one
of the PATH directories (typically, <em>/usr/local/bin</em>).</p>
<p>After that, you should be able to call it from the command line as
<em>youtube-dl</em> or <em>youtube-dl.py</em>. I will use <em>youtube-dl</em>
in the following examples. Usage instructions are easy. Use <em>youtube-dl</em>
followed by a video URL or identifier. Example: <em>youtube-dl
"http://www.youtube.com/watch?v=foobar"</em>. The video will be saved
to the file <em>foobar.flv</em> in that example. As YouTube.com
videos are in Flash Video format, their extension should be <em>flv</em>.
In Linux and other unices, video players using a recent version of
<em>ffmpeg</em> can play them. That includes MPlayer, VLC, etc. Those two
work under Windows and other platforms, but you could also get a
specific FLV player of your taste.</p>
<p>If you try to run the program and you receive an error message containing the
keyword <em>SyntaxError</em> near the end, it means your Python interpreter
is too old.</p>
<h2>More usage tips</h2>
<ul>
<li>You can change the file name of the video using the -o option, like in
<em>youtube-dl -o vid.flv "http://www.youtube.com/watch?v=foobar"</em>.
Read the <em>Output template</em> section for more details on this.</li>
<li>Some videos require an account to be downloaded, mostly because they're
flagged as mature content. You can pass the program a username and password
for a YouTube.com account with the -u and -p options, like <em>youtube-dl
-u myusername -p mypassword "http://www.youtube.com/watch?v=foobar"</em>.</li>
<li>The account data can also be read from the user .netrc file by indicating
the -n or --netrc option. The machine name is <em>youtube</em> in that
case.</li>
<li>The <em>simulate mode</em> (activated with -s or --simulate) can be used
to just get the real video URL and use it with a download manager if you
prefer that option.</li>
<li>The <em>quiet mode</em> (activated with -q or --quiet) can be used to
supress all output messages. This allows, in systems featuring /dev/stdout
and other similar special files, outputting the video data to standard output
in order to pipe it to another program without interferences.</li>
<li>The program can be told to simply print the final video URL to standard
output using the -g or --get-url option.</li>
<li>In a similar line, the -e or --get-title option tells the program to print
the video title.</li>
<li>The default filename is <em>video_id.flv</em>. But you can also use the
video title in the filename with the -t or --title option, or preserve the
literal title in the filename with the -l or --literal option.</li>
<li>You can make the program append <em>&amp;fmt=something</em> to the URL
by using the -f or --format option. This makes it possible to download high
quality versions of the videos when available.</li>
<li><em>youtube-dl</em> can attempt to download the best quality version of
a video by using the -b or --best-quality option.</li>
<li><em>youtube-dl</em> can attempt to download the mobile quality version of
a video by using the -m or --mobile-version option.</li>
<li>Normally, the program will stop on the first error, but you can tell it
to attempt to download every video with the -i or --ignore-errors option.</li>
<li><em>youtube-dl</em> honors the <em>http_proxy</em> environment variable
if you want to use a proxy. Set it to something like
<em>http://proxy.example.com:8080</em>, and do not leave the <em>http://</em>
prefix out.</li>
<li>You can get the program version by calling it as <em>youtube-dl
-v</em> or <em>youtube-dl --version</em>.</li>
<li>For usage instructions, use <em>youtube-dl -h</em> or <em>youtube-dl
--help.</em></li>
<li>You can cancel the program at any time pressing Ctrl+C. It may print
some error lines saying something about <em>KeyboardInterrupt</em>.
That's ok.</li>
</ul>
<h2>Download it</h2>
<p>Note that if you directly click on these hyperlinks, your web browser will
most likely display the program contents. It's usually better to
right-click on it and choose the appropriate option, normally called <em>Save
Target As</em> or <em>Save Link As</em>, depending on the web browser you
are using.</p>
<p><a href="youtube-dl">@PROGRAM_VERSION@</a></p>
<ul>
<li><strong>MD5</strong>: @PROGRAM_MD5SUM@</li>
<li><strong>SHA1</strong>: @PROGRAM_SHA1SUM@</li>
<li><strong>SHA256</strong>: @PROGRAM_SHA256SUM@</li>
</ul>
<h2>Output template</h2>
<p>The -o option allows users to indicate a template for the output file names.
The basic usage is not to set any template arguments when downloading a single
file, like in <em>youtube-dl -o funny_video.flv 'http://some/video'</em>.
However, it may contain special sequences that will be replaced when
downloading each video. The special sequences have the format
<strong>%(NAME)s</strong>. To clarify, that's a percent symbol followed by a
name in parenthesis, followed by a lowercase S. Allowed names are:</p>
<ul>
<li><em>id</em>: The sequence will be replaced by the video identifier.</li>
<li><em>url</em>: The sequence will be replaced by the video URL.</li>
<li><em>uploader</em>: The sequence will be replaced by the nickname of the
person who uploaded the video.</li>
<li><em>title</em>: The sequence will be replaced by the literal video
title.</li>
<li><em>stitle</em>: The sequence will be replaced by a simplified video
title.</li>
<li><em>ext</em>: The sequence will be replaced by the appropriate
extension.</li>
</ul>
<p>As you may have guessed, the default template is <em>%(id)s.%(ext)s</em>.
When some command line options are used, it's replaced by other templates like
<em>%(title)s-%(id)s.%(ext)s</em>. You can specify your own.</p>
<h2>Authors</h2>
<ul>
<li>Ricardo Garcia Gonzalez: program core, YouTube.com InfoExtractor,
metacafe.com InfoExtractor and YouTube playlist InfoExtractor.</li>
<li>Many other people contributing patches, code, ideas and kind messages. Too
many to be listed here. You know who you are. Thank you very much.</li>
</ul>
<p class="smallnote">Copyright &copy; 2006-2007 Ricardo Garcia Gonzalez</p>
</body>
</html>

View File

@@ -1,9 +1,11 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Author: Ricardo Garcia Gonzalez
# Author: Danny Colligan
# License: Public domain code
import htmlentitydefs
import httplib
import locale
import math
import netrc
import os
@@ -17,7 +19,7 @@ import urllib
import urllib2
std_headers = {
'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1',
'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Accept': 'text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5',
'Accept-Language': 'en-us,en;q=0.5',
@@ -69,10 +71,10 @@ class FileDownloader(object):
File downloaders accept a lot of parameters. In order not to saturate
the object constructor with arguments, it receives a dictionary of
options instead. These options are available through the get_params()
method for the InfoExtractors to use. The FileDownloader also registers
itself as the downloader in charge for the InfoExtractors that are
added to it, so this is a "mutual registration".
options instead. These options are available through the params
attribute for the InfoExtractors to use. The FileDownloader also
registers itself as the downloader in charge for the InfoExtractors
that are added to it, so this is a "mutual registration".
Available options:
@@ -87,9 +89,10 @@ class FileDownloader(object):
outtmpl: Template for output names.
ignoreerrors: Do not stop on download errors.
ratelimit: Download speed limit, in bytes/sec.
nooverwrites: Prevent overwriting files.
"""
_params = None
params = None
_ies = []
_pps = []
@@ -97,7 +100,7 @@ class FileDownloader(object):
"""Create a FileDownloader object with the given options."""
self._ies = []
self._pps = []
self.set_params(params)
self.params = params
@staticmethod
def pmkdir(filename):
@@ -141,7 +144,7 @@ class FileDownloader(object):
return '--:--'
return '%02d:%02d' % (eta_mins, eta_secs)
@staticmethod
@staticmethod
def calc_speed(start, now, bytes):
dif = now - start
if bytes == 0 or dif < 0.001: # One millisecond
@@ -171,16 +174,6 @@ class FileDownloader(object):
multiplier = 1024.0 ** 'bkmgtpezy'.index(matchobj.group(2).lower())
return long(round(number * multiplier))
def set_params(self, params):
"""Sets parameters."""
if type(params) != dict:
raise ValueError('params: dictionary expected')
self._params = params
def get_params(self):
"""Get parameters."""
return self._params
def add_info_extractor(self, ie):
"""Add an InfoExtractor object to the end of the list."""
self._ies.append(ie)
@@ -193,7 +186,7 @@ class FileDownloader(object):
def to_stdout(self, message, skip_eol=False):
"""Print message to stdout if not in quiet mode."""
if not self._params.get('quiet', False):
if not self.params.get('quiet', False):
print u'%s%s' % (message, [u'\n', u''][skip_eol]),
sys.stdout.flush()
@@ -203,7 +196,7 @@ class FileDownloader(object):
def fixed_template(self):
"""Checks if the output template is fixed."""
return (re.search(ur'(?u)%\(.+?\)s', self._params['outtmpl']) is None)
return (re.search(ur'(?u)%\(.+?\)s', self.params['outtmpl']) is None)
def trouble(self, message=None):
"""Determine action to take when a download problem appears.
@@ -216,13 +209,13 @@ class FileDownloader(object):
"""
if message is not None:
self.to_stderr(message)
if not self._params.get('ignoreerrors', False):
if not self.params.get('ignoreerrors', False):
raise DownloadError(message)
return 1
def slow_down(self, start_time, byte_counter):
"""Sleep if the download speed is over the rate limit."""
rate_limit = self._params.get('ratelimit', None)
rate_limit = self.params.get('ratelimit', None)
if rate_limit is None or byte_counter == 0:
return
now = time.time()
@@ -250,7 +243,7 @@ class FileDownloader(object):
"""Download a given list of URLs."""
retcode = 0
if len(url_list) > 1 and self.fixed_template():
raise SameFileError(self._params['outtmpl'])
raise SameFileError(self.params['outtmpl'])
for url in url_list:
suitable_found = False
@@ -265,25 +258,28 @@ class FileDownloader(object):
retcode = self.trouble()
if len(results) > 1 and self.fixed_template():
raise SameFileError(self._params['outtmpl'])
raise SameFileError(self.params['outtmpl'])
for result in results:
# Forced printings
if self._params.get('forcetitle', False):
if self.params.get('forcetitle', False):
print result['title']
if self._params.get('forceurl', False):
if self.params.get('forceurl', False):
print result['url']
# Do nothing else if in simulate mode
if self._params.get('simulate', False):
if self.params.get('simulate', False):
continue
try:
filename = self._params['outtmpl'] % result
filename = self.params['outtmpl'] % result
self.report_destination(filename)
except (ValueError, KeyError), err:
retcode = self.trouble('ERROR: invalid output template or system charset: %s' % str(err))
continue
if self.params['nooverwrites'] and os.path.exists(filename):
self.to_stderr('WARNING: file exists: %s; skipping' % filename)
continue
try:
self.pmkdir(filename)
except (OSError, IOError), err:
@@ -411,7 +407,7 @@ class InfoExtractor(object):
def to_stdout(self, message):
"""Print message to stdout if downloader is not in quiet mode."""
if self._downloader is None or not self._downloader.get_params().get('quiet', False):
if self._downloader is None or not self._downloader.params.get('quiet', False):
print message
def to_stderr(self, message):
@@ -430,14 +426,19 @@ class YoutubeIE(InfoExtractor):
"""Information extractor for youtube.com."""
_VALID_URL = r'^((?:http://)?(?:\w+\.)?youtube\.com/(?:(?:v/)|(?:(?:watch(?:\.php)?)?\?(?:.+&)?v=)))?([0-9A-Za-z_-]+)(?(1).+)?$'
_LOGIN_URL = 'http://www.youtube.com/login?next=/'
_AGE_URL = 'http://www.youtube.com/verify_age?next_url=/'
_LANG_URL = r'http://uk.youtube.com/?hl=en&persist_hl=1&gl=US&persist_gl=1&opt_out_ackd=1'
_LOGIN_URL = 'http://www.youtube.com/signup?next=/&gl=US&hl=en'
_AGE_URL = 'http://www.youtube.com/verify_age?next_url=/&gl=US&hl=en'
_NETRC_MACHINE = 'youtube'
@staticmethod
def suitable(url):
return (re.match(YoutubeIE._VALID_URL, url) is not None)
def report_lang(self):
"""Report attempt to set language."""
self.to_stdout(u'[youtube] Setting language')
def report_login(self):
"""Report attempt to log in."""
self.to_stdout(u'[youtube] Logging in')
@@ -464,7 +465,7 @@ class YoutubeIE(InfoExtractor):
username = None
password = None
downloader_params = self._downloader.get_params()
downloader_params = self._downloader.params
# Attempt to use provided username and password or .netrc data
if downloader_params.get('username', None) is not None:
@@ -482,6 +483,15 @@ class YoutubeIE(InfoExtractor):
self.to_stderr(u'WARNING: parsing .netrc: %s' % str(err))
return
# Set language
request = urllib2.Request(self._LANG_URL, None, std_headers)
try:
self.report_lang()
urllib2.urlopen(request).read()
except (urllib2.URLError, httplib.HTTPException, socket.error), err:
self.to_stderr(u'WARNING: unable to set language: %s' % str(err))
return
# No authentication to be performed
if username is None:
return
@@ -529,14 +539,18 @@ class YoutubeIE(InfoExtractor):
# Downloader parameters
format_param = None
if self._downloader is not None:
params = self._downloader.get_params()
params = self._downloader.params
format_param = params.get('format', None)
# Extension
video_extension = {'18': 'mp4', '17': '3gp'}.get(format_param, 'flv')
video_extension = {
'17': '3gp',
'18': 'mp4',
'22': 'mp4',
}.get(format_param, 'flv')
# Normalize URL, including format
normalized_url = 'http://www.youtube.com/watch?v=%s' % video_id
normalized_url = 'http://www.youtube.com/watch?v=%s&gl=US&hl=en' % video_id
if format_param is not None:
normalized_url = '%s&fmt=%s' % (normalized_url, format_param)
request = urllib2.Request(normalized_url, None, std_headers)
@@ -592,7 +606,7 @@ class MetacafeIE(InfoExtractor):
"""Information Extractor for metacafe.com."""
_VALID_URL = r'(?:http://)?(?:www\.)?metacafe\.com/watch/([^/]+)/([^/]+)/.*'
_DISCLAIMER = 'http://www.metacafe.com/disclaimer'
_DISCLAIMER = 'http://www.metacafe.com/family_filter/'
_youtube_ie = None
def __init__(self, youtube_ie, downloader=None):
@@ -631,10 +645,10 @@ class MetacafeIE(InfoExtractor):
# Confirm age
disclaimer_form = {
'allowAdultContent': '1',
'filters': '0',
'submit': "Continue - I'm over 18",
}
request = urllib2.Request('http://www.metacafe.com/watch/', urllib.urlencode(disclaimer_form), std_headers)
request = urllib2.Request('http://www.metacafe.com/', urllib.urlencode(disclaimer_form), std_headers)
try:
self.report_age_confirmation()
disclaimer = urllib2.urlopen(request).read()
@@ -684,7 +698,7 @@ class MetacafeIE(InfoExtractor):
video_url = '%s?__gda__=%s' % (mediaURL, gdaKey)
mobj = re.search(r'(?im)<meta name="title" content="Metacafe - ([^"]+)"', webpage)
mobj = re.search(r'(?im)<title>(.*) - Video</title>', webpage)
if mobj is None:
self.to_stderr(u'ERROR: unable to extract title')
return [None]
@@ -706,11 +720,95 @@ class MetacafeIE(InfoExtractor):
'ext': video_extension.decode('utf-8'),
}]
class YoutubeSearchIE(InfoExtractor):
"""Information Extractor for YouTube search queries."""
_VALID_QUERY = r'ytsearch(\d+|all)?:[\s\S]+'
_TEMPLATE_URL = 'http://www.youtube.com/results?search_query=%s&page=%s&gl=US&hl=en'
_VIDEO_INDICATOR = r'href="/watch\?v=.+?"'
_MORE_PAGES_INDICATOR = r'>Next</a>'
_youtube_ie = None
def __init__(self, youtube_ie, downloader=None):
InfoExtractor.__init__(self, downloader)
self._youtube_ie = youtube_ie
@staticmethod
def suitable(url):
return (re.match(YoutubeSearchIE._VALID_QUERY, url) is not None)
def report_download_page(self, query, pagenum):
"""Report attempt to download playlist page with given number."""
self.to_stdout(u'[youtube] query "%s": Downloading page %s' % (query, pagenum))
def _real_initialize(self):
self._youtube_ie.initialize()
def _real_extract(self, query):
mobj = re.match(self._VALID_QUERY, query)
if mobj is None:
self.to_stderr(u'ERROR: invalid search query "%s"' % query)
return [None]
prefix, query = query.split(':')
prefix = prefix[8:]
if prefix == '':
return self._download_n_results(query, 1)
elif prefix == 'all':
return self._download_n_results(query, -1)
else:
try:
n = int(prefix)
if n <= 0:
self.to_stderr(u'ERROR: invalid download number %s for query "%s"' % (n, query))
return [None]
return self._download_n_results(query, n)
except ValueError: # parsing prefix as int fails
return self._download_n_results(query, 1)
def _download_n_results(self, query, n):
"""Downloads a specified number of results for a query"""
video_ids = []
already_seen = set()
pagenum = 1
while True:
self.report_download_page(query, pagenum)
result_url = self._TEMPLATE_URL % (urllib.quote_plus(query), pagenum)
request = urllib2.Request(result_url, None, std_headers)
try:
page = urllib2.urlopen(request).read()
except (urllib2.URLError, httplib.HTTPException, socket.error), err:
self.to_stderr(u'ERROR: unable to download webpage: %s' % str(err))
return [None]
# Extract video identifiers
for mobj in re.finditer(self._VIDEO_INDICATOR, page):
video_id = page[mobj.span()[0]:mobj.span()[1]].split('=')[2][:-1]
if video_id not in already_seen:
video_ids.append(video_id)
already_seen.add(video_id)
if len(video_ids) == n:
# Specified n videos reached
information = []
for id in video_ids:
information.extend(self._youtube_ie.extract('http://www.youtube.com/watch?v=%s' % id))
return information
if self._MORE_PAGES_INDICATOR not in page:
information = []
for id in video_ids:
information.extend(self._youtube_ie.extract('http://www.youtube.com/watch?v=%s' % id))
return information
pagenum = pagenum + 1
class YoutubePlaylistIE(InfoExtractor):
"""Information Extractor for YouTube playlists."""
_VALID_URL = r'(?:http://)?(?:\w+\.)?youtube.com/view_play_list\?p=(.+)'
_TEMPLATE_URL = 'http://www.youtube.com/view_play_list?p=%s&page=%s'
_TEMPLATE_URL = 'http://www.youtube.com/view_play_list?p=%s&page=%s&gl=US&hl=en'
_VIDEO_INDICATOR = r'/watch\?v=(.+?)&'
_MORE_PAGES_INDICATOR = r'/view_play_list?p=%s&amp;page=%s'
_youtube_ie = None
@@ -752,10 +850,11 @@ class YoutubePlaylistIE(InfoExtractor):
return [None]
# Extract video identifiers
ids_in_page = set()
ids_in_page = []
for mobj in re.finditer(self._VIDEO_INDICATOR, page):
ids_in_page.add(mobj.group(1))
video_ids.extend(list(ids_in_page))
if mobj.group(1) not in ids_in_page:
ids_in_page.append(mobj.group(1))
video_ids.extend(ids_in_page)
if (self._MORE_PAGES_INDICATOR % (playlist_id, pagenum + 1)) not in page:
break
@@ -790,7 +889,7 @@ class PostProcessor(object):
def to_stdout(self, message):
"""Print message to stdout if downloader is not in quiet mode."""
if self._downloader is None or not self._downloader.get_params().get('quiet', False):
if self._downloader is None or not self._downloader.params.get('quiet', False):
print message
def to_stderr(self, message):
@@ -836,7 +935,7 @@ if __name__ == '__main__':
# Parse command line
parser = optparse.OptionParser(
usage='Usage: %prog [options] url...',
version='2008.08.09',
version='2009.03.28',
conflict_handler='resolve',
)
parser.add_option('-h', '--help',
@@ -865,18 +964,31 @@ if __name__ == '__main__':
action='store_true', dest='gettitle', help='simulate, quiet but print title', default=False)
parser.add_option('-f', '--format',
dest='format', metavar='FMT', help='video format code')
parser.add_option('-b', '--best-quality',
action='store_const', dest='format', help='alias for -f 18', const='18')
parser.add_option('-m', '--mobile-version',
action='store_const', dest='format', help='alias for -f 17', const='17')
parser.add_option('-d', '--high-def',
action='store_const', dest='format', help='alias for -f 22', const='22')
parser.add_option('-i', '--ignore-errors',
action='store_true', dest='ignoreerrors', help='continue on download errors', default=False)
parser.add_option('-r', '--rate-limit',
dest='ratelimit', metavar='L', help='download rate limit (e.g. 50k or 44.6m)')
parser.add_option('-a', '--batch-file',
dest='batchfile', metavar='F', help='file containing URLs to download')
parser.add_option('-w', '--no-overwrites',
action='store_true', dest='nooverwrites', help='do not overwrite files', default=False)
(opts, args) = parser.parse_args()
# Batch file verification
batchurls = []
if opts.batchfile is not None:
try:
batchurls = [line.strip() for line in open(opts.batchfile, 'r')]
except IOError:
sys.exit(u'ERROR: batch file could not be read')
all_urls = batchurls + args
# Conflicting, missing and erroneous options
if len(args) < 1:
if len(all_urls) < 1:
sys.exit(u'ERROR: you must provide at least one URL')
if opts.usenetrc and (opts.username is not None or opts.password is not None):
sys.exit(u'ERROR: using .netrc conflicts with giving username/password')
@@ -898,8 +1010,12 @@ if __name__ == '__main__':
youtube_ie = YoutubeIE()
metacafe_ie = MetacafeIE(youtube_ie)
youtube_pl_ie = YoutubePlaylistIE(youtube_ie)
youtube_search_ie = YoutubeSearchIE(youtube_ie)
# File downloader
charset = locale.getdefaultlocale()[1]
if charset is None:
charset = 'ascii'
fd = FileDownloader({
'usenetrc': opts.usenetrc,
'username': opts.username,
@@ -909,17 +1025,19 @@ if __name__ == '__main__':
'forcetitle': opts.gettitle,
'simulate': (opts.simulate or opts.geturl or opts.gettitle),
'format': opts.format,
'outtmpl': ((opts.outtmpl is not None and opts.outtmpl.decode())
'outtmpl': ((opts.outtmpl is not None and opts.outtmpl.decode(charset))
or (opts.usetitle and u'%(stitle)s-%(id)s.%(ext)s')
or (opts.useliteral and u'%(title)s-%(id)s.%(ext)s')
or u'%(id)s.%(ext)s'),
'ignoreerrors': opts.ignoreerrors,
'ratelimit': opts.ratelimit,
'nooverwrites': opts.nooverwrites,
})
fd.add_info_extractor(youtube_search_ie)
fd.add_info_extractor(youtube_pl_ie)
fd.add_info_extractor(metacafe_ie)
fd.add_info_extractor(youtube_ie)
retcode = fd.download(args)
retcode = fd.download(all_urls)
sys.exit(retcode)
except DownloadError: