本来已经有人写了python脚本从ted上下载字幕了,但是他的网站被墙同时有些ted的地址他解析不了,
所以我将他的python 脚本下载了下来,修改了一下。
谢谢:
http://tedtalksubtitledownload.appspot.com/
source 如下:
c = c['captions']
for linea in c:
salida.write("%d/n"%conta)
conta += 1
salida.write("%s --> %s/n"%(getFormatedTime(timeIntro+linea['startTime']), getFormatedTime(timeIntro+linea['startTime']+linea['duration'])))
salida.write("%s/n/n"%(linea['content'].encode('utf-8')))
salida.close()
def main(tedurl):
print("Loading information about TED talk number %s..."%tedurl)
vidpar = getVideoParameters(tedurl)
if not vidpar:
print("There was a problem fetching information about that TED Talk")
sys.exit(1)
print("Download all subtitles (write 'all' when prompted) or only one (specify wich)?")
a = raw_input()
availables = availableSubs(vidpar['languages'])
idtalk = vidpar['ti']
idtalk = int(idtalk[1:3])
if a == "all":
for lang in availables:
downloadSub(idtalk, lang, int(vidpar['introDuration']))
else:
while a not in availables:
print("We're sorry, the only available languages are:")
for a in availables:
print("/t"+a)
a = raw_input()
downloadSub(idtalk, a, int(vidpar['introDuration']))
if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage: %s tedurl"%sys.argv[0])
else:
main(sys.argv[1])
要使用它的话,需要先下载simplejson包,地址是: http://pypi.python.org/pypi/simplejson/
在通过http代理上网的环境中也可以使用。
具体使用例子如下:
D:/Document and Setting/test/My Documents/Downloads/TEDTalkSubtitles>TEDTalkSub
itles.py
http://www.ted.com/talks/barry_schwartz_on_the_paradox_of_choice.html
Loading information about TED talk number http://www.ted.com/talks/barry_schwar
z_on_the_paradox_of_choice.html...
Download all subtitles (write 'all' when prompted) or only one (specify wich)?
chi_hans
Downloading subtitles for language chi_hans
D:/Document and Setting/test/My Documents/Downloads/TEDTalkSubtitles>dir
ドライブ D のボリューム ラベルは programe です
ボリューム シリアル番号は 447B-7E2B です
D:/Document and Setting/test/My Documents/Downloads/TEDTalkSubtitles のディレク
トリ
2011/04/15 14:16 <DIR> .
2011/04/15 14:16 <DIR> ..
2011/04/15 14:34 31,879 subs_93_chi_hans.srt
2011/04/15 14:16 31,928 subs_93_eng.srt
2011/04/15 14:26 2,639 TEDTalkSubtitles.py
3 個のファイル 66,446 バイト
2 個のディレクトリ 13,469,048,832 バイトの空き領域
D:/Document and Setting/test/My Documents/Downloads/TEDTalkSubtitles>
refs:
http://pythonconquerstheuniverse.wordpress.com/category/the-python-debugger/