poppler-utils是一个基于xpdf-3.0的pdf渲染库。
使用:
How do I convert a pdf to text?
a file hp-manual.pdf to hp-manual.txt, enter:$ pdftotext hp-manual.pdf hp-manual.txt
Specifies the page to convert, enter:$ pdftotext -f 5 hp-manual.pdf hp-manual.txt
Specifies the last page to convert, enter:$ pdftotext -l 5 hp-manual.pdf hp-manual.txt
Convert a pdf file and encrypted by password:$ pdftotext -opw 'password' hp-manual.pdf hp-manual.txt
Convert a pdf file protected and encrypted by user password:$ pdftotext -upw 'password' hp-manual.pdf hp-manual.txt
Sets the end-of-line convention to use for text output. You can set it to unix, dos or mac. For UNIX / Linux oses, enter:$ pdftotext -eol unix hp-manual.pdf hp-manual.txt
类似的还有pdf2dic,pdf2ps,pdftoabw,pdftohtml,pdftoppm,pdftops,pdftotext,等等,好强大的包啊。