现在的位置: 首页 > 综合 > 正文

Convert HTML Page To a PDF Using Open Source Tool [ Linux / OS X / Windows ]

2017年12月06日 ⁄ 综合 ⁄ 共 5954字 ⁄ 字号 评论关闭

Do you need a simple open source cross-platform command line tool that converts web pages and HTML to
a PDF file? Look no further, try wkhtmltopdf.


From the project home page:

Simple shell utility to convert html to pdf using the webkit rendering engine, and qt. Searching the web, I have found several command line tools that allow you to convert a HTML-document to a PDF-document, however they all seem to use their own, and rather
incomplete rendering engine, resulting in poor quality. Recently QT 4.4 was released with a WebKit widget (WebKit is the engine of Apples Safari, which is a fork of the KDE KHtml), and making a good tool became very easy.

Software features

  1. Cross platform.
  2. Open source.
  3. Convert any web pages into PDF documents using webkit.
  4. You can add headers and footers.
  5. TOC generation.
  6. Batch mode conversions.
  7. Can run on Linux server with an XServer (the X11 client libs must be installed).
  8. Can be directly used by PHP or Python via bindings to libwkhtmltox.

A note about Debian / Ubuntu Linux user

You can install wkhtmltopdf using apt-get command:
$
sudo apt-get install wkhtmltopdf

$ sudo ln -s /usr/bin/wkhtmltopdf /usr/local/bin/html2pdf


Sample outputs:

[sudo] password for vivek:
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
  wkhtmltopdf
0 upgraded, 1 newly installed, 0 to remove and 10 not upgraded.
Need to get 116 kB of archives.
After this operation, 303 kB of additional disk space will be used.
Get:1 http://debian.osuosl.org/debian/ squeeze/main wkhtmltopdf amd64 0.9.9-1 [116 kB]
Fetched 116 kB in 2s (49.4 kB/s)
Selecting previously deselected package wkhtmltopdf.
(Reading database ... 274164 files and directories currently installed.)
Unpacking wkhtmltopdf (from .../wkhtmltopdf_0.9.9-1_amd64.deb) ...
Processing triggers for man-db ...
Setting up wkhtmltopdf (0.9.9-1) ...

Download wkhtmltopdf

Visit this page to grab wkhtmltopdf for Linux / MS-Windows / Apple Mac OS X. You can also use the wget
command
 as follows:
$
wget http://wkhtmltopdf.googlecode.com/files/wkhtmltopdf-0.11.0_rc1-static-amd64.tar.bz2


Sample outputs:

Resolving wkhtmltopdf.googlecode.com... 74.125.135.82, 2404:6800:4001:c01::52
Connecting to wkhtmltopdf.googlecode.com|74.125.135.82|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11175276 (11M) [application/octet-stream]
Saving to: `wkhtmltopdf-0.11.0_rc1-static-amd64.tar.bz2'
 
100%[======================================>] 1,11,75,276  480K/s   in 23s
 
2012-10-04 01:21:43 (477 KB/s) - `wkhtmltopdf-0.11.0_rc1-static-amd64.tar.bz2' saved [11175276/11175276]
 

Install wkhtmltopdf under Linux

Type the following tar command to extract files:
$
tar xvf wkhtmltopdf-0.11.0_rc1-static-amd64.tar.bz2


Sample outputs:

wkhtmltopdf-amd64

Install the same in your private ~/bin/ directory or in /usr/local/bin directory:
$
mv wkhtmltopdf-amd64 ~/bin/

ln -s ~/bin/wkhtmltopdf-amd64 ~/bin/html2pdf


OR
$
sudo mv wkhtmltopdf-amd64 /usr/local/bin/

ln -s /usr/local/bin/wkhtmltopdf-amd64 /usr/local/bin/html2pdf

How do I use wkhtmltopdf?

The syntax is as follows:

 
html2pdf http://www.cyberciti.biz/path/to/url.html output.pdf
html2pdf http://www.cyberciti.biz/blog/print/url-slut.html output.pdf
html2pdf -option1  -option2 http://www.cyberciti.biz/blog/print/url-slut.html output.pdf
 

Example: Simple html to pdf file

In this example, convert out bash for loop page to a pdf file:
$
html2pdf http://www.cyberciti.biz/faq/bash-for-loop/print/ /tmp/bash.for.loop.pdf


Sample outputs:

Loading pages (1/6)
Counting pages (2/6)
Resolving links (4/6)
Loading headers and footers (5/6)
Printing pages (6/6)
Done

To view generated pdf file click here. Verify pdf file, enter:
$
file /tmp/bash.for.loop.pdf


Sample outputs:

/tmp/bash.for.loop.pdf: PDF document, version 1.4

Use the pdfinfo command to prints the contents of the 'Info' dictionary (plus some other
useful information) from a Portable Document Format (PDF) file:
$
pdfinfo /tmp/bash.for.loop.pdf


Sample outputs:

Title:          Frequently Asked Questions About Linux / UNIX » Bash For Loop Examples » Print
Creator:
Producer:       wkhtmltopdf
CreationDate:   Thu Oct  4 01:29:33 2012
Tagged:         no
Pages:          4
Encrypted:      no
Page size:      595 x 842 pts (A4)
File size:      98792 bytes
Optimized:      no
PDF version:    1.4

Grayscale pdf

The following PDF will be generated in grayscale:
$
html2pdf -g http://www.cyberciti.biz/faq/bash-for-loop/print/ bash.for.loop.pdf

Set orientation to Landscape or Portrait

Use the following syntax:
$
html2pdf -O Landscape http://www.cyberciti.biz/faq/bash-for-loop/print/ bash.for.loop.pdf


Where,

  • -O Landscape|Portrait. The default is Portrait.

How do I set page size?

Use the following syntax:
$
html2pdf -S SIZE http://www.cyberciti.biz/faq/bash-for-loop/print/ bash.for.loop.pdf


Where,

  • -s Size : Set paper size to: A4, Letter, etc. (default A4)

How do I generate table of content?

A table of content can be added to the document by adding a toc the command line option. For example:
$
html2pdf toc http://www.cyberciti.biz/faq/bash-for-loop/print/ bash.for.loop.pdf


Sample outputs:

Linux / Unix HTML to PDF File Command Line Option

Fig.01: wkhtmltopdf in action


Please note that the table of content is generated based on the H tags in the input documents.

How do I see all available options?

You can see a list of commonly used options, enter:
$
wkhtmltopdf --help


OR see all available options i.e. display more extensive help, detailing less common command switches, run:
$
wkhtmltopdf -H | less

REFERENCES:
来源:http://www.cyberciti.biz/open-source/html-to-pdf-freeware-linux-osx-windows-software/

wkhtmltopdf
分页的方法,就是在那个div的样式后添加一个:page-break-inside:avoid;就ok了。如下:

<style type="text/css">
*{ margin:0px; padding:0px;}
div{ width:800px; min-height:1362px;margin:auto;page-break-inside:avoid;}
</style>

项目中需要用到html 转成 pdf的组件,需要支持html中css和js效果的渲染支持。

一周时间内测试了TCPDF/mpdf/html2fpdf/prince/prawn等数个组件,其中prince的转换效果稍稍出众,但是css渲染有一些问题,而且是收费软件,TCPDF功能强大,页头页尾均很好的支持,但对代码高亮等特殊的js效果处理得不到位。

直接周末找到了wkhtmltopdf。

先介绍一下大体功能

跨平台:Cross platform.
开源:Open source.
支持webkit引擎渲染页面:Convert any web pages into PDF documents using webkit.
页头页尾支持:You can add headers and footers.
创建TOC:TOC generation.
命令行执行,方便其它程序调用:Batch mode conversions.
可以运行在XServer下:Can run on Linux server with an XServer (the X11 client libs must be installed).
PHP的libwkhtmltox组件支持,嘎嘎,phper最爱:Can be directly used by PHP or Python via bindings to libwkhtmltox.

原文作者花了大面积的篇幅介绍如何安装使用,我就不多说了,安装简单,一分钟内安装完成。

1
2
sudoapt-getinstallwkhtmltopdf
wkhtmltopdf
http:
//www.baidu.com/index.php
baidu.pdf

https://github.com/antialize/wkhtmltopdf

https://github.com/mreiferson/php-wkhtmltox

http://www.cs.au.dk/~jakobt/libwkhtmltox_0.10.0_doc/pagesettings.html

抱歉!评论已关闭.