觉得火影更新的慢么?觉得那些漫画网站不让下载很可恶么?看看这个^_^
ps: Web-Harvest http://web-harvest.sourceforge.net
1、逻辑文件
<var-def name="num" overwrite="false">1</var-def>
<loop index="i" item="url">
<!-- get list of name -->
<list>
<var-def name="imagelinks">
<call name="download-multipage-list">
<call-param name="pageUrl"><template>http://www.narutom.com/comic/index.html</template></call-param>
<call-param name="nextXPath">//div[@class='pagenav']/a[last()-1]/@href</call-param>
<call-param name="itemXPath">//div[@id='dm_name']/ul/li/a/text()</call-param>
<call-param name="maxloops"><template>${num}</template></call-param>
</call>
</var-def>
</list>
<body>
<empty>
<!-- get ordinal -->
<var-def name="ordinal">
<regexp>
<regexp-pattern>^/D*(/d*)?/D*$</regexp-pattern>
<regexp-source><template>${url}</template></regexp-source>
<regexp-result>
<template>${_1}</template>
</regexp-result>-
</regexp>
</var-def>
<!-- output -->
<call name="getComic">
<call-param name="fromNum"><template>${ordinal}</template></call-param>
<call-param name="directory"><template>${url}</template></call-param>
</call>
</empty>
</body>
</loop>
</config>
2、函数库 文件
@param pageUrl - URL of starting page
@param itemXPath - XPath expression to obtain single item in the list
@param nextXPath - XPath expression to URL for the next page
@param maxloops - maximum number of pages downloaded
@return list of all downloaded items
-->
<function name="download-multipage-list">
<return>
<while condition="${pageUrl.toString().length() != 0}" maxloops="${maxloops}" index="i">
<empty>
<var-def name="content">
<html-to-xml>
<http url="${pageUrl}" charset="gb2312"/>
</html-to-xml>
</var-def>
<var-def name="nextLinkUrl">
<xpath expression="${nextXPath}">
<var name="content"/>
</xpath>
</var-def>
<var-def name="pageUrl">
<!--<template>${sys.fullUrl(pageUrl.toString(), nextLinkUrl.toString())}</template>-->
<template>${nextLinkUrl.toString()}</template>
</var-def>
</empty>
<xpath expression="${itemXPath}">
<var name="content"/>
</xpath>
</while>
</return>
</function>
<!-- naruto -->
<function name="getComic">
<while index="j" condition="${j.toInt() != 20}" >
<var-def name="pageUrl">
<template>http://wt2.narutom.com/d/manhua/naruto/${fromNum}/${j}.png</template>
</var-def>
<file action="write" path='/home/xyzqing/webharvest/naruto/naruto/${directory}/${j}.png' type="binary">
<http url="${pageUrl}"/>
</file>
</while>
</function>
</config>
3、效果截图
PS: 有点美中不足的地方就是可能会多几张没用的图片,对技术角度这就是瑕疵,不过对于火影迷来说这并不影响观看。还有就是特别篇没取下来,因为ID不连续。可以自己修改例子抓一下,毕竟没几篇。
运行方法:下载webharvest jar包 用其内置的UI运行“逻辑文件”
即可
,当然输出路径要自己配置一下哦。
欢迎讨论。