转自:http://youthmemo.com/?p=2082
最近开了个blog站点,想把以前在csdn上面的文章都转移过来,但一直没找到合适的工具,于是周末就自己写了个小程序。
本程序可以完成的工作:转移csdn上面的文章(限于文本内容)到wordpress;不能完成的工作:1、不支持在wordpress上创建分类,所以需要提前在wordpress上手工创建分类(保持与csdn一致);2、不能以很好的格式转移文章,转移之后文章格式需要调整。
程序由采集、解析、发帖三部分构成。采集负责将指定url的内容下载下来,解析负责从网页内容中解析出正文链接、标题、发布时间、分类信息,发帖部分负责将解析出来的数据通过rpc发送给wordpress,生成博文。
本程序用到的jar包及其版本如下:
-rw-r--r-- 1 mingyuan mingyuan 46725 2011-09-03 23:05 commons-codec-1.3.jar -rw-r--r-- 1 mingyuan mingyuan 279781 2011-09-03 23:05 commons-httpclient-3.0.1.jar -rwxrwxrwx 1 mingyuan mingyuan 52915 2010-05-03 03:39 commons-logging-1.1.jar -rw-r--r-- 1 mingyuan mingyuan 281579 2011-09-04 01:40 jsoup-1.6.1.jar -rwxrwxrwx 1 mingyuan mingyuan 34407 2010-05-03 03:39 ws-commons-util-1.0.2.jar -rwxrwxrwx 1 mingyuan mingyuan 58573 2010-05-03 03:39 xmlrpc-client-3.1.3.jar -rwxrwxrwx 1 mingyuan mingyuan 109131 2010-05-03 03:39 xmlrpc-common-3.1.3.jar -rwxrwxrwx 1 mingyuan mingyuan 81555 2010-05-03 03:39 xmlrpc-server-3.1.3.jar
代码很简单,就不解释了,大伙看看即可明白。程序的入口函数是Mover.main
下面先给出主要的类Mover.java
- package cn.mingyuan.csdn2wordpress;
- import java.io.IOException;
- import java.net.MalformedURLException;
- import java.net.URL;
- import java.text.ParseException;
- import java.text.SimpleDateFormat;
- import java.util.Date;
- import java.util.HashMap;
- import java.util.LinkedList;
- import java.util.List;
- import java.util.Map;
- import java.util.concurrent.TimeUnit;
- import org.apache.xmlrpc.XmlRpcException;
- import org.apache.xmlrpc.client.XmlRpcClient;
- import org.apache.xmlrpc.client.XmlRpcClientConfigImpl;
- import org.jsoup.Jsoup;
- import org.jsoup.nodes.Document;
- import org.jsoup.nodes.Element;
- import org.jsoup.select.Elements;
- /**
- * 采集、解析、转移
- *
- * @author mingyuan
- *
- */
- public class Mover {
- private int totalPages;
- private XmlRpcClientConfigImpl config;
- private XmlRpcClient client;
- private String baseUrl;
- private Object userName;
- private Object password;
- private String csdnUserName;
- public Mover(int totalPages, String blogRpcUrl, String csdnUrl, String csdnUserName, String userName,
- String password) {
- this.totalPages = totalPages;
- this.baseUrl = csdnUrl;
- this.csdnUserName = csdnUserName;
- this.userName = userName;
- this.password = password;
- config = new XmlRpcClientConfigImpl();
- try {
- config.setServerURL(new URL(blogRpcUrl));
- } catch (MalformedURLException e) {
- System.out.println(“请检查url”);
- }
- client = new XmlRpcClient();
- client.setConfig(config);
- }
- private List<String> getlinks() {
- List<String> list = new LinkedList<String>();
- for (int i = 1; i <= totalPages; i++) {
- System.out.println(“processing page ” + i);
- Downloader downloader = new Downloader();
- String content = downloader.download(baseUrl + “/” + csdnUserName + “/article/list/” + i);
- if (content == null)
- continue;
- Document doc = Jsoup.parse(content);
- Elements first = doc.select(“.link_title”);
- for (int j = 0; j < first.size(); j++) {
- Element first2 = first.get(j).select(“a”).first();
- String link = baseUrl + first2.attr(“href”);
- list.add(link);
- System.out.println(“get link\t” + link);
- }
- System.out.println(“page ” + i + “ extractor done,sleep 2s”);
- try {
- TimeUnit.SECONDS.sleep(1);
- } catch (InterruptedException e) {
- e.printStackTrace();
- }
- }
- return list;
- }
- public List<CSDNPost> getPosts() {
- List<String> links = getlinks();
- List<CSDNPost> posts = new LinkedList<CSDNPost>();
- for (String link : links) {
- CSDNPost post = getPost(link);
- if (post != null) {
- posts.add(post);
- }
- }
- return posts;
- }
- private CSDNPost getPost(String url) {
- System.out.println(“url\t” + url);
- Downloader downloader = new Downloader();
- String html = downloader.download(url);
- if (html ==