HttpClient使用例子:读取CSDN的投票列表并正则解析
处理的结果
截至2009年01月06日,CSDN参与人数最多的投票列表。
- package com.laozizhu.apache.httpclient;
- import java.net.Socket;
- import org.apache.http.ConnectionReuseStrategy;
- import org.apache.http.Header;
- import org.apache.http.HttpHost;
- import org.apache.http.HttpResponse;
- import org.apache.http.HttpVersion;
- import org.apache.http.impl.DefaultConnectionReuseStrategy;
- import org.apache.http.impl.DefaultHttpClientConnection;
- import org.apache.http.message.BasicHttpRequest;
- import org.apache.http.params.BasicHttpParams;
- import org.apache.http.params.HttpParams;
- import org.apache.http.params.HttpProtocolParams;
- import org.apache.http.protocol.BasicHttpContext;
- import org.apache.http.protocol.BasicHttpProcessor;
- import org.apache.http.protocol.ExecutionContext;
- import org.apache.http.protocol.HttpContext;
- import org.apache.http.protocol.HttpRequestExecutor;
- import org.apache.http.protocol.RequestConnControl;
- import org.apache.http.protocol.RequestContent;
- import org.apache.http.protocol.RequestExpectContinue;
- import org.apache.http.protocol.RequestTargetHost;
- import org.apache.http.protocol.RequestUserAgent;
- import org.apache.http.util.EntityUtils;
- /**
- * HttpClient读取页面的使用例子
- * @author 老紫竹(java2000.net)
- *
- */
- public class HttpGet {
- public static void main(String[] args) throws Exception {
- HttpParams params = new BasicHttpParams();
- // HTTP 协议的版本,1.1/1.0/0.9
- HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1);
- // 字符集
- HttpProtocolParams.setContentCharset(params, "UTF-8");
- // 伪装的浏览器类型
- // IE7 是
- // Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0)
- //
- // Firefox3.03
- // Mozilla/5.0 (Windows; U; Windows NT 5.2; zh-CN; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3
- //
- HttpProtocolParams.setUserAgent(params, "HttpComponents/1.1");
- HttpProtocolParams.setUseExpectContinue(params, true);
- BasicHttpProcessor httpproc = new BasicHttpProcessor();
- httpproc.addInterceptor(new RequestContent());
- httpproc.addInterceptor(new RequestTargetHost());
- httpproc.addInterceptor(new RequestConnControl());
- httpproc.addInterceptor(new RequestUserAgent());
- httpproc.addInterceptor(new RequestExpectContinue());
- HttpRequestExecutor httpexecutor = new HttpRequestExecutor();
- HttpContext context = new BasicHttpContext(null);
- HttpHost host = new HttpHost("www.java2000.net", 80);
- DefaultHttpClientConnection conn = new DefaultHttpClientConnection();
- ConnectionReuseStrategy connStrategy = new DefaultConnectionReuseStrategy();
- context.setAttribute(ExecutionContext.HTTP_CONNECTION, conn);
- context.setAttribute(ExecutionContext.HTTP_TARGET_HOST, host);
- try {
- String[] targets = { "/", "/help.jsp" };
- for (int i = 0; i < targets.length; i++) {
- if (!conn.isOpen()) {
- Socket socket = new Socket(host.getHostName(), host.getPort());
- conn.bind(socket, params);
- }
- BasicHttpRequest request = new BasicHttpRequest("GET", targets[i]);
- System.out.println(">> Request URI: " + request.getRequestLine().getUri());
- context.setAttribute(ExecutionContext.HTTP_REQUEST, request);
- request.setParams(params);
- httpexecutor.preProcess(request, httpproc, context);
- HttpResponse response = httpexecutor.execute(request, conn, context);
- response.setParams(params);
- httpexecutor.postProcess(response, httpproc, context);
- // 返回码
- System.out.println("<< Response: " + response.getStatusLine());
- // 返回的文件头信息
- Header[] hs = response.getAllHeaders();
- for (Header h : hs) {
- System.out.println(h.getName() + ":" + h.getValue());
- }
- // 输出主体信息
- System.out.println(EntityUtils.toString(response.getEntity()));
- System.out.println("==============");
- if (!connStrategy.keepAlive(response, context)) {
- conn.close();
- } else {
- System.out.println("Connection kept alive...");
- }
- }
- } finally {
- conn.close();
- }
- }
- }
这个代码为httpClient自带的例子,可以借鉴的地方很多,我简单的改造了一下,把文件头也输出了,大家随便看一下结果
>> Request URI: /
<< Response: HTTP/1.1 200 OK
Proxy-Connection:Keep-Alive
Connection:Keep-Alive
Transfer-Encoding:chunked
Via:1.1 GDATAISASERVER
Date:Tue, 06 Jan 2009 05:07:54 GMT
Content-Type:text/html;charset=UTF-8
Server:Apache/2.2.4 (Win32) mod_jk/1.2.26
Set-Cookie:JSESSIONID=AAF2386712151447598F72716A64F847; Path=/
Set-Cookie:JAVA2000_STYLE_ID=1; Domain=www.java2000.net; Expires=Thu, 08-Mar-2012 14:54:33 GMT; Path=/
Vary:Accept-Encoding
Keep-Alive:timeout=5, max=100
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="zh-CN" dir="ltr">
后面的我就不写了... 那个 keep-alive对性能的影响还是很大的。