用java做抓取的时候免不了要用到多线程的了,因为要同时抓取多个网站或一条线程抓取一个网站的话实在太慢,而且有时一条线程抓取同一个网站的话也比较浪费CPU资源。要用到多线程的等方面,也就免不了对线程的控制或用到线程池。 我在做我们现在的那一个抓取框架的时候,就曾经用过java.util.concurrent.ExecutorService作为线程池,关于ExecutorService的使用代码大概如下:
java.util.concurrent.Executors类的API提供大量创建连接池的静态方法:
1.固定大小的线程池:
package BackStage; import java.util.concurrent.Executors; import java.util.concurrent.ExecutorService; public class JavaThreadPool { public static void main(String[] args) { // 创建一个可重用固定线程数的线程池 ExecutorService pool = Executors.newFixedThreadPool(2); // 创建实现了Runnable接口对象,Thread对象当然也实现了Runnable接口 Thread t1 = new MyThread(); Thread t2 = new MyThread(); Thread t3 = new MyThread(); Thread t4 = new MyThread(); Thread t5 = new MyThread(); // 将线程放入池中进行执行 pool.execute(t1); pool.execute(t2); pool.execute(t3); pool.execute(t4); pool.execute(t5); // 关闭线程池 pool.shutdown(); } } class MyThread extends Thread { @Override public void run() { System.out.println(Thread.currentThread().getName() + "正在执行。。。"); } }
后来发现ExecutorService的功能没有想像中的那么好,而且最多只是提供一个线程的容器而然,所以后来我用改用了java.lang.ThreadGroup,ThreadGroup有很多优势,最重要的一点就是它可以对线程进行遍历,知道那些线程已经运行完毕,还有那些线程在运行。关于ThreadGroup的使用代码如下:
class MyThread extends Thread {
boolean stopped;
MyThread(ThreadGroup tg, String name) {
super(tg, name);
stopped = false;
}
public void run() {
System.out.println(Thread.currentThread().getName() + " starting.");
try {
for (int i = 1; i < 1000; i++) {
System.out.print(".");
Thread.sleep(250);
synchronized (this) {
if (stopped)
break;
}
}
} catch (Exception exc) {
System.out.println(Thread.currentThread().getName() + " interrupted.");
}
System.out.println(Thread.currentThread().getName() + " exiting.");
}
synchronized void myStop() {
stopped = true;
}
}
public class Main {
public static void main(String args[]) throws Exception {
ThreadGroup tg = new ThreadGroup("My Group");
MyThread thrd = new MyThread(tg, "MyThread #1");
MyThread thrd2 = new MyThread(tg, "MyThread #2");
MyThread thrd3 = new MyThread(tg, "MyThread #3");
thrd.start();
thrd2.start();
thrd3.start();
Thread.sleep(1000);
System.out.println(tg.activeCount() + " threads in thread group.");
Thread thrds[] = new Thread[tg.activeCount()];
tg.enumerate(thrds);
for (Thread t : thrds)
System.out.println(t.getName());
thrd.myStop();
Thread.sleep(1000);
System.out.println(tg.activeCount() + " threads in tg.");
tg.interrupt();
}
}
由以上的代码可以看出:ThreadGroup比ExecutorService多以下几个优势
1.ThreadGroup可以遍历线程,知道那些线程已经运行完毕,那些还在运行
2.可以通过ThreadGroup.activeCount知道有多少线程从而可以控制插入的线程数
这篇文章来自博客园:http://www.cnblogs.com/yy2011/archive/2011/05/05/2037564.html