一切都在maillist中,有人问过,为何OpenVPN不实现多线程,并且人家给出了实际的测试数据。JY是这么回答的:
OpenVPN 2.0 has no multithreading support, this is the only feature present in
1.x which has been removed from 2.0.
好吧,明确说明了OpenVPN 2.0时代不支持多线程了,此前的1.0时代,多线程是有的,但是并不用于数据传输,即不是用于数据通道的。注意,由于讨论仅仅局限于处理过程的CPU开销,和我之前所想的一样,在1.0时代,由于OpenVPN只是建立一个加密隧道,只有隧道中有数据的时候才会有CPU开销,然而何时有数据是不知道的,所以使用内核的调度机制是不明智的(内核的task entry调度是基于一系列的预测的),因此CPU的开销只是在控制通道的TLS握手阶段(对于非SSL情况也一样,预共享密钥,用户名/密码的验证只是比SSL弱了一些)才能定量计算,因此OpenVPN只是将额外的线程用于这个协商阶段,在数据传输阶段,OpenVPN仅仅使用一个线程,并且内部实现了自己的packet schedule机制。
注意,不要认为OpenVPN没有实现多线程就不好(这是我之前的误区,对于别人而言,要么喷我,要么根本就不关注此事),事实上,我被折服了,单线程的OpenVPN将这个唯一的线程对资源的利用率维持地如此之高,让人钦佩。关键就是它自己的packet schedule机制。在OpenVPN 2.0时代,甚至连控制通道协商阶段的独立线程都取消了,JY的意思是这样的:
The original rationale for having the TLS thread optimization was to improve
latency during the TLS key negotiation which is very CPU intensive. The 1.x
pthread implementation uses pthreads only for this very special case, which
does not improve overall efficiency on multiprocessor machines, but helps to
keep tunnel-forwarding latency down during the TLS negotiation.
I did some testing on 2.0 to determine the worst-case latency caused by the
TLS negotiation in single threaded mode. On a 2GHz x86, the worst-case
latency was about 160 milliseconds for a 2048 bit key and 40 milliseconds for
a 1024 bit key. Even with 100 users hitting a TLS renegotiate once per hour,
the probability that two or more of these 160 millisecond latency periods
would overlap to make a bigger latency is still quite small.
I think these latency numbers are too small to justify the extra level of
complexity entailed by multithreading. Not to mention whole classes of
potential bugs which arise when you attempt to multithread code, and
incompatibilities that exist between multithread implementations on different
OSes. Bottom line is that I don't think multithreading in OpenVPN is worth
the trouble.
Keep in mind that people use multithreading to:
(1) improve latency, or
(2) improve performance on multithreaded machines
OpenVPN 1.x only tried to hit (1).
With OpenVPN 2.0, my decision was basically that (1) didn't justify the
complexification that pthread support would entail and that (2) is satisfied
by different means.
So how do you improve performance on multithreaded machines, to take advantage
of all processors, i.e. if (1) is not worth the effort, then how to
accomplish (2)?
Answer: Run multiple server mode daemons on different ports, and have the
client load balance between them by using multiple "remote" entries in the
client side config. This is actually more efficient than multithreading
because each OpenVPN daemon gets its own private virtual memory address
space, so there is no bus contention from multiple processors over the same
address space, as would occur with a multi-threaded execution model.
是的,由外部来做!
我想,JY从一开始就是思路清晰的吧,所以他把数据通道和控制通道分离,这个分离让单进程单线程的处理更加超级紧凑,让特殊的SSL过程(请原谅,我也是SSL关注者,遗憾的是,我关注了两者,不光是SSL,还有数据传输)处理的优化和数据传输的优化可以分开进行。
不要怀疑OpenVPN的低效了,它作为一个单进程单线程的程序,它很紧凑,在这一个仅有的线程里,它的packet schedule算法可谓最优化,如果想优化它,注意JY的Answer,同时注意我的blog吧。我不敢和JY称兄道弟,但是事实证明,我俩的思路是一致的。
我的偏执在于,我实在不想多个OpenVPN侦听多个port,所以我才做了多线程,然而,请看一眼我的多线程版本就知道,我其实对于包传输没有做任何改动,只是共享了multi_instance链表以及IP地址pool而已。我所做的工作也都是外围的工作,我没有修改OpenVPN的源码,因为我知道它已经够紧凑,所以我只做外围的封装。我略微修改了协议,但这只是一脬垃圾而已。
复杂性让位于简洁性的一个完美的例子。