问题描述

  一台M4000主机,操作系统是solaris10,上面的resin进程cpu占用率过高,达到了70%,如下:

-bash-3.00$ ps -ef -o pid,pcpu,args|grep java
 1511  0.1 /usr/java/bin/java -Dwebview.htdocs=/etc/opt/FJSVwvcnf/htdocs/FJSVwvbs -mx128m 
 2135  0.0 /usr/java/bin/java -server -Xmx128m -XX:+BackgroundCompilation -XX:PermSize=32m
15945  0.0 sh -c /svi/jdk150/jdk1.5.0_06/bin/java  -server -Xms512m -Xmx3072m -XX:MaxPe
15946 70.7 /svi/jdk150/jdk1.5.0_06/bin/java -server -Xms512m -Xmx3072m -XX:MaxPermSize=

排查过程

  1. 首先需要查找cpu占用率过高的LWP

-bash-3.00$ prstat -L -p 15946
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/LWPID      
 15946 slview   3336M 3301M sleep   15    0   3:56:27 2.2% java/49
 15946 slview   3336M 3301M sleep    8    0   3:33:17 2.2% java/52
 15946 slview   3336M 3301M sleep   12    0   3:32:20 2.2% java/50
 15946 slview   3336M 3301M sleep   13    0   3:29:43 2.2% java/51
 15946 slview   3336M 3301M sleep   13    0   3:30:54 2.2% java/47
 15946 slview   3336M 3301M sleep   12    0   1:24:19 2.2% java/64
 15946 slview   3336M 3301M sleep   15    0   1:07:55 2.1% java/144

 

   2. 查找LWP与java线程的对应关系

  

-bash-3.00$ pstack 15946|grep lwp
-----------------  lwp# 47 / thread# 47  --------------------
 ff2c49fc _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 48 / thread# 48  --------------------
 ff2c5cd0 lwp_cond_wait (1704928, 1704910, 0, 0)
 ff2c49fc _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 49 / thread# 49  --------------------
 ff2c49fc _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 50 / thread# 50  --------------------
 ff2c49fc _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 51 / thread# 51  --------------------
 ff2c49fc _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 52 / thread# 52  --------------------
 ff2c49fc _lwp_start (0, 0, 0, 0, 0, 0)

 

3. used the jstack <pid> find the callstack of thread

  $ jstack -m 15946 获取所有线程的调用堆桟

hread t@50: (state = IN_VM)
 - java.lang.AbstractStringBuilder.expandCapacity(int) @bci=28, line=99 (Compiled frame; information may be imprecise)
 - per.xwnmp.flux.report.RptFluxHisQuery.GetFluxData(java.lang.String[], java.util.HashMap, java.lang.String, java.lang.Stri
ng, java.lang.String, java.lang.String, java.lang.String) @bci=480, line=509 (Interpreted frame)
 - per.xwnmp.flux.report.RptFluxHisQuery.GenFluxReport(java.lang.String, java.lang.String[], java.lang.String, java.lang.Str
ing, java.lang.String, java.lang.String, java.lang.String) @bci=124, line=82 (Interpreted frame)
 - _nos._flux._flux._FluxPerfView_0Excel__jsp._jspService(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletRespo
nse) @bci=930, line=162 (Interpreted frame)
 - com.caucho.jsp.JavaPage.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=9, line=75 (Interpreted frame)
 - com.caucho.jsp.Page.subservice(com.caucho.server.http.CauchoRequest, com.caucho.server.http.CauchoResponse) @bci=214, line=506 (I
 - com.caucho.server.TcpConnection.run() @bci=73, line=139 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=595 (Interpreted frame)


Thread t@52: (state = IN_VM)
 - per.xwnmp.flux.report.RptFluxHisQuery.GetFluxData(java.lang.String[], java.util.HashMap, java.lang.String, java.lang.Stri
ng, java.lang.String, java.lang.String, java.lang.String) @bci=435, line=508 (Compiled frame; information may be imprecise)
 - per.xwnmp.flux.report.RptFluxHisQuery.GenFluxReport(java.lang.String, java.lang.String[], java.lang.String, java.lang.Str
ing, java.lang.String, java.lang.String, java.lang.String) @bci=124, line=82 (Interpreted frame)
 - _nos._flux._flux._FluxPerfView_0Excel__jsp._jspService(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletRespo
nse) @bci=930, line=162 (Interpreted frame)
 - com.caucho.jsp.JavaPage.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=9, line=75 (Interpreted frame)
 - com.caucho.jsp.Page.subservice(com.caucho.server.http.CauchoRequest, com.caucho.server.http.CauchoResponse) @bci=214, line=506 (I
nterpreted frame)