JVM 垃圾回收机制及性能调优案列

现在的位置: 首页 > 综合 > 正文

JVM 垃圾回收机制及性能调优案列

2017年12月06日 ⁄ 综合 ⁄ 共 17277字 ⁄ 字号小中大 ⁄ 评论关闭

文章目录

Snapshots from a "verbose:gc" Log File Generated by Using the Following Switches
Old Generation GC

堆大小搭配要注意
JVM堆内存实际分成好几种，可以详细配置
吞吐量暂停（垃圾回收的时候经常带来程序暂停）选择
并行垃圾回收能大大降低暂停时间，稍微提高一点CPU使用

先看看堆内存种类：
垃圾回收主要是对Young Generation块和Old Generation块内存进行回收，YG用来放新产生的对象，经过几次回收还没回收掉的对象往OG中移动，
对YG进行垃圾回收又叫做MinorGC，对OG垃圾回收叫MajorGC，两块内存回收互不干涉

看案例：
jmap -heap 2343
Attaching to process ID 2343, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 11.0-b16

using thread-local object allocation.
Parallel GC with 8 thread(s)

Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   MaxHeapSize      = 4294967296 (4096.0MB)
   NewSize          = 2686976 (2.5625MB)
   MaxNewSize       = -65536 (-0.0625MB)
   OldSize          = 5439488 (5.1875MB)
   NewRatio         = 2                   （YG，OG 大小比为1：2）
   SurvivorRatio    = 8
   PermSize         = 21757952 (20.75MB)
   MaxPermSize      = 268435456 (256.0MB)

Heap Usage:
PS Young Generation
Eden Space:
   capacity = 1260060672 (1201.6875MB)
   used     = 64868288 (61.86322021484375MB)
   free     = 1195192384 (1139.8242797851562MB)
   5.148028935546367% used
From Space:
   capacity = 85524480 (81.5625MB)
   used     = 59457648 (56.70323181152344MB)
   free     = 26066832 (24.859268188476562MB)
   69.52120375359195% used
To Space:
   capacity = 85852160 (81.875MB)
   used     = 0 (0.0MB)
   free     = 85852160 (81.875MB)
   0.0% used
~~~~~~~~~~~~~~~~~~~~~~~~~~这三块为上面所说的YG大小和使用情况
PS Old Generation
   capacity = 2291138560 (2185.0MB)
   used     = 1747845928 (1666.8757705688477MB)
   free     = 543292632 (518.1242294311523MB)
   76.28722062099989% used
~~~~~~~~~~~~~~~~~~~~~~~~~~OG大小和使用情况
PS Perm Generation
   capacity = 108265472 (103.25MB)
   used     = 107650712 (102.6637191772461MB)
   free     = 614760 (0.5862808227539062MB)
   99.43217353728436% used

这台机器简单说YG内存1G，OG内存2G，总内存4G
在这样的配置下，GC运行情况：
jstat -gcutil -h5 2343 4s 100
S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
79.82   0.00 75.34 78.55 99.44   7646 1221.668   398 2052.993 3274.661
0.00 79.52   0.62 78.63 99.44   7647 1221.782   398 2052.993 3274.775 这里发生了一次YG GC，也就是MinorGC，耗时0.12s
0.00 79.52 28.95 78.63 99.44   7647 1221.782   398 2052.993 3274.775
0.00 79.52 46.34 78.63 99.44   7647 1221.782   398 2052.993 3274.775

同时可以看到总共进行了398次Major GC 总耗时2052.993 所以每次Major GC时间为：2052.993/398=5.16秒
这是个很严重的问题，进行Major GC的时候程序会暂停，无法响应，居然会暂停5秒多，这谁都无法接受吧
同样Minor GC进行了7647次，总用时1221.782 平均时间为0.16秒，算是可以接受

再来看看修改配置后：
jmap -heap 14103
Attaching to process ID 14103, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 11.0-b16

using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC

Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   MaxHeapSize      = 4294967296 (4096.0MB)
   NewSize          = 536870912 (512.0MB)
   MaxNewSize       = 536870912 (512.0MB)
   OldSize          = 5439488 (5.1875MB)
   NewRatio         = 4             YG：OG          1：4
   SurvivorRatio    = 8
   PermSize         = 268435456 (256.0MB)
   MaxPermSize      = 268435456 (256.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
   capacity = 483196928 (460.8125MB)
   used     = 428284392 (408.4438247680664MB)
   free     = 54912536 (52.368675231933594MB)
   88.63557841162434% used
Eden Space:
   capacity = 429522944 (409.625MB)
   used     = 404788608 (386.0364990234375MB)
   free     = 24734336 (23.5885009765625MB)
   94.24144010337199% used
From Space:
   capacity = 53673984 (51.1875MB)
   used     = 23495784 (22.407325744628906MB)
   free     = 30178200 (28.780174255371094MB)
   43.77499534970238% used
To Space:
   capacity = 53673984 (51.1875MB)
   used     = 0 (0.0MB)
   free     = 53673984 (51.1875MB)
   0.0% used
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~YG 大小和使用状态
concurrent mark-sweep generation:
   capacity = 3758096384 (3584.0MB)
   used     = 1680041600 (1602.2125244140625MB)
   free     = 2078054784 (1981.7874755859375MB)
   44.70459052494594% used
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~OG 大小和使用状态
Perm Generation:
   capacity = 268435456 (256.0MB)
   used     = 128012184 (122.0819320678711MB)
   free     = 140423272 (133.9180679321289MB)
   47.688254714012146% used

在这个配置下，GC运行情况：
jstat -gcutil -h5 14103 4s 100
S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
47.49   0.00 64.82 46.08 47.69 20822 2058.631    68   22.734 2081.365
0.00 37.91 38.57 46.13 47.69 20823 2058.691    68   22.734 2081.425 这里发生了一次YG GC，也就是MinorGC，耗时0.06s
46.69   0.00 15.19 46.18 47.69 20824 2058.776    68   22.734 2081.510
46.69   0.00 74.59 46.18 47.69 20824 2058.776    68   22.734 2081.510
0.00 40.29 19.95 46.24 47.69 20825 2058.848    68   22.734 2081.582

MajorGC平均时间：22.734/68=0.334秒（上面是5秒多吧）
MinorGC平均时间：2058.691/20823=0.099秒（比上面略少）

===============================================================

These two new collectors introduced in J2SE1.4.1 are:

parallel collector
The parallel collector is implemented in the young generation. It is multi-threaded and stop-the-world. This collector enables garbage collection to occur on multiple threads for better performance on multiprocessor machines. Even though it suspends all "mutators" (application threads), it is able to complete the given amount of garbage collection work much more quickly by leveraging all available CPUs on the system. This reduces the GC pauses in the young generation significantly. The parallel collector thus enables the applications to scale to larger number of CPUs as well as larger memory.
concurrent mark-sweep (cms) collector
The concurrent mark-sweep collector (CMS) is implemented in the old generation. The CMS collector executes "mostly concurrently" with the application, hence, is sometimes referred to as the "mostly-concurrent garbage collector". It trades the utilization of processing power that would otherwise be available to the application for shorter garbage collection pause times. The CMS collection is split into four phases:
- Initial mark
- Concurrent marking
- Remark
- Concurrent sweeping
The "initial mark" and and "remark" phases are stop-the-world phases, in which the CMS collector has to "suspend" the mutators. During initial mark, it records all objects directly reachable from the "roots" of the system. During the "concurrent marking" phase, mutators are resumed and a concurrent marking phase is initiated. During the "remark" phase, the mutators are once again suspended to complete the final marking, and finally, during the "concurrent sweeping" phase, mutators are resumed and all unmarked objects are deallocated, while concurrently sweeping over the heap. The initial mark and remark pauses are quite minimal. For an old generation size of 1G, they may be ~ 200 ms or less. The concurrent sweeping phase, when the garbage is collected, may still take as much time as the mark-compact collector; however, this pause is hidden as the mutators are not suspended. The "mostly concurrent" nature of the CMS collector enables the JVM to scale to larger heaps and CPUs, thus addressing both the latency and throughput issues arising from the default mark-compact stop-the-world collector.

Table 1 highlights the features of the different collectors available in J2SE1.4.1.

Young Generation Collectors	Old Generation Collectors
Copying collector	Mark-compact collector
Default Stop-the-world Single threaded All J2SEs	Default Stop-the-world Single threaded All J2SEs
Parallel collector	Concurrent mark-sweep collector
Stop-the-world Multi-threaded J2SE1.4.1+	Mostly-concurrent Single threaded J2SE1.4.1+

Table 1. Summary of the features of different garbage collectors

Figures 1, 2 and 3 are graphical illustrations of the different garbage collectors. The green arrows represent a multi-threaded application running on a multi-CPU box. The red arrows represent the GC threads. The length of the GC thread roughly represents the length of the GC pause.

Figure 1. Young generation collectors.

Figure 2. Old generation collectors. Garbage collection with the Parallel Collector and Concurrent mark-sweep collector enabled together. Best Case scenario for high throughput and low latency.

Figure 3. Garbage Collection with the two new collectors enabled together.

As illustrated by the diagrams above, use of the parallel collector in the young generation and the concurrent mark-sweep collector in the old generation together can help reduce pause times and GC sequential overhead. These two collectors thus help the applications to scale to larger number of processors, and to larger memory.

GC配置详细解释，和GCLog日志解析：

Snapshots from a "verbose:gc" Log File Generated by Using the Following Switches

java -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintTenuringDistribution -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Xmx512m -Xms512m -XX:NewSize=24m -XX:MaxNewSize=24m -XX:SurvivorRatio=2 <app-name>

Young Generation GC

311.649: [GC 311.65: [ParNew Desired survivor size 4194304 bytes, new threshold 3 (max 31)
- age 1: 1848472 bytes, 1848472 total
- age 2: 1796200 bytes, 3644672 total
- age 3: 1795664 bytes, 5440336 total
: 21647K->5312K(24576K), 0.1333032 secs] 377334K->362736K(516096K), 0.1334940 secs]

The GC snapshot above describes a young generation GC and gives the following information:

Total JVM heap at this GC was 516096K and total heap size of the young generation was 24576K.

The old generation heap at the time of this GC was 516096K - 24576K = 491520K.

Describes the "aging distribution" of the objects allocated by the application in the young generation at the end of this GC cycle. This GC cycle set the new threshold=3 meaning that objects will age up to 3 before the next GC cycle takes place.

The Desired survivor size of 4194304 bytes was determined by SurvivorRatio=2, which sets the ratio of Eden/Survivor space=2. With a young generation heap set to 24MB, this would mean 24MB = 2 * Survivor space + Survivor space + Survivor space, or Survivor space=8MB. With a default value of TargetSurvivorRatio=50, desired survivor size was set to 8 * .5 = 4MB

Time stamp of the GC relative to the start of the application run was 311.649 seconds

Total GC pause for this collection in the young generation was 0.1334940 seconds.

The size of the used young generation heap before this GC was 21647K

The size of the used young generation heap after this GC was 5312K

377334K - 362736K = 14598K of data was garbage collected in the young generation.

21647K - 5312K - 14598K = 1737K was promoted into the old generation.

The total size of the used heap was 377334K before this GC took place.

The total size of the used heap was 362736K after this GC took place.

The size of used heap in old generation before this GC was 377334K - 21647K = 355687K

The size of used heap in old generation after this GC was 362736K - 5312K = 357424K

The total size of objects promoted into the old generation during this GC was 357424K - 355687K = 1737K

The legend ParNew indicates that the parallel collector was used in the young generation. The use of default copying collector is indicated by a legend DefNew, instead of ParNew.

The convention used in the verbose:gc log format is that if there's something being reported that is specific to a generation, it's preceded by the name of that generation.

[GC [gen1 info1] [gen2 info2] info]
info is general to GC and includes information for the young and old generation combined, info<n> is specific to gen<n>. So, attention needs to be paid to the nesting and matching of square brackets.

Old Generation GC

CMS Collector

The snapshots below indicate the use of the concurrent mark-sweep collector in the old generation. As mentioned earlier in the paper, this phase is split into four phases.

513.474: [GC [1 CMS-initial-mark: 335432K(491520K)] 340897K(516096K), 0.0482491secs]

This is the initial mark phase which is stop-the-world and took 0.048 seconds.
The time relative to the start of the application was 513.474 seconds.
335432K was the size of the old generation used heap.
491520K was the size of the total used heap including the young generation used heap.
340897K was the total size of the old generation heap.
516096K was the total size of the heap including the young generation.
This phase does not recycle any memory.
The prefix legend "[GC" represents the stop-the-world phase.

513.523: [CMS-concurrent-mark-start]
514.337: [CMS-concurrent-mark: 0.814/0.814 secs]
514.337: [CMS-concurrent-preclean-start]
514.36: [CMS-concurrent-preclean: 0.023/0.023 secs]

This is the concurrent mark phase which took less than a second, (0.814 + 0.023) seconds, however, the application runs concurrently with the GC during this time. Again, this phase does not collect any garbage.

514.361: [GC 514.361: [dirty card accumulation, 0.0072366 secs]
514.368: [dirty card rescan, 0.0037990 secs]
514.372: [remark from roots, 0.1471209 secs]
514.519: [weak refs processing, 0.0043200 secs] [1 CMS-remark: 335432K(491520K)] 352841K(516096K), 0.1629795 secs]

This is the stop-the-world remark phase which took about 0.162 secs.
335432K was the size of used heap in the old generation.
491520K was the total heap size of the old generation.
352841K was the total size of the used heap including the young generation.
516096K was the total size of the heap including the young generation.
No memory is recycled during this phase.
The legend "[GC", as mentioned earlier, represents the stop-the-world phase.

514.525: [CMS-concurrent-sweep-start]
517.692: [CMS-concurrent-sweep: 2.905/3.167 secs]
517.693: [CMS-concurrent-reset-start]
517.766: [CMS-concurrent-reset: 0.073/0.073 secs]

This is the concurrent sweep phase which took about 3 seconds, however, application threads can run concurrently with the GC thread during this phase on a multi-processor system. Among the four CMS phases, this is the only phase when the heap is swept and collected.

Default Mark-Compact Collector

If instead of the CMS collector, the default mark-compact collector is used in the old generation, the old GC snapshot would look as:

719.2: [GC 719.2: [DefNew: 20607K->20607K(24576K), 0.0000341 secs]719.2: [Tenured: 471847K->92010K(491520K), 2.6654172 secs] 492454K->92010K(516096K), 2.6658030 secs]

719.2 seconds was the time relative to the start of the application at which this GC took place.
The legend DefNew indicates the use of default copying collector in the young generation.
The legend "[GC" represents a stop-the-world GC triggered by the JVM. For an old generation GC, requested by the application through the system call, System.gc(), the above snapshot would be prefixed by the legend "Full GC"
The total heap size of the young generation was 24576K.
A young generation collection (only) was initially attempted. It was found that that collection couldn't be done because the old generation could not guarantee that it would absorb all potentially live data. As a result, only the young collection did not happen, and the young generation reported that no memory was reclaimed (notice that it all finished rather quickly, in a fraction of a millisecond). That's because nothing much happened beyond determining that the young generation collection couldn't happen:
[DefNew: 20607K->20607K(24576K), 0.0000341 secs]

It was then decided that a "tenured" generation collection needed to happen, (obviously), because that generation was too full to absorb promotions from the young generation, so a full mark-compact collection was done.

The legend Tenured indicates a full mark-compact GC in the old generation. The "old generation" is sometimes referred to as "Tenured generation"
471847K was the size of used heap in the old generation before GC.
92010K was the size of used heap in the old generation after the GC.
491520K was the total heap size of the old generation at the time of this GC.
492454K was the total used heap, for young and old generations combined, before GC.
92010K was the total used heap, for young and old generations combined, after GC.
Total garbage collected during this GC was 492454K - 92010K = 399837K
516096K was the size of the total heap for the JVM.
2.6658030 seconds was the total time for which the application was suspended as a result of this GC.

Update 2009-12-10
JVM GC注意点：
CMS Old Generation是没有压缩的回收方式，所以可能导致内存碎片；内存不足/或内存快爆满时会导致Stop All World 解决：-XX:+UseCMSCompactAtFullCollection -XX:CMSInitiatingOccupancyFraction=80（设置每次回收时，压缩内存碎片，并且在年老代达到80%使用率时马上进行回收）；
    在系统空闲时可能会有较长时间的Full GC
    记得：-XX:+DisableExplicitGC? 关闭System.gc()，否则会导致Full GC
    CMS回收器已经比较理想，能够将1.5G的堆FullGC时间降到平均每次50ms左右
    如果Old Generation对象不多（比如占用空间<200M）使用CMS反而效率不高

JVM GC 相关总结
问题，总内存大小6G，Old分配2G结果FullGC时间太长，大约2S，YoungGC 0.1S

解决：1 缩小Old块大小如Old 1.5G Young 4.5G 效果FullGC变成了0.5S(但是在访问高峰期效果很不好，每次只能回收不到10%的空间，所以需要频繁FGC)，YoungGC时间不变0.1S
2 增大Old块大小如Old 3 G Young 3 G 效果FullGC变成了1.3S，YoungGC时间不变0.1S

原因：每次YoungGC时存活的不到200M，也就是时间都用在复制这200M的对象身上，所以YoungGC总大小对暂停时间没有影响，只和每次YGC时，存活下来的对象的大小有关，存活的越少越快
FullGC时，如果Old越大需要时间越长，中间需要整理；太小的话GC效率太低，每次回收回来的空间有限，需要频繁FGC。

Update 2009-12-14
发现FullGC的时候Rescan使用了大量时间：
3036.816: [GC[YG occupancy: 550898 K (2949120 K)]3036.816: [Rescan (parallel) , 0.1198670 secs]3036.936: [weak refs processing, 0.0168150 secs]3036.953: [class unloading, 0.0566650 secs]3037.010: [scrub symbol & string tables, 0.0117000 secs] [1 CMS-remark: 2211101K(3145728K)] 2761999K(6094848K), 0.2663660 secs] [Times: user=0.80 sys=0.04, real=0.26 secs]

4467.799: [GC[YG occupancy: 1686813 K (2949120 K)]4467.799: [Rescan (parallel) , 1.2868060 secs]4469.086: [weak refs processing, 0.0168640 secs]4469.103: [class unloading, 0.0521170 secs]4469.155: [scrub symbol & string tables, 0.0118140 secs] [1 CMS-remark: 2205795K(3145728K)] 3892609K(6094848K), 1.4293820 secs] [Times: user=1.60 sys=0.57, real=1.43 secs]

5862.375: [GC[YG occupancy: 1673859 K (2949120 K)]5862.375: [Rescan (parallel) , 1.2085060 secs]5863.584: [weak refs processing, 0.0157410 secs]5863.600: [class unloading, 0.0535530 secs]5863.653: [scrub symbol & string tables, 0.0116760 secs] [1 CMS-remark: 2204312K(3145728K)] 3878171K(6094848K), 1.3540610 secs] [Times: user=1.54 sys=0.42, real=1.36 secs]

主要原因是这个时候Eden块使用比较多，550M的时候只需要0.12秒，1.6G就需要1.2秒了
解决：增加参数：-XX:+CMSScavengeBeforeRemark 指定在Remark（也就是Rescan）前强行进行一次MinorGC
这样FullGC的暂停时间由1.5M降到0.35秒

还尝试过使用参数：CollectGen0First 在FullGC前强行进行一次MinorGC，但是效果不好，主要是因为在我们的系统中从这个过程到Remark大约需要5-10秒，这段时间Eden空间开支需要差不多1.5G，所以基本没什么效果
Remark开始：
1520.719: [GC[YG occupancy: 2485901 K (2949120 K)]1520.720: [GC 1520.720: [ParNew
Desired survivor size 100663296 bytes, new threshold 15 (max 15)
- age   1:   20274872 bytes,   20274872 total
- age   2:   15799848 bytes,   36074720 total
- age   3:   11308728 bytes,   47383448 total
- age   4:    7179272 bytes,   54562720 total
- age   5:    7728992 bytes,   62291712 total
- age   6:    6570360 bytes,   68862072 total
- age   7:    9366104 bytes,   78228176 total
- age   8:    6087376 bytes,   84315552 total
- age   9:    8942088 bytes,   93257640 total
- age 10:    6186592 bytes,   99444232 total
: 2485901K->152025K(2949120K), 0.1270290 secs] 4058777K->1735681K(6094848K), 0.1273970 secs强行插入的MinorGC] [Times: user=0.44 sys=0.02, real=0.13 secs]
1520.847: [Rescan (parallel) , 0.0388200 secs时间明显减少]1520.886: [weak refs processing, 0.0148430 secs]1520.901: [class unloading, 0.0561410 secs]1520.957: [scrub symbol & string tables, 0.0133140 secs] [1 CMS-remark: 1583655K(3145728K)] 1735681K(6094848K), 0.3050640 secs] [Times: user=0.80 sys=0.03, real=0.30 secs]

/usr/local/resin/bin/httpd.sh start -Xms1024m -Xmx4048m -Xss1000k -Djava.awt.headless=true -XX:PermSize=64M -XX:MaxPermSize=256M -XX:ErrorFile=/data/mqq_data/log/jvm_error.log -XX:NewSize=256m -XX:MaxNewSize=256m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+DisableExplicitGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime