0% found this document useful (0 votes)
207 views136 pages

GC Tuning Strategies for Java Performance

This document discusses garbage collection and memory profiling in Java. It provides an overview of common garbage collection tools and options in the Java Virtual Machine. The key topics covered include generational garbage collection, object lifecycles, metrics for analyzing garbage collection performance, and tips for tuning heap sizes and other JVM options to optimize garbage collection.

Uploaded by

siva_relax
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
207 views136 pages

GC Tuning Strategies for Java Performance

This document discusses garbage collection and memory profiling in Java. It provides an overview of common garbage collection tools and options in the Java Virtual Machine. The key topics covered include generational garbage collection, object lifecycles, metrics for analyzing garbage collection performance, and tips for tuning heap sizes and other JVM options to optimize garbage collection.

Uploaded by

siva_relax
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Garbage Collection &

Memory Profiling

Jeff Taylor
Sun Microsystems

1
Agenda

• Case for GC and Tuning
• Object Lifecycle
• Generational Collection
• Garbage Collectors
• JVM GC Observability Tools

2
Tools
• Verbose GC • ps
• PrintGCStats • mdb
• jps, jstack, jinfo • pmap
• jconsole • sar
• jstat • JVM Ergonomics
• VisualVM • VisualGC
• GCHisto • hprof

3
Sun JVM Options
• Standard options
> All platforms
• -X options
> Not all platforms
• -XX options
> Not all platforms
> May need additional privileges to use

4
A Multitude of Options
JRE Number of Options Options
Version Options Added Removed
1.4 159 ­­­ ­­­
1.4.1 224 70 5
1.4.2 260 44 8
5 343 98 10
6 427 102 35

5
On the Shoulders of Giants
• Material in the presentation is
based on previous work,
including:
• JavaOne Online Technical
Sessions
> TS-4887
> TS-6500

6
Objects Need Storage Space
• Age old problems
> How to allocate space efficiently
> How to reclaim unused space (garbage) 
efficiently and reliably
• C (malloc and free)
• C++ (new and delete)
• Java (new and Garbage Collection)

7
Why is GC Necessary?
• Alternative to manual deallocation
• Hard to debug errors in storage deallocation
• Deallocation management may lead to a tight 
binding between supposedly independent 
modules
• Manual Memory Management Breaks 
Encapsulation.
• Message Passing Leads to Dynamic Execution 
Paths
• Garbage Collection Works

8
Why is GC Tuning Necessary?
• GC is not “one size fits all”
• Best GC characteristics depend on 
requirements of the application and 
deployment
> Some applications need best throughput
> Other applications need low pause time
> Variations in deployment size, deployment 
hardware, and shared resources

9
GC Tuning is an Art
• Unfortunately, we can't give you a flawless 
recipe or a flowchart that will apply to all 
your GC tuning scenarios
• GC tuning involves a lot of common 
pattern recognition
• This pattern recognition requires 
experience

10
Object Realities
• Generational hypothesis
• Most objects are very short lived
> 80-98% of all newly allocated objects
die within a few million instructions
> 80-98% of all newly allocated objects
die before another megabyte has been
allocated
• This impacts heavily on choices for
GC algorithms

11
Why Generational?
• Most Java applications
> Conform to the weak generational hypothesis
> Really benefit from generational GC
> Performance­wise, generational GC is hard to 
beat in most cases
• All GCs in the HotSpot JVM are 
generational

12
Generational Garbage Collectors
• Driven by the weak generational hypothesis
• Split the heap into “generations”
> Usually two: young generation / old generation
• Concentrate collection effort on the young generation
> Good payoff (a lot of space reclaimed) for your collection 
effort
> Lower GC overhead
> Most pauses are short
• Reduced allocation rate into the old generation
> Young generation acts as a “filter”

13
GC Vocabulary Release Tenured (O ld)
Object Lifecycle
Increm ental

Full
GC
GC

Survivor 1
Tenuring
Release
Release
Survivor 2
Increm ental
new () GC

Eden (Young)

14
Metrics for Collection
• Heap population (aka Live set) • Cycle time
> How much of your heap is alive > How long it takes the collector to 
free up memory
• Allocation rate
> How fast you allocate
• Marking time
> How long it takes the collector to 
• Mutation rate find all live objects
> How fast your program updates  • Sweep time
references in memory
> How long it takes to locate dead 
• Heap Shape objects
> The shape of the live object graph > * Relevant for Mark­Sweep
> * Hard to quantify as a metric...
• Compaction time
• Object Lifetime > How long it takes to free up 
> How long objects live memory by relocating objects
> * Relevant for Mark­Compact

15
Jconsole

16
Jconsole

17
Jconsole

18
Jconsole

19
Jconsole

20
Which JVM options should be used
for large scale applications?
• Answer: It depends
> Hardware
> Application
> Usage patterns 
• One of the fundamental questions that 
needs to be answered by every 
administrator is: “Is memory being used 
efficiently?”
• Solaris has a significant advantage for 32­
bit Java
21
Heap Sizing Trade-Offs

• Generally, the larger the heap space, the better
> For both young and old generation
> Larger space: less frequent GCs, lower GC 
overhead, objects more likely to become garbage
> Smaller space: faster GCs (not always! see later)
• Sometimes max heap size is dictated by available 
memory and/or max space the JVM can address
> You have to find a good balance between young 
and old generation size

22
Sizing Heap Spaces
• ­Xmx<size> : max heap size
> young generation + old generation
• ­Xms<size> : initial heap size
> young generation + old generation
• ­Xmn<size> : young generation size
• Applications with emphasis on 
performance tend to set ­Xms and ­Xmx to 
the same value
• When ­Xms != ­Xmx, heap growth or 
shrinking requires a Full GC
23
Should -Xms == -Xmx?
• Set ­Xms to what you think would be your 
desired heap size
> It's expensive to grow the heap
• If memory allows, set ­Xmx to something 
larger than ­Xms “just in case”
> Maybe the application is hit with more load
> Maybe the DB gets larger over time
• In most occasions, it's better to do a Full 
GC and grow the heap than to get an OOM 
and crash
24
Sizing Heap Spaces (ii)

• ­XX:PermSize=<size> : permanent generation initial 
size
• ­XX:MaxPermSize=<size> : permanent generation 
max size
• Applications with emphasis on performance almost 
always set ­XX:PermSize and ­XX:MaxPermSize to the 
same value
> Growing or shrinking the permanent generation 
requires a Full GC too
• Unfortunately, the permanent generation occupancy 
is hard to predict
25
Priority 1: Eden Heap
• When a well written Java program performs poorly, 
there are 3 typical causes. 
> The Java heap size is two small causing an 
excessive amount of garbage collection.
– CPU test (SPARC vs. Intel/AMD vs. CMT)
> Java heap size that is so big that portions are 
paged to virtual memory.
– Out of RAM 
> GC pauses can be too long with Java 64­bit heaps
– Long pauses 
• Eden heap sizing is critical

26
Young Generation Sizing
• Eden size determines
> The frequency of minor GCs
> Which objects will be reclaimed at age 0
• Increasing the size of the Eden will not 
always affect minor GC times
> Remember: minor GC times are proportional to 
the amount of objects they copy (i.e., the live 
objects), not the young generation size

27
Sizing Heap Spaces
• ­XX:NewSize=<size> : initial young 
generation size
• ­XX:MaxNewSize=<size> : max young 
generation size
• ­XX:NewRatio=<ratio> : young 
generation to old generation ratio
• Applications with emphasis on 
performance tend to use ­Xmn to size the 
young generation since it combines the use 
of ­XX:NewSize and ­XX:MaxNewSize
28
Priority 2: Other critical parameters
• Survivor Ratio
• Collector Algorithm
> Concurrent Mark Sweep
> ParaNewGC
> ParallelGC
• Egonomics
> Auto tuning in Java 1.5+
> Good single JRE sizing and needs burn­in.
> Use the final values.

29
A Reasonable Goal
• Try to keep garbage collection at 5% or 
less of the JVM’s CPU time. 
• If you can’t accomplish this by adjusting 
the JVM parameters, consider purchasing 
additional RAM. 
• The information presented in this article is 
intended to help you understand how to 
measure the current status to achieve this 
goal.

30
Technique 1: Minimize Full GC's
• Old generation collections use more resources
> Long pauses
> Uses more CPU cycles 
• 1. Increase the size of the Old Generation space
> Full GC's occur when the Old Generation is 
nearly full (while respecting the “new 
generation guarantee”)
• 2. Minimize the rate at which objects are tenured
> My goal as a tuner is to stop objects from being 
unnecessarily tenured.

31
Object Lifecycle Tenured (O ld)
Release
Increm ental
GC

Full
GC
Expensive Full GC's
Survivor 1
Tenuring Increase O ld size
Release M inim ize prom otions
Survivor 2 Release
Increm ental
GC
new ()

Eden (Young)

32
Technique 2: Slow tenuring process
• Why objects are promoted:
• Survive a certain number of young 
generation garbage collections. 
> Therefore, increasing the time interval between 
young generation collections implies that an 
object will be older before being tenured. 
• Survivor space too small to contain all of 
the objects which survive a young 
generation collection, in which case the 
survivors spill into the old generation. 
> Therefore, a large survivor space is a good 
thing. 33
Spilling & Pumping Tenured (O ld)
Release
Increm ental

Full
GC
GC
Slow  tenuring process
Release
(“pum ping”) Survivor 1
Tenuring

Release Survivor 2
Increm ental
new () GC

If no Space in Survivor,
Eden (Young) objects “spill” to old
34
Technique 3: Keep objects in Young
• The rate at which new Java objects are 
created can not be controlled by the 
administrator
>  Function of the application, usage pattern and 
the algorithms implemented by those 
programmers 
• Interval between young generation 
collections is simply the size of Eden 
divided by the creation rate. 
> Increasing the size of Eden makes the time 
interval between new generation collections 
longer. 35
Keep objects in Young Tenured (O ld)

Increm ental
Release

Full
GC
GC

Survivor 1 Release
Tenuring
Release Survivor 2
Increm ental
new () GC
Interval between “incremental GC's”
collections is simply the size of 
Eden divided by the creation rate.
Eden (Young) Double the size of Eden. Half the rate. 

36
VisualGC – Old is growing, spilling

Survivor Full

O ld Grow th

No Tenuring
one and Everything Participating on the 
Netw ork
37
VisualGC – Well behaved GC

Gradual slope

Survivor not full

O ld is stable

10 Tenuring 
one and Everything Participating on the 
generations in use
Netw ork
38
VisualGC – Xms default (too small)
Eden is very sm all

Greyed 
grid is 
unmapped
RAM

39
Tenuring
• ­XX:SurvivorRatio=3
• ­XX:TargetSurvivorRatio=<percent>, e.g., 50
> How much of the survivor space should be filled
– Typically leave extra space to deal with “spikes”
• ­XX:InitialTenuringThreshold=<threshold>
• ­XX:MaxTenuringThreshold=<threshold>
• ­XX:+AlwaysTenure
> Never keep any objects in the survivor spaces
• ­XX:+NeverTenure
> Very bad idea!

40
Tenuring Threshold Trade-Offs

• Try to retain as many objects as possible in 
the survivor spaces so that they can be 
reclaimed in the young generation
> Less promotion into the old generation
> Less frequent old GCs
• But also, try not to unnecessarily copy very 
long­lived objects between the survivors
> Unnecessary overhead on minor GCs
• Not always easy to find the perfect balance
> Generally: better copy more, than promote 
more 41
Tenuring Distribution
• Monitor tenuring distribution with
­XX:+PrintTenuringDistribution
Desired survivor size 6684672 bytes, new threshold 8 (max 8)
­ age   1:    2315488 bytes,    2315488 total
­ age   2:      19528 bytes,    2335016 total
­ age   3:         96 bytes,    2335112 total
­ age   4:         32 bytes,    2335144 total

• Young generation seems well tuned here
> We can even decrease the survivor space size

42
Tenuring Distribution (ii)
Desired survivor size 3342336 bytes, new threshold 1 (max 6)
­ age   1:    3956928 bytes,    3956928 total 

• Survivor space too small!
> Increase survivor space and/or eden size

43
Tenuring Distribution (iii)
Desired survivor size 3342336 bytes, new threshold 6 (max 6)
­ age   1:    2483440 bytes,    2483440 total
­ age   2:     501240 bytes,    2984680 total
­ age   3:      50016 bytes,    3034696 total
­ age   4:      49088 bytes,    3083784 total
­ age   5:      48616 bytes,    3132400 total
­ age   6:      50128 bytes,    3182528 total 

• Might be able to do better
> Either increase max tenuring threshold
> Or even set max tenuring threshold to 2
– If ages > 6 still have around 50K of surviving bytes

44
VisualVM

45
VisualVM

46
VisualVM

47
VisualVM

48
VisualVM

49
Jstat: Time in GC & Generation Sizes
• VisualGC is pretty, but:
> It is interactive only
– You can't use it for historical analysis
> You can't put the results into a spread sheet
• Instead use jstat:
# pgrep java | xargs ­n 1 /usr/jdk/jdk1.5.0_06/bin/jstat ­gc

    S0C   S1C       S0U     S1U     EC         EU       OC       OU      PC      PU     YGC   YGCT   FGC   FGCT   GCT
273024.0 273024.0    0.0  1751.0 1092352.0  882496.2 2048000.0 805586.6 65536.0 31095.8 115   82.588  2  39.730 122.318
273024.0 273024.0 1745.9     0.0 1092352.0 1070228.3 2048000.0 923693.3 65536.0 30294.0 138   97.911  4  90.186 188.097 
273024.0 273024.0    0.0 60328.1 1092352.0  892815.4 2048000.0 659197.6 98304.0 62288.3 361  387.152  2  29.291 416.443

50
Jstat – survivor spaces

# jstat ­gc

   S0C   S1C       S0U     S1U     
273024.0 273024.0    0.0  1751.0 
273024.0 273024.0 1745.9     0.0 
273024.0 273024.0    0.0 60328.1

   S0C   S1C       S0U     S1U     EC         EU       OC       OU      PC      PU     YGC   YGCT   FGC   FGCT   GCT
273024.0 273024.0    0.0  1751.0 1092352.0  882496.2 2048000.0 805586.6 65536.0 31095.8 115   82.588  2  39.730 122.318
273024.0 273024.0 1745.9     0.0 1092352.0 1070228.3 2048000.0 923693.3 65536.0 30294.0 138   97.911  4  90.186 188.097
273024.0 273024.0    0.0 60328.1 1092352.0  892815.4 2048000.0 659197.6 98304.0 62288.3 361  387.152  2  29.291 416.443

51
Jstat – Eden and Old

# jstat ­gc

    EC         EU       OC       OU   
1092352.0  882496.2 2048000.0 805586.6 
1092352.0 1070228.3 2048000.0 923693.3 
1092352.0  892815.4 2048000.0 659197.6 
    S0C   S1C       S0U     S1U     EC         EU       OC       OU      PC      PU     YGC   YGCT   FGC   FGCT   GCT
273024.0 273024.0    0.0  1751.0 1092352.0  882496.2 2048000.0 805586.6 65536.0 31095.8 115   82.588  2  39.730 122.318
273024.0 273024.0 1745.9     0.0 1092352.0 1070228.3 2048000.0 923693.3 65536.0 30294.0 138   97.911  4  90.186 188.097
273024.0 273024.0    0.0 60328.1 1092352.0  892815.4 2048000.0 659197.6 98304.0 62288.3 361  387.152  2  29.291 416.443

52
Jstat – Garbage Collection Times

# jstat ­gc

YGC   YGCT   FGC   FGCT   GCT
115   82.588  2  39.730 122.318
138   97.911  4  90.186 188.097
361  387.152  2  29.291 416.443
    S0C   S1C       S0U     S1U     EC         EU       OC       OU      PC      PU     YGC   YGCT   FGC   FGCT   GCT
273024.0 273024.0    0.0  1751.0 1092352.0  882496.2 2048000.0 805586.6 65536.0 31095.8 115   82.588  2  39.730 122.318
273024.0 273024.0 1745.9     0.0 1092352.0 1070228.3 2048000.0 923693.3 65536.0 30294.0 138   97.911  4  90.186 188.097
273024.0 273024.0    0.0 60328.1 1092352.0  892815.4 2048000.0 659197.6 98304.0 62288.3 361  387.152  2  29.291 416.443

53
Limitations of jstat
• Jstat is sample based. 
> “a finite part of a statistical population whose 
properties are studied to gain information 
about the whole”
• Details are smoothed over. 
• Jstat is not a good tool to answer questions 
such as 
> “were the user performance complaints that 
came in after everyone returned from lunch 
due to excessive garbage collections at this 
particular time.” Or, 
> “how much space is available after the garbage 
collection completes?”  54
Monitoring the GC

• Online
> VisualVM: 
http://java.sun.com/performance/jvmstat/
> VisualGC: http://visualvm.dev.java.net/
– VisualGC is also available as a VisualVM plug­in
– Can monitor multple JVMs within the same tool
• Offline
> GC Logging
> PrintGCStats
> GChisto
55
GC Logging in Production

• Don't be afraid to enable GC logging in 
production
> Very helpful when diagnosing production issues
• Extremely low / non­existent overhead
> Maybe some large files in your file system. :­)
> We are surprised that customers are still afraid 
to enable it
• Real customer quote:
> “If someone doesn't enable GC logging in 
production, I shoot them!”
56
Most Important GC Logging
Parameters
• You need at least:
> ­XX:+PrintGCTimeStamps
– Add ­XX:+PrintGCDateStamps if you must
> ­XX:+PrintGCDetails
– Preferred over ­verbosegc as it's more detailed
• Also useful:
> ­Xloggc:<file>
> Separates GC logging output from application 
output

57
PrintGCStats
• Summarizes GC logs
• Downloadable script from
> http://java.sun.com/developer/technicalArticle
s/Programming/turbo/PrintGCStats.zip
• Usage
> PrintGCStats ­v cpus=<num> <gc log file>
– Where <num> is the number of CPUs on the 
machine where the GC log was obtained

58
PrintGCStats Parallel GC
what count total mean max stddev
gen0t(s) 193 11.470 0.05943 0.687 0.0633
gen1t(s) 1 7.350 7.34973 7.350 0.0000
GC(s) 194 18.819 0.09701 7.350 0.5272
alloc(MB) 193 11244.609 58.26222 100.875 18.8519
promo(MB) 193 807.236 4.18257 96.426 9.9291
used0(MB) 193 16018.930 82.99964 114.375 17.4899
used1(MB) 1 635.896 635.89648 635.896 0.0000
used(MB) 194 91802.213 473.20728 736.490 87.8376
commit0(MB) 193 17854.188 92.50874 114.500 9.8209
commit1(MB) 193 123520.000 640.00000 640.000 0.0000
commit(MB) 193 141374.188 732.50874 754.500 9.8209
alloc/elapsed_time = 11244.609 MB / 77.237 s = 145.586 MB/s
alloc/tot_cpu_time = 11244.609 MB / 1235.792 s = 9.099 MB/s
alloc/mut_cpu_time = 11244.609 MB / 934.682 s = 12.030 MB/s
promo/elapsed_time = 807.236 MB / 77.237 s = 10.451 MB/s
promo/gc0_time = 807.236 MB / 11.470 s = 70.380 MB/s
gc_seq_load = 301.110 s / 1235.792 s = 24.366%
gc_conc_load = 0.000 s / 1235.792 s = 0.000%
gc_tot_load = 301.110 s / 1235.792 s = 24.366%

59
PrintGCStats CMS

what count total mean max stddev


gen0(s) 110 24.381 0.22164 1.751 0.2038
gen0t(s) 110 24.397 0.22179 1.751 0.2038
cmsIM(s) 3 0.285 0.09494 0.108 0.0112
cmsRM(s) 3 0.092 0.03074 0.032 0.0015
GC(s) 113 24.774 0.21924 1.751 0.2013
cmsCM(s) 3 2.459 0.81967 0.835 0.0146
cmsCP(s) 6 0.971 0.16183 0.191 0.0272
cmsCS(s) 3 14.620 4.87333 4.916 0.0638
cmsCR(s) 3 0.036 0.01200 0.016 0.0035
alloc(MB) 110 11275.000 102.50000 102.500 0.0000
promo(MB) 110 1322.718 12.02471 104.608 11.8770
used0(MB) 110 12664.750 115.13409 115.250 1.2157
used(MB) 110 56546.542 514.05947 640.625 91.5858
commit0(MB) 110 12677.500 115.25000 115.250 0.0000
commit1(MB) 110 70400.000 640.00000 640.000 0.0000
commit(MB) 110 83077.500 755.25000 755.250 0.0000
alloc/elapsed_time = 11275.000 MB / 83.621 s = 134.835 MB/s
alloc/tot_cpu_time = 11275.000 MB / 1337.936 s = 8.427 MB/s
alloc/mut_cpu_time = 11275.000 MB / 923.472 s = 12.209 MB/s
promo/elapsed_time = 1322.718 MB / 83.621 s = 15.818 MB/s
promo/gc0_time = 1322.718 MB / 24.397 s = 54.217 MB/s
gc_seq_load = 396.378 s / 1337.936 s = 29.626%
gc_conc_load = 18.086 s / 1337.936 s = 1.352%
gc_tot_load = 414.464 s / 1337.936 s = 30.978%

60
GChisto
• Graphical GC log visualizer
• Under development
> Currently, can only show pause times
• Open source at
> http://gchisto.dev.java.net/

61
GChisto
• Graphical GC log visualizer
• Under development
> Currently, can only show pause times
• Open source at
> http://gchisto.dev.java.net/

62
GCHisto

63
How big can you make the heap?
• 1GB is not the limit of the JAVA HEAP!!!!!!!!!!!!!!
• 3 potential limits to the size of the Java Heap: 
> Physical memory
– Paging
– Add RAM or reduce usage
> Virtual memory
– Will not start
– malloc() failure.
– Add swap.
> Process address space.
– Cores
– Unstable JRE exits.
– Debugging pmap or mdb
64
Consumption Address Space

3 consumers of address space.

Java Heap fixed Address range.

Native Heap 

­> malloc() 

­> sockets windows, gzip, javac 
and JNI code.

Thread Count and Size (­Xss)

Ensure 200MB safety zone above Native 
HEAP.

Consider 64 bit Java
65
Threads & Address Space


Figure out why threads are created!

Is is one per session?

Is it a fixed server thread count?

Estimate maximum thread count. 

Win32/Linux 32 will have a smaller thread limit.

Solaris 32 == 4GB

Win32/Linux32 ==  2GB (even on a 64bit OS)

Java 32 on Win64/Linux64 has 2GB limit 

Consider 64 bit Java

66
SPECjbb2005: Out of Box Performance
1
Normalized to IBM SDK 5.0 32-bit Linux

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

67
Pmap & Address Space
pmap ­x `pgrep ­n java`
 java ­server ­Xms2600m ­Xmx2600m ­Xss1024k ­XX:NewSize=1400
 Address        Kbytes     RSS    Anon  Locked Mode   Mapped File
..........
0x00400000  241664  241664  241664       ­ rwx­­    [ heap ]                
0x38000000   64608    9352       ­             ­ r­x­­  libociei.so 
...........

What is the address space for the Heap 
            0x38000000 – 0x00400000 =  0x37C00000 =  892 MB
What heap is in use 241664 KB =  236 MB.

Space for BRK() to map == 892 – 236 =  656 MB before SEGV.
Do a pmap on core if you have one.
Expert preload libumem.so
Use mdb core ; ::umastat
We where able to pin down a memory leak with umem debugging.
68
Using multiple JVM's

• The services of multiple 32­bit JVM's can 
be used by most applications
> WebLogic
> Sun Java Web Server
> Oracle Application Server 
– (Hint: use Solaris Containers)
> Websphere 
– (Hint: use Solaris Containers)  

69
Java Heap & Paging
• The heap sizes are too big if memory pressure is 
causing excessive virtual memory activity. 
• Indicator: scan rate (“sr”) from “sar –g” or vmstat ­p. 
• The scan rate should be at or close to zero. 
• If the scanner kicks in for a short time but returns to 
zero, virtual memory pressure is not having a 
significant impact on your performance. 
• If the system is always scanning, you need to kill 
non­critical processes, reduce the size of your Java 
heap, or add more RAM to the system.

70
Working Set, RSS and VSZ
• If you want to increase the size of your heaps, you 
will need to determine how much RAM is available. 
• It is important to differentiate between a process’s 
working set, resident set size and virtual size. 
> The working set, the set of memory addresses that 
a program will need to use in the near future. 
> RSS, resident set size of a process is the size of a 
process’s address space that is currently in RAM. 
> VSZ, The virtual size is the size of the process’s 
memory: pages that are currently in RAM, pages 
that the operating system has paged out, and 
addresses that have been allocated but not yet 
been mapped. 
71
Resident Set Size and Virtual Size
• View the resident set size and virtual size:
> ps ­e ­orss,vsz,args | grep java | sort ­n

• Add the sizes of all of your processes’ RSS:
> ps ­e ­orss,vsz,args | awk '{printf( "%d+",
$1)}END{print 0}' | bc
> 12354976
– Will always be less than size of RAM.
• Add the sizes of all of your processes’ VSZ
> ps ­e ­orss,vsz,args | awk '{printf( "%d+",
$2)}END{print 0}'| bc
> 19383912
– You must have this amount of Swap.
72
Finding memory allocation with mdb
m db ­k
>  ::m em stat
Page Summary Pages MB %Tot
------------ ---------------- ---------------- ----
Kernel 89123 696 4%
Anon 589676 4606 28%
Exec and libs 3973 31 0%
Page cache 196119 1532 9%
Free (cachelist) 141024 1101 7%
Free (freelist) 1067444 8339 51%

Total 2087359 16307


Physical 2054319 16049

73
Heap Conclusion
• Administrator needs to observe and minimize both 
Java garbage collection and virtual memory pressure. 
> Java garbage collection should take no more than 
5% of the JVM’s CPU cycles. 
• The virtual memory scan rate should remain at zero 
most of the time. 
> If the administrator can not accomplish both goals 
on a given server, RAM needs to be added to the 
server. 
> Use a 32­bit JVM if possible
• Solaris allows you to go further with a 32­bit JVM

74
Garbage collection &
Memory Profiling

Jeff Taylor

[email protected]
75
Agenda

• Case for GC and Tuning
• Object Lifecycle
• Generational Collection
• Garbage Collectors
• JVM GC Observability Tools

76
Your Dream GC
• You would really like a GC that has
> Low GC overhead
> Low GC pause times, and
> Good space efficiency
• Unfortunately, you'll have to pick two (any 
two!)

77
New Generation Collectors
• "Serial" GC is a stop­the­world, young 
generation, copying collector which uses a 
single GC thread.
• Parallel Scavenge" is a stop­the­world, 
young generation, copying collector which 
uses multiple GC threads.
• "ParNew" is a stop­the­world, young 
generation, copying collector which uses 
multiple GC threads. "ParNew" does the 
synchronization needed so that it can run 
during the concurrent phases of CMS.
78
Old Generation Collectors
• “Serial Old" is a stop­the­world, old 
generation, mark­sweep­compact collector 
that uses a single GC thread.
• "CMS" is a mostly concurrent, old 
generation, low­pause collector.
• "Parallel Old" is an old generation, 
compacting collector that uses multiple GC 
threads.

79
GC Design Choices
• Serial v. Parallel
• Concurrent v. Stop­the­world
• Compacting v. non­compacting v. copying

80
Serial Collector
• Both young and old generation collections 
performed serially
• Mark­sweep­compact algorithm for old and 
permanent generations
• Suited to most desktop applications
• Standard option on non­server class 
machines
> ­XX:+UseSerialGC

81
Mark Sweep Compact: Old Generation

x
x Before
x

After

82
Parallel Collector
• Also known as throughput collector
• Most machines today have 
> multiple cores/CPUs
> Large amounts of memory
• Compare to machines when Java launched
> Default heap size is 64Mb

83
Parallel Collector: Young Generation

• Parallel copy collector
> Still stop­the­world
• Allocates as many threads as CPUs
> Algorithm optimized to minimize contention
• Maximize work throughput
> Work stealing
• Potential locality of reference issue
> Each thread has separate destination in 
tenured space
84
Parallel Copy Collector

• ­XX:+UseParNewGC
> Default copy collector will be used on single 
CPU machines
> Only required on pre­Java SE 5 VMs

• ­XX:ParallelGCThreads=n
> Default is number of CPUs
> Reduce for multiple application machines
> Can be used to force the parallel copy collector 
to be used on single a CPU machine

85
Parallel Scavenge: Old Generation
• Mark­sweep­compact
• Order of objects is maintained
> No locality of reference issues
• Requires multiple passes
> Mark live data
> Compute new location and move data
> Update all pointers
• Default collector on server class machines 
> Java SE 5 onwards
• ­XX:+UseParallelGC
86
Parallel Compacting Collector
• Introduced in Java SE 5 update 6
> ­XX:+UseParallelOldGC
• Same parallel copy collector for young generation
• Three phase, sliding compaction algorithm
> marking, summary, compaction
• Not all phases are currently parallel
• Can be good for UltraSPARC T1
> CMS is single threaded
> CMS cannot keep up with mutator threads
> Use parallel old, assuming pause times are 
acceptable
87
STW Parallel GC Threads
• The number of parallel GC threads is
controlled by -XX:ParallelGCThreads=<num>
• Default value assumes only one JVM per
system
• Set the parallel GC thread number according
to:
> Number of JVMs deployed on the system /
processor set / zone
> CPU chip architecture
– Multiple hardware threads per chip core, i.e.,
UltraSPARC T1 / T2

88
Parallel GC Tuning Advice
• Tune the young generation as described so far
• Try to avoid / decrease the frequency of major
GCs
• We know of customers who use the Parallel
GC in low-pause environments
> Avoid Full GCs by avoiding / minimizing
promotion
> Maximize heap size
> If the old generation is getting full, redirect load to
another machine while the Full GC is happening
– Mechanism should be there to deal with failures
89
Parallel GC Ergonomics
• The Parallel GC has ergonomics
> i.e., auto-tuning
• Ergonomics help in improving out-of-the-box
GC performance
• To get maximum performance, most customers
we know do manual tuning

90
JVM Ergonomics
• Ergonomics enables the following:
> Throughput garbage collector and Adaptive Sizing
– ­XX :+UseParallelGC                  
– ­XX:+UseAdaptiveSizePolicy
> Initial heap size of 1/64 of physical memory up to 
1Gbyte
> Maximum heap size of 1/4 of physical memory up 
to 1Gb
> Server runtime compiler (­server)
• To enable server ergonomics on 32­bit Windows, use 
the following flags:
> ­server ­Xmx1g ­XX:+UseParallelGC
> Varying the heap size 91
Using JVM Ergonomics
• Maximum pause time goal
> ­XX:MaxGCPauseMillis=n
> This is a hint, not a guarantee
> GC will adjust parameters to try and meet goal
> Can adversely effect applicaiton throughput
• Throughput goal
> ­XX:GCTimeRatio=n
> GC Time : Application time = 1 / (1 + nnn)
> e.g. ­XX:GCTimeRatio=19  (5% of time in GC)
• Footprint goal
> Only considered if first two goals are met
92
Heap Tuning Beyond Ergonomics
• Increase heap size
> Ergonomics chooses up to 1GB, some 
applications need more memory for high 
performance
> ­Xms3g ­Xmx3g
• Increase the size of the young generation
> Generally: ¼ to ½ the overall heap size
> Sizing above ½ the overall heap size is 
supported
> Only makes sense with throughput collector

93
Concurrent Mark Sweep Collector
• Low­pause or low­latency collector
• Parallel copy collector for young generation
ApplicationThreads

Stop-the-world initial mark phase


Concurrent mark phase
Concurrent pre-clean phase

Stop-the-world re-mark phase


Concurrent sweep phase
Concurrent reset phase

94
Concurrent Mark Sweep Collector
• ­XX:+UseConcMarkSweepGC
• Concurrent marking phase parallel in JDK 
6
> ­XX:ParallelCMSThreads=n
> Default is ¼ of available CPUs
• Scheduling of collection handled by GC
> Based on statistics in JVM
> Or Occupancy level of tenured generation
> ­XX:CMSInitiatingOccupancyFraction

95
Incremental CMS

Throughput 
reduced to  Marking work
50% interleaved with
application work

2 CPU Problem

96
Incremental CMS

• ­XX:+CMSIncrementalMode (off)
• ­XX:CMSIncrementalDutyCycle=n% (50)
• ­XX:CMSIncrementalDutyCycleMin=n% (10)
• ­XX:+CMSIncrementalPacing (on)
• DutyCycle of 10 and DutyCycleMin of 0 
can help certain applications

97
CMS Tuning Advice
• Tune the young generation as described so 
far
• Need to be even more careful about 
avoiding premature promotion
> Originally we were using an +AlwaysTenure 
policy
> We have since changed our mind :­)
• Promotion in CMS is expensive (free lists)
• The more often promotion / reclamation 
happens, the more likely fragmentation 
will settle in 98
CMS Tuning Advice (ii)
• We know customers who tune their 
applications to do mostly minor GCs, even 
with CMS
> CMS is used as a “safety net”, when 
applications load exceeds what they have 
provisioned for
> Schedule Full GCs at non­critical times (say, 
late at night) to “tidy up” the heap and 
minimize fragmentation

99
Fragmentation
• Two types
> External fragmentation
– No free chuck is large enough to satisfy an 
allocation
> Internal fragmentation
– Allocator rounds up allocation requests
– Free space wasted due to this rounding up
• Related: dark matter
> Free chunks too small to allocate

100
Fragmentation (ii)
• The bad news: you can never eliminate it!
> It has been proven
• The good news: you can decrease its 
likelihood
> Decrease promotion into the CMS old 
generation
> Be careful when coding
– Large objects of various sizes are the main cause
• But, when is the heap fragmented anyway?

101
Concurrent CMS GC Threads
• Number of parallel CMS threads is 
controlled by
­XX:ParallelCMSThreads=<num>
> Available in post 6 JVMs
• Trade­Off
> CMS cycle duration vs.
> Concurrent overhead during a CMS cycle

102
Permanent Generation and CMS
• To date, classes will not be unloaded by 
default from the permanent generation 
when using CMS
> Both ­XX:+CMSClassUnloadingEnabled and 
­XX:+PermGenSweepingEnabled need to be 
set to enable class unloading in CMS
> The 2nd switch is not needed in post 6u4 JVMs

103
Setting CMS Initiating Threshold
• Again, a tricky trade­off!
• Starting a CMS cycle too early
> Frequent CMS cycles
> High concurrent overhead
• Starting a CMS cycle too late
> Chance of an evacuation failure / Full GC
• Initiating heap occupancy should be 
(much) higher than the application steady­
state live size
• Otherwise, CMS will constantly do CMS 
cycles 104
Common CMS Scenarios

• Applications that promote non­trivial 
amounts of objects to the old generation
> Old generation grows at a non­trivial rate
> Very frequent CMS cycles
> CMS cycles need to start relatively early
• Applications that promote very few or even 
no objects to the old generation
> Old generation grows very slowly, if at all
> Very infrequent CMS cycles
> CMS cycles can start quite late
105
Initiating CMS Cycles
• CMS will try to automatically find the best 
initiating occupancy
> It first does a CMS cycle early to collect stats
> Then, it tries to start cycles as late as possible, 
but early enough not to run out of heap before 
the cycle completes
> It keeps collecting stats and adjusting when to 
start cycles
> Sometimes, the second cycle starts too late

106
Initiating CMS Cycles (ii)
•­
XX:CMSInitiatingOccupancyFraction=<per
cent>
> Occupancy percentage of CMS old generation 
that triggers a CMS cycle
• ­XX:+UseCMSInitiatingOccupancyOnly
> Don't use the ergonomic initiating occupancy

107
Initiating CMS Cycles (iii)
• ­XX:CMSInitiatingPermOccupancyFraction=<percent>
> Occupancy percentage of permanent 
generation that triggers a CMS cycle
> Class unloading must be enabled

108
CMS Cycle Initiation Example

• Cycle started too early:


[ParNew 390868K->296358K(773376K), 0.1882258 secs]
[CMS-initial-mark 298458K(773376K), 0.0847541 secs]
[ParNew 401318K->306863K(773376K), 0.1933159 secs]
[CMS-concurrent-mark: 0.787/0.981 secs]
[CMS-concurrent-preclean: 0.149/0.152 secs]
[CMS-concurrent-abortable-preclean: 0.105/0.183 secs]
[CMS-remark 374049K(773376K), 0.0353394 secs]
[ParNew 407285K->312829K(773376K), 0.1969370 secs]
[ParNew 405554K->311100K(773376K), 0.1922082 secs]
[ParNew 404913K->310361K(773376K), 0.1909849 secs]
[ParNew 406005K->311878K(773376K), 0.2012884 secs]
[CMS-concurrent-sweep: 2.179/2.963 secs]
[CMS-concurrent-reset: 0.010/0.010 secs]
[ParNew 387767K->292925K(773376K), 0.1843175 secs]
[CMS-initial-mark 295026K(773376K), 0.0865858 secs]
[ParNew 397885K->303822K(773376K), 0.1995878 secs]

109
CMS Cycle Initiation Example (ii)

• Cycle started too late:


[ParNew 742993K->648506K(773376K), 0.1688876 secs]
[ParNew 753466K->659042K(773376K), 0.1695921 secs]
[CMS-initial-mark 661142K(773376K), 0.0861029 secs]
[Full GC 645986K->234335K(655360K), 8.9112629 secs]
[ParNew 339295K->247490K(773376K), 0.0230993 secs]
[ParNew 352450K->259959K(773376K), 0.1933945 secs]

110
CMS Cycle Initiation Example (iii)

• This is better:
[ParNew 640710K->546360K(773376K), 0.1839508 secs]
[CMS-initial-mark 548460K(773376K), 0.0883685 secs]
[ParNew 651320K->556690K(773376K), 0.2052309 secs]
[CMS-concurrent-mark: 0.832/1.038 secs]
[CMS-concurrent-preclean: 0.146/0.151 secs]
[CMS-concurrent-abortable-preclean: 0.181/0.181 secs]
[CMS-remark 623877K(773376K), 0.0328863 secs]
[ParNew 655656K->561336K(773376K), 0.2088224 secs]
[ParNew 648882K->554390K(773376K), 0.2053158 secs]
...
[ParNew 489586K->395012K(773376K), 0.2050494 secs]
[ParNew 463096K->368901K(773376K), 0.2137257 secs]
[CMS-concurrent-sweep: 4.873/6.745 secs]
[CMS-concurrent-reset: 0.010/0.010 secs]
[ParNew 445124K->350518K(773376K), 0.1800791 secs]
[ParNew 455478K->361141K(773376K), 0.1849950 secs]

111
Start CMS Cycles Explicitly
• If relying on explicit GCs and want them to 
be concurrent, use:
> ­XX:+ExplicitGCInvokesConcurrent
– Requires a post 6 JVM
> ­XX:+ExplicitGCInvokesConcurrentAndUnloadClasses
– Requires a post 6u4 JVM
• Useful when wanting to cause references / 
finalizers to be processed

112
Consider Serial GC Too
• For small heaps, pause time requirements 
may be achieved using the Serial GC
> Consider the Serial GC for heaps up to 128MB 
to 256MB
> The Serial GC is easier to tune than CMS
> What is learned from tuning the Serial GC is 
useful for the initial CMS tuning

113
Collector Summary
• Many factors affect Java performance
> Application code
> App server settings (Java EE only)
> JVM settings
• Understanding the JVM is essential to 
improving performance
• You MUST profile your application!
• If possible, always upgrade to the latest 
version of the JVM
114
Garbage collection &
Memory Profiling

Jeff Taylor

[email protected]
115
AGENGA – Misc:
• jps, jinfo, jstack, jmap
• hprof
• Large Page Sizes
• Thread local allocation
• NUMA
• Tiered Compilation
• Sun Java Real-Time System

116
jps
• $ jps
• 7875 JConsole
• 20374 Jps
• 16287
• 7746 Main
• 20218 Main
• $ jps -l
• 7875 sun.tools.jconsole.JConsole
• 16287
• 20510 sun.tools.jps.Jps
• 7746 org.netbeans.Main
• 20218 twoday.Main
117
jmap -histo
• $ jmap -histo 20218 | head -10

• num #instances #bytes class name
• ----------------------------------------------
• 1: 25579 522962504 [I
• 2: 7698 929584 <methodKlass>
• 3: 7698 884104 <constMethodKlass>
• 4: 24916 797312
twoday.HeapExampleApp$Big

118
jinfo
• $ jinfo -flags 20218
• Attaching to process ID 20218, please wait...
• Debugger attached successfully.
• Server compiler detected.
• JVM version is 14.0-b16

• -Xms1g -Xmx1g -Xmn500m -XX:SurvivorRatio=3
-XX:TargetSurvivorRatio=90 -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-Xloggc:/home/user/Desktop/SoftwareAG/HeapExam
pleApp/log/20090907_191929/GC_log.txt -XX:
+PrintTenuringDistribution

119
jstack
• "main" prio=10 tid=0x0000000040113000 nid=0x4efb waiting on
condition [0x0000000041dd8000]
• java.lang.Thread.State: TIMED_WAITING (sleeping)
• at java.lang.Thread.sleep(Native Method)
• at twoday.HeapExampleApp.infiniteLoop(HeapExampleApp.java:83)
• at twoday.Main.main(Main.java:20)


• "Concurrent Mark-Sweep GC Thread" prio=10
tid=0x000000004016e000 nid=0x4efe runnable

120
jmap -heap
• $ jmap -heap 20218
• Attaching to process ID 20218, please wait...
• Debugger attached successfully.
• Server compiler detected.
• JVM version is 14.0-b16

• using parallel threads in the new generation.
• using thread-local object allocation.
• Concurrent Mark-Sweep GC

• Heap Configuration:
• MaxHeapSize = 1073741824 (1024.0MB)

121
HPROF
• Command line tool
• Supplied with JDK
• CPU usage
• Heap allocation statistics
• Monitor contention profiles
• Report complete heap dumps
• States of monitors and threads

122
HPROF
• $ java -agentlib:hprof=help


• HPROF: Heap and CPU Profiling Agent (JVMTI Demonstration
Code)


• hprof usage: java -agentlib:hprof=[help]|
[<option>=<value>, ...]


• Option Name and Value Description Default
• --------------------- ----------- -------
• heap=dump|sites|all heap profiling all
• cpu=samples|times|old CPU usage off
• monitor=y|n monitor contention n

123
HPROF Example

• $ java -server
-agentlib:hprof=heap=sites
-Xms1g -Xmx1g ...
• On exit:
> Dumping allocation sites ... done.
> Creates java.hprof.txt
> percent live alloc'ed stack class
> rank self accum bytes objs bytes objs trace name
> 1 99.60% 99.60% 307477984 14996 601710384 29346 301321 int[]
> 2 0.16% 99.76% 479776 14993 939072 29346 301320
twoday.HeapExampleApp$Big
> 3 0.03% 99.78% 80024 1 80024 1 301261 java.lang.Object[]
>

124
125
Large Page Sizes
• ­XX:+UseLargePages
> Cross platform (only on by default for Solaris)
> Improves utilization of TLB
> Kernel support in Linux 2.6 and Windows 2003 
server
> 8m SPARC
> 4m x86
> 2m x86_64
> 256m supported for UltraSPARC T1
• ­XX:LargePageSizeInBytes=n
> Set to 2m for AMD Opteron systems
126
Multi-processors/cores and Eden Allocation
• Problem: multiple threads creating objects
> All trying to access eden simultaneously
> Multiple CPU machine: contention
• Solution: Thread local allocation
> -XX:+UseTLAB

127
Thread Local Allocation

T2
T1 T2 T3
(Resized)

Eden Space

Allocation Pointers

­XX:TLABSize= size­in­bytes
­XX:ResizeTLAB 
128
NUMA
• Non­Uniform Memory Access
> Applicable to most SPARC, Opteron, more 
recently Intel platforms
• ­XX:+UseNUMA
• Splits the young generation into partitions
> Each partition “belongs” to a CPU
• Allocates new objects into the partition 
that belongs to the allocating CPU
• Big win for some applications
129
Random Things
• Consider disabling explicit GC
> ­XX:+DisableExplicitGC
• Increase size of permanent generation
> If lots of classes loaded at start
> Can improve startup time
> e.g. NetBeans sets permanent heap size to 
20MB
• ­XX:+AggressiveOpts
> Go fast meta­option (can change between 
releases)

130
Tiered Compilation
• New in JRE 6
• HotSpot has two compilers, ­client & ­server
• JVM starts with client compiler
> Fast warmup
• Switches to server compiler
> Better optimisation
• ­XX:+TieredCompilation

131
Sun Java Real-Time System
• The Real-Time Specification for Java
(RTSJ) – JSR-001
• RTSJ provides an API set, semantic JVM
enhancements, RTGC, and JVM-to-OS
layer modifications to satisfy real-time
requirements for Java application
development

132
Inside the Java Real-Time System
- Real-Time Garbage Collector
• Based on Roger Henriksson's PhD Thesis, 
Lund University, Sweden
• Operating principals: 
> GC threads run at lower priority than critical 
real­time threads
> Real­time threads are unaffected by GC activity
> Non­critical (non­real­time) threads pay GC 
“tax”
• Pro: Very deterministic, with little to no GC­borne 
latency
• Con: Non­real­time threads bear GC burden with 
loss of overall throughput
133
Resources
• java.sun.com/performance/reference
• visualvm.dev.java.net
• http://www.j2ee.me/developer/technicalArti
cles/Programming/HPROF.html

134
Why Solaris is a better OS
• What is the biggest ­Xmx you can start. 
• 32bit java on Solaris can use 3.6 GB for
> Java Heap 3.2 GB max
> Threads
> Native Heap
• Windows 32 it address space is limited to 
2.0 GB, less mappings 
> Java Heap 1.3 GB max
• Linux 32 it address space is limited to 2.0 
GB, less mappings 
> Java Heap 1.6 GB max 135
Garbage collection &
Memory Profiling

Jeff Taylor

[email protected]
136

You might also like