Most Java performance problems in production are not slow algorithms — they are memory and garbage-collection behavior: pauses that spike latency, heaps sized wrong for the container, allocation rates that keep the collector busy. This deep dive is a practical guide to JVM tuning for production services: choosing a garbage collector, sizing the heap (especially in containers), reading GC logs, and profiling with the tools that actually find the problem.
-XX:MaxRAMPercentage, not a hardcoded -Xmx, and leave headroom for non-heap memory. Turn on GC logging in production. Switch to ZGC only when you need consistently sub-millisecond pauses on large heaps. Find the real bottleneck with JFR and async-profiler before changing flags.The JVM allocates objects on the heap and reclaims unreachable ones automatically. The key insight behind every modern collector is the generational hypothesis: most objects die young. So the heap is split into a young generation (where new objects live and most are collected cheaply in fast “minor” GCs) and an old generation (for objects that survive long enough to be promoted, collected by costlier “major” GCs). Tuning is largely about keeping short-lived garbage in the young gen and not over-promoting.
Outside the heap sits significant non-heap memory: metaspace (class metadata), thread stacks, code cache, and direct/native buffers (used heavily by Netty, NIO, and gRPC). Forgetting this is the #1 cause of containers getting OOM-killed even though the heap “looks fine.”
| Collector | Optimizes for | Use when |
|---|---|---|
| G1 (default) | Balance of throughput and pause time | Almost all services — start here |
| ZGC | Ultra-low, predictable pauses (sub-ms), large heaps | Latency-critical services, big heaps (tens of GB+) |
| Parallel | Raw throughput, pauses don’t matter | Batch jobs / data pipelines |
| Serial | Tiny footprint, single thread | Small CLIs, constrained containers |
G1 has been the default since Java 9 and is the right answer for the vast majority of microservices — it targets a pause goal (default ~200ms) and usually meets it without hand-tuning. Reach for ZGC when tail latency is the product (trading some throughput and extra memory for pauses that stay sub-millisecond even on huge heaps). Use Parallel for throughput-bound batch work where a longer pause is irrelevant.
# G1 is default; to choose explicitly:
-XX:+UseG1GC
# Low-pause, large heap:
-XX:+UseZGC -XX:+ZGenerational
# Throughput batch:
-XX:+UseParallelGC
On modern JDKs (11+, and much improved since) the JVM is container-aware: it reads the cgroup memory limit rather than the host’s total RAM. The mistake is still hardcoding -Xmx to a number that doesn’t track the container limit. Prefer percentage flags so the heap scales when you resize the pod:
# Let the heap use 75% of the container memory limit;
# the rest is headroom for metaspace, threads, direct buffers.
-XX:MaxRAMPercentage=75.0
The headroom matters: if you give the heap 100% of a 1 GB container, the first thread stack or direct buffer allocation pushes total RSS over the limit and the kernel OOM-kills the pod — which looks like a crash, not a memory problem, until you read the exit code (137). A common starting split is ~75% heap, 25% everything else, then verify with real traffic.
You cannot tune what you cannot see. Enable unified GC logging in production — it is cheap and invaluable when latency spikes:
-Xlog:gc*:file=/var/log/app/gc.log:time,uptime,level,tags:filecount=5,filesize=10m
What to look for: pause durations (are they within your latency budget?), frequency (frequent young GCs mean a high allocation rate), promotion (lots of objects surviving to old gen suggests the young gen is too small or objects live too long), and full GCs (with G1 these should be rare — frequent full GCs signal heap pressure or a leak). Tools like GCeasy or JDK Mission Control turn these logs into readable charts.
Before you touch a single flag, profile — most “GC problems” are actually allocation problems in application code (an unbounded cache, a per-request object explosion, string churn). The modern toolkit:
.jfr file you open in JDK Mission Control.jmap / -XX:+HeapDumpOnOutOfMemoryError) analyzed in Eclipse MAT for leak hunting — find the dominator tree and the GC roots holding memory.# Start a 60s flight recording on a running JVM
jcmd <pid> JFR.start duration=60s filename=rec.jfr
# Allocation flame graph with async-profiler
./profiler.sh -d 30 -e alloc -f alloc.html <pid>
ThreadLocal never cleared. Find it with a heap dump.MaxRAMPercentage or cap direct memory / thread count.Java runs interpreted at first and the JIT compiler optimizes hot paths over time, so a freshly started JVM is slower until it warms up — relevant for autoscaling and serverless, where new instances briefly underperform. Options: GraalVM native images eliminate warmup and JVM startup almost entirely (great for functions/fast-scaling services); AppCDS (class-data sharing) and CRaC (coordinated restore at checkpoint) cut startup time while keeping the JVM. For autoscaling, account for warmup so a scale-up event doesn’t briefly serve slow responses.
MaxRAMPercentage; keep G1.JVM tuning is a measurement discipline, not a list of magic flags. Keep G1 unless data says otherwise, size the heap from the container limit with headroom for non-heap memory, always run with GC logging on, and use JFR and async-profiler to find the real bottleneck — which is usually allocation in your own code. The teams that run Java fast in production are the ones who profile before they tune.
Which garbage collector should I use for a Java microservice?
G1 (the default since Java 9) is the right choice for most services, balancing throughput and pause times. Use ZGC when you need consistently very low pauses on large heaps; use the Parallel collector for batch jobs that care about raw throughput over latency.
How do I set JVM heap size in a container?
On modern JDKs the JVM is container-aware and sizes the heap from the container memory limit. Prefer percentage flags like -XX:MaxRAMPercentage=75 over a fixed -Xmx so the heap scales with the limit, and always leave headroom for non-heap (metaspace, threads, direct buffers).