JVM Performance Tuning for Production Java: GC, Heap & Profiling

Most Java performance problems in production are not slow algorithms — they are memory and garbage-collection behavior: pauses that spike latency, heaps sized wrong for the container, allocation rates that keep the collector busy. This deep dive is a practical guide to JVM tuning for production services: choosing a garbage collector, sizing the heap (especially in containers), reading GC logs, and profiling with the tools that actually find the problem.

TL;DR: Default to G1 and don’t tune blindly — measure first. Size the heap from the container limit with -XX:MaxRAMPercentage, not a hardcoded -Xmx, and leave headroom for non-heap memory. Turn on GC logging in production. Switch to ZGC only when you need consistently sub-millisecond pauses on large heaps. Find the real bottleneck with JFR and async-profiler before changing flags.

Tailor your resume to a senior Java role →

How the JVM manages memory

The JVM allocates objects on the heap and reclaims unreachable ones automatically. The key insight behind every modern collector is the generational hypothesis: most objects die young. So the heap is split into a young generation (where new objects live and most are collected cheaply in fast “minor” GCs) and an old generation (for objects that survive long enough to be promoted, collected by costlier “major” GCs). Tuning is largely about keeping short-lived garbage in the young gen and not over-promoting.

Outside the heap sits significant non-heap memory: metaspace (class metadata), thread stacks, code cache, and direct/native buffers (used heavily by Netty, NIO, and gRPC). Forgetting this is the #1 cause of containers getting OOM-killed even though the heap “looks fine.”

Choosing a garbage collector

Collector	Optimizes for	Use when
G1 (default)	Balance of throughput and pause time	Almost all services — start here
ZGC	Ultra-low, predictable pauses (sub-ms), large heaps	Latency-critical services, big heaps (tens of GB+)
Parallel	Raw throughput, pauses don’t matter	Batch jobs / data pipelines
Serial	Tiny footprint, single thread	Small CLIs, constrained containers

G1 has been the default since Java 9 and is the right answer for the vast majority of microservices — it targets a pause goal (default ~200ms) and usually meets it without hand-tuning. Reach for ZGC when tail latency is the product (trading some throughput and extra memory for pauses that stay sub-millisecond even on huge heaps). Use Parallel for throughput-bound batch work where a longer pause is irrelevant.

# G1 is default; to choose explicitly:
-XX:+UseG1GC
# Low-pause, large heap:
-XX:+UseZGC -XX:+ZGenerational
# Throughput batch:
-XX:+UseParallelGC

Sizing the heap — especially in containers

On modern JDKs (11+, and much improved since) the JVM is container-aware: it reads the cgroup memory limit rather than the host’s total RAM. The mistake is still hardcoding -Xmx to a number that doesn’t track the container limit. Prefer percentage flags so the heap scales when you resize the pod:

# Let the heap use 75% of the container memory limit;
# the rest is headroom for metaspace, threads, direct buffers.
-XX:MaxRAMPercentage=75.0

The headroom matters: if you give the heap 100% of a 1 GB container, the first thread stack or direct buffer allocation pushes total RSS over the limit and the kernel OOM-kills the pod — which looks like a crash, not a memory problem, until you read the exit code (137). A common starting split is ~75% heap, 25% everything else, then verify with real traffic.

Reading GC logs

You cannot tune what you cannot see. Enable unified GC logging in production — it is cheap and invaluable when latency spikes:

-Xlog:gc*:file=/var/log/app/gc.log:time,uptime,level,tags:filecount=5,filesize=10m

What to look for: pause durations (are they within your latency budget?), frequency (frequent young GCs mean a high allocation rate), promotion (lots of objects surviving to old gen suggests the young gen is too small or objects live too long), and full GCs (with G1 these should be rare — frequent full GCs signal heap pressure or a leak). Tools like GCeasy or JDK Mission Control turn these logs into readable charts.

Finding the real problem: profiling

Before you touch a single flag, profile — most “GC problems” are actually allocation problems in application code (an unbounded cache, a per-request object explosion, string churn). The modern toolkit:

JDK Flight Recorder (JFR) — built into the JVM, near-zero overhead, always-on capable. It records allocation, GC, locks, I/O, and method profiles to a .jfr file you open in JDK Mission Control.
async-profiler — low-overhead sampling profiler that produces flame graphs for CPU and allocation, with accurate native stacks. The go-to for “where is the time/memory going.”
Heap dumps (jmap / -XX:+HeapDumpOnOutOfMemoryError) analyzed in Eclipse MAT for leak hunting — find the dominator tree and the GC roots holding memory.

# Start a 60s flight recording on a running JVM
jcmd <pid> JFR.start duration=60s filename=rec.jfr

# Allocation flame graph with async-profiler
./profiler.sh -d 30 -e alloc -f alloc.html <pid>

Common culprits and fixes

High allocation rate → frequent young GCs. Fix the code (reuse buffers, avoid per-request allocations, stream instead of materializing) before enlarging the young gen.
Memory leak → old gen grows until full GCs thrash, then OOM. Usually an unbounded collection, a cache without eviction, or a ThreadLocal never cleared. Find it with a heap dump.
Container OOM-kill (exit 137) with a healthy heap → non-heap memory (direct buffers, threads, metaspace) exceeded the limit. Lower MaxRAMPercentage or cap direct memory / thread count.
Long pauses on a big heap → switch G1 → ZGC, or reduce live-set size.

Startup, warmup, and the JIT

Java runs interpreted at first and the JIT compiler optimizes hot paths over time, so a freshly started JVM is slower until it warms up — relevant for autoscaling and serverless, where new instances briefly underperform. Options: GraalVM native images eliminate warmup and JVM startup almost entirely (great for functions/fast-scaling services); AppCDS (class-data sharing) and CRaC (coordinated restore at checkpoint) cut startup time while keeping the JVM. For autoscaling, account for warmup so a scale-up event doesn’t briefly serve slow responses.

A sane tuning workflow

Set a clear goal (p99 latency budget, throughput target, memory ceiling).
Enable GC logging and JFR in production.
Right-size the container and use MaxRAMPercentage; keep G1.
Measure under realistic load; read the GC log and a flame graph.
Fix application allocation/leaks first; change collectors/flags only with evidence.
Change one thing at a time and re-measure.

Takeaways

JVM tuning is a measurement discipline, not a list of magic flags. Keep G1 unless data says otherwise, size the heap from the container limit with headroom for non-heap memory, always run with GC logging on, and use JFR and async-profiler to find the real bottleneck — which is usually allocation in your own code. The teams that run Java fast in production are the ones who profile before they tune.

Frequently asked questions

Which garbage collector should I use for a Java microservice?
G1 (the default since Java 9) is the right choice for most services, balancing throughput and pause times. Use ZGC when you need consistently very low pauses on large heaps; use the Parallel collector for batch jobs that care about raw throughput over latency.

How do I set JVM heap size in a container?
On modern JDKs the JVM is container-aware and sizes the heap from the container memory limit. Prefer percentage flags like -XX:MaxRAMPercentage=75 over a fixed -Xmx so the heap scales with the limit, and always leave headroom for non-heap (metaspace, threads, direct buffers).

Land your next Java role — tailor your resume with AI →