Contention & Performance

Hot Counter Under Contention

Hot Counter Under Contention: practice a Java concurrency bug with symptoms like Correct results but poor throughput, Counter becomes hotspot,...

  • Hot counters
  • Contention
  • Counters
  • Java
  • Intermediate

Production symptoms

  • Correct results but poor throughput
  • Counter becomes hotspot
  • Performance degrades under concurrency

Failure scenario

Code

Java example
class Metrics {
    private long requests;

    synchronized void incrementRequests() {
        requests++;
    }

    synchronized long requests() {
        return requests;
    }
}

Prod Symptoms

A high-volume telemetry counter is correct, but the counter update becomes one of the hottest shared locations in the service.

Key signal: Correctness is not the issue. A single serialized counter can still be a scalability hotspot.

  • The final count is right
  • Throughput drops as thread count rises
  • Many threads spend time waiting to update one counter
  • The issue appears in metrics, request accounting, or per-endpoint telemetry paths
  • Thread dumps alone may understate the problem because each wait is short but frequent

Run Locally

  • Both counters produce the same final count
  • The synchronized counter often takes much longer with multiple threads
  • LongAdder spreads contention across cells and sums them when read
  • Exact timing depends on CPU, JVM, and background load

Inspect hints

  • Timing comparison is more useful than one thread dump
  • Profilers may show time around monitor enter/exit or contended synchronization
  • In production, watch whether a metric update sits on a very hot request path
Run
javac HotCounterContentionDemo.java
java HotCounterContentionDemo
HotCounterContentionDemo.java
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.atomic.LongAdder;

public class HotCounterContentionDemo {
    private static final int THREADS =
            Math.max(4, Runtime.getRuntime().availableProcessors());
    private static final int INCREMENTS_PER_THREAD = 1_000_000;

    private static long synchronizedCounter;

    public static void main(String[] args) throws Exception {
        long syncMillis = runSynchronizedCounter();
        long adderMillis = runLongAdderCounter();

        System.out.println("threads = " + THREADS);
        System.out.println("synchronized ms = " + syncMillis);
        System.out.println("LongAdder ms    = " + adderMillis);
    }

    private static long runSynchronizedCounter() throws Exception {
        synchronizedCounter = 0;
        long start = System.nanoTime();
        runInParallel(() -> {
            for (int i = 0; i < INCREMENTS_PER_THREAD; i++) {
                incrementSynchronized();
            }
        });
        long elapsed = (System.nanoTime() - start) / 1_000_000;
        System.out.println("synchronized count = " + synchronizedCounter);
        return elapsed;
    }

    private static synchronized void incrementSynchronized() {
        synchronizedCounter++;
    }

    private static long runLongAdderCounter() throws Exception {
        LongAdder counter = new LongAdder();
        long start = System.nanoTime();
        runInParallel(() -> {
            for (int i = 0; i < INCREMENTS_PER_THREAD; i++) {
                counter.increment();
            }
        });
        long elapsed = (System.nanoTime() - start) / 1_000_000;
        System.out.println("LongAdder count = " + counter.sum());
        return elapsed;
    }

    private static void runInParallel(Runnable task) throws Exception {
        ExecutorService pool = Executors.newFixedThreadPool(THREADS);
        List<Future<?>> futures = new ArrayList<>();
        for (int i = 0; i < THREADS; i++) {
            futures.add(pool.submit(task));
        }
        for (Future<?> future : futures) {
            future.get();
        }
        pool.shutdown();
    }
}

Note: This demo compares a synchronized counter with LongAdder under the same increment load. Treat the timings as a shape, not as a benchmark.

Diagnosis and fix

Explanation

A synchronized counter gives correct results by serializing every increment through one monitor.

Key signal: Use the counter type that matches the workload: exact coordination value or scalable metric update.

  • Every thread must acquire the same monitor for every increment
  • Under heavy concurrency, the monitor becomes a shared bottleneck
  • AtomicLong removes the monitor but can still have a hot atomic update path
  • LongAdder improves high-contention throughput by spreading updates internally
  • The tradeoff is that sum() is a snapshot-style aggregate, which is usually fine for telemetry but not for every correctness invariant

How to Diagnose

This is mostly a performance diagnosis. Compare implementations and measure the hot path.

  • Check whether the counter is updated by many threads at high frequency
  • Compare throughput with synchronized, AtomicLong, and LongAdder when appropriate
  • Use a profiler or Java Flight Recorder for real services
  • Look for short but frequent contention rather than one long BLOCKED state
  • Confirm the counter is telemetry-style before switching to LongAdder
  • Keep AtomicLong or another exact protocol when reads must coordinate decisions such as limits, ownership, or state transitions
Optional thread inspection
jps
jcmd <pid> Thread.print
Example timing shape
synchronized count = 8000000
LongAdder count = 8000000
threads = 8
synchronized ms = 620
LongAdder ms    = 95

Note: The exact numbers are machine-dependent. The important signal is correct counts with a large throughput gap under contention.

How to Fix

  • Use LongAdder for high-contention metric-style counters
  • Use AtomicLong for lower contention or when a single linearizable value matters
  • Avoid synchronizing the whole metric path when only the counter is shared
  • Do not use LongAdder for check-then-act decisions that require an exact current value
  • Benchmark the real workload before and after the change
LongAdderCounterFixed.java
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.atomic.LongAdder;

public class LongAdderCounterFixed {
    private static final int THREADS =
            Math.max(4, Runtime.getRuntime().availableProcessors());
    private static final int INCREMENTS_PER_THREAD = 1_000_000;

    private static final LongAdder requests = new LongAdder();

    public static void main(String[] args) throws Exception {
        ExecutorService pool = Executors.newFixedThreadPool(THREADS);
        List<Future<?>> futures = new ArrayList<>();

        for (int i = 0; i < THREADS; i++) {
            futures.add(pool.submit(() -> {
                for (int j = 0; j < INCREMENTS_PER_THREAD; j++) {
                    requests.increment();
                }
            }));
        }

        for (Future<?> future : futures) {
            future.get();
        }

        long expected = (long) THREADS * INCREMENTS_PER_THREAD;
        System.out.println("expected = " + expected);
        System.out.println("actual   = " + requests.sum());
        pool.shutdown();
    }
}

Note: LongAdder is a strong default for high-volume counters used for metrics and telemetry, not for every counter-shaped invariant.