CS Fundamentals

Garbage Collection Explained: How Languages Manage Memory

Most modern languages free you from manual memory management, but the garbage collector doing that work has costs — pauses, throughput overhead, and tuning parameters that determine whether your service handles load smoothly or stutters unpredictably.

Published June 25, 2026

In C and C++, the programmer explicitly allocates memory with malloc or new and releases it with free or delete. Forget to free and you leak memory. Free too early and you have a dangling pointer. Free twice and you corrupt the heap. Garbage collection replaces this manual discipline with an automatic system that tracks which memory is still reachable from the program and reclaims the rest.

The trade-off is not free. A garbage collector has to do work your program is not doing, and some of that work requires stopping your program temporarily. Understanding how collectors work helps you write code that works with the GC rather than against it, and helps you diagnose the latency spikes that GC pauses cause in production.

What makes memory "garbage"?

Memory is garbage when no part of the running program can reach it through any chain of references. Reachability starts from roots: local variables on the stack, global variables, and CPU registers. If you can walk from a root through a sequence of pointer dereferences and reach an object, that object is live. If you cannot, it is unreachable and safe to collect.

This is a stronger condition than "nobody is using it." A memory leak in a garbage-collected language is almost always an object that is technically reachable — held in a list, a cache, or an event listener — but will never actually be used again. The GC cannot collect it because it follows references, not intent.

Mark-and-sweep

The foundational algorithm is mark-and-sweep. The collector runs in two phases. In the mark phase, it traverses all reachable objects starting from the roots and marks each one as live. In the sweep phase, it scans the entire heap and reclaims any object that was not marked.

# Conceptual walk-through (not real code)
def mark_and_sweep(roots, heap):
    live = set()

    # Mark phase: traverse reachable objects
    worklist = list(roots)
    while worklist:
        obj = worklist.pop()
        if obj not in live:
            live.add(obj)
            worklist.extend(obj.references())

    # Sweep phase: reclaim unreachable objects
    for obj in heap:
        if obj not in live:
            heap.free(obj)

The classic problem with naive mark-and-sweep is fragmentation: after many collections, free memory is scattered in small chunks between live objects. Allocating a large object may fail even when total free memory is sufficient. Modern collectors address this with compaction, moving live objects together to consolidate free space — at the cost of updating every pointer that referred to a moved object.

Stop-the-world pauses

The simplest way to run a mark phase is to stop all application threads first. With all threads paused, no references change during the traversal and the collector sees a consistent snapshot. This is a stop-the-world (STW) pause. For small heaps the pause is imperceptible. For a JVM heap of 16 GB, a full GC pause can last several seconds — enough to trip load-balancer health checks, timeout waiting database connections, and cause cascading failures.

Modern collectors use various techniques to reduce or eliminate STW pauses:

Concurrent marking: run the mark phase alongside application threads, using a write barrier to track any references that change during marking.
Incremental collection: break the work into short slices interleaved with application execution.
Parallel collection: use multiple GC threads to do the work faster (shortens pauses but does not eliminate them).

The JVM's G1 and ZGC collectors, the Go runtime's concurrent GC, and .NET's background GC all combine these techniques to keep pauses under a few milliseconds even on large heaps.

Reference counting

An alternative approach is reference counting: each object stores a count of how many references point to it. When a reference is created, the count increments. When a reference drops out of scope or is reassigned, the count decrements. When the count reaches zero, the object is immediately freed.

# Python's CPython implementation uses reference counting
import sys

a = [1, 2, 3]
print(sys.getrefcount(a))   # 2 (one for `a`, one for the getrefcount argument)

b = a
print(sys.getrefcount(a))   # 3

del b
print(sys.getrefcount(a))   # 2 again

Reference counting has attractive properties: memory is freed immediately when it becomes unreachable, pauses are distributed across the program rather than concentrated in a single GC cycle, and the runtime overhead is predictable. CPython uses it as its primary memory management strategy.

The fundamental weakness is cyclic references. If object A holds a reference to B, and B holds a reference back to A, both counts stay at one even when the rest of the program can no longer reach either. CPython supplements its reference counting with a cycle-detecting collector that specifically hunts for these isolated reference cycles.

Rust avoids garbage collection entirely: its ownership and borrowing system proves at compile time that memory is freed exactly once at the right time. The runtime cost is zero; the cost is a stricter type system that requires upfront thought about ownership.

Generational garbage collection

The generational hypothesis, backed by empirical data across many programs, states that most objects die young. A string built to format a log message, a request object created per HTTP call, a temporary list used in a sort — these are allocated, used briefly, and then become garbage. A small number of objects (database connections, caches, module-level constants) survive for the lifetime of the program.

Generational collectors exploit this by dividing the heap into generations, typically a small young (or nursery) generation and one or more old generations. New objects are allocated in the young generation. Collection of the young generation is fast because it is small, and most objects there are garbage anyway — you collect a lot of garbage for a small scan. Objects that survive several collections are promoted to the old generation, which is collected less frequently.

The JVM's generational model is explicit in its collector names. The young generation uses a stop-the-world copying collector (very fast for small spaces). The old generation uses a more sophisticated algorithm that tolerates larger pauses. G1 divides both generations into equal-sized regions and collects whichever regions have the most garbage first (hence "Garbage First").

Writing GC-friendly code

Most of the time you should not think about the GC and trust that it handles allocation correctly. But a few patterns can cause measurable problems at scale.

Allocation rate matters more than live set size. A GC that collects a 100 MB young generation 50 times per second is doing more work than one that collects a 2 GB old generation once per minute. If your code allocates millions of small, short-lived objects in hot paths, try to reuse them with object pools or preallocate slices with known capacity.

# Wasteful: building a new list for each call
def process_items(items):
    result = []          # new allocation every call
    for item in items:
        result.append(transform(item))
    return result

# Better in tight loops: reuse a preallocated buffer
result_buffer = []
def process_items_fast(items, buf):
    buf.clear()
    for item in items:
        buf.append(transform(item))
    return buf

Memory leaks in GC languages are usually reference retention. Common patterns: storing objects in a module-level dict or list without a removal path; event listeners or callbacks that hold a reference to a large object; caches without a size bound or eviction policy. Use weak references (weakref in Python, WeakHashMap in Java) where you want the GC to collect an object even if a cache still holds a reference to it.

Understand your runtime's GC tuning options. For the JVM, the key levers are heap size (-Xmx), young generation size (-Xmn), and collector choice (-XX:+UseG1GC, -XX:+UseZGC). Go exposes a GOGC environment variable that controls how aggressively it collects relative to live heap size. Defaults are sensible starting points but not optimal for all workloads — a latency-sensitive service usually wants smaller, more frequent collections over large, infrequent ones.

When GC is not the answer

For systems where deterministic, sub-millisecond latency is a hard requirement — real-time audio processing, high-frequency trading, embedded systems — GC pauses are often unacceptable regardless of tuning. These domains use C, C++, or Rust, where memory management is explicit and predictable. The trade-off is programmer responsibility: all the bugs that GC prevents (leaks, use-after-free, double-free) become possible again and must be prevented through discipline, tooling (Valgrind, AddressSanitizer), and language features (Rust's borrow checker).

For the vast majority of application development — web services, data pipelines, command-line tools — a modern GC adds a few percent overhead and removes an entire category of bugs. The pauses become relevant only at scale, and even then, choosing the right collector and tuning a handful of parameters usually brings them under control.