CS Fundamentals

Common Data Structures Explained: Arrays, Linked Lists, Hash Tables, and Trees

Four data structures underpin the vast majority of production code: arrays, linked lists, hash tables, and binary trees. Understanding their internal layout, time complexities, and failure modes tells you which one to reach for without having to guess.

Published June 28, 2026

Choosing the wrong data structure does not usually cause a crash — it causes a performance cliff that only appears at scale. An O(n) lookup that runs in 0.1 ms on a list of 100 items runs in 10 seconds on a list of 100,000. The cost of the wrong choice compounds silently until it does not.

Arrays

An array stores elements in contiguous memory. Because the base address and element size are both known, the runtime can jump directly to any position with a single calculation: address = base + index * element_size. This makes random access O(1) — the fastest possible.

# Python list is a dynamic array under the hood
items = [10, 20, 30, 40]

# O(1): direct index access
value = items[2]       # 30

# O(1) amortized: append to end (occasionally triggers a resize and copy)
items.append(50)

# O(n): insert at position 0 shifts every element right
items.insert(0, 5)

# O(n): search by value requires scanning
found = 30 in items

The weakness of arrays is insertion and deletion in the middle. Inserting at index 0 requires shifting every subsequent element one position to the right, which costs O(n) time. If your workload involves many insertions at arbitrary positions, an array is the wrong choice.

Dynamic arrays (Python list, Java ArrayList, C++ vector) handle growth automatically by allocating a new, larger buffer and copying when the array fills. Appends are O(1) amortized because copies happen infrequently enough that their cost averages out across many appends.

Linked Lists

A linked list stores elements in nodes scattered through memory. Each node holds a value and a pointer to the next node. There is no index math — to reach position n, you must follow n pointers from the head.

class Node:
    def __init__(self, value):
        self.value = value
        self.next = None

class LinkedList:
    def __init__(self):
        self.head = None

    def prepend(self, value):
        # O(1): create a new head, point it at the old head
        node = Node(value)
        node.next = self.head
        self.head = node

    def find(self, value):
        # O(n): must walk the chain
        current = self.head
        while current:
            if current.value == value:
                return current
            current = current.next
        return None

Linked lists make insertion and deletion at a known position O(1) — just update two pointers. The cost is that reaching that position in the first place requires O(n) traversal. They also carry a constant overhead per element (the pointer) and have poor cache locality because nodes are scattered in memory, which matters for performance on modern CPUs.

In practice, linked lists are most useful as the internal building block of other structures (queues, stacks, LRU caches). For most application-level code, a dynamic array outperforms a linked list even for operations that theoretically favor linked lists, because arrays are cache-friendly.

Hash Tables

A hash table stores key-value pairs by mapping each key to a bucket index using a hash function. A well-distributed hash function gives O(1) average-case lookup, insertion, and deletion — independent of the number of stored items.

# Python dict is a hash table
cache = {}

# O(1) average: compute hash, find bucket, store
cache["user:42"] = {"name": "Alice", "role": "admin"}

# O(1) average: same hash computation to retrieve
user = cache.get("user:42")

# O(1) average: same to remove
del cache["user:42"]

# Collision example: two keys can hash to the same bucket
# Python dicts handle this with open addressing (probing)
# The load factor (filled / total capacity) is kept below ~0.7
# by resizing before it degrades to O(n)

Performance degrades to O(n) in the worst case if many keys collide in the same bucket, though a good hash function makes this rare with real data. The other practical concern is load factor: as a hash table fills, collisions increase. Most implementations resize automatically when the load factor exceeds a threshold, which costs O(n) at that moment but keeps amortized cost at O(1).

Hash tables do not preserve insertion order in most languages (Python dicts do, as of 3.7, but this is an implementation detail). They also cannot efficiently answer range queries ("give me all keys between A and B") because hashing deliberately destroys the ordering of keys. For ordered operations, use a tree.

Binary Trees and BSTs

A binary tree is a linked structure where each node has at most two children (left and right). A binary search tree (BST) adds an ordering invariant: every key in a node's left subtree is smaller, and every key in the right subtree is larger. This invariant allows O(log n) search in a balanced tree.

class BSTNode:
    def __init__(self, key):
        self.key = key
        self.left = None
        self.right = None

def insert(node, key):
    # O(log n) for a balanced tree
    if node is None:
        return BSTNode(key)
    if key < node.key:
        node.left = insert(node.left, key)
    elif key > node.key:
        node.right = insert(node.right, key)
    return node

def search(node, key):
    if node is None or node.key == key:
        return node
    if key < node.key:
        return search(node.left, key)
    return search(node.right, key)

An unbalanced BST degenerates into a linked list when keys are inserted in sorted order (every new node becomes a right child, producing a chain). Self-balancing variants — AVL trees, red-black trees — automatically restructure after insertions and deletions to maintain O(log n) height. Most standard library ordered maps (Java TreeMap, C++ std::map, Python's sortedcontainers) use a red-black tree or B-tree internally.

Trees excel at range queries: an in-order traversal visits keys in sorted order, so "find all keys between 100 and 200" is O(log n + k) where k is the number of results. Databases lean on B-trees (a generalization allowing many children per node) for exactly this reason.

Choosing the right structure

The decision is driven by the dominant operation in your workload:

Indexed access by position: array.
Frequent insertion or removal at known positions: linked list, but verify that cache effects do not erase the theoretical win.
Lookup, insert, or delete by key: hash table. This is the correct default for associative lookups.
Ordered traversal or range queries: BST or B-tree (or a sorted array if the data is mostly static).

In most application code, "should I use an array or a linked list?" is answered by "array" — cache friendliness dominates. "Should I use an array or a hash table for key lookup?" is answered by "hash table" — O(1) vs O(n) dominates. The tree becomes relevant when you need both O(log n) lookup and ordered enumeration, which is exactly the situation databases face constantly.