Algorithms & Complexity

Binary Search Explained: Why O(log n) Beats Linear Search and How to Get It Right

Binary search is one of the most elegant algorithms in computer science: it solves the search problem in logarithmic time by exploiting the structure of a sorted array. It is also notorious for hiding subtle bugs that only surface on edge cases. This guide covers the theory, two correct implementations, every common mistake, and the variants you will actually use in production.

Published June 30, 2026

The Precondition: Sorted Input

Binary search requires the array to be sorted. This is not a suggestion; it is a hard precondition. If the array is unsorted, binary search will return incorrect results silently — it will not crash or raise an exception. The algorithm works by inferring the direction to search based on the comparison at the midpoint. That inference is only valid when elements are in order.

The sorting step costs O(n log n). So if you need to search an unsorted collection exactly once, a linear scan at O(n) is faster overall. Binary search pays off when you search the same sorted dataset many times: sort once, then each subsequent search is O(log n). This is why database indexes, which are pre-sorted B-tree structures, make repeated lookups fast. See the discussion in database indexes explained for how this extends to disk-based search structures.

Why O(log n)?

Each comparison eliminates half the remaining candidates. Starting with n elements: after one comparison you have at most n/2 candidates; after two, n/4; after k comparisons, n/2^k. The search ends when n/2^k reaches 1, which happens when k = log2(n). Searching one billion elements takes at most 30 comparisons. A linear search of one billion elements takes up to one billion comparisons. That is the complexity gap in practice, not just in theory.

Iterative Implementation

def binary_search(arr: list, target: int) -> int:
    """
    Returns the index of target in arr, or -1 if not found.
    arr must be sorted in ascending order.
    """
    left, right = 0, len(arr) - 1

    while left <= right:
        # Overflow-safe midpoint: avoid (left + right) // 2
        mid = left + (right - left) // 2

        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1

    return -1

# Examples
nums = [1, 3, 5, 7, 9, 11, 13, 15]
print(binary_search(nums, 7))    # 3
print(binary_search(nums, 6))    # -1
print(binary_search(nums, 1))    # 0  (left boundary)
print(binary_search(nums, 15))   # 7  (right boundary)

Recursive Implementation

def binary_search_recursive(arr: list, target: int,
                             left: int = 0, right: int = None) -> int:
    if right is None:
        right = len(arr) - 1

    if left > right:
        return -1

    mid = left + (right - left) // 2

    if arr[mid] == target:
        return mid
    elif arr[mid] < target:
        return binary_search_recursive(arr, target, mid + 1, right)
    else:
        return binary_search_recursive(arr, target, left, mid - 1)

Both implementations are O(log n) time and the iterative version is O(1) space. The recursive version uses O(log n) stack space due to call frames. For very large arrays in environments with limited stack size, prefer the iterative approach. The logic is identical; the recursion just expresses it differently.

Common Bugs: FAQ

Q: Why use mid = left + (right - left) // 2 instead of (left + right) // 2?

In languages with fixed-width integers (C, Java, C++), left + right can overflow when both values are large. If left = 1_000_000_000 and right = 1_500_000_000, their sum exceeds the 32-bit signed integer limit and wraps to a negative number, producing a garbage index. The expression left + (right - left) // 2 never overflows because right - left is always non-negative and smaller than either operand. Python's integers are arbitrary-precision, so this bug does not bite there — but the habit is worth keeping, especially when porting code.

Q: Should the loop condition be left <= right or left < right?

left <= right is the standard condition for searching for an exact match. Using left < right will fail to check the last remaining element when the search space narrows to a single index. You would miss the target when it sits at the final position. For the insertion point variants (bisect_left / bisect_right), the loop runs while left < right — but that is a different algorithm with different invariants.

Q: My binary search returns the wrong index when there are duplicate values. What is happening?

The standard binary search finds an occurrence of the target, not necessarily the first or last. With duplicates, it may land on any one of them depending on the distribution. If you need the first occurrence or the last occurrence, you need a variant. See the bisect section below.

Q: What happens if I forget to update left or right after finding a non-match?

If you write left = mid instead of left = mid + 1, the loop can get stuck in an infinite cycle when left == right - 1. Always move past the midpoint: left = mid + 1 and right = mid - 1. This ensures the search space strictly shrinks each iteration.

Variants: bisect_left, First Occurrence, Last Occurrence

import bisect

nums = [1, 3, 5, 5, 5, 7, 9]

# Find insertion point for 5 (leftmost position where 5 can go)
print(bisect.bisect_left(nums, 5))   # 2

# Find insertion point for 5 on the right side
print(bisect.bisect_right(nums, 5))  # 5

# Manual: find index of first occurrence
def first_occurrence(arr: list, target: int) -> int:
    idx = bisect.bisect_left(arr, target)
    if idx < len(arr) and arr[idx] == target:
        return idx
    return -1

# Manual: find index of last occurrence
def last_occurrence(arr: list, target: int) -> int:
    idx = bisect.bisect_right(arr, target) - 1
    if idx >= 0 and arr[idx] == target:
        return idx
    return -1

print(first_occurrence(nums, 5))   # 2
print(last_occurrence(nums, 5))    # 4

In C++, the standard library provides std::lower_bound (equivalent to bisect_left) and std::upper_bound (bisect_right). Both operate on any sorted range via iterators, not just arrays.

When NOT to Use Binary Search

Unsorted data. You must sort first (O(n log n)), which makes a single search O(n log n) total — worse than linear O(n).
Very small arrays (n < 20). Linear search has better cache behavior and no branching overhead. The crossover point varies by hardware, but binary search's advantage only becomes meaningful at moderate sizes.
Hash-based lookups. If you need O(1) average-case lookup and do not care about order, a hash set or hash map beats binary search. Binary search's advantage is that it also supports range queries (find all elements between x and y), predecessor/successor queries, and floor/ceiling lookups — operations hash tables cannot perform efficiently.

Understanding binary search also deepens your grasp of Big O notation and recursive thinking more broadly. The halving intuition appears in merge sort, quickselect, segment trees, and binary lifting — it is a pattern worth internalizing.