CPython GIL: The Global Interpreter Lock¶

Overview¶

The Global Interpreter Lock (GIL) is a mutex that protects access to Python objects, preventing multiple native threads from executing Python bytecodes simultaneously. It is one of the most discussed — and misunderstood — aspects of CPython.

The GIL is not a language feature. It is an implementation detail of CPython. Other implementations like Jython, IronPython, and PyPy-STM do not have it.

Thread 1 ──── acquire GIL ──── run bytecode ──── release GIL ────▶
Thread 2 ──────────────────── waiting ──────── acquire GIL ──── run ──▶
Thread 3 ──────────────────────────────────────── waiting ──────────▶

1. Why Does the GIL Exist?¶

The Memory Management Problem¶

CPython uses reference counting for garbage collection. Every Python object stores a count of how many references point to it:

// From CPython: Include/object.h
typedef struct _object {
    Py_ssize_t ob_refcnt;   // reference count
    PyTypeObject *ob_type;
} PyObject;

When two threads simultaneously increment or decrement ob_refcnt, you get a race condition:

Thread 1:  load ob_refcnt (= 1)
Thread 2:  load ob_refcnt (= 1)
Thread 1:  decrement → 0 → FREE OBJECT
Thread 2:  decrement → 0 → FREE OBJECT (double-free! crash!)

The GIL prevents this by ensuring only one thread mutates object state at a time.

Why Not Fine-Grained Locking?¶

The obvious alternative — putting a lock on every object — was attempted and abandoned:

Performance overhead: every attribute access, list append, or dict lookup would require lock/unlock
Deadlock complexity: acquiring multiple object locks in consistent order is extremely hard
Single-threaded regression: programs that don't use threads at all got slower

Guido van Rossum chose the GIL as a pragmatic tradeoff to make CPython easy to integrate with C extensions and correct by default.

2. How the GIL Works Internally¶

The Lock Itself¶

In modern CPython (3.2+), the GIL is implemented using a condition variable + mutex pair, not a simple mutex:

// Simplified from Python/ceval_gil.c
struct _gil_runtime_state {
    unsigned long interval;       // check interval in microseconds (default: 5000)
    _Py_atomic_int locked;        // 0 = free, 1 = held
    unsigned long switch_number;  // incremented on every GIL transfer
    PyCOND_T cond;                // condition variable to wake waiting threads
    PyMUTEX_T mutex;
    _Py_atomic_int eval_breaker;  // signal to drop GIL
};

The Check Interval¶

CPython periodically signals the running thread to drop the GIL so other threads get a chance. This is controlled by sys.getswitchinterval():

import sys
print(sys.getswitchinterval())   # 0.005 (5 milliseconds, default)

sys.setswitchinterval(0.001)     # check every 1ms (more context switches)

Internally, the eval_breaker flag is set every N bytecode instructions (pre-3.2) or every 5ms (3.2+, using wall-clock time).

GIL Acquisition Flow¶

flowchart TD
    A[Thread wants to run Python] --> B{GIL locked?}
    B -- No --> C[Acquire GIL, set locked=1]
    B -- Yes --> D[Request drop: set eval_breaker]
    D --> E[Wait on condition variable]
    F[Running thread sees eval_breaker] --> G[Has another thread requested?]
    G -- No --> H[Continue running]
    G -- Yes --> I[Release GIL, signal cond var]
    I --> J[Other thread wakes, acquires GIL]
    C --> K[Execute bytecode]
    K --> L{eval_breaker set?}
    L -- Yes --> G
    L -- No --> K

Dropping the GIL Around I/O¶

The GIL is released during blocking I/O and other long-running C operations so other threads can run Python:

// Example: how socket.recv() releases the GIL
// Modules/socketmodule.c (simplified)

Py_BEGIN_ALLOW_THREADS          // <-- release GIL
n = recv(s->sock_fd, buf, len, flags);
Py_END_ALLOW_THREADS            // <-- reacquire GIL

These macros are fundamental to writing GIL-aware C extensions:

Macro	Effect
`Py_BEGIN_ALLOW_THREADS`	Save thread state, release GIL
`Py_END_ALLOW_THREADS`	Reacquire GIL, restore thread state
`Py_BLOCK_THREADS`	Reacquire GIL (inside allow block)
`Py_UNBLOCK_THREADS`	Release GIL (inside allow block)

3. What the GIL Means for You¶

CPU-Bound Code: GIL Hurts¶

If you use threading for CPU-intensive work, you will not get parallelism — threads take turns:

import threading, time

COUNT = 50_000_000

def countdown(n):
    while n > 0:
        n -= 1

# Single-threaded
start = time.time()
countdown(COUNT)
print(f"Single: {time.time() - start:.2f}s")   # ~2.0s

# Two threads — NOT 2x faster, roughly the same or slower
t1 = threading.Thread(target=countdown, args=(COUNT // 2,))
t2 = threading.Thread(target=countdown, args=(COUNT // 2,))
start = time.time()
t1.start(); t2.start()
t1.join();  t2.join()
print(f"Threaded: {time.time() - start:.2f}s")  # ~2.1s (overhead!)

I/O-Bound Code: GIL Doesn't Matter¶

Because the GIL is released during I/O, threading works well for network/disk bound work:

import threading, urllib.request, time

urls = ["https://httpbin.org/delay/1"] * 5

def fetch(url):
    urllib.request.urlopen(url)

# Sequential: ~5 seconds
# Threaded: ~1 second (all requests in parallel)
threads = [threading.Thread(target=fetch, args=(u,)) for u in urls]
start = time.time()
for t in threads: t.start()
for t in threads: t.join()
print(f"Elapsed: {time.time() - start:.2f}s")

The Right Tool for Each Job¶

Workload	Best Tool	Why
I/O-bound (network, disk)	`threading` or `asyncio`	GIL released during I/O
CPU-bound computation	`multiprocessing`	Separate processes, no GIL
CPU-bound with NumPy	`threading`	NumPy releases GIL in C
Mixed I/O + CPU	`concurrent.futures`	Unified API for both

4. Working Around the GIL¶

`multiprocessing`: True Parallelism¶

Each process has its own GIL and memory space:

from multiprocessing import Pool

def heavy_compute(n):
    return sum(i * i for i in range(n))

with Pool(processes=4) as pool:
    results = pool.map(heavy_compute, [10_000_000] * 4)

Overhead: inter-process communication (IPC) via serialization (pickle). Not free — don't use for tiny tasks.

`concurrent.futures`: Unified Interface¶

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

# I/O-bound: use threads
with ThreadPoolExecutor(max_workers=8) as exe:
    futures = [exe.submit(fetch_url, url) for url in urls]
    results = [f.result() for f in futures]

# CPU-bound: use processes
with ProcessPoolExecutor(max_workers=4) as exe:
    futures = [exe.submit(crunch_numbers, data) for data in chunks]
    results = [f.result() for f in futures]

NumPy, SciPy, Pandas: GIL Released in C¶

Well-written C extensions release the GIL around their hot loops. This means threading does parallelize NumPy operations:

import numpy as np
import threading

arr = np.random.rand(10_000_000)

def compute(a):
    return np.sqrt(a).sum()   # GIL released inside NumPy's C code

t1 = threading.Thread(target=compute, args=(arr,))
t2 = threading.Thread(target=compute, args=(arr,))
t1.start(); t2.start()
t1.join();  t2.join()
# Actually parallel!

`ctypes` / `cffi`: Manual GIL Release¶

When calling C code via ctypes, you can release the GIL with ctypes.CDLL:

import ctypes

# Mark function as GIL-releasing
lib = ctypes.CDLL("mylib.so")
lib.heavy_function.restype = ctypes.c_int
# ctypes releases GIL by default for non-Python callbacks

Cython: `nogil` Blocks¶

Cython lets you explicitly mark code regions as GIL-free:

# mymodule.pyx
from cython.parallel import prange

def parallel_sum(double[:] arr):
    cdef double total = 0.0
    cdef int i

    with nogil:                     # release GIL here
        for i in prange(len(arr)):  # OpenMP parallel loop
            total += arr[i]
    return total

5. `asyncio` and the GIL¶

asyncio is single-threaded cooperative concurrency — it avoids the GIL problem entirely by never creating multiple threads. Everything runs in one thread on an event loop:

import asyncio
import aiohttp

async def fetch(session, url):
    async with session.get(url) as resp:
        return await resp.text()     # yields control, no GIL issue

async def main():
    async with aiohttp.ClientSession() as session:
        tasks = [fetch(session, url) for url in urls]
        results = await asyncio.gather(*tasks)

asyncio.run(main())

When await suspends a coroutine, the event loop runs another — no thread switching, no GIL contention.

6. The GIL and C Extensions¶

Writing C extensions that behave correctly under the GIL requires understanding thread state.

Thread State¶

Every OS thread that runs Python has a PyThreadState struct:

// Simplified from Include/cpython/pystate.h
typedef struct _ts {
    struct _ts *next;
    PyInterpreterState *interp;
    struct _frame *frame;       // current execution frame
    int recursion_depth;
    // ... many more fields
} PyThreadState;

The GIL is tied to the current thread state. When you call Py_BEGIN_ALLOW_THREADS, CPython saves the current PyThreadState* and releases the GIL. On Py_END_ALLOW_THREADS it reacquires the GIL and restores state.

Safe Patterns in C Extensions¶

// Pattern 1: release GIL around blocking C call
static PyObject *
my_blocking_function(PyObject *self, PyObject *args)
{
    int result;
    Py_BEGIN_ALLOW_THREADS
    result = slow_c_library_call();   // may block for seconds
    Py_END_ALLOW_THREADS
    return PyLong_FromLong(result);
}

// Pattern 2: callback from C that needs to call Python
// Must hold the GIL!
void c_callback(void *data) {
    PyGILState_STATE gstate = PyGILState_Ensure();   // acquire GIL
    PyObject *result = PyObject_CallFunction(callback, "O", data);
    Py_XDECREF(result);
    PyGILState_Release(gstate);                      // release GIL
}

`PyGILState_Ensure` / `PyGILState_Release`¶

Used when a non-Python thread (e.g. a C library callback) needs to call Python:

// Thread created by C library, not Python
void worker_thread(void) {
    // This thread has no PyThreadState yet
    PyGILState_STATE state = PyGILState_Ensure();

    // Now we hold the GIL and have a valid thread state
    PyRun_SimpleString("print('hello from C thread')");

    PyGILState_Release(state);
}

7. Diagnosing GIL Contention¶

`sys.monitoring` / `sys.settrace` (Basic)¶

import sys, threading

switch_count = 0
def trace(frame, event, arg):
    global switch_count
    if event == 'call':
        switch_count += 1
    return trace

sys.settrace(trace)

`gilknocker` and `gil_load` (Third-Party)¶

pip install gilknocker

from gilknocker import KnockKnock

with KnockKnock(polling_interval_ms=5) as knocker:
    do_your_work()

print(f"GIL contention: {knocker.contention_ratio:.1%}")
# e.g. "GIL contention: 73.4%"

`py-spy` (Production Profiling)¶

pip install py-spy
py-spy top --pid 12345          # live view
py-spy record -o profile.svg -- python myscript.py

Identifying Bottlenecks¶

import cProfile, pstats

with cProfile.Profile() as pr:
    run_multithreaded_code()

stats = pstats.Stats(pr)
stats.sort_stats('cumulative')
stats.print_stats(20)

Look for functions with high tottime in CPU-bound code — those are holding the GIL. If wait or acquire shows up prominently, you have contention.

8. GIL Changes Across Python Versions¶

Version	Change
≤ 3.1	GIL dropped every N bytecode instructions (`sys.checkinterval`, default 100)
3.2	New GIL: time-based (5ms), condition variable, fairer between threads (Antoine Pitrou)
3.9	Significant GIL contention improvements for subinterpreters
3.12	Per-interpreter GIL (`PEP 684`) — each subinterpreter gets its own GIL
3.13	Free-threaded CPython (`PEP 703`, experimental) — GIL optional via `python3.13t`

PEP 703: Free-Threaded Python (3.13+)¶

The most significant GIL change in CPython's history. You can now install a GIL-free build:

# Install free-threaded Python
pyenv install 3.13t

# Or with python.org installer, check "free-threaded" option
python3.13t -c "import sys; print(sys._is_gil_enabled())"  # False

Enable/disable at runtime:

import sys

# Check
print(sys._is_gil_enabled())   # False in free-threaded build

# Re-enable (for compatibility with non-thread-safe extensions)
sys._enable_gil()

Tradeoff: Without the GIL, single-threaded code runs ~40% slower in 3.13t due to atomic reference counting overhead. This gap is expected to close in future versions.

9. PEP 684: Per-Interpreter GIL (3.12+)¶

PEP 684 allows running multiple Python interpreters in the same process, each with its own GIL:

import _interpreters   # low-level, CPython only

# Create a sub-interpreter
interp_id = _interpreters.create()

# Run code in it — truly parallel!
_interpreters.run_string(interp_id, """
import math
result = math.factorial(10000)
""")

_interpreters.destroy(interp_id)

The interpreters module (higher-level, stabilizing in 3.13) provides channels for safe inter-interpreter communication:

# 3.13+ (PEP 734)
import interpreters

interp = interpreters.create()
channel = interpreters.create_channel()

interp.run(f"""
import interpreters
ch = interpreters.RecvChannel({channel.id})
result = expensive_computation()
ch.send(result)
""")

output = channel.recv()

Mental Model¶

flowchart TB
    subgraph Process["Python Process"]
        GIL(["🔒 GIL"])

        subgraph Threads["Threads"]
            T1["Thread 1\n▶ running bytecode"]
            T2["Thread 2\n⏸ waiting"]
            T3["Thread 3\n⏸ waiting"]
        end

        T1 -->|holds| GIL
        T2 -.->|wants| GIL
        T3 -.->|wants| GIL
    end

    note["EXCEPT: C extensions that call\nPy_BEGIN_ALLOW_THREADS\ncan run concurrently"]
    Process --- note

    style T1 fill:#4caf50,color:#fff
    style T2 fill:#ef9a9a,color:#333
    style T3 fill:#ef9a9a,color:#333
    style GIL fill:#ff9800,color:#fff,stroke:#e65100
    style note fill:#e3f2fd,color:#333,stroke:#90caf9

Key rules to remember:

CPU-bound Python code → multiprocessing, not threading
I/O-bound code → threading or asyncio both work fine
NumPy/SciPy → threading works because C code releases the GIL
C extensions → use Py_BEGIN/END_ALLOW_THREADS around blocking calls
Python 3.13t → experimental free-threaded build available if you need true parallelism without multiprocessing