CPython GIL: The Global Interpreter Lock¶
Overview¶
The Global Interpreter Lock (GIL) is a mutex that protects access to Python objects, preventing multiple native threads from executing Python bytecodes simultaneously. It is one of the most discussed — and misunderstood — aspects of CPython.
The GIL is not a language feature. It is an implementation detail of CPython. Other implementations like Jython, IronPython, and PyPy-STM do not have it.
Thread 1 ──── acquire GIL ──── run bytecode ──── release GIL ────▶
Thread 2 ──────────────────── waiting ──────── acquire GIL ──── run ──▶
Thread 3 ──────────────────────────────────────── waiting ──────────▶
1. Why Does the GIL Exist?¶
The Memory Management Problem¶
CPython uses reference counting for garbage collection. Every Python object stores a count of how many references point to it:
// From CPython: Include/object.h
typedef struct _object {
Py_ssize_t ob_refcnt; // reference count
PyTypeObject *ob_type;
} PyObject;
When two threads simultaneously increment or decrement ob_refcnt, you get a race condition:
Thread 1: load ob_refcnt (= 1)
Thread 2: load ob_refcnt (= 1)
Thread 1: decrement → 0 → FREE OBJECT
Thread 2: decrement → 0 → FREE OBJECT (double-free! crash!)
The GIL prevents this by ensuring only one thread mutates object state at a time.
Why Not Fine-Grained Locking?¶
The obvious alternative — putting a lock on every object — was attempted and abandoned:
- Performance overhead: every attribute access, list append, or dict lookup would require lock/unlock
- Deadlock complexity: acquiring multiple object locks in consistent order is extremely hard
- Single-threaded regression: programs that don't use threads at all got slower
Guido van Rossum chose the GIL as a pragmatic tradeoff to make CPython easy to integrate with C extensions and correct by default.
2. How the GIL Works Internally¶
The Lock Itself¶
In modern CPython (3.2+), the GIL is implemented using a condition variable + mutex pair, not a simple mutex:
// Simplified from Python/ceval_gil.c
struct _gil_runtime_state {
unsigned long interval; // check interval in microseconds (default: 5000)
_Py_atomic_int locked; // 0 = free, 1 = held
unsigned long switch_number; // incremented on every GIL transfer
PyCOND_T cond; // condition variable to wake waiting threads
PyMUTEX_T mutex;
_Py_atomic_int eval_breaker; // signal to drop GIL
};
The Check Interval¶
CPython periodically signals the running thread to drop the GIL so other threads get a chance. This is controlled by sys.getswitchinterval():
import sys
print(sys.getswitchinterval()) # 0.005 (5 milliseconds, default)
sys.setswitchinterval(0.001) # check every 1ms (more context switches)
Internally, the eval_breaker flag is set every N bytecode instructions (pre-3.2) or every 5ms (3.2+, using wall-clock time).
GIL Acquisition Flow¶
flowchart TD
A[Thread wants to run Python] --> B{GIL locked?}
B -- No --> C[Acquire GIL, set locked=1]
B -- Yes --> D[Request drop: set eval_breaker]
D --> E[Wait on condition variable]
F[Running thread sees eval_breaker] --> G[Has another thread requested?]
G -- No --> H[Continue running]
G -- Yes --> I[Release GIL, signal cond var]
I --> J[Other thread wakes, acquires GIL]
C --> K[Execute bytecode]
K --> L{eval_breaker set?}
L -- Yes --> G
L -- No --> K
Dropping the GIL Around I/O¶
The GIL is released during blocking I/O and other long-running C operations so other threads can run Python:
// Example: how socket.recv() releases the GIL
// Modules/socketmodule.c (simplified)
Py_BEGIN_ALLOW_THREADS // <-- release GIL
n = recv(s->sock_fd, buf, len, flags);
Py_END_ALLOW_THREADS // <-- reacquire GIL
These macros are fundamental to writing GIL-aware C extensions:
| Macro | Effect |
|---|---|
Py_BEGIN_ALLOW_THREADS |
Save thread state, release GIL |
Py_END_ALLOW_THREADS |
Reacquire GIL, restore thread state |
Py_BLOCK_THREADS |
Reacquire GIL (inside allow block) |
Py_UNBLOCK_THREADS |
Release GIL (inside allow block) |
3. What the GIL Means for You¶
CPU-Bound Code: GIL Hurts¶
If you use threading for CPU-intensive work, you will not get parallelism — threads take turns:
import threading, time
COUNT = 50_000_000
def countdown(n):
while n > 0:
n -= 1
# Single-threaded
start = time.time()
countdown(COUNT)
print(f"Single: {time.time() - start:.2f}s") # ~2.0s
# Two threads — NOT 2x faster, roughly the same or slower
t1 = threading.Thread(target=countdown, args=(COUNT // 2,))
t2 = threading.Thread(target=countdown, args=(COUNT // 2,))
start = time.time()
t1.start(); t2.start()
t1.join(); t2.join()
print(f"Threaded: {time.time() - start:.2f}s") # ~2.1s (overhead!)
I/O-Bound Code: GIL Doesn't Matter¶
Because the GIL is released during I/O, threading works well for network/disk bound work:
import threading, urllib.request, time
urls = ["https://httpbin.org/delay/1"] * 5
def fetch(url):
urllib.request.urlopen(url)
# Sequential: ~5 seconds
# Threaded: ~1 second (all requests in parallel)
threads = [threading.Thread(target=fetch, args=(u,)) for u in urls]
start = time.time()
for t in threads: t.start()
for t in threads: t.join()
print(f"Elapsed: {time.time() - start:.2f}s")
The Right Tool for Each Job¶
| Workload | Best Tool | Why |
|---|---|---|
| I/O-bound (network, disk) | threading or asyncio |
GIL released during I/O |
| CPU-bound computation | multiprocessing |
Separate processes, no GIL |
| CPU-bound with NumPy | threading |
NumPy releases GIL in C |
| Mixed I/O + CPU | concurrent.futures |
Unified API for both |
4. Working Around the GIL¶
multiprocessing: True Parallelism¶
Each process has its own GIL and memory space:
from multiprocessing import Pool
def heavy_compute(n):
return sum(i * i for i in range(n))
with Pool(processes=4) as pool:
results = pool.map(heavy_compute, [10_000_000] * 4)
Overhead: inter-process communication (IPC) via serialization (pickle). Not free — don't use for tiny tasks.
concurrent.futures: Unified Interface¶
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
# I/O-bound: use threads
with ThreadPoolExecutor(max_workers=8) as exe:
futures = [exe.submit(fetch_url, url) for url in urls]
results = [f.result() for f in futures]
# CPU-bound: use processes
with ProcessPoolExecutor(max_workers=4) as exe:
futures = [exe.submit(crunch_numbers, data) for data in chunks]
results = [f.result() for f in futures]
NumPy, SciPy, Pandas: GIL Released in C¶
Well-written C extensions release the GIL around their hot loops. This means threading does parallelize NumPy operations:
import numpy as np
import threading
arr = np.random.rand(10_000_000)
def compute(a):
return np.sqrt(a).sum() # GIL released inside NumPy's C code
t1 = threading.Thread(target=compute, args=(arr,))
t2 = threading.Thread(target=compute, args=(arr,))
t1.start(); t2.start()
t1.join(); t2.join()
# Actually parallel!
ctypes / cffi: Manual GIL Release¶
When calling C code via ctypes, you can release the GIL with ctypes.CDLL:
import ctypes
# Mark function as GIL-releasing
lib = ctypes.CDLL("mylib.so")
lib.heavy_function.restype = ctypes.c_int
# ctypes releases GIL by default for non-Python callbacks
Cython: nogil Blocks¶
Cython lets you explicitly mark code regions as GIL-free:
# mymodule.pyx
from cython.parallel import prange
def parallel_sum(double[:] arr):
cdef double total = 0.0
cdef int i
with nogil: # release GIL here
for i in prange(len(arr)): # OpenMP parallel loop
total += arr[i]
return total
5. asyncio and the GIL¶
asyncio is single-threaded cooperative concurrency — it avoids the GIL problem entirely by never creating multiple threads. Everything runs in one thread on an event loop:
import asyncio
import aiohttp
async def fetch(session, url):
async with session.get(url) as resp:
return await resp.text() # yields control, no GIL issue
async def main():
async with aiohttp.ClientSession() as session:
tasks = [fetch(session, url) for url in urls]
results = await asyncio.gather(*tasks)
asyncio.run(main())
When await suspends a coroutine, the event loop runs another — no thread switching, no GIL contention.
6. The GIL and C Extensions¶
Writing C extensions that behave correctly under the GIL requires understanding thread state.
Thread State¶
Every OS thread that runs Python has a PyThreadState struct:
// Simplified from Include/cpython/pystate.h
typedef struct _ts {
struct _ts *next;
PyInterpreterState *interp;
struct _frame *frame; // current execution frame
int recursion_depth;
// ... many more fields
} PyThreadState;
The GIL is tied to the current thread state. When you call Py_BEGIN_ALLOW_THREADS, CPython saves the current PyThreadState* and releases the GIL. On Py_END_ALLOW_THREADS it reacquires the GIL and restores state.
Safe Patterns in C Extensions¶
// Pattern 1: release GIL around blocking C call
static PyObject *
my_blocking_function(PyObject *self, PyObject *args)
{
int result;
Py_BEGIN_ALLOW_THREADS
result = slow_c_library_call(); // may block for seconds
Py_END_ALLOW_THREADS
return PyLong_FromLong(result);
}
// Pattern 2: callback from C that needs to call Python
// Must hold the GIL!
void c_callback(void *data) {
PyGILState_STATE gstate = PyGILState_Ensure(); // acquire GIL
PyObject *result = PyObject_CallFunction(callback, "O", data);
Py_XDECREF(result);
PyGILState_Release(gstate); // release GIL
}
PyGILState_Ensure / PyGILState_Release¶
Used when a non-Python thread (e.g. a C library callback) needs to call Python:
// Thread created by C library, not Python
void worker_thread(void) {
// This thread has no PyThreadState yet
PyGILState_STATE state = PyGILState_Ensure();
// Now we hold the GIL and have a valid thread state
PyRun_SimpleString("print('hello from C thread')");
PyGILState_Release(state);
}
7. Diagnosing GIL Contention¶
sys.monitoring / sys.settrace (Basic)¶
import sys, threading
switch_count = 0
def trace(frame, event, arg):
global switch_count
if event == 'call':
switch_count += 1
return trace
sys.settrace(trace)
gilknocker and gil_load (Third-Party)¶
from gilknocker import KnockKnock
with KnockKnock(polling_interval_ms=5) as knocker:
do_your_work()
print(f"GIL contention: {knocker.contention_ratio:.1%}")
# e.g. "GIL contention: 73.4%"
py-spy (Production Profiling)¶
pip install py-spy
py-spy top --pid 12345 # live view
py-spy record -o profile.svg -- python myscript.py
Identifying Bottlenecks¶
import cProfile, pstats
with cProfile.Profile() as pr:
run_multithreaded_code()
stats = pstats.Stats(pr)
stats.sort_stats('cumulative')
stats.print_stats(20)
Look for functions with high tottime in CPU-bound code — those are holding the GIL. If wait or acquire shows up prominently, you have contention.
8. GIL Changes Across Python Versions¶
| Version | Change |
|---|---|
| ≤ 3.1 | GIL dropped every N bytecode instructions (sys.checkinterval, default 100) |
| 3.2 | New GIL: time-based (5ms), condition variable, fairer between threads (Antoine Pitrou) |
| 3.9 | Significant GIL contention improvements for subinterpreters |
| 3.12 | Per-interpreter GIL (PEP 684) — each subinterpreter gets its own GIL |
| 3.13 | Free-threaded CPython (PEP 703, experimental) — GIL optional via python3.13t |
PEP 703: Free-Threaded Python (3.13+)¶
The most significant GIL change in CPython's history. You can now install a GIL-free build:
# Install free-threaded Python
pyenv install 3.13t
# Or with python.org installer, check "free-threaded" option
python3.13t -c "import sys; print(sys._is_gil_enabled())" # False
Enable/disable at runtime:
import sys
# Check
print(sys._is_gil_enabled()) # False in free-threaded build
# Re-enable (for compatibility with non-thread-safe extensions)
sys._enable_gil()
Tradeoff: Without the GIL, single-threaded code runs ~40% slower in 3.13t due to atomic reference counting overhead. This gap is expected to close in future versions.
9. PEP 684: Per-Interpreter GIL (3.12+)¶
PEP 684 allows running multiple Python interpreters in the same process, each with its own GIL:
import _interpreters # low-level, CPython only
# Create a sub-interpreter
interp_id = _interpreters.create()
# Run code in it — truly parallel!
_interpreters.run_string(interp_id, """
import math
result = math.factorial(10000)
""")
_interpreters.destroy(interp_id)
The interpreters module (higher-level, stabilizing in 3.13) provides channels for safe inter-interpreter communication:
# 3.13+ (PEP 734)
import interpreters
interp = interpreters.create()
channel = interpreters.create_channel()
interp.run(f"""
import interpreters
ch = interpreters.RecvChannel({channel.id})
result = expensive_computation()
ch.send(result)
""")
output = channel.recv()
Mental Model¶
flowchart TB
subgraph Process["Python Process"]
GIL(["🔒 GIL"])
subgraph Threads["Threads"]
T1["Thread 1\n▶ running bytecode"]
T2["Thread 2\n⏸ waiting"]
T3["Thread 3\n⏸ waiting"]
end
T1 -->|holds| GIL
T2 -.->|wants| GIL
T3 -.->|wants| GIL
end
note["EXCEPT: C extensions that call\nPy_BEGIN_ALLOW_THREADS\ncan run concurrently"]
Process --- note
style T1 fill:#4caf50,color:#fff
style T2 fill:#ef9a9a,color:#333
style T3 fill:#ef9a9a,color:#333
style GIL fill:#ff9800,color:#fff,stroke:#e65100
style note fill:#e3f2fd,color:#333,stroke:#90caf9
Key rules to remember:
- CPU-bound Python code →
multiprocessing, notthreading - I/O-bound code →
threadingorasyncioboth work fine - NumPy/SciPy →
threadingworks because C code releases the GIL - C extensions → use
Py_BEGIN/END_ALLOW_THREADSaround blocking calls - Python 3.13t → experimental free-threaded build available if you need true parallelism without
multiprocessing
Further Reading¶
- Python docs: The Global Interpreter Lock
- PEP 703 — Making the GIL Optional
- PEP 684 — A Per-Interpreter GIL
- PEP 734 — Multiple Interpreters in the Stdlib
- CPython source:
Python/ceval_gil.c,Python/pystate.c