09 — Stack and Procedures¶
The stack is the backbone of function calls. Understanding how CALL, RET, and stack frames work lets you write reusable functions, interact with C libraries, and debug crashes.
Stack Fundamentals¶
The stack is a LIFO (Last In, First Out) data structure in memory.
- Grows downward (from high to low addresses)
RSPalways points to the top (lowest address currently in use)- Must be 16-byte aligned before any
CALLinstruction (System V ABI)
block-beta
columns 1
A["argv / env (set up by OS)"]
B["... (previous frames)"]
C["top item ← RSP"]
D["(free)"]
style A fill:#4a4a6a,color:#fff,stroke:#888
style B fill:#4a4a6a,color:#fff,stroke:#888
style C fill:#2d6a4f,color:#fff,stroke:#52b788
style D fill:#1e1e2e,color:#888,stroke:#444,stroke-dasharray:4
Stack grows downward — high addresses at top,
RSPpoints to the current top item.
PUSH / POP¶
Only 64-bit registers (or 16-bit with pushw) can be pushed in 64-bit mode. There is no push eax in 64-bit mode.
CALL and RET¶
CALL¶
Internally:
RET¶
Internally:
The caller's return address sits at [RSP] when inside a called function (before any push or stack allocation).
Stack Frames¶
A stack frame is the region of the stack dedicated to one function call. It holds:
- The saved return address (placed by CALL)
- The saved RBP of the caller
- Local variables
- Saved callee-saved registers
Standard Prologue / Epilogue¶
my_function:
; --- Prologue ---
push rbp ; save caller's frame pointer
mov rbp, rsp ; set our frame base
sub rsp, 32 ; allocate space for local variables
; (must keep RSP 16-byte aligned)
; --- Function body ---
mov qword [rbp - 8], rdi ; store 1st argument as local
mov qword [rbp - 16], rsi ; store 2nd argument as local
; ... computation ...
mov rax, 42 ; return value in RAX
; --- Epilogue ---
mov rsp, rbp ; deallocate locals
pop rbp ; restore caller's frame pointer
ret ; return to caller
Stack Layout During my_function¶
Address Contents
─────────────────────────────────────────
[rbp + 16] second stack argument (if any)
[rbp + 8] first stack argument (if any)
[rbp + 0] saved return address ← CALL pushed this
[rbp - 8] saved RBP ← our push rbp
(rbp points here after: push rbp; mov rbp, rsp)
[rbp - 8] local variable 1
[rbp - 16] local variable 2
[rbp - 24] local variable 3
[rbp - 32] padding (for alignment)
RSP → ... (top of stack)
Callee-Saved Registers¶
If your function uses RBX, RBP, R12–R15, you must save them at the start and restore before returning:
my_function:
push rbp
mov rbp, rsp
push rbx ; save callee-saved registers used
push r12
push r13
; ... use rbx, r12, r13 freely ...
pop r13 ; restore in reverse order
pop r12
pop rbx
pop rbp
ret
Always push and pop in reverse order. Mismatch = corrupted RSP = crash.
The Red Zone (Linux x86-64)¶
The System V ABI defines a 128-byte red zone below RSP that the kernel guarantees not to clobber (signal handlers excluded). Leaf functions (no calls) can use it without adjusting RSP:
; Leaf function using red zone (no push, no sub rsp)
leaf_func:
mov [rsp - 8], rax ; use red zone for temporaries
mov [rsp - 16], rbx
; ... compute ...
ret
This is an optimization, not a requirement. Non-leaf functions must still allocate stack space properly.
Recursive Functions¶
Example: compute n! (factorial)
; rdi = n → returns n! in rax
factorial:
push rbp
mov rbp, rsp
push rbx ; save rbx (callee-saved)
mov rbx, rdi ; save n
cmp rdi, 1
jle .base_case
dec rdi
call factorial ; rax = (n-1)!
imul rax, rbx ; rax = n * (n-1)!
jmp .done
.base_case:
mov rax, 1 ; 0! = 1! = 1
.done:
pop rbx
pop rbp
ret
Usage:
Stack-Passed Arguments (7th+ arguments)¶
For functions with more than 6 arguments, extras are pushed onto the stack right-to-left by the caller:
; Calling: func(a, b, c, d, e, f, g, h)
; First 6 go in registers; 7th (g) and 8th (h) go on stack
push h ; pushed first (rightmost)
push g
; now set registers:
mov r9, f
mov r8, e
mov rcx, d
mov rdx, c
mov rsi, b
mov rdi, a
call func
add rsp, 16 ; caller cleans up the 2 pushed args (16 bytes)
Inside func, the stack arguments are at:
- [rbp + 16] → g (7th arg)
- [rbp + 24] → h (8th arg)
Frame Pointer Omission¶
Modern compilers often use -fomit-frame-pointer, which eliminates the push rbp; mov rbp, rsp prologue:
; Without frame pointer:
my_func:
sub rsp, 40 ; allocate locals directly
mov [rsp + 0], rdi
mov [rsp + 8], rsi
; ... body ...
add rsp, 40
ret
Locals are accessed via RSP offsets instead of RBP. Faster and saves one register, but makes debugging harder (no frame chain to walk).
Complete Example: Two-Function Program¶
; add_and_double.asm
; int64_t add(int64_t a, int64_t b) { return a + b; }
; int64_t double_it(int64_t x) { return x * 2; }
; _start: exit with add(3, 7) * 2 = 20
section .text
global _start
add_vals:
push rbp
mov rbp, rsp
mov rax, rdi
add rax, rsi ; rax = a + b
pop rbp
ret
double_it:
push rbp
mov rbp, rsp
lea rax, [rdi + rdi] ; rax = x * 2
pop rbp
ret
_start:
; result = add(3, 7)
mov rdi, 3
mov rsi, 7
call add_vals ; rax = 10
; result = double_it(result)
mov rdi, rax
call double_it ; rax = 20
; exit(result)
mov rdi, rax ; exit code = 20
mov rax, 60
syscall
Run and check: echo $? should output 20.
Common Stack Bugs¶
| Bug | Symptom | Cause |
|---|---|---|
| Stack misalignment | SIGSEGV in called C function | RSP not 16-byte aligned at CALL |
| Missing prologue/epilogue | RBP corrupted after return | Unbalanced PUSH/POP |
| Wrong callee-save | Random corruption in caller | Modified RBX/R12-R15 without saving |
| RET to wrong address | SIGSEGV / wrong behavior | Stack overwritten (buffer overflow) |
Check Alignment¶
Before calling any C function or syscall that may use SSE:
; After CALL, RSP is misaligned by 8 (return address was pushed).
; A further sub makes it aligned again:
sub rsp, 8 ; align to 16 bytes
call some_c_func
add rsp, 8
Or use an odd number of PUSHes in the prologue to maintain alignment.
Key Takeaways¶
CALLpushes return address, then jumps;RETpops return address and jumps back- Standard prologue:
push rbp; mov rbp, rsp; sub rsp, N - Standard epilogue:
mov rsp, rbp; pop rbp; ret - Callee saves:
RBX,RBP,R12–R15 - Caller saves:
RAX,RCX,RDX,RSI,RDI,R8–R11 - RSP must be 16-byte aligned before any
CALLinstruction