Skip to content

09 — Stack and Procedures

The stack is the backbone of function calls. Understanding how CALL, RET, and stack frames work lets you write reusable functions, interact with C libraries, and debug crashes.


Stack Fundamentals

The stack is a LIFO (Last In, First Out) data structure in memory.

  • Grows downward (from high to low addresses)
  • RSP always points to the top (lowest address currently in use)
  • Must be 16-byte aligned before any CALL instruction (System V ABI)
block-beta
  columns 1
  A["argv / env  (set up by OS)"]
  B["...  (previous frames)"]
  C["top item  ← RSP"]
  D["(free)"]

  style A fill:#4a4a6a,color:#fff,stroke:#888
  style B fill:#4a4a6a,color:#fff,stroke:#888
  style C fill:#2d6a4f,color:#fff,stroke:#52b788
  style D fill:#1e1e2e,color:#888,stroke:#444,stroke-dasharray:4

Stack grows downward — high addresses at top, RSP points to the current top item.

PUSH / POP

push rax     ; RSP -= 8 ; [RSP] = rax
pop  rbx     ; rbx = [RSP] ; RSP += 8

Only 64-bit registers (or 16-bit with pushw) can be pushed in 64-bit mode. There is no push eax in 64-bit mode.


CALL and RET

CALL

call label    ; PUSH RIP (return address), then JMP label

Internally:

; call foo  is equivalent to:
sub rsp, 8
mov [rsp], rip_of_next_instruction
jmp foo

RET

ret    ; POP RIP (jump to return address)

Internally:

; ret is equivalent to:
mov rip, [rsp]
add rsp, 8

The caller's return address sits at [RSP] when inside a called function (before any push or stack allocation).


Stack Frames

A stack frame is the region of the stack dedicated to one function call. It holds: - The saved return address (placed by CALL) - The saved RBP of the caller - Local variables - Saved callee-saved registers

Standard Prologue / Epilogue

my_function:
    ; --- Prologue ---
    push rbp           ; save caller's frame pointer
    mov  rbp, rsp      ; set our frame base

    sub  rsp, 32       ; allocate space for local variables
                       ; (must keep RSP 16-byte aligned)

    ; --- Function body ---
    mov  qword [rbp - 8],  rdi    ; store 1st argument as local
    mov  qword [rbp - 16], rsi    ; store 2nd argument as local

    ; ... computation ...

    mov  rax, 42       ; return value in RAX

    ; --- Epilogue ---
    mov  rsp, rbp      ; deallocate locals
    pop  rbp           ; restore caller's frame pointer
    ret                ; return to caller

Stack Layout During my_function

Address         Contents
─────────────────────────────────────────
[rbp + 16]    second stack argument (if any)
[rbp +  8]    first  stack argument (if any)
[rbp +  0]    saved return address   ← CALL pushed this
[rbp -  8]    saved RBP              ← our push rbp
              (rbp points here after: push rbp; mov rbp, rsp)
[rbp -  8]    local variable 1
[rbp - 16]    local variable 2
[rbp - 24]    local variable 3
[rbp - 32]    padding (for alignment)
RSP →  ...    (top of stack)

Callee-Saved Registers

If your function uses RBX, RBP, R12R15, you must save them at the start and restore before returning:

my_function:
    push rbp
    mov  rbp, rsp
    push rbx       ; save callee-saved registers used
    push r12
    push r13

    ; ... use rbx, r12, r13 freely ...

    pop  r13       ; restore in reverse order
    pop  r12
    pop  rbx
    pop  rbp
    ret

Always push and pop in reverse order. Mismatch = corrupted RSP = crash.


The Red Zone (Linux x86-64)

The System V ABI defines a 128-byte red zone below RSP that the kernel guarantees not to clobber (signal handlers excluded). Leaf functions (no calls) can use it without adjusting RSP:

; Leaf function using red zone (no push, no sub rsp)
leaf_func:
    mov  [rsp - 8],  rax    ; use red zone for temporaries
    mov  [rsp - 16], rbx
    ; ... compute ...
    ret

This is an optimization, not a requirement. Non-leaf functions must still allocate stack space properly.


Recursive Functions

Example: compute n! (factorial)

; rdi = n  →  returns n! in rax
factorial:
    push rbp
    mov  rbp, rsp
    push rbx                 ; save rbx (callee-saved)

    mov  rbx, rdi            ; save n

    cmp  rdi, 1
    jle  .base_case

    dec  rdi
    call factorial           ; rax = (n-1)!
    imul rax, rbx            ; rax = n * (n-1)!
    jmp  .done

.base_case:
    mov  rax, 1              ; 0! = 1! = 1

.done:
    pop  rbx
    pop  rbp
    ret

Usage:

mov  rdi, 6
call factorial    ; rax = 720


Stack-Passed Arguments (7th+ arguments)

For functions with more than 6 arguments, extras are pushed onto the stack right-to-left by the caller:

; Calling: func(a, b, c, d, e, f, g, h)
; First 6 go in registers; 7th (g) and 8th (h) go on stack

push h           ; pushed first (rightmost)
push g
; now set registers:
mov  r9,  f
mov  r8,  e
mov  rcx, d
mov  rdx, c
mov  rsi, b
mov  rdi, a
call func

add  rsp, 16     ; caller cleans up the 2 pushed args (16 bytes)

Inside func, the stack arguments are at: - [rbp + 16] → g (7th arg) - [rbp + 24] → h (8th arg)


Frame Pointer Omission

Modern compilers often use -fomit-frame-pointer, which eliminates the push rbp; mov rbp, rsp prologue:

; Without frame pointer:
my_func:
    sub  rsp, 40            ; allocate locals directly
    mov  [rsp + 0],  rdi
    mov  [rsp + 8],  rsi
    ; ... body ...
    add  rsp, 40
    ret

Locals are accessed via RSP offsets instead of RBP. Faster and saves one register, but makes debugging harder (no frame chain to walk).


Complete Example: Two-Function Program

; add_and_double.asm
; int64_t add(int64_t a, int64_t b) { return a + b; }
; int64_t double_it(int64_t x) { return x * 2; }
; _start: exit with add(3, 7) * 2 = 20

section .text
global _start

add_vals:
    push rbp
    mov  rbp, rsp
    mov  rax, rdi
    add  rax, rsi    ; rax = a + b
    pop  rbp
    ret

double_it:
    push rbp
    mov  rbp, rsp
    lea  rax, [rdi + rdi]   ; rax = x * 2
    pop  rbp
    ret

_start:
    ; result = add(3, 7)
    mov  rdi, 3
    mov  rsi, 7
    call add_vals       ; rax = 10

    ; result = double_it(result)
    mov  rdi, rax
    call double_it      ; rax = 20

    ; exit(result)
    mov  rdi, rax       ; exit code = 20
    mov  rax, 60
    syscall

Run and check: echo $? should output 20.


Common Stack Bugs

Bug Symptom Cause
Stack misalignment SIGSEGV in called C function RSP not 16-byte aligned at CALL
Missing prologue/epilogue RBP corrupted after return Unbalanced PUSH/POP
Wrong callee-save Random corruption in caller Modified RBX/R12-R15 without saving
RET to wrong address SIGSEGV / wrong behavior Stack overwritten (buffer overflow)

Check Alignment

Before calling any C function or syscall that may use SSE:

; After CALL, RSP is misaligned by 8 (return address was pushed).
; A further sub makes it aligned again:
sub rsp, 8      ; align to 16 bytes
call some_c_func
add rsp, 8

Or use an odd number of PUSHes in the prologue to maintain alignment.


Key Takeaways

  • CALL pushes return address, then jumps; RET pops return address and jumps back
  • Standard prologue: push rbp; mov rbp, rsp; sub rsp, N
  • Standard epilogue: mov rsp, rbp; pop rbp; ret
  • Callee saves: RBX, RBP, R12R15
  • Caller saves: RAX, RCX, RDX, RSI, RDI, R8R11
  • RSP must be 16-byte aligned before any CALL instruction

Next: 10 — System Calls