Skip to content

13 — Calling Conventions

A calling convention is a contract between caller and callee: how arguments are passed, where the return value goes, which registers must be preserved, and how the stack is managed. Getting this right is essential for interoperability between assembly, C, and other languages.


Why Calling Conventions Matter

Without a convention: - A caller puts arguments in registers A, B, C - A callee reads from registers D, E, F - Nothing works

With a convention, assembly code can call C functions (printf, malloc), C code can call assembly functions, and shared libraries can be written in any language.


System V AMD64 ABI (Linux, macOS)

This is the standard on all UNIX-like 64-bit systems.

Integer/Pointer Arguments

Argument Register
1st RDI
2nd RSI
3rd RDX
4th RCX
5th R8
6th R9
7th+ Stack (right to left)

Floating-Point Arguments

FP Argument Register
1st XMM0
2nd XMM1
... ...
8th XMM7

Return Values

Type Register
Integer/pointer (≤64-bit) RAX
Large integer (128-bit) RDX:RAX
Float/double XMM0
Struct ≤128 bits RDX:RAX
Struct >128 bits Caller allocates, passes pointer in RDI

Register Preservation

Caller-Saved (volatile) Callee-Saved (non-volatile)
RAX, RCX, RDX RBX, RBP
RSI, RDI R12, R13, R14, R15
R8, R9, R10, R11
XMM0–XMM7 XMM8–XMM15
All YMM/ZMM upper halves

Caller-saved: the calling function must save these before calling if it needs them after. Callee-saved: the called function must save and restore these if it uses them.

Stack Alignment

  • RSP must be 16-byte aligned at the point of the CALL instruction
  • Since CALL pushes 8 bytes (return address), RSP inside a function is offset by 8 from 16-byte alignment
  • After the prologue (push rbp), RSP is 16-byte aligned again
; RSP alignment check:
; Before call: RSP % 16 == 0
;   CALL pushes 8 bytes → RSP % 16 == 8
;   push rbp pushes 8 bytes → RSP % 16 == 0  ✓

Microsoft x64 ABI (Windows)

Windows uses a different convention — be aware when writing cross-platform code.

Key Differences from System V

Feature System V AMD64 Microsoft x64
Arg 1 RDI RCX
Arg 2 RSI RDX
Arg 3 RDX R8
Arg 4 RCX R9
Arg 5+ Stack Stack
Callee-saved RBX, RBP, R12–R15 RBX, RBP, RDI, RSI, R12–R15, XMM6–XMM15
Shadow space None 32 bytes always reserved by caller
Stack alignment 16 bytes at CALL 16 bytes at CALL

Shadow Space (Windows)

The caller must always allocate 32 bytes of shadow space on the stack before a call, even if fewer than 4 arguments are used. The callee may store its register arguments there for debugging.

; Windows: calling a 4-argument function
sub  rsp, 40        ; 32 bytes shadow + 8 bytes for alignment
mov  rcx, arg1
mov  rdx, arg2
mov  r8,  arg3
mov  r9,  arg4
call some_function
add  rsp, 40

Calling C Functions from Assembly

Example: Call printf

; call_printf.asm
; Requires linking with C library: gcc call_printf.o -o call_printf -no-pie

extern printf       ; declare printf as external symbol

section .data
    fmt  db "Value: %ld", 10, 0   ; format string (null-terminated)

section .text
global main         ; use main instead of _start when linking with glibc

main:
    push rbp
    mov  rbp, rsp
    sub  rsp, 16         ; align stack (we have push rbp = 8 bytes already)

    ; printf(fmt, value)
    lea  rdi, [fmt]      ; arg1: format string
    mov  rsi, 42         ; arg2: integer value
    xor  eax, eax        ; AL = 0: no XMM registers used (variadic ABI)
    call printf

    ; return 0
    xor  eax, eax
    leave                ; mov rsp, rbp; pop rbp
    ret

Build:

nasm -f elf64 call_printf.asm -o call_printf.o
gcc call_printf.o -o call_printf -no-pie
./call_printf
# Value: 42

Important for variadic functions (printf, scanf): set AL = number of XMM registers used (usually 0).

Example: Call malloc and free

extern malloc
extern free

section .text
global main

main:
    push rbp
    mov  rbp, rsp

    ; void *p = malloc(64)
    mov  rdi, 64
    call malloc          ; rax = pointer (or NULL on failure)
    test rax, rax
    jz   .oom

    mov  r12, rax        ; save pointer in callee-saved register

    ; Use the memory ...
    mov  qword [r12], 0xDEADBEEF

    ; free(p)
    mov  rdi, r12
    call free

    xor  eax, eax
    pop  rbp
    ret

.oom:
    mov  eax, 1
    pop  rbp
    ret

Writing C-Callable Assembly Functions

Your assembly functions need to follow the ABI so C code can call them.

C header

// mymath.h
long square(long x);
long max_of_three(long a, long b, long c);

Assembly implementation

; mymath.asm

section .text
global square       ; export symbol
global max_of_three

; long square(long x)  →  rdi = x
square:
    imul rdi, rdi   ; rdi = x * x
    mov  rax, rdi   ; return value
    ret

; long max_of_three(long a, long b, long c)  →  rdi, rsi, rdx
max_of_three:
    mov  rax, rdi       ; rax = a (tentative max)

    cmp  rsi, rax
    cmovg rax, rsi      ; if b > rax, rax = b

    cmp  rdx, rax
    cmovg rax, rdx      ; if c > rax, rax = c

    ret

C caller

// main.c
#include <stdio.h>
#include "mymath.h"

int main() {
    printf("%ld\n", square(7));           // 49
    printf("%ld\n", max_of_three(3, 9, 5)); // 9
    return 0;
}

Build:

nasm -f elf64 mymath.asm -o mymath.o
gcc main.c mymath.o -o program
./program


Stack Frame Conventions in Detail

Caller's frame
═══════════════════════════════════════
[rbp + 24]   3rd stack argument (if any)
[rbp + 16]   2nd stack argument (if any)  ← 7th overall arg
[rbp +  8]   return address              ← CALL pushed this
[rbp +  0]   caller's RBP               ← push rbp
[rbp -  8]   local variable 1
[rbp - 16]   local variable 2
[rbp - 24]   saved RBX (if used)
[rbp - 32]   saved R12 (if used)
RSP →        (aligned to 16 bytes)
═══════════════════════════════════════

Accessing Stack Arguments (7th+)

my_7arg_func:           ; args: rdi, rsi, rdx, rcx, r8, r9, [stack]
    push rbp
    mov  rbp, rsp

    ; 7th argument is at [rbp + 16]
    mov  rax, [rbp + 16]

    pop  rbp
    ret

Checking ABI Compliance with GDB

(gdb) break my_function
(gdb) run
(gdb) info registers rdi rsi rdx rcx r8 r9
(gdb) x/4gx $rsp    ; inspect stack at entry

Use objdump -d or gdb disassemble to verify prologue/epilogue correctness.


Key Takeaways

Topic System V AMD64
Integer args RDI, RSI, RDX, RCX, R8, R9
Float args XMM0–XMM7
Return RAX (int), XMM0 (float)
Caller-saved RAX, RCX, RDX, RSI, RDI, R8–R11
Callee-saved RBX, RBP, R12–R15
Stack alignment 16 bytes at CALL site
Variadic AL Set AL = count of XMM args used

Next: 14 — Floating Point and SIMD