01 — Introduction to Assembly Language¶
What is Assembly Language?¶
Assembly language is a low-level programming language that has a near one-to-one correspondence with a processor's machine code instructions. Each assembly statement maps directly to one (or occasionally a few) machine instructions that the CPU executes.
High Level: a = b + c;
Assembly: mov rax, [b]
add rax, [c]
mov [a], rax
Machine Code: 48 8B 05 ... 48 03 05 ... 48 89 05 ...
Unlike C or Python, you control every register, every memory byte, every instruction the CPU runs.
Why Learn Assembly?¶
Assembly is rarely used to write entire programs today, but it is indispensable for:
| Use Case | Why Assembly Matters |
|---|---|
| Reverse engineering | Executables have no source code — you read disassembly |
| Malware analysis | Understanding shellcode and exploits |
| Performance-critical code | SIMD/vectorization, tight loops |
| Embedded systems | Bare-metal microcontrollers with no OS |
| Compiler development | Writing code generators requires knowing the target ISA |
| OS/kernel development | Boot loaders, interrupt handlers, context switches |
| Security research | Buffer overflows, ROP chains, exploitation techniques |
| Deep debugging | When a crash has no source — you read the disassembly |
Even if you never write assembly yourself, reading it is a critical skill for any systems programmer.
A Brief History¶
| Year | Event |
|---|---|
| 1940s | First computers (ENIAC) programmed in raw binary/machine code |
| 1950 | Assembly language invented at Cambridge — mnemonic names for opcodes |
| 1960s | IBM System/360 — standardized ISA across a product family |
| 1978 | Intel 8086 — the origin of x86; 16-bit architecture |
| 1985 | Intel 80386 — x86 extended to 32-bit (IA-32) |
| 2003 | AMD introduces AMD64 (x86-64) — 64-bit extension, backward compatible |
| 2004 | Intel adopts AMD64 as Intel 64; this is now the dominant desktop/server ISA |
Modern 64-bit x86 processors can run code written for the 8086 — extraordinary backward compatibility.
Instruction Set Architectures (ISA)¶
An ISA defines the contract between software and hardware: which instructions exist, how they're encoded, how registers are named, and how memory is accessed.
Common ISAs:
| ISA | Used In |
|---|---|
| x86-64 | Desktops, laptops, servers (Intel/AMD) |
| ARM64 (AArch64) | Mobile (iOS/Android), Apple Silicon, embedded |
| RISC-V | Academic, embedded, growing open-source ecosystem |
| MIPS | Routers, older gaming consoles, CS education |
| z/Architecture | IBM mainframes |
This series focuses on x86-64. The concepts (registers, stack, calling conventions) transfer to other ISAs with minor differences.
CISC vs. RISC¶
x86 is a CISC (Complex Instruction Set Computer) architecture.
| CISC (x86) | RISC (ARM, RISC-V) |
|---|---|
| Many complex instructions | Fewer, simpler instructions |
| Variable instruction length (1–15 bytes) | Fixed instruction length (4 bytes) |
| Instructions can access memory directly | Load/Store only — operate on registers |
| Decades of legacy instructions | Cleaner, more orthogonal design |
Modern x86 CPUs internally translate CISC instructions into simpler micro-ops, giving RISC-like execution with CISC compatibility.
Assembly Syntax Flavors¶
There are two main syntax dialects for x86 assembly:
Intel Syntax (used in this series)¶
mov rax, 42 ; destination first, then source
add rax, rbx ; rax = rax + rbx
mov [rsp], rax ; brackets = memory dereference
AT&T Syntax (used by GCC/GAS)¶
movq $42, %rax # source first, then destination
addq %rbx, %rax # rax = rax + rbx
movq %rax, (%rsp) # parentheses = memory dereference
Key differences:
- AT&T reverses operand order (src, dst vs Intel's dst, src)
- AT&T prefixes registers with % and immediates with $
- AT&T appends size suffixes (b, w, l, q) to mnemonics
This series uses Intel syntax with NASM. GDB defaults to AT&T; use set disassembly-flavor intel to switch.
Assemblers¶
An assembler converts human-readable assembly text into binary machine code.
| Assembler | Syntax | Platform |
|---|---|---|
| NASM | Intel | Cross-platform, our choice |
| GAS (GNU as) | AT&T (default) | Bundled with GCC toolchain |
| MASM | Intel | Windows / Microsoft |
| YASM | Intel/AT&T | NASM-compatible, faster |
| FASM | Intel | Self-hosting, no dependencies |
The Assembly Workflow¶
source.asm
│
▼ nasm -f elf64 source.asm -o source.o
source.o (ELF object file — machine code + symbol table)
│
▼ ld source.o -o program (or gcc source.o -o program)
program (ELF executable — linked, ready to run)
│
▼ ./program
For multi-file projects, each .asm is assembled to .o, then all .o files are linked together — identical to C compilation.
What You Will Build in This Series¶
By the end of this tutorial series you will be able to:
- Write x86-64 assembly programs from scratch
- Use system calls to perform I/O without a C runtime
- Implement functions with correct stack frames
- Use SIMD instructions to process data in parallel
- Mix assembly with C code using proper calling conventions
- Read and understand compiler-generated disassembly
- Apply optimization techniques at the instruction level