Skip to content

01 — Introduction to Assembly Language

What is Assembly Language?

Assembly language is a low-level programming language that has a near one-to-one correspondence with a processor's machine code instructions. Each assembly statement maps directly to one (or occasionally a few) machine instructions that the CPU executes.

High Level:   a = b + c;
Assembly:     mov rax, [b]
              add rax, [c]
              mov [a], rax
Machine Code: 48 8B 05 ...  48 03 05 ...  48 89 05 ...

Unlike C or Python, you control every register, every memory byte, every instruction the CPU runs.


Why Learn Assembly?

Assembly is rarely used to write entire programs today, but it is indispensable for:

Use Case Why Assembly Matters
Reverse engineering Executables have no source code — you read disassembly
Malware analysis Understanding shellcode and exploits
Performance-critical code SIMD/vectorization, tight loops
Embedded systems Bare-metal microcontrollers with no OS
Compiler development Writing code generators requires knowing the target ISA
OS/kernel development Boot loaders, interrupt handlers, context switches
Security research Buffer overflows, ROP chains, exploitation techniques
Deep debugging When a crash has no source — you read the disassembly

Even if you never write assembly yourself, reading it is a critical skill for any systems programmer.


A Brief History

Year Event
1940s First computers (ENIAC) programmed in raw binary/machine code
1950 Assembly language invented at Cambridge — mnemonic names for opcodes
1960s IBM System/360 — standardized ISA across a product family
1978 Intel 8086 — the origin of x86; 16-bit architecture
1985 Intel 80386 — x86 extended to 32-bit (IA-32)
2003 AMD introduces AMD64 (x86-64) — 64-bit extension, backward compatible
2004 Intel adopts AMD64 as Intel 64; this is now the dominant desktop/server ISA

Modern 64-bit x86 processors can run code written for the 8086 — extraordinary backward compatibility.


Instruction Set Architectures (ISA)

An ISA defines the contract between software and hardware: which instructions exist, how they're encoded, how registers are named, and how memory is accessed.

Common ISAs:

ISA Used In
x86-64 Desktops, laptops, servers (Intel/AMD)
ARM64 (AArch64) Mobile (iOS/Android), Apple Silicon, embedded
RISC-V Academic, embedded, growing open-source ecosystem
MIPS Routers, older gaming consoles, CS education
z/Architecture IBM mainframes

This series focuses on x86-64. The concepts (registers, stack, calling conventions) transfer to other ISAs with minor differences.


CISC vs. RISC

x86 is a CISC (Complex Instruction Set Computer) architecture.

CISC (x86) RISC (ARM, RISC-V)
Many complex instructions Fewer, simpler instructions
Variable instruction length (1–15 bytes) Fixed instruction length (4 bytes)
Instructions can access memory directly Load/Store only — operate on registers
Decades of legacy instructions Cleaner, more orthogonal design

Modern x86 CPUs internally translate CISC instructions into simpler micro-ops, giving RISC-like execution with CISC compatibility.


Assembly Syntax Flavors

There are two main syntax dialects for x86 assembly:

Intel Syntax (used in this series)

mov rax, 42        ; destination first, then source
add rax, rbx       ; rax = rax + rbx
mov [rsp], rax     ; brackets = memory dereference

AT&T Syntax (used by GCC/GAS)

movq $42, %rax     # source first, then destination
addq %rbx, %rax    # rax = rax + rbx
movq %rax, (%rsp)  # parentheses = memory dereference

Key differences: - AT&T reverses operand order (src, dst vs Intel's dst, src) - AT&T prefixes registers with % and immediates with $ - AT&T appends size suffixes (b, w, l, q) to mnemonics

This series uses Intel syntax with NASM. GDB defaults to AT&T; use set disassembly-flavor intel to switch.


Assemblers

An assembler converts human-readable assembly text into binary machine code.

Assembler Syntax Platform
NASM Intel Cross-platform, our choice
GAS (GNU as) AT&T (default) Bundled with GCC toolchain
MASM Intel Windows / Microsoft
YASM Intel/AT&T NASM-compatible, faster
FASM Intel Self-hosting, no dependencies

The Assembly Workflow

source.asm
    ▼  nasm -f elf64 source.asm -o source.o
source.o  (ELF object file — machine code + symbol table)
    ▼  ld source.o -o program   (or gcc source.o -o program)
program   (ELF executable — linked, ready to run)
    ▼  ./program

For multi-file projects, each .asm is assembled to .o, then all .o files are linked together — identical to C compilation.


What You Will Build in This Series

By the end of this tutorial series you will be able to:

  • Write x86-64 assembly programs from scratch
  • Use system calls to perform I/O without a C runtime
  • Implement functions with correct stack frames
  • Use SIMD instructions to process data in parallel
  • Mix assembly with C code using proper calling conventions
  • Read and understand compiler-generated disassembly
  • Apply optimization techniques at the instruction level

Next: 02 — Computer Architecture