02 — Computer Architecture¶
Understanding how a CPU works is essential before writing a single line of assembly. Assembly is not an abstraction — it is a direct description of what the hardware does.
The Von Neumann Architecture¶
Modern computers follow the Von Neumann model: a single shared memory holds both instructions and data, connected to a CPU that fetches and executes instructions sequentially.
graph TD
subgraph CPU
CU[Control Unit]
ALU[ALU\nArithmetic & Logic Unit]
REG[Registers]
end
CU --> REG
ALU --> REG
REG -->|Bus| RAM[Memory - RAM]
The CPU Components¶
Control Unit (CU)¶
- Fetches instructions from memory
- Decodes them into control signals
- Sequences the execution steps
- Does NOT perform computation itself
Arithmetic Logic Unit (ALU)¶
- Performs arithmetic: ADD, SUB, MUL, DIV
- Performs logic: AND, OR, XOR, NOT
- Performs comparisons: sets flags based on results
- Operates on values from registers
Registers¶
- Ultra-fast storage directly inside the CPU
- x86-64 has 16 general-purpose 64-bit registers
- Access time: ~0 cycles (already in the CPU)
- Compare: L1 cache ~4 cycles, RAM ~200 cycles
The Memory Hierarchy¶
| Level | Latency | Size | Location |
|---|---|---|---|
| Registers | ~0 cycles | ~1 KB | Inside CPU |
| L1 Cache | ~4 cycles | 32–64 KB | Per core |
| L2 Cache | ~12 cycles | 256 KB–1 MB | Per core |
| L3 Cache | ~40 cycles | 4–64 MB | Shared |
| RAM (DRAM) | ~200 cycles | GBs | On motherboard |
| SSD (NVMe) | ~100 µs | TBs | Storage |
| HDD | ~10 ms | TBs | Storage |
Assembly programmers exploit this hierarchy by keeping frequently used values in registers and minimizing memory accesses.
The Fetch-Decode-Execute Cycle¶
Every instruction goes through this cycle:
1. FETCH — Read next instruction from memory at address in RIP
2. DECODE — Determine what operation and operands are needed
3. EXECUTE — ALU or other unit performs the operation
4. WRITEBACK — Store result to register or memory
(repeat)
Modern CPUs execute multiple stages simultaneously via pipelining, running fetch for instruction N+1 while executing instruction N.
The Instruction Pointer (RIP)¶
RIP (Register Instruction Pointer) always points to the next instruction to execute.
- After each instruction, RIP advances by the instruction's byte length
JMP,CALL,RETchange RIP non-sequentially- You cannot directly
MOVan arbitrary value into RIP (use JMP/CALL)
In 32-bit mode, this was called EIP. In 16-bit mode, IP.
Memory Layout of a Process¶
When the OS loads your program, it creates a process with this virtual address space layout:
block-beta
columns 1
H["🔼 High Addresses"]:1
stack["Stack\ngrows ↓ — local vars, return addresses"]
gap["... (large gap — virtual memory) ..."]:1
heap["Heap\ngrows ↑ — dynamic allocation (malloc)"]
bss[".bss\nuninitialized globals (zeroed by OS)"]
data[".data\ninitialized global/static variables"]
text[".text\nexecutable code (read-only)"]
headers["(headers)"]
L["🔽 Low Addresses — 0x400000 typical for ELF"]:1
style H fill:#1e1e2e,color:#cdd6f4,stroke:none
style L fill:#1e1e2e,color:#cdd6f4,stroke:none
style gap fill:#313244,color:#6c7086,stroke:#45475a,stroke-dasharray:4
style stack fill:#45475a,color:#cba6f7,stroke:#7f849c
style heap fill:#45475a,color:#89dceb,stroke:#7f849c
style bss fill:#313244,color:#a6e3a1,stroke:#7f849c
style data fill:#313244,color:#f9e2af,stroke:#7f849c
style text fill:#313244,color:#f38ba8,stroke:#7f849c
style headers fill:#181825,color:#6c7086,stroke:#45475a
You will declare these sections explicitly in your assembly programs.
Buses¶
The CPU communicates with RAM and peripherals via buses:
| Bus | Function |
|---|---|
| Address Bus | CPU sends the memory address it wants to access |
| Data Bus | Data transferred between CPU and memory |
| Control Bus | Signals for read/write, interrupts, clock |
x86-64 CPUs have a 48-bit virtual address space (256 TB addressable), though physical RAM is far smaller.
Endianness¶
x86 is little-endian: multi-byte values are stored with the least significant byte at the lowest address.
Value: 0x12345678 (32-bit integer)
Address: 0x100 0x101 0x102 0x103
Data: 0x78 0x56 0x34 0x12
LSB MSB
This matters when reading memory dumps, network packets, and binary file formats.
Interrupts and Exceptions¶
The CPU can be interrupted mid-execution:
| Type | Cause | Example |
|---|---|---|
| Hardware interrupt | External device signal | Keyboard press, timer tick |
| Software interrupt | INT instruction |
Linux int 0x80 syscall (legacy) |
| Exception | CPU detects an error | Division by zero, page fault, invalid opcode |
The OS registers an Interrupt Descriptor Table (IDT) mapping interrupt numbers to handler functions. System calls in x86-64 use the syscall instruction (faster than int 0x80).
Privilege Levels (Rings)¶
x86 has 4 privilege levels, called rings:
Ring 0 — Kernel mode (full hardware access)
Ring 1 — (rarely used)
Ring 2 — (rarely used)
Ring 3 — User mode (restricted, your programs run here)
- User-space programs run in Ring 3
- Kernel code runs in Ring 0
- System calls transition from Ring 3 → Ring 0 → Ring 3
- Writing to hardware ports or installing interrupt handlers requires Ring 0
Key Takeaways¶
- The CPU fetches, decodes, and executes instructions one at a time (logically)
- Registers are fastest; RAM is ~200x slower — keep hot data in registers
- Your program lives in virtual memory split into
.text,.data,.bss, stack, and heap - x86 is little-endian
- User programs (Ring 3) request OS services via the
syscallinstruction