Skip to content

02 — Computer Architecture

Understanding how a CPU works is essential before writing a single line of assembly. Assembly is not an abstraction — it is a direct description of what the hardware does.


The Von Neumann Architecture

Modern computers follow the Von Neumann model: a single shared memory holds both instructions and data, connected to a CPU that fetches and executes instructions sequentially.

graph TD
    subgraph CPU
        CU[Control Unit]
        ALU[ALU\nArithmetic & Logic Unit]
        REG[Registers]
    end

    CU --> REG
    ALU --> REG
    REG -->|Bus| RAM[Memory - RAM]

The CPU Components

Control Unit (CU)

  • Fetches instructions from memory
  • Decodes them into control signals
  • Sequences the execution steps
  • Does NOT perform computation itself

Arithmetic Logic Unit (ALU)

  • Performs arithmetic: ADD, SUB, MUL, DIV
  • Performs logic: AND, OR, XOR, NOT
  • Performs comparisons: sets flags based on results
  • Operates on values from registers

Registers

  • Ultra-fast storage directly inside the CPU
  • x86-64 has 16 general-purpose 64-bit registers
  • Access time: ~0 cycles (already in the CPU)
  • Compare: L1 cache ~4 cycles, RAM ~200 cycles

The Memory Hierarchy

Level Latency Size Location
Registers ~0 cycles ~1 KB Inside CPU
L1 Cache ~4 cycles 32–64 KB Per core
L2 Cache ~12 cycles 256 KB–1 MB Per core
L3 Cache ~40 cycles 4–64 MB Shared
RAM (DRAM) ~200 cycles GBs On motherboard
SSD (NVMe) ~100 µs TBs Storage
HDD ~10 ms TBs Storage

Assembly programmers exploit this hierarchy by keeping frequently used values in registers and minimizing memory accesses.


The Fetch-Decode-Execute Cycle

Every instruction goes through this cycle:

1. FETCH    — Read next instruction from memory at address in RIP
2. DECODE   — Determine what operation and operands are needed
3. EXECUTE  — ALU or other unit performs the operation
4. WRITEBACK — Store result to register or memory
(repeat)

Modern CPUs execute multiple stages simultaneously via pipelining, running fetch for instruction N+1 while executing instruction N.


The Instruction Pointer (RIP)

RIP (Register Instruction Pointer) always points to the next instruction to execute.

  • After each instruction, RIP advances by the instruction's byte length
  • JMP, CALL, RET change RIP non-sequentially
  • You cannot directly MOV an arbitrary value into RIP (use JMP/CALL)

In 32-bit mode, this was called EIP. In 16-bit mode, IP.


Memory Layout of a Process

When the OS loads your program, it creates a process with this virtual address space layout:

block-beta
  columns 1
  H["🔼 High Addresses"]:1
  stack["Stack\ngrows ↓ — local vars, return addresses"]
  gap["... (large gap — virtual memory) ..."]:1
  heap["Heap\ngrows ↑ — dynamic allocation (malloc)"]
  bss[".bss\nuninitialized globals (zeroed by OS)"]
  data[".data\ninitialized global/static variables"]
  text[".text\nexecutable code (read-only)"]
  headers["(headers)"]
  L["🔽 Low Addresses — 0x400000 typical for ELF"]:1

  style H fill:#1e1e2e,color:#cdd6f4,stroke:none
  style L fill:#1e1e2e,color:#cdd6f4,stroke:none
  style gap fill:#313244,color:#6c7086,stroke:#45475a,stroke-dasharray:4
  style stack fill:#45475a,color:#cba6f7,stroke:#7f849c
  style heap fill:#45475a,color:#89dceb,stroke:#7f849c
  style bss fill:#313244,color:#a6e3a1,stroke:#7f849c
  style data fill:#313244,color:#f9e2af,stroke:#7f849c
  style text fill:#313244,color:#f38ba8,stroke:#7f849c
  style headers fill:#181825,color:#6c7086,stroke:#45475a

You will declare these sections explicitly in your assembly programs.


Buses

The CPU communicates with RAM and peripherals via buses:

Bus Function
Address Bus CPU sends the memory address it wants to access
Data Bus Data transferred between CPU and memory
Control Bus Signals for read/write, interrupts, clock

x86-64 CPUs have a 48-bit virtual address space (256 TB addressable), though physical RAM is far smaller.


Endianness

x86 is little-endian: multi-byte values are stored with the least significant byte at the lowest address.

Value: 0x12345678  (32-bit integer)

Address:  0x100  0x101  0x102  0x103
Data:      0x78   0x56   0x34   0x12
           LSB                   MSB

This matters when reading memory dumps, network packets, and binary file formats.


Interrupts and Exceptions

The CPU can be interrupted mid-execution:

Type Cause Example
Hardware interrupt External device signal Keyboard press, timer tick
Software interrupt INT instruction Linux int 0x80 syscall (legacy)
Exception CPU detects an error Division by zero, page fault, invalid opcode

The OS registers an Interrupt Descriptor Table (IDT) mapping interrupt numbers to handler functions. System calls in x86-64 use the syscall instruction (faster than int 0x80).


Privilege Levels (Rings)

x86 has 4 privilege levels, called rings:

Ring 0 — Kernel mode    (full hardware access)
Ring 1 — (rarely used)
Ring 2 — (rarely used)
Ring 3 — User mode      (restricted, your programs run here)
  • User-space programs run in Ring 3
  • Kernel code runs in Ring 0
  • System calls transition from Ring 3 → Ring 0 → Ring 3
  • Writing to hardware ports or installing interrupt handlers requires Ring 0

Key Takeaways

  • The CPU fetches, decodes, and executes instructions one at a time (logically)
  • Registers are fastest; RAM is ~200x slower — keep hot data in registers
  • Your program lives in virtual memory split into .text, .data, .bss, stack, and heap
  • x86 is little-endian
  • User programs (Ring 3) request OS services via the syscall instruction

Next: 03 — Number Systems