Skip to content

12 — Arrays and Strings

Arrays are contiguous blocks of memory. Strings are arrays of bytes (characters), typically null-terminated. x86 has dedicated string instructions that operate in tight, hardware-accelerated loops.


Arrays in Memory

An array is just a label pointing to the first element, with elements stored contiguously.

section .data
    int_arr   dq 10, 20, 30, 40, 50   ; 5 × 8 bytes = 40 bytes
    byte_arr  db  1,  2,  3,  4,  5   ; 5 × 1 byte  = 5  bytes
    word_arr  dw 100, 200, 300         ; 3 × 2 bytes = 6  bytes

Accessing Elements

For a dq array (8-byte elements, stride = 8):

lea  rsi, [int_arr]        ; rsi = base address
mov  rax, [rsi]            ; arr[0] = 10
mov  rax, [rsi + 8]        ; arr[1] = 20
mov  rax, [rsi + 16]       ; arr[2] = 30

; With index register:
mov  rcx, 3
mov  rax, [rsi + rcx*8]    ; arr[3] = 40

Modifying Elements

mov  qword [rsi + 8], 99   ; arr[1] = 99

Iterating — Pointer Style

    lea  rsi, [int_arr]
    mov  rcx, 5           ; 5 elements
.loop:
    mov  rax, [rsi]       ; process element
    add  rsi, 8           ; advance pointer by element size
    dec  rcx
    jnz  .loop

Iterating — Index Style

    xor  rcx, rcx         ; i = 0
.loop:
    cmp  rcx, 5
    jge  .done
    mov  rax, [int_arr + rcx*8]   ; arr[i]
    inc  rcx
    jmp  .loop
.done:

Strings (Null-Terminated)

In C and assembly, strings are byte arrays ending with 0x00 (the null terminator).

section .data
    greeting  db "Hello, World!", 0    ; 14 bytes including null
    empty_str db 0                     ; empty string

Compute String Length

; strlen: rdi = pointer to string → rax = length
strlen:
    xor  rax, rax             ; rax = length counter
.loop:
    cmp  byte [rdi + rax], 0  ; is current char null?
    je   .done
    inc  rax
    jmp  .loop
.done:
    ret

Using SCAS (see below) is faster for long strings.


x86 String Instructions

x86 has dedicated instructions that operate on memory blocks. They use implicit registers:

Register Role
RSI Source pointer
RDI Destination pointer
RCX Count (for REP prefix)
AL/AX/EAX/RAX Value for SCAS/STOS

Instructions auto-increment (or decrement) RSI/RDI after each operation. Direction: - DF=0 (default): increment — forward scan - DF=1 (std instruction): decrement — backward scan

Always use cld to ensure forward direction:

cld    ; clear Direction Flag — RSI/RDI increment forward

MOVS — Move String (memory copy)

movsb    ; [RDI] = [RSI]; RSI++; RDI++   (1 byte)
movsw    ; [RDI] = [RSI]; RSI+=2; RDI+=2 (2 bytes)
movsd    ; 4 bytes
movsq    ; 8 bytes

STOS — Store String (fill memory)

; fill: [RDI] = AL; RDI++  (repeat with REP)
stosb    ; store AL, advance RDI by 1
stosw    ; store AX, advance RDI by 2
stosd    ; store EAX, advance RDI by 4
stosq    ; store RAX, advance RDI by 8

SCAS — Scan String (search memory)

; compare AL with [RDI]; RDI++; set flags
scasb    ; compare AL with byte at RDI, then RDI++
scasw    ; 2-byte compare

CMPS — Compare String

; compare [RSI] with [RDI]; advance both
cmpsb    ; compare byte at RSI with byte at RDI; RSI++; RDI++

REP Prefix — Repeat String Operations

REP repeats the following string instruction RCX times, decrementing RCX each iteration.

rep movsb    ; copy RCX bytes from [RSI] to [RDI]
rep stosb    ; fill RCX bytes at [RDI] with AL

Variants:

Prefix Stop Condition
rep RCX = 0
repe / repz RCX = 0 OR ZF = 0
repne / repnz RCX = 0 OR ZF = 1

memcpy Using REP MOVSQ

; memcpy(dst=rdi, src=rsi, n_bytes=rdx)
; Copies rdx bytes from rsi to rdi
; Assumes both aligned and no overlap

memcpy:
    mov  rcx, rdx
    shr  rcx, 3          ; rcx = rdx / 8 (number of qwords)
    rep  movsq           ; copy qwords

    mov  rcx, rdx
    and  rcx, 7          ; rcx = rdx % 8 (remaining bytes)
    rep  movsb           ; copy remaining bytes

    ret

Compiler-Quality memcpy

For high performance, use multiple stores per iteration and handle alignment. For small copies, inline unrolled moves. The rep movsb instruction is optimized by modern CPUs (uses internal microcode with SIMD internals).


memset Using REP STOSB

; memset(buf=rdi, value=rsi, n_bytes=rdx)
memset:
    mov  al,  sil        ; al = value to fill
    mov  rcx, rdx        ; count
    rep  stosb           ; fill RCX bytes at RDI with AL
    ret

strlen Using REPNE SCASB

; strlen(str=rdi) → rax = length
strlen:
    xor  rcx, rcx
    not  rcx             ; rcx = 0xFFFFFFFFFFFFFFFF (max length)
    xor  al,  al         ; looking for byte 0
    cld
    repne scasb          ; scan until [RDI] == AL (0), decrement RCX
    ; after: RCX was decremented for each byte including the null
    not  rcx             ; rcx = count of bytes scanned
    dec  rcx             ; subtract 1 for the null terminator
    mov  rax, rcx
    ret

strcmp — String Comparison

; strcmp(s1=rdi, s2=rsi) → rax: 0 if equal, <0 if s1<s2, >0 if s1>s2
strcmp:
.loop:
    mov  al,  [rdi]
    mov  bl,  [rsi]
    cmp  al,  bl
    jne  .differ
    test al,  al         ; check for null terminator
    jz   .equal
    inc  rdi
    inc  rsi
    jmp  .loop

.differ:
    movsx rax, al
    movsx rbx, bl
    sub  rax, rbx        ; return difference
    ret

.equal:
    xor  rax, rax        ; return 0
    ret

String to Integer (atoi)

; atoi: rdi = pointer to null-terminated decimal string
; returns rax = integer value (unsigned, no error checking)
atoi:
    xor  rax, rax        ; result = 0
.loop:
    movzx rcx, byte [rdi]
    test  cl,  cl
    jz    .done          ; null terminator
    sub   cl,  '0'       ; convert ASCII digit to value
    cmp   cl,  9
    ja    .done          ; not a digit (> '9')
    imul  rax, rax, 10
    add   rax, rcx
    inc   rdi
    jmp   .loop
.done:
    ret

Integer to String (itoa, decimal)

; itoa: rax = integer, rdi = output buffer (min 21 bytes)
; Writes decimal string, null-terminated. Returns rax = length.
itoa:
    push rbp
    mov  rbp, rsp

    lea  rsi, [rdi + 20]  ; write from end of buffer
    mov  byte [rsi], 0    ; null terminator
    dec  rsi

    test rax, rax
    jnz  .convert
    mov  byte [rsi], '0'
    dec  rsi
    jmp  .done

.convert:
    mov  rcx, 10
.loop:
    xor  rdx, rdx
    div  rcx              ; rax = quotient, rdx = remainder
    add  dl,  '0'
    mov  [rsi], dl
    dec  rsi
    test rax, rax
    jnz  .loop

.done:
    inc  rsi              ; rsi now points to first digit
    ; move string to beginning of rdi buffer
    ; (left as exercise — or return rsi as the string start)

    pop  rbp
    ret

Key Takeaways

  • Elements accessed via [base + index * stride]
  • String instructions (MOVS, STOS, SCAS, CMPS) use RSI, RDI, and RCX implicitly
  • REP repeats a string instruction RCX times
  • REPNE SCASB finds a byte in memory (strlen, strchr pattern)
  • REP MOVSB/MOVSQ — fast bulk copy (optimized by hardware)
  • REP STOSB — fast memory fill
  • Always CLD before forward string operations

Next: 13 — Calling Conventions