12 — Arrays and Strings¶
Arrays are contiguous blocks of memory. Strings are arrays of bytes (characters), typically null-terminated. x86 has dedicated string instructions that operate in tight, hardware-accelerated loops.
Arrays in Memory¶
An array is just a label pointing to the first element, with elements stored contiguously.
section .data
int_arr dq 10, 20, 30, 40, 50 ; 5 × 8 bytes = 40 bytes
byte_arr db 1, 2, 3, 4, 5 ; 5 × 1 byte = 5 bytes
word_arr dw 100, 200, 300 ; 3 × 2 bytes = 6 bytes
Accessing Elements¶
For a dq array (8-byte elements, stride = 8):
lea rsi, [int_arr] ; rsi = base address
mov rax, [rsi] ; arr[0] = 10
mov rax, [rsi + 8] ; arr[1] = 20
mov rax, [rsi + 16] ; arr[2] = 30
; With index register:
mov rcx, 3
mov rax, [rsi + rcx*8] ; arr[3] = 40
Modifying Elements¶
Iterating — Pointer Style¶
lea rsi, [int_arr]
mov rcx, 5 ; 5 elements
.loop:
mov rax, [rsi] ; process element
add rsi, 8 ; advance pointer by element size
dec rcx
jnz .loop
Iterating — Index Style¶
xor rcx, rcx ; i = 0
.loop:
cmp rcx, 5
jge .done
mov rax, [int_arr + rcx*8] ; arr[i]
inc rcx
jmp .loop
.done:
Strings (Null-Terminated)¶
In C and assembly, strings are byte arrays ending with 0x00 (the null terminator).
section .data
greeting db "Hello, World!", 0 ; 14 bytes including null
empty_str db 0 ; empty string
Compute String Length¶
; strlen: rdi = pointer to string → rax = length
strlen:
xor rax, rax ; rax = length counter
.loop:
cmp byte [rdi + rax], 0 ; is current char null?
je .done
inc rax
jmp .loop
.done:
ret
Using SCAS (see below) is faster for long strings.
x86 String Instructions¶
x86 has dedicated instructions that operate on memory blocks. They use implicit registers:
| Register | Role |
|---|---|
RSI |
Source pointer |
RDI |
Destination pointer |
RCX |
Count (for REP prefix) |
AL/AX/EAX/RAX |
Value for SCAS/STOS |
Instructions auto-increment (or decrement) RSI/RDI after each operation. Direction:
- DF=0 (default): increment — forward scan
- DF=1 (std instruction): decrement — backward scan
Always use cld to ensure forward direction:
MOVS — Move String (memory copy)¶
movsb ; [RDI] = [RSI]; RSI++; RDI++ (1 byte)
movsw ; [RDI] = [RSI]; RSI+=2; RDI+=2 (2 bytes)
movsd ; 4 bytes
movsq ; 8 bytes
STOS — Store String (fill memory)¶
; fill: [RDI] = AL; RDI++ (repeat with REP)
stosb ; store AL, advance RDI by 1
stosw ; store AX, advance RDI by 2
stosd ; store EAX, advance RDI by 4
stosq ; store RAX, advance RDI by 8
SCAS — Scan String (search memory)¶
; compare AL with [RDI]; RDI++; set flags
scasb ; compare AL with byte at RDI, then RDI++
scasw ; 2-byte compare
CMPS — Compare String¶
REP Prefix — Repeat String Operations¶
REP repeats the following string instruction RCX times, decrementing RCX each iteration.
Variants:
| Prefix | Stop Condition |
|---|---|
rep |
RCX = 0 |
repe / repz |
RCX = 0 OR ZF = 0 |
repne / repnz |
RCX = 0 OR ZF = 1 |
memcpy Using REP MOVSQ¶
; memcpy(dst=rdi, src=rsi, n_bytes=rdx)
; Copies rdx bytes from rsi to rdi
; Assumes both aligned and no overlap
memcpy:
mov rcx, rdx
shr rcx, 3 ; rcx = rdx / 8 (number of qwords)
rep movsq ; copy qwords
mov rcx, rdx
and rcx, 7 ; rcx = rdx % 8 (remaining bytes)
rep movsb ; copy remaining bytes
ret
Compiler-Quality memcpy¶
For high performance, use multiple stores per iteration and handle alignment. For small copies, inline unrolled moves. The rep movsb instruction is optimized by modern CPUs (uses internal microcode with SIMD internals).
memset Using REP STOSB¶
; memset(buf=rdi, value=rsi, n_bytes=rdx)
memset:
mov al, sil ; al = value to fill
mov rcx, rdx ; count
rep stosb ; fill RCX bytes at RDI with AL
ret
strlen Using REPNE SCASB¶
; strlen(str=rdi) → rax = length
strlen:
xor rcx, rcx
not rcx ; rcx = 0xFFFFFFFFFFFFFFFF (max length)
xor al, al ; looking for byte 0
cld
repne scasb ; scan until [RDI] == AL (0), decrement RCX
; after: RCX was decremented for each byte including the null
not rcx ; rcx = count of bytes scanned
dec rcx ; subtract 1 for the null terminator
mov rax, rcx
ret
strcmp — String Comparison¶
; strcmp(s1=rdi, s2=rsi) → rax: 0 if equal, <0 if s1<s2, >0 if s1>s2
strcmp:
.loop:
mov al, [rdi]
mov bl, [rsi]
cmp al, bl
jne .differ
test al, al ; check for null terminator
jz .equal
inc rdi
inc rsi
jmp .loop
.differ:
movsx rax, al
movsx rbx, bl
sub rax, rbx ; return difference
ret
.equal:
xor rax, rax ; return 0
ret
String to Integer (atoi)¶
; atoi: rdi = pointer to null-terminated decimal string
; returns rax = integer value (unsigned, no error checking)
atoi:
xor rax, rax ; result = 0
.loop:
movzx rcx, byte [rdi]
test cl, cl
jz .done ; null terminator
sub cl, '0' ; convert ASCII digit to value
cmp cl, 9
ja .done ; not a digit (> '9')
imul rax, rax, 10
add rax, rcx
inc rdi
jmp .loop
.done:
ret
Integer to String (itoa, decimal)¶
; itoa: rax = integer, rdi = output buffer (min 21 bytes)
; Writes decimal string, null-terminated. Returns rax = length.
itoa:
push rbp
mov rbp, rsp
lea rsi, [rdi + 20] ; write from end of buffer
mov byte [rsi], 0 ; null terminator
dec rsi
test rax, rax
jnz .convert
mov byte [rsi], '0'
dec rsi
jmp .done
.convert:
mov rcx, 10
.loop:
xor rdx, rdx
div rcx ; rax = quotient, rdx = remainder
add dl, '0'
mov [rsi], dl
dec rsi
test rax, rax
jnz .loop
.done:
inc rsi ; rsi now points to first digit
; move string to beginning of rdi buffer
; (left as exercise — or return rsi as the string start)
pop rbp
ret
Key Takeaways¶
- Elements accessed via
[base + index * stride] - String instructions (
MOVS,STOS,SCAS,CMPS) use RSI, RDI, and RCX implicitly REPrepeats a string instruction RCX timesREPNE SCASBfinds a byte in memory (strlen, strchr pattern)REP MOVSB/MOVSQ— fast bulk copy (optimized by hardware)REP STOSB— fast memory fill- Always
CLDbefore forward string operations