15 — Inline Assembly¶

Inline assembly lets you embed assembly instructions directly inside C or C++ code. This gives you low-level control without writing a separate .asm file — ideal for performance-critical sections, hardware intrinsics, or accessing instructions unavailable in C.

GCC Extended Inline Assembly Syntax¶

GCC uses the AT&T/GAS syntax for inline assembly by default. The general form:

asm volatile (
    "assembly template"         // instruction(s), one per line
    : output operands           // [optional]
    : input operands            // [optional]
    : clobbers                  // [optional]
);

volatile tells the compiler: do not optimize away or reorder this block
Output operands: C variables that will be written by the asm
Input operands: C variables that provide values to the asm
Clobbers: registers/memory that the asm modifies (so compiler doesn't assume they're unchanged)

Basic Syntax¶

Inline NOP¶

asm("nop");

Move instruction¶

int x = 5;
asm("movl %0, %%eax" : : "r"(x));
// %0 refers to first operand, %% escapes to a single %

Operand Constraints¶

Operands use constraint strings to tell GCC where to place variables.

Common Constraints¶

Constraint	Meaning
`r`	Any general-purpose register
`m`	Memory operand
`i`	Immediate integer constant
`n`	Immediate integer (known at compile time)
`a`	RAX/EAX specifically
`b`	RBX/EBX
`c`	RCX/ECX
`d`	RDX/EDX
`S`	RSI/ESI
`D`	RDI/EDI
`0`–`9`	Same location as Nth operand

Modifiers on Output Constraints¶

Modifier	Meaning
`=`	Write-only (value overwritten)
`+`	Read-write (value read and modified)
`&`	Early clobber (written before inputs are consumed)

Practical Examples¶

Add two integers¶

int a = 10, b = 20, result;
asm ("addl %2, %1"      // %1 += %2  (AT&T: src, dst)
     : "=r"(result)     // output: result in any register
     : "0"(a), "r"(b)   // inputs: %0=a (same reg as result), %1=b
);
// result = 30

Swap two variables¶

int x = 1, y = 2;
asm ("xchgl %0, %1"
     : "=r"(x), "=r"(y)
     : "0"(x), "1"(y)
);
// x = 2, y = 1

Read the timestamp counter (RDTSC)¶

uint64_t read_tsc(void) {
    uint32_t lo, hi;
    asm volatile ("rdtsc"
                  : "=a"(lo), "=d"(hi));  // rdtsc: result in EDX:EAX
    return ((uint64_t)hi << 32) | lo;
}

CPUID¶

void cpuid(uint32_t leaf, uint32_t *eax, uint32_t *ebx,
           uint32_t *ecx, uint32_t *edx) {
    asm volatile ("cpuid"
                  : "=a"(*eax), "=b"(*ebx), "=c"(*ecx), "=d"(*edx)
                  : "a"(leaf));
}

Bit scan (find lowest set bit)¶

int lowest_set_bit(unsigned int x) {
    int pos;
    asm ("bsfl %1, %0"   // BSF: bit scan forward
         : "=r"(pos)
         : "r"(x));
    return pos;
}

Clobber List¶

Tell GCC which registers your asm destroys (other than the explicit operands):

asm volatile (
    "push %%rbx\n\t"
    "mov $42, %%rbx\n\t"
    "pop %%rbx"
    :
    :
    : "rbx", "memory"   // rbx is clobbered; memory may be modified
);

Special clobbers:

Clobber	Meaning
`"memory"`	Asm may read or write any memory — compiler must reload all cached values
`"cc"`	Asm modifies the flags register (RFLAGS)
`"rax"`, `"rbx"`, ...	Specific registers clobbered

Use "memory" whenever your asm stores to arbitrary memory (not a named output).

Intel Syntax in GCC Inline Asm¶

GCC defaults to AT&T syntax. To use Intel syntax:

__asm__ (
    ".intel_syntax noprefix\n\t"
    "mov rax, 42\n\t"
    ".att_syntax prefix\n\t"
    ::: "rax"
);

Or compile with -masm=intel:

gcc -masm=intel source.c -o program

Clang Extended ASM¶

Clang supports the same GCC inline asm syntax. No changes needed for basic usage.

MSVC Inline Assembly (Windows, 32-bit only)¶

MSVC has its own inline asm syntax using __asm:

// MSVC — 32-bit only (not supported in 64-bit MSVC!)
int x = 5;
__asm {
    mov eax, x
    add eax, 10
    mov x, eax
}
// x = 15

MSVC dropped inline asm support for 64-bit targets. Use intrinsics or separate .asm files instead.

Compiler Intrinsics (Preferred Alternative)¶

For SIMD and special instructions, intrinsics are safer than inline asm — they look like function calls but map directly to instructions, and the compiler handles register allocation.

#include <immintrin.h>    // SSE/AVX intrinsics

// Add 4 floats in parallel
__m128 a = _mm_set_ps(4.0f, 3.0f, 2.0f, 1.0f);
__m128 b = _mm_set_ps(8.0f, 7.0f, 6.0f, 5.0f);
__m128 c = _mm_add_ps(a, b);  // {12, 10, 8, 6}

Header	Extension
`<mmintrin.h>`	MMX
`<xmmintrin.h>`	SSE
`<emmintrin.h>`	SSE2
`<pmmintrin.h>`	SSE3
`<smmintrin.h>`	SSE4.1
`<nmmintrin.h>`	SSE4.2
`<immintrin.h>`	AVX, AVX2, AVX-512

Common intrinsics:

// Load / store
__m256 _mm256_load_ps(float const *p);     // aligned
__m256 _mm256_loadu_ps(float const *p);    // unaligned
void   _mm256_store_ps(float *p, __m256 a);

// Arithmetic (8 floats in parallel)
__m256 _mm256_add_ps(__m256 a, __m256 b);
__m256 _mm256_mul_ps(__m256 a, __m256 b);
__m256 _mm256_sqrt_ps(__m256 a);

// Comparison
__m256 _mm256_cmp_ps(__m256 a, __m256 b, int imm8);  // returns mask

// Horizontal
__m128 _mm_hadd_ps(__m128 a, __m128 b);

When to Use Inline Assembly vs. Intrinsics¶

Situation	Use
SIMD/vectorization	Intrinsics — compiler handles allocation
Specific CPU instructions (CPUID, RDTSC)	Inline asm or intrinsics
Atomic operations	Use `<stdatomic.h>` or `<atomic>`
Syscalls from C	glibc wrappers (or inline asm as last resort)
Full control, no C overhead	Separate `.asm` file

Complete Example: Fast Absolute Value (SIMD)¶

#include <immintrin.h>
#include <stdio.h>

// Compute absolute value of 8 floats using AVX
void abs_vec8(float *dst, const float *src) {
    __m256 v    = _mm256_loadu_ps(src);
    __m256 mask = _mm256_set1_ps(-0.0f);  // sign bit mask
    __m256 result = _mm256_andnot_ps(mask, v);  // clear sign bit
    _mm256_storeu_ps(dst, result);
}

int main(void) {
    float src[] = {-1.0f, 2.0f, -3.0f, 4.0f, -5.0f, 6.0f, -7.0f, 8.0f};
    float dst[8];
    abs_vec8(dst, src);
    for (int i = 0; i < 8; i++) printf("%.1f ", dst[i]);
    // 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0
}

Build:

gcc -O2 -mavx2 abs_vec.c -o abs_vec

Key Takeaways¶

GCC inline asm format: asm("instructions" : outputs : inputs : clobbers)
Operands use constraint strings (r, m, a, b, etc.) to specify placement
Use volatile to prevent the compiler from removing or reordering your asm
Add "memory" clobber when asm reads/writes arbitrary memory
"cc" clobber when asm modifies flags
Prefer intrinsics over inline asm for SIMD — safer and more portable
MSVC does not support 64-bit inline asm; use intrinsics or .asm files