Skip to content

15 — Inline Assembly

Inline assembly lets you embed assembly instructions directly inside C or C++ code. This gives you low-level control without writing a separate .asm file — ideal for performance-critical sections, hardware intrinsics, or accessing instructions unavailable in C.


GCC Extended Inline Assembly Syntax

GCC uses the AT&T/GAS syntax for inline assembly by default. The general form:

asm volatile (
    "assembly template"         // instruction(s), one per line
    : output operands           // [optional]
    : input operands            // [optional]
    : clobbers                  // [optional]
);
  • volatile tells the compiler: do not optimize away or reorder this block
  • Output operands: C variables that will be written by the asm
  • Input operands: C variables that provide values to the asm
  • Clobbers: registers/memory that the asm modifies (so compiler doesn't assume they're unchanged)

Basic Syntax

Inline NOP

asm("nop");

Move instruction

int x = 5;
asm("movl %0, %%eax" : : "r"(x));
// %0 refers to first operand, %% escapes to a single %

Operand Constraints

Operands use constraint strings to tell GCC where to place variables.

Common Constraints

Constraint Meaning
r Any general-purpose register
m Memory operand
i Immediate integer constant
n Immediate integer (known at compile time)
a RAX/EAX specifically
b RBX/EBX
c RCX/ECX
d RDX/EDX
S RSI/ESI
D RDI/EDI
09 Same location as Nth operand

Modifiers on Output Constraints

Modifier Meaning
= Write-only (value overwritten)
+ Read-write (value read and modified)
& Early clobber (written before inputs are consumed)

Practical Examples

Add two integers

int a = 10, b = 20, result;
asm ("addl %2, %1"      // %1 += %2  (AT&T: src, dst)
     : "=r"(result)     // output: result in any register
     : "0"(a), "r"(b)   // inputs: %0=a (same reg as result), %1=b
);
// result = 30

Swap two variables

int x = 1, y = 2;
asm ("xchgl %0, %1"
     : "=r"(x), "=r"(y)
     : "0"(x), "1"(y)
);
// x = 2, y = 1

Read the timestamp counter (RDTSC)

uint64_t read_tsc(void) {
    uint32_t lo, hi;
    asm volatile ("rdtsc"
                  : "=a"(lo), "=d"(hi));  // rdtsc: result in EDX:EAX
    return ((uint64_t)hi << 32) | lo;
}

CPUID

void cpuid(uint32_t leaf, uint32_t *eax, uint32_t *ebx,
           uint32_t *ecx, uint32_t *edx) {
    asm volatile ("cpuid"
                  : "=a"(*eax), "=b"(*ebx), "=c"(*ecx), "=d"(*edx)
                  : "a"(leaf));
}

Bit scan (find lowest set bit)

int lowest_set_bit(unsigned int x) {
    int pos;
    asm ("bsfl %1, %0"   // BSF: bit scan forward
         : "=r"(pos)
         : "r"(x));
    return pos;
}

Clobber List

Tell GCC which registers your asm destroys (other than the explicit operands):

asm volatile (
    "push %%rbx\n\t"
    "mov $42, %%rbx\n\t"
    "pop %%rbx"
    :
    :
    : "rbx", "memory"   // rbx is clobbered; memory may be modified
);

Special clobbers:

Clobber Meaning
"memory" Asm may read or write any memory — compiler must reload all cached values
"cc" Asm modifies the flags register (RFLAGS)
"rax", "rbx", ... Specific registers clobbered

Use "memory" whenever your asm stores to arbitrary memory (not a named output).


Intel Syntax in GCC Inline Asm

GCC defaults to AT&T syntax. To use Intel syntax:

__asm__ (
    ".intel_syntax noprefix\n\t"
    "mov rax, 42\n\t"
    ".att_syntax prefix\n\t"
    ::: "rax"
);

Or compile with -masm=intel:

gcc -masm=intel source.c -o program


Clang Extended ASM

Clang supports the same GCC inline asm syntax. No changes needed for basic usage.


MSVC Inline Assembly (Windows, 32-bit only)

MSVC has its own inline asm syntax using __asm:

// MSVC — 32-bit only (not supported in 64-bit MSVC!)
int x = 5;
__asm {
    mov eax, x
    add eax, 10
    mov x, eax
}
// x = 15

MSVC dropped inline asm support for 64-bit targets. Use intrinsics or separate .asm files instead.


Compiler Intrinsics (Preferred Alternative)

For SIMD and special instructions, intrinsics are safer than inline asm — they look like function calls but map directly to instructions, and the compiler handles register allocation.

#include <immintrin.h>    // SSE/AVX intrinsics

// Add 4 floats in parallel
__m128 a = _mm_set_ps(4.0f, 3.0f, 2.0f, 1.0f);
__m128 b = _mm_set_ps(8.0f, 7.0f, 6.0f, 5.0f);
__m128 c = _mm_add_ps(a, b);  // {12, 10, 8, 6}
Header Extension
<mmintrin.h> MMX
<xmmintrin.h> SSE
<emmintrin.h> SSE2
<pmmintrin.h> SSE3
<smmintrin.h> SSE4.1
<nmmintrin.h> SSE4.2
<immintrin.h> AVX, AVX2, AVX-512

Common intrinsics:

// Load / store
__m256 _mm256_load_ps(float const *p);     // aligned
__m256 _mm256_loadu_ps(float const *p);    // unaligned
void   _mm256_store_ps(float *p, __m256 a);

// Arithmetic (8 floats in parallel)
__m256 _mm256_add_ps(__m256 a, __m256 b);
__m256 _mm256_mul_ps(__m256 a, __m256 b);
__m256 _mm256_sqrt_ps(__m256 a);

// Comparison
__m256 _mm256_cmp_ps(__m256 a, __m256 b, int imm8);  // returns mask

// Horizontal
__m128 _mm_hadd_ps(__m128 a, __m128 b);

When to Use Inline Assembly vs. Intrinsics

Situation Use
SIMD/vectorization Intrinsics — compiler handles allocation
Specific CPU instructions (CPUID, RDTSC) Inline asm or intrinsics
Atomic operations Use <stdatomic.h> or <atomic>
Syscalls from C glibc wrappers (or inline asm as last resort)
Full control, no C overhead Separate .asm file

Complete Example: Fast Absolute Value (SIMD)

#include <immintrin.h>
#include <stdio.h>

// Compute absolute value of 8 floats using AVX
void abs_vec8(float *dst, const float *src) {
    __m256 v    = _mm256_loadu_ps(src);
    __m256 mask = _mm256_set1_ps(-0.0f);  // sign bit mask
    __m256 result = _mm256_andnot_ps(mask, v);  // clear sign bit
    _mm256_storeu_ps(dst, result);
}

int main(void) {
    float src[] = {-1.0f, 2.0f, -3.0f, 4.0f, -5.0f, 6.0f, -7.0f, 8.0f};
    float dst[8];
    abs_vec8(dst, src);
    for (int i = 0; i < 8; i++) printf("%.1f ", dst[i]);
    // 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0
}

Build:

gcc -O2 -mavx2 abs_vec.c -o abs_vec


Key Takeaways

  • GCC inline asm format: asm("instructions" : outputs : inputs : clobbers)
  • Operands use constraint strings (r, m, a, b, etc.) to specify placement
  • Use volatile to prevent the compiler from removing or reordering your asm
  • Add "memory" clobber when asm reads/writes arbitrary memory
  • "cc" clobber when asm modifies flags
  • Prefer intrinsics over inline asm for SIMD — safer and more portable
  • MSVC does not support 64-bit inline asm; use intrinsics or .asm files

Next: 16 — Optimization