Writing a CHIP-8 emulator in C with SDL2
chip8 is a CHIP-8 emulator written in C, using SDL2 for rendering and input. CHIP-8 is the classic “hello world” of emulator development — a tiny 1970s virtual machine designed to make game programming approachable on 8-bit microcomputers. The whole console fits in your head: 4KB of memory, 16 registers, a 64×32 monochrome display, and just 35 instructions.
That small surface area is exactly why it’s such a good project. You get to build a real fetch-decode-execute CPU loop, do bit-level sprite blitting with collision detection, and wire it all up to a hardware-accelerated window — without drowning in the complexity of a real console.
Features
- All 35 opcodes implemented with a complete fetch-decode-execute cycle.
- 64×32 monochrome display rendered through SDL2 with hardware-accelerated scaling (10× by default, into a resizable window).
- Hex keypad input mapped onto the keyboard in the traditional CHIP-8 layout.
- Delay and sound timers ticking at 60 Hz.
- Bundled ROMs (IBM Logo, Zero Demo, Particle) with an interactive selection prompt on startup.
The whole machine is one struct
Unlike a real console, the entire CHIP-8 state is small enough to read at a glance. Here’s the complete virtual machine:
typedef struct { unsigned short opcode; // Current opcode unsigned char* memory; // 4K memory [4096] unsigned char* V; // 16 8-bit registers [16] unsigned short I; // Index register unsigned short pc; // Program counter unsigned char* gfx; // Graphics buffer [64 * 32] unsigned char delayTimer; // Delay timer unsigned char soundTimer; // Sound timer unsigned short* stack; // Stack [16] unsigned short sp; // Stack pointer unsigned char* key; // Keypad [16] unsigned char drawFlag; // Set when the screen needs a redraw} Chip8;The main loop is just as honest — execute one instruction, poll input, redraw if needed, tick the timers, and sleep a little to slow the emulated CPU down to a reasonable speed:
while(true){ chip8_FetchDecodeExecute(&chip8); chip8_Input(&chip8); chip8_RenderLoop(&chip8); chip8_Timer(&chip8);
usleep(10000); // Slow down execution}Decoding an opcode is bit surgery
Every CHIP-8 instruction is two bytes, stored big-endian. The first job is to reassemble those two bytes into a 16-bit opcode, then slice it into the bit-fields that different instructions care about: a 12-bit address (nnn), two register indices (x, y), an 8-bit constant (kk), and a 4-bit nibble (n).
void chip8_FetchDecodeExecute(Chip8* chip8){ // Fetch: combine two bytes into one big-endian 16-bit opcode chip8->opcode = chip8->memory[chip8->pc] << 8 | chip8->memory[chip8->pc + 1];
// Decode: carve the opcode into its bit-fields short nnn = chip8->opcode & 0x0FFF; unsigned char x = (chip8->opcode & 0x0F00) >> 8; unsigned char y = (chip8->opcode & 0x00F0) >> 4; unsigned char kk = chip8->opcode & 0x00FF; unsigned char n = chip8->opcode & 0x000F;
// Execute: dispatch on the high nibble, then on the variant switch(chip8->opcode & 0xF000) { case 0x0000: switch(chip8->opcode & 0x00FF) { case 0x00E0: chip8_cls(chip8); break; // CLS - clear the display case 0x00EE: chip8_ret(chip8); break; // RET - return from subroutine } break; // ... 1nnn, 2nnn, 3xkk, and the rest ... }
chip8->pc += 2; // Instructions are 2 bytes — always advance by two}The dispatch is a nested switch: the outer one keys off the top nibble (0xF000), and where a family of instructions shares that nibble (like the 0x8xy_ arithmetic group), an inner switch picks the exact variant. Advancing pc by two at the end is what makes the program counter walk through memory one instruction at a time.
Drawing sprites: XOR and the collision flag
The most interesting opcode is Dxyn — draw a sprite. CHIP-8 sprites are 8 pixels wide and n rows tall, and they’re drawn by XOR-ing their bits onto the framebuffer. That XOR is the clever bit: drawing the same sprite twice erases it, which is how the original games did flicker-y animation.
The XOR also gives you free collision detection. If drawing a sprite ever flips a pixel that was already on back to off, something overlapped — so the emulator sets register VF to 1. Games read that flag to know when, say, a missile hit a wall.
void chip8_drw_vx_vy_nibble(Chip8* chip8, unsigned char x, unsigned char y, unsigned char n){ unsigned short px = chip8->V[x] % 64; // wrap X onto the screen unsigned short py = chip8->V[y] % 32; // wrap Y onto the screen unsigned short height = n; unsigned char pixel;
chip8->V[0xF] = 0; // clear the collision flag for(int yline = 0; yline < height; yline++) { if(py + yline >= 32) break;
pixel = chip8->memory[chip8->I + yline]; // one sprite row = one byte for(int xline = 0; xline < 8; xline++) { if(px + xline >= 64) break;
if((pixel & (0x80 >> xline)) != 0) // test each bit, MSB first { unsigned int index = px + xline + ((py + yline) * 64); if(chip8->gfx[index] == 1) // pixel was already on? chip8->V[0xF] = 1; // -> collision chip8->gfx[index] ^= 1; // XOR the pixel onto the screen } } }
chip8->drawFlag = 1; // mark the frame dirty}The 0x80 >> xline mask is how each of the eight bits in a sprite row gets tested left-to-right (0x80 is 10000000, walked one bit right each iteration).
The font is just bytes that look like numbers
CHIP-8 ships its own built-in font for the hex digits 0–F, and the trick is delightfully literal: each character is five bytes, and if you read those bytes in binary, the 1 bits are the pixels.
unsigned char const CHIP8_FONTSET[80] = { 0xF0, 0x90, 0x90, 0x90, 0xF0, // 0 0x20, 0x60, 0x20, 0x20, 0x70, // 1 0xF0, 0x10, 0xF0, 0x80, 0xF0, // 2 // ... through F};Take the digit 0: 0xF0, 0x90, 0x90, 0x90, 0xF0. Write out the top four bits of each byte and the shape pops right out:
1111 0xF01001 0x901001 0x901001 0x901111 0xF0A hollow rectangle — a zero. Every glyph in the font is built this way, which means “rendering text” is just running these bytes through the exact same sprite-drawing path as everything else.
Only redraw when something changed
SDL2 handles the window, the GPU-backed texture, and the scaling (SDL_RenderSetLogicalSize lets the emulator pretend the screen is 64×32 while SDL stretches it to fill the window). The render path itself leans on that drawFlag from the draw opcode — there’s no point pushing pixels to the GPU on frames where nothing moved:
void chip8_RenderLoop(void){ if(chip8.drawFlag) { uint32_t pixels[SCREEN_WIDTH * SCREEN_HEIGHT]; for(int i = 0; i < SCREEN_WIDTH * SCREEN_HEIGHT; i++) pixels[i] = chip8.gfx[i] ? 0x0000FF00 : 0x00000000; // green on / black off
SDL_UpdateTexture(canvas, NULL, pixels, SCREEN_WIDTH * sizeof(uint32_t)); SDL_RenderClear(renderer); SDL_RenderCopy(renderer, canvas, NULL, NULL); SDL_RenderPresent(renderer);
chip8.drawFlag = 0; // until the next DRW marks the frame dirty again }}It’s a classic dirty-flag optimization: the framebuffer is a 1-byte-per-pixel monochrome buffer, and it only gets translated into RGBA and shipped to the texture when a DRW instruction actually touched it.
Tech Stack
- Language: C
- Graphics & input: SDL2 (hardware-accelerated renderer, streaming texture, logical scaling)
- Build: CMake (SDL2 vendored as a git submodule — no system install needed)
- Display: 64×32 monochrome, 10× scale into a resizable window
- CPU: all 35 opcodes, 60 Hz delay/sound timers
Links
- 💻 GitHub →