An introduction
Zero Board Computer
Working printf, fopen, and a real filesystem
on any CPU — real, emulated, or yet to be designed.
On day one. Before any hardware exists.
Thanks for coming. I want to make you a promise up front, and then spend the
next half hour earning it. The promise is this: a working printf, a real
fopen, an actual filesystem — on any processor you can name. Silicon on
your bench, a core inside an emulator, or a chip nobody has designed yet. And
you get all of it on day one, before you've written a single device driver.
The way that works is almost suspiciously simple: the processor talks to the
outside world through nothing but ordinary memory reads and writes. That's the
whole trick, and it's why I can say — and I'll keep coming back to this
line all talk — if the CPU can load and store, this works. Let's start
with the name, because the name tells you exactly what the thing is.
First, the name
Why “Zero Board Computer”?
A Single Board Computer — Raspberry Pi, BeagleBone — is a whole computer on one circuit board.
A Zero Board Computer needs zero boards.
So — why “Zero Board Computer”? The name is a play on a term
you already know. A Single Board Computer — a Raspberry Pi, a BeagleBone
— is a complete, working computer on one circuit board. That was the
whole selling point: take what used to be a backplane full of cards and
collapse it down to a single board. Now just follow that trend one more step.
If a single board is the goal, what's past it? Zero boards. That's the name.
It's what you get when you keep stripping support hardware away until there's
nothing left to strip. Which raises an obvious question — if there's no
board, then where do the peripherals actually live? The disk, the console, the
clock? Hold that thought for one slide.
The board is nowhere
The peripherals aren't on a board — they're borrowed from a host .
The only “hardware” the guest sees is a tiny memory window.
It's the working computer you have before any board exists .
A console, a filesystem, a clock — and nothing to bring up.
Here's the answer. The peripherals aren't on a board at all — they're
borrowed from a host machine. Your laptop. The emulator. The workstation
driving your prototype. The guest doesn't own a disk; it borrows the host's
disk. It doesn't own a terminal; it borrows the host's terminal. From the
guest's side there is no peripheral hardware — just a small window of
memory it reads and writes, and everything real happens on the far side of
that window. So what you end up with is a fully usable computer —
storage, a console, a clock — with zero boards of support hardware for
you to design, fabricate, or debug. And that matters most precisely when you
don't have any of that hardware yet. Which is the situation I want to talk
about next, because every one of you has been stuck in it.
The six-week tax
Every new chip, board, or prototype starts the same way:
Can't debug firmware until the serial port works.
Can't test the filesystem until the flash driver works.
Can't validate the timer until you can read the clock back .
Every new chip, every new board, every prototype starts in the same hole.
It's a chicken-and-egg problem. You want to debug your firmware — but the
only window you have into it is the serial port, and the serial-port driver is
itself unproven firmware you can't see into yet. You want to test your storage
— but that needs the flash driver, which needs the very debugging you
don't have. You want to trust your timer — but you can't even confirm
it's counting until you've built a way to read the clock back out, and that's
its own little project. Notice the pattern: nobody's product is “bring up
a UART.” That work ships nothing. It's pure scaffolding. And it's where
the first month of every hardware project quietly disappears.
The first six weeks go to fighting your way to a working printf.
Not the product. Not the silicon. Just trying to see what your code is doing — redone by every team, on every project, for decades.
And it all collapses into one blunt sentence: the first six weeks go to
fighting your way to a working printf. “A working printf” is really
shorthand for the first moment the machine talks back — the first time
you can see what your code is doing. Until then you're flying completely blind.
And none of that six weeks is the product. It isn't validating your silicon,
it isn't exercising your design, it isn't testing your algorithm. Worse, it
doesn't get cheaper — it isn't a problem somebody solves once. It gets
re-paid, in full, by every team, on every chip, for decades. So the honest
question is: why are we all still paying this? What if printf, and fopen, and a
clock were simply there, on the first power-on? To explain how, I have to
introduce one old idea.
What is semihosting ?
Borrow the host's I/O instead of building your own.
The program on the target doesn't open files or print characters itself —
it asks a host machine (your laptop, the emulator, a debugger)
to do it on its behalf.
That idea is semihosting. The definition is simple: instead of the target
doing its own input and output, it borrows the host's. The program on the chip
doesn't open the file itself, doesn't push characters to a screen itself
— it asks a host machine, one that already has a real filesystem and a
real console and a real clock, to do it on its behalf. The name tells you the
shape of it: “semi,” as in half a host. The target is only half a
computer. It runs the code, but it hands the I/O up to a full machine that
does the actual work. Concretely, that covers exactly what the bottom of a C
library needs: open, close, read, write, and seek a file; read and write the
console; ask the time; exit with a status. And I want to be clear — this
is not new or exotic. Semihosting is decades old and completely mainstream.
ZBC didn't invent it. ZBC fixes how it's delivered.
ARM did this in 1993
It works — but it assumes:
only ARM chips,
only with a hardware debug probe attached,
only via ARM-specific trap instructions a debugger intercepts.
To see what needed fixing, look at the canonical version. ARM defined
semihosting in the early nineties, and it's genuinely good — it's still
carrying printf for Cortex-M developers today. But it makes three assumptions.
First, it only works on ARM chips, because the mechanism is defined in terms
of ARM's instruction set — it simply doesn't exist for a 6502 or a
RISC-V core. Second, it needs a hardware debug probe physically attached,
because the request gets caught by a debugger over JTAG. No probe, no
semihosting. And third, the way the chip signals the host is a special trap
instruction — a deliberate breakpoint that halts the CPU so the debugger
can step in. Now, in 1993, every one of those was free. Every target was ARM,
every engineer had a probe on the bench, and silicon came before simulation.
The design didn't get worse over thirty years. The world changed around it
— and every one of those three assumptions broke.
All three constraints come from how the target signals the host: trap instructions.
Replace the trap with a32‑byte memory‑mapped device .
If the CPU can read and write memory — and every CPU can, by definition — it works.
So here's the insight that frees it. Look at where all three of ARM's
constraints come from. They don't come from the I/O services — those are
fine. They all come from one design choice: the signaling mechanism. The trap
instruction is what drags in the architecture dependence, the debugger, and
the privileged mode. So replace the trap. Instead of a special instruction,
put a small device on the bus — thirty-two bytes of registers, sitting
in the memory map like a UART or a clock chip. The chip signals the host by
writing to memory instead of by executing a magic instruction. And why is that
the right primitive? Because reading and writing memory is the one thing every
CPU can do, by definition. An eight-bit micro, a sixty-four-bit server chip, a
processor that won't tape out until next year — there is no CPU that runs
code but can't do a load and a store. That's how an idea that was ARM-only
becomes genuinely universal. This is the hinge of the whole talk.
What that buys you
No special instructions to define.
No debug probe to attach.
No privileged execution mode required.
And look at what falls out of that one substitution — it's the exact
mirror of ARM's three constraints. There's nothing architecture-specific to
define, so the same device works for every CPU family, and porting to a new
architecture is zero protocol work. There's no debug probe, because the host
isn't a debugger catching a trap — it's a peripheral answering a memory
access, so it works in a plain software simulator with nothing attached, and
on real hardware with an empty JTAG header. And there's no privileged mode,
because loads and stores are just ordinary instructions — no supervisor
state, no exception handler to make a request. The mental model to hold onto is
this: the host is a device on the bus, not a debugger interrupting the CPU. The
processor stays in control the entire time. Which is exactly why a single
request is so simple — let me show you one.
The doorbell handshake
Guest builds a small message in its own RAM.
Writes the buffer address into RIFF_PTR.
Writes any value to DOORBELL — the signal.
Host does the real I/O, writes the response back into the same buffer .
Host sets RESPONSE_READY; guest polls, then reads its answer.
This is the entire protocol — five steps, and then it repeats. First, the
guest builds a small message in its own RAM. That detail matters: the guest
owns the buffer, the device never allocates anything, and that's how the client
library can promise it never touches the heap. Second, it writes that buffer's
address into the RIFF_PTR register, so the host knows where to look. Third, it
writes any value at all to the DOORBELL register — that single write is
the “go” signal; the value doesn't matter, the act of writing does.
Fourth, the host reads the request out of guest memory, performs the real file
or console operation, and writes the answer back into that same buffer. And
fifth, the host sets a “response ready” bit in the status register;
the guest, which has been polling that bit, sees it and reads its result. The
thing to notice is that the host is completely passive. It does nothing until
the doorbell rings, and it never seizes the CPU. Compare that to ARM, where the
trap halts the processor. Here, the CPU is always driving.
The whole device: 32 bytes
Offset Register Meaning
0x00 SIGNATURE ASCII "SEMIHOST" — how guests find it
0x08 RIFF_PTR 16 bytes: address of the message buffer
0x18 DOORBELL write any value → trigger
0x19 STATUS bitmask: TIMER / RESPONSE_READY / PROTO_ERROR
0x1A ERROR_CODE protocol errors, without touching guest RAM
And here is that device in full. This table is the entire hardware surface
— thirty-two bytes. That's the whole thing a chip designer or an emulator
author has to build. Let me walk it. SIGNATURE is eight ASCII bytes spelling
“SEMIHOST” — it's how a guest discovers the device: read this
address, see the magic string, and you know ZBC is present. Remember that one;
it comes back later. RIFF_PTR is where the message lives, and it's a full
sixteen bytes wide on purpose — that's room for a 128-bit pointer. A 6502
just writes the low two bytes and ignores the rest. The register is sized for
the widest CPU imaginable, not for the machine the implementer happens to own.
DOORBELL is the trigger we just saw. STATUS is a bitmask, not one flag: a timer
bit, a response-ready bit, and a protocol-error bit. And ERROR_CODE is a small
safety feature — when a request is so broken the host can't even parse
it, the host reports that here, in its own register, instead of scribbling into
the guest's memory. A malformed request can't corrupt the guest. So that's the
hardware. Now — what's actually inside that buffer?
The message is RIFF
A tagged container — the same family as WAV and AVI files.
RIFF .... SEMI <- container, form type "SEMI"
CNFG int_size ptr_size endian <- guest declares its architecture
CALL opcode = 0x05 (SYS_WRITE) <- the request
PARM fd = 1
DATA "Hello, world!\n"
PARM length = 14
Host overwrites CALL with a RETN (result + errno), in place.
The message is RIFF — the same tagged container format behind WAV and AVI
files. I want to justify that choice for a second, because reusing a boring old
format is the point. RIFF is just a stream of chunks: each one has a
four-character tag, a length, and a payload. It's self-describing and trivially
extensible — add a new chunk type and old parsers simply skip what they
don't recognize. There's no clever new binary format to get subtly wrong. Read
the example top to bottom. The outer RIFF container is tagged “SEMI,”
for semihosting. The first chunk is always CNFG — that's the guest
announcing who it is: my integer size, my pointer size, my byte order.
Everything after gets interpreted through that declaration, and it's the setup
for the most important idea in this talk, two slides from now. Then CALL
carries the opcode — here, five, which is write. Inside it: a parameter
for the file descriptor, a data chunk with the actual bytes, and a parameter
for the length. When the host finishes, it overwrites that CALL chunk with a
RETN chunk — the result and an error number — right there in the
same buffer. Request in, response out, same memory, no allocation anywhere.
One subtlety: endianness
RIFF headers
Always little-endian — that's RIFF's own rule.
Payload values
The guest's declared endianness, from CNFG.
The container is fixed. The contents speak the guest's language.
One subtlety is worth thirty seconds, because it's exactly where a
cross-architecture protocol usually goes wrong: endianness. There are two
different rules here, on purpose. The RIFF frame itself — the chunk tags
and the length fields — is always little-endian. That's RIFF's own rule,
inherited from WAV and AVI, and ZBC keeps it unchanged, so the structural bytes
look the same no matter who's talking. But the values inside — the actual
integers and pointers, the return values — use the guest's own declared
endianness, the one it announced in CNFG. A big-endian 68000 encodes its file
descriptor big-endian; a little-endian 6502 encodes its little-endian. The
picture to hold is: the envelope is written in one fixed language so any host
can route it, but the letter inside is written in the guest's native language.
A big-endian guest byteswaps the envelope through shared helpers — it
never has to reformat its own data. Container fixed, contents native. That's
the whole rule.
The thesis
Width neutrality is the product
The guest declares its int size, pointer size, and byte order.
The host honors what the guest says.
If you remember one slide from today, make it this one. Everything else —
the device, the RIFF format — is mechanism. This is the point. The
protocol never assumes a width. It doesn't bake in thirty-two bits, or
sixty-four, or any number at all. Instead the guest declares its parameters
— integer size, pointer size, byte order — and the host simply
honors whatever the guest said. Width is negotiated, never assumed. And that's
the reason this is a protocol and not just a library. A library you could fork
and tweak. But declared widths mean a guest written today and a host written by
someone else, years later, in another language, still interoperate —
because the width travels on the wire, from the guest, not from anyone's source
code. The project states this as a hard rule: any part of the system that
assumes a specific bit width is treated as a defect to be removed. The only
legitimate sources of a width are the wire format's own fields and the guest's
declaration. Never the implementer's habits. Never “thirty-two bits is
plenty.” Let me make that concrete.
Same protocol, different sizes
8-bit 6502
int = 2 · ptr = 2 · little-endian
64-bit, big-endian
int = 4 · ptr = 8 · big-endian
They speak the same wire protocol. They just fill in different fields.
Take two machines that could hardly be more different. On the left, a 6502
— a 1975 eight-bit chip with sixteen-bit pointers, little-endian. Its
CNFG says: integer two bytes, pointer two bytes, little-endian. On the right, a
modern sixty-four-bit core that happens to be big-endian. Its CNFG says:
integer four bytes, pointer eight bytes, big-endian. And notice there that the
integer and the pointer are different sizes — the protocol handles that
fine, because every parameter knows whether it's an integer or a pointer and is
sized accordingly. Here's the payoff: those two machines — forty-some
years apart, a factor of eight in width, opposite byte orders — speak the
identical wire protocol. Not a small-chip dialect and a big-chip dialect. The
same protocol. They just write different numbers into the same CNFG fields, and
every host reads those fields and adapts. And this isn't hypothetical —
the project runs live on-target tests at both ends of that range, so the
interoperability is continuously proven, not just promised. And taking width
seriously buys you something you might not expect.
Why it matters, concretely
A 16-bit guest can stream sequentially through a file larger than its own address space .
That's a core use case — not an edge case.
Here's the surprise. Take a sixteen-bit machine. Its entire address space is
sixty-four kilobytes — it physically cannot form a pointer to byte one
hundred thousand. And yet it can open a multi-gigabyte file on the host and
stream sequentially all the way through it. How? Because the file offset isn't
a guest pointer — it's a value in the protocol, and the protocol carries
it at full sixty-four-bit width no matter how small the guest is. The guest
reads a chunk into its little buffer, processes it, asks for the next chunk at
the next offset. It never has to address the whole file — only to name
positions in it, and sixty-four-bit positions fit on the wire even when they
don't fit in the CPU. This is exactly why narrowing is forbidden. If some
implementer had stored that offset in a thirty-two-bit integer because
“who has files that big,” they'd have silently capped what every
guest can do — and the tiny guest, the one that needs streaming the most,
is the first one to break. So a small chip walking a huge file isn't a clever
edge case someone allowed. It's a core use case the design protects on purpose.
Standing on ARM's shoulders
Op Name
0x01 SYS_OPEN
0x05 SYS_WRITE
0x06 SYS_READ
0x0A SYS_SEEK
0x18 SYS_EXIT
ZBC keeps ARM semihosting's operation numbers .
picolibc and newlib ports work nearly unchanged .
Only the transport changed: traps → memory. The vocabulary is the same.
Now, earlier I made ARM's trap the villain. But ARM got something else exactly
right, and ZBC keeps it: the vocabulary. ARM defined a numbering for the
operations — open is one, write is five, read is six, and so on. That
numbering is excellent, and ZBC uses it unchanged. And that's a strategic
choice, not a lazy one. The bottom layer of real C libraries — picolibc,
newlib — already knows how to speak these exact call numbers. So by
keeping them, ZBC lets that battle-tested library code work nearly unchanged.
You're not rewriting libc; you swap only the handful of instructions at the
very bottom that actually deliver the request. So the one-line summary is: only
the transport changed — traps became memory accesses — and the
vocabulary stayed the same. You get the universality of the new transport and
decades of accumulated library work, for free. And keep that opcode
compatibility in mind, because it pays off again in a few slides in a way you
won't see coming.
It's not one wire
One binary, many hosts
The invariant is the guest-facing API , not the wire underneath it.
Build once against zbc_call() — the same binary runs across very different machines, each with its own transport.
Okay — change of gear. Everything so far has been one device, one wire:
the RIFF doorbell peripheral. Now I want to widen the picture. The RIFF device
is still the canonical wire, but it turns out to be just one of several ways a
guest can reach a host. And here's the key idea: the thing that stays constant
isn't the wire — it's the guest-facing API. Your firmware calls the same
functions regardless. Underneath that call sits a swappable
“transport” that decides how the request actually gets to a host.
So the same compiled binary — not recompiled, the same bytes — can
run on machines that have nothing in common, because at startup the library
detects which transport is available and uses it. If you've ever written a
program that opens “a file” without caring whether it's a local
disk or a network share or a pipe — same call, different plumbing,
chosen at runtime — that's exactly the move here. So what are the
transports?
Three ways to reach the host
RIFF device — the 32-byte doorbell peripheral.
MAME, a patched QEMU, real silicon.
virtio — the guest carries drivers.
Stock QEMU, unmodified. (next slides)
native trap — ARM-style semihosting traps.
Build-time option, where the target supports it.
There are three, and each one is at home somewhere different. The first is the
RIFF device — the doorbell peripheral we just spent the whole talk on.
That's what you get on MAME, on a QEMU that's been patched to include the
device, and on real silicon. Whenever a native ZBC device exists, it wins. The
second is virtio. When a host has no ZBC device at all, the guest can instead
carry standard virtio drivers and talk to the host's existing virtio devices
— and that is what makes plain, unmodified QEMU work, which is the next
two slides. The third is native trap — on architectures that already have
trap-based semihosting, a build option just uses it, bringing us full circle
back to ARM. And one sign this is real engineering: the transports compose. A
composite can send console traffic over one and file traffic over another, with
a base layer underneath that safely reports “not implemented.” Now,
with multiple wires you'd reasonably worry they behave differently — so
there's an equivalence test suite. The same operation script runs over every
transport, and the results, the error numbers, the file-descriptor behavior all
have to match. It's the client-side twin of the host conformance test.
Runs on stock QEMU — today
No ZBC device, no QEMU patch. The guest brings the drivers:
virtio-9p
File operations, via QEMU's built-in 9P file server. No external daemon.
virtio-console
Console I/O, on the same virtqueue core.
-fsdev local,id=fs0,path=$SHARE,security_model=none \
-device virtio-9p-device,fsdev=fs0,mount_tag=zbc
And this is the headline of the second half: it runs on the QEMU you already
have installed. No patched fork, no custom build, no kernel. Stock upstream
QEMU. Here's how. Stock QEMU has no ZBC device — so instead of the host
carrying the device, the guest carries small standard drivers and talks to
devices QEMU already ships. For files, that's virtio-9p. 9P is the Plan 9
filesystem protocol, and QEMU has a built-in 9P server, so the guest gets real
access to a host directory with no external daemon running. For the console,
virtio-console, riding the same underlying machinery. And the reason it's 9P
specifically: 9P is a strictly synchronous, one-request-one-reply protocol,
which maps almost perfectly onto ZBC's own synchronous call model — so the
translation layer stays tiny. Look at the bottom of the slide — that's
the entire host-side setup. One line to point at a directory, one line to
expose it as a device. You can paste those two lines and be running in about
two minutes.
If it speaks virtio , it works
Any machine exposing modern virtio-mmio — the library scans for it at startup.
QEMU machine Arch virtio-mmio window
virtARM / AArch64 0x0a000000
virtRISC-V (rv32 & rv64) 0x10001000
microvmx86 0xfeb00000
Now let me say that more broadly, because I said “QEMU” but the real
claim is bigger: if a machine exposes a standard virtio device over
memory-mapped I/O, ZBC's guest drivers can use it. virtio is the de-facto
standard for paravirtual devices — so this is less a QEMU integration than
a virtio integration that QEMU happens to provide. The table is just “the
major architectures, covered”: ARM and AArch64 on the virt machine,
thirty-two and sixty-four-bit RISC-V on virt, and x86 on microvm. The hex
column is simply where each machine parks its virtio register window. And
discovery ties right back to that SIGNATURE register from earlier. At startup
the library probes in order: first it reads the device base and looks for the
“SEMIHOST” signature; if that's missing, it scans the virtio window
for a live console or filesystem device; and if it finds nothing at all, it
falls back to a transport where every call fails immediately and cleanly. It
never hangs, and it never guesses — the whole thing is signature-driven,
which is exactly what that register was for. One quick note in case you're
wondering why x86 says microvm and not the usual pc: virtio over PCI would drag
in bus enumeration, exactly the per-platform complexity this project refuses to
grow, so microvm — which exposes virtio over memory — is the
supported x86 machine.
Or QEMU's own trap semihosting
On ARM, AArch64, RISC-V, MIPS, m68k, Xtensa, a build-time shim routes everything through QEMU's -semihosting.
Covers the entire operation set with zero guest driver code.
~10 instructions per architecture — opcode + pointer in registers, then the trap.
Build-time only: a trap on a non-semihosting host faults, so it's never auto-probed.
And here's the full-circle moment I promised. We opened by treating ARM's trap
as the villain. But because ZBC kept ARM's opcode numbers, QEMU's own built-in
trap semihosting just becomes another transport we can plug in for free. The
very thing we replaced, we can also reuse wherever it's already there. It
applies to the architectures QEMU implements traps for — ARM, AArch64,
RISC-V, MIPS, m68k, Xtensa. On those, you start QEMU with one flag, and a tiny
per-architecture shim — about ten instructions — drops the opcode
and a pointer into the right registers, executes the trap, and reads the result
back. Its big advantage is coverage: it handles the entire operation set, files
and console and time and exit, with zero guest driver code and nothing to
configure beyond that flag. The one honest caveat: unlike the others, this
transport can't be safely auto-detected. To test whether trap semihosting is
present, you'd have to execute the trap — and if it isn't present, that
raises a CPU exception. Probing for it could crash you. So it's a deliberate
build-time choice, never part of the runtime probe.
This is not a sketch
Two reference hosts, proven equal
C90 (zero heap, statically verified) and C++17 — byte-for-byte conformance-tested against each other on every CI run.
200+ machines in MAME
Drop a binary into an emulated 6502, Z80, or 68000 and it reads files off your laptop. No driver work at all.
Let me switch from pitch to evidence, because you've heard the idea — now
I want to show it's not vaporware. There are two reference host libraries: one
in C90, one in C++17, written independently. And a conformance test feeds both
the same requests and checks that their responses are identical, byte for byte,
on every CI run. That's how you know the specification is real and unambiguous
— two separate implementations come out provably interchangeable. The C
one also allocates zero heap, statically verified, so it fits on an eight-bit
micro with a few kilobytes of RAM. And here's the detail that tends to make
people sit up: MAME already ships ZBC machines for over two hundred vintage
systems — home computers, arcade boards, classic minis. You can take a
cross-compiled binary, drop it into an emulated 6502 or Z80 or 68000, and that
1979 machine reads files off your 2026 laptop. No serial driver, no storage
driver, no filesystem code — none of the bring-up tax from the start of
this talk. And it's tested at both ends of the spectrum at once.
Tested at both extremes
One test binary, two emulators, many transports
The same program runs on MAME's 6502 & i386 (RIFF) and QEMU's RISC-V, ARM, AArch64 & x86 microvm (virtio) — live, every push.
Hardened
Continuous RIFF-parser fuzzing, ASan/UBSan, optional seccomp sandbox, directory-jail backends.
This is the slide that ties the whole talk together. One test program, written
once against the client API, runs unmodified across all of this: MAME's
eight-bit 6502 and thirty-two-bit i386 over the RIFF transport, and QEMU's
RISC-V, ARM, AArch64, and x86 microvm over the virtio transport. One binary,
two emulators, multiple transports, eight-bit through sixty-four-bit — and
it runs live on every push, not as a one-time demo. That matrix is the thesis
made executable: width neutrality and transport independence stop being slogans
the moment the same program keeps passing across that whole spread
automatically. Now I should address the obvious worry — security.
Semihosting is powerful by definition; guest code is reaching into host files.
So the project takes that seriously. The RIFF parser, which is the part that
eats untrusted input, is continuously fuzzed. The sanitizers gate every push.
There's an optional seccomp sandbox that strips the host process down to just
the syscalls semihosting needs. And the file backend can be jailed to a single
directory with path-traversal blocked. You pick a backend and a policy to match
your threat model — from “trusted test code, full access” all
the way to “untrusted guest, locked box.”
Is this for you?
Bringing up a new board
Add 32 bytes of decode to your bus. Your firmware has printf, fopen, time() before the board is back from the factory.
Hobbyist with an old chip
Soldered a 6502 to perfboard? Give it a real filesystem this afternoon — the same binary runs on the physical board later.
So — is this for you? Let me get specific, because there are a few
different people in this room. If you're bringing up a new board, you're the
one dreading the first month. You don't have to write the serial driver, or the
flash driver, or the throwaway logging hack you always end up building. Add
thirty-two bytes of address decode to your bus — or just link the host
library into your emulator — and your firmware has printf and fopen and
time of day before the physical board is even back from fabrication. Silicon
stops blocking software. And if you're a hobbyist — you soldered a 6502 or
a Z80 onto perfboard and you want it to do something real today — run it
in MAME, give it a real filesystem this afternoon, and here's the part that
matters: when your physical board is ready, you point the client library at the
address you wired in, and the same binary just runs. There's no “emulator
version” and “hardware version.” One binary, emulator first,
hardware later, no divergence. But there's a third audience where the payoff is
measured in calendar quarters.
Especially: a new chip program
Firmware starts day one against the simulator. Silicon arrival becomes an integration milestone , not a starting gun.
A quarter of calendar time recovered. Nothing proprietary introduced.
And this is the one for whoever owns the schedule. On a new-chip program,
firmware is normally blocked on silicon for the first quarter. The team waits
for tape-out before they can really begin — that's a quarter of idle
calendar time baked right into the plan. ZBC changes that. Firmware development
starts on day one, against the simulator, with full file access and a real
console. Silicon arrival stops being the starting gun for firmware and becomes
just an integration milestone — the same firmware, the same tests, the
same logs you've already been running, now on real parts. And the reason that's
low-risk is the most important technical point here: the entire interface is
thirty-two bytes of memory-mapped I/O. The surface is so small that the
simulator and the manufactured chip can't meaningfully diverge across it.
You're not building an elaborate model that drifts from reality — you're
matching a thirty-two-byte contract. So the business case is simple: about a
quarter of calendar time recovered, and nothing proprietary introduced. It's
MIT-licensed, the client is tiny C90, there's no vendor SDK and nothing to sign.
The wire format is the contract
One spec file is the single source of truth, byte for byte.
Future hosts — an FPGA core , a vendor's emulator — conform to the same format.
A binary written today still runs twenty years from now against any conforming host.
Semihosting stops being a per-platform reinvention.
Let me zoom out to the deepest point, because it's the one that outlasts
everything else I've shown you. The C library, the C++ library, MAME, QEMU
— all of those are implementations. The actual product is the contract: a
single specification file that defines the wire format, byte for byte. The
libraries are reference implementations of that spec, not the other way around.
And that ordering is what makes this durable. Because the authority lives in a
wire format and not in any one codebase, anyone can write a new conforming host
— Rust bindings, an FPGA core burned into a prototype, a chip vendor's
clean-room emulator — and a guest written today just works against it.
Implementations come and go; the contract is the fixed point. Which lets me
make a claim I actually believe: a guest binary you compile today, against this
spec, will still run twenty years from now against whatever conforming host
exists then. That's the line between a tool and infrastructure. And it's the
real fix for the bring-up tax — the reason bring-up has been a perennial
cost is that semihosting kept getting reinvented per platform. Pin it to one
stable, width-neutral contract, and that reinvention stops.
Try it this afternoon
Vintage chip, in MAME
mame -listfull zbc*
mame zbcz80 -quik hello.bin
Modern target, in stock QEMU
qemu-system-riscv32 -machine virt \
-device virtio-9p-device,...
Docs & spec: johnwbyrd.github.io/zbc · Wiki: zeroboardcomputer.com
License: MIT · client is C90, never calls malloc
If it can load and store, it can run ZBC.
So let me leave you with something you can actually do, because the barrier here
really is one afternoon. There are two on-ramps. If you have MAME, list the ZBC
machines with the first command there, pick one, and quickload a binary
— you'll have a Z80 or a 6502 doing real file I/O in minutes. If you have
QEMU, boot the virt or microvm machine with a virtio-9p device pointed at a
directory — same client API, different transport. And to clear away the
usual objections: it's MIT-licensed, about as permissive as it gets; the client
is small C90 that never calls malloc; you don't sign anything and you don't take
on a vendor SDK. The spec lives on the docs site, and the tutorials and
compatibility tables live on the wiki. And I'll close where I started, because
it's still the whole story in one line: if it can load and store, it can run
ZBC. Thank you — I'd love to take your questions.