Dan Freeman | | 👾

Wrong Way 'Round


This past week I've been working on implementing the full CPU instruction set for Gumball. As I thought about how to model the system for the CPU to operate on, I had an immediate sense of what I expected one of the foundational abstractions would look like.

As I tried to build on that abstraction, though, I kept feeling like there was an impedance mismatch somewhere. Instead of making things easier, it only muddied the water. After some time to let the problem percolate in the back of my mind, I finally realized that even though the abstraction was the right one, I wasn't applying it in a way that was buying me anything.

One Abstraction to Rule Them All

The Game Boy uses memory-mapped I/O for interfacing between the CPU and the rest of the system (the LCD and audio controllers, the game cartridge, etc), so interactions with those elements mostly just look like reading and writing bytes at particular addresses in memory.

My knee-jerk reaction to that as a software developer was roughly: "perfect, so all the system components are just things that can have bytes read and written," followed by hammering out a type definition[1] to capture that notion while it was fresh in my mind:

type DataSource = {
  readByte(address: number): number;
  writeByte(address: number, value: number): void;
};

From there, I went through each of the placeholder system entities I had blocked out and framed them as DataSources. Then I set about building something to wire them all together and present them as a single coherent whole to the CPU:

class AddressBus {
  private cartridge: DataSource;
  private ram: DataSource;
  // ...

  public read(address: number): number {
    if (address < 0x8000) {
      return this.cartridge.readByte(address);
    } else if (address >= 0xc000 && address < 0xe000) {
      return this.ram.readByte(address);
    } else {
      // ...
    }
  }
}

That set of conditionals wound up getting pretty complex pretty quickly, though, since chunks of the address space that ultimately mapped to the same device weren't necessarily contiguous with one another. This meant there was a constant tension between growing the complexity of the AddressBus to account for the way the regions were spread out versus encoding knowledge in individual components about where in global memory they sat.

In one pretty representative quirk of the hardware, for instance, the system's working RAM is reflected in two locations in the address space (minus a few bytes at the top), meaning the address bus needed to take this into account when mapping from either of those regions:

} else if (address >= 0xc000 && address < 0xfe00) {
  let actualAddress = address;
  if (address >= 0xe000) {
    // Echo RAM
    actualAddress -= 0x2000;
  }

  return this.ram.readByte(actualAddress);
} else {

The alternative would have been baking that knowledge into the data source for the RAM instead of having it be a straightforward mapping on to a buffer in memory. Trying to keep each data source simple seemed like a good idea, but the further I went, the more odd cases I realized I was going to have to account for. The game cartridge in particular is capable of all kinds of shenanigans.

Left at Albuquerque

The Game Boy maps 32KB of its address space to accessing the cartridge ROM, but 32KB isn't a whole lot of space to fit a game in. Because of that, the lower 16KB of that space always maps to the first bank of program data on the cartridge, but the upper 16KB is essentially a window into up to 8MB of ROM that can be shifted around by writing to these read-only addresses.

Similarly, because the Game Boy's onboard resources are so limited, there's also an 8KB region of the address space where a cartridge can supply its own external devices. Most frequently this is used to provide additional working RAM (which is often actually multiple 8KB banks like the upper ROM region), but there are also cartridges that bring additional hardware like an accelerometer or an IR communication port.

Juggling the mapping for different memory addresses between the AddressBus dispatch logic and the internals of the cartridge implementation would definitely be possible, but it was about at this point that I decided that something was off. The CPU -> AddressBus -> DataSources architecture was definitely workable, but it felt like it wasn't doing me any favors—it just smeared the complexity around instead of doing anything to tame and encapsulate it.

Turtles All the Way Down

I took a couple days away from actively working on the problem and let it slowly mull during less thought-intensive activities like going for a run or sorting through my daily mountain of emails[2]. Slowly I started to form a picture of where I'd made a misstep: the DataSource abstraction was what I needed, but the way I'd employed it was almost exactly backwards in two key ways:

  • Entities like the Cartridge aren't data sources themselves, but they're composed of individual data sources whose behaviors can be defined locally, like the fixed lower ROM bank and the movable upper one.
  • The AddressBus doesn't need to have baked-in knowledge and be a special-purpose adapter for the CPU; it's a tool for composing data sources that is itself a DataSource implementation.

With this shift in mindset, the pieces all began to fit together much more easily. Instead of having the CPU interact with a flat collection of data sources by means of the address bus, it would interact directly with a single data source composed from other data sources, that were themselves composed from more primitive data sources, and so on down.

The details of that composition could be irrelevant and totally fungible in different scenarios, such as in testing, or (hypothetically) emulating the Game Boy Color, which has a slightly different hardware setup.

That long sequence of hard-coded mapping conditionals went away, and the address bus instead just accepted an array of configuration for how to present other data sources:

new AddressBus([
  { offset: 0x0000, length: 0x4000, data: cartridge.rom0 },
  { offset: 0x4000, length: 0x4000, data: cartridge.romx },
  { offset: 0xa000, length: 0x2000, data: cartridge.xram },
  { offset: 0xc000, length: 0x2000, data: ram },
  { offset: 0xe000, length: 0x1e00, data: ram },
  // ...
])

With this, the "echo RAM" region of the address space no longer had to be a special case; it was just the same data source (a simple buffer(0x2000) source) presented in two different locations. Regions with more complexity, like the multi-bank ROM window, could be similarly be implemented as behaviors on top of simpler data sources, easy to build and test in isolation and then compose together.

The CPU itself also became massively easier to test, as it no longer required constructing a full address bus to use, but instead could operate against any data source. This opened up some very nice options for testing instructions in an automated way by seeding the address space with different values, executing the instruction and recording the side effects, and then asserting on that recorded information.

Now What?

I think it was my conviction that I had found the right tool that made it take as long as it did for me to realize I was using that tool incorrectly, but in the end I'm pretty happy with where I landed.

I have the full instruction set for the Game Boy CPU implemented and passing the test suite, though it's entirely possible my expectations around some of those instructions are wrong in both the implementation and the corresponding test 🙃

The only way to find out is going to be to keep building out other parts of the emulator! While all the components in the system might be addressable by the CPU now, a lot of those addresses basically amount to /dev/null at the moment, so the next step is to start giving them meaning.


[1] The return type for readByte and second argument to writeByte didn't actually end up being number, but that's a topic for a whole separate post. ⤴️

[2] 90% of the notification emails I get don't actually need any attention, but the tricky part is making sure I catch the 10% that do. ⤴️