Dan Freeman | Sep 22, 2019 | 🖥

What's in a Name?

This past week I had one of those moments where, in the middle of idle conversation, something suddenly came clear to me that I hadn't even realized I was missing. There was some ongoing chatter about structural typing in the Ember Discord's #e-typescript channel, and folks were discussing the ups and downs of being able to provide more information than your types require you to.

Somewhere in there it clicked for me that this was exactly the same as the problem I alluded to in my recent Gumball post that led to me not using the number type in the emulator. I'd already been planning to write about that since I thought it was interesting, but making that connection got me thinking things through a bit more thoroughly, and I ended up writing more than I'd expected on the topic.

Take a Number

Working on a Game Boy emulator has resurfaced a lot of dusty old facts and fears that I'd managed to put out of my mind as soon as I knew I'd never see them on an exam again. One of the most fundamental truths that my brain knows but my heart tries not to dwell on is this: computing pretty much boils down to shuffling numbers from place to place. Code and data are both ultimately just streams of undifferentiated bits that have to be handled very, very carefully or else they lose all meaning.

Systems-oriented programming languages like Go, Rust and C/C++ help developers out on this front by offering numeric representations that correspond to varying sizes and interpretations of the chunks of bits they're moving around. So, naturally, I reached instead for a language that has none of that and instead just offers up a single¹ representation for all your possible arithmetic needs: number.

Counting the number of milliseconds since the dawn of (computational) time? Sounds like a job for number! Working out the sin or cos of an angle to make the geometry for your snazzy CSS effects fit together? Here, have a number. Twiddling bits to roll your own sketchy crypto routines? number's got you covered.

Modern JavaScript engines are super smart, and they'll often notice patterns in how you're using numeric values in specific contexts and optimize the representation accordingly. On paper, though, every JS number is a 64-bit IEEE 754 floating-point value, with all the good and not-so-good things that entails. TypeScript accordingly offers a single type to encompass all numeric values, called—you guessed it—number.

The emulation in Gumball involves manipulating lots of numbers with very specific ranges of possible values. If, for example, I have a variable with the value 255 and I add 1 to it, the right answer depends on the kind of value I'm working with.

If it's a 16-bit value, then nothing unusual happens and the result is 256, just as it would be in grade school math. If it's an 8-bit value, though, then 255 is the largest number it can hold and the addition rolls back around to make the result 0. And if the value was meant to only represent a single on-or-off bit, then the fact that I somehow got 255 in there in the first place indicates a bug somewhere else in the system, and the answer is meaningless anyway.

To the JS runtime, though, everything is just a number, and 255 + 1 is always 256. Since TypeScript's broad goal is to reflect the realities of that runtime behavior, it also draws no distinction, but there are tools we can use to get the typechecker on our side. To get there, though, we need a quick diversion on how certain pieces of the type system work.

Inside and Out

In programming, type systems can broadly be broken down into two categories: nominal and structural. This breakdown refers to how the system determines whether one type is equivalent to (or a subtype of) another. In other words, "if I have a value of type X, how do I know if I can assign it to a variable of type Y?"

In Name Only

Java is an example of one well-known language with a nominal type system:

interface MyInterface {
  public String getMessage();
}

class MyClass {
  public String getMessage() {
    return "hello";
  }
}

If I try to pass an instance of MyClass to a method expecting an argument of type MyInterface, the Java compiler will reject that code. In order for it to compile, I would need to explicitly add implements MyInterface to the declaration of MyClass, at which point the compiler would happily allow me to use a MyClass instance as a MyInterface value.

This is the crux of nominal typing: the identity of a type is bound up in its name, and the developer has to declare their intent to conform their code to a particular type for the compiler to consider it to be of that type.

Don’t Judge a Book by Its Cover

TypeScript, on the other hand, is a salient example of a language with a structural type system:

interface MyInterface {
  getMessage(): string;
}

class MyClass {
  getMessage(): string {
    return 'hello';
  }
}

If I try to pass an instance of MyClass to a method expecting an argument of type MyInterface, the TypeScript compiler will check the members of the MyInterface type, see that the MyClass type also has those members, and happily accept the code. It doesn't matter if MyClass also has a hundred other methods and fields unrelated to MyInterface; as long as it meets the minimum requirements of the type, it's acceptable.

I can even use an anonymous value like { getMessage: () => 'hi' } anywhere that code is expecting a value of type MyInterface, and the compiler will give it a thumbs up.

This is the crux of structural typing: it doesn't matter what your types are called or whether they were implemented with a particular protocol in mind. As long as one type fits the minimal shape of another, it can be used as that type.

Chocolate and Peanut Butter

Neither nominal nor structural typing is obviously better than the other. They each have pros and cons and are well suited to particular contexts. And they're not mutually exclusive, either! Scala, for instance, largely encourages developers to design for nominal typing², but also has affordances for declaring methods that accept any value that has certain methods.

TypeScript, as we said, has a structural system. This means that even if we declare separate names for a type, they'll still be the same in the end:

type Person = { name: string; age: number };
type Dog = { name: string; age: number };

let clifford: Dog = { name: 'Clifford', age: 56 };
let dan: Person = clifford; // ✅

Despite the Dog and Person types having different names, the code above typechecks when I assign a value of type Dog to a variable of type Person because the two types are structurally equivalent.

However, TypeScript also has literal types, e.g. 1 and 'foo' are very specific types that are subtypes of number and string respectively. This gives developers the tools they need to brand types, which refers to a family of related patterns for sneaking nominality into the structural system.

One very straightforward way to do this is to have your types include a field describing what they are:

type Person = { tag: 'person'; name: string; age: number };
type Dog = { tag: 'dog'; name: string; age: number };

let clifford: Dog = { tag: 'dog', name: 'Clifford', age: 56 };
let dan: Person = clifford; // ❌

This code now gives a type error when I try to assign a Dog to a Person, because they're no longer compatible types. They both have a tag field, but the type of that field is different on each: the type "dog" isn't assignable to the type "person".

A Numbers Game

As I ~~complained~~ mentioned earlier, when you're dealing with things at a low enough level, code and data both just look like bytes. And in fact, in the two are tightly interleaved in the stream of bits the Game Boy CPU interprets: some instructions stand on their own, but others take 8- or 16-bit values as operands, and that data is located immediately following the byte for the instruction itself.

Consider a CPU instruction that expects to consume the byte immediately following it as some kind of input. Suppose in implementing that instruction, though, that I accidentally write consumeWordAtPC() instead of consumeByteAtPC()—there are hundreds of these instructions, after all, and typos happen. This mistake causes the CPU to slurp up not only the expected byte of data, but also the byte that was supposed to be the next instruction executed!

Since everything is just a number, that word gets passed around as-is for a while, possibly getting truncated down to 8 bits, and is eventually either stored or acted on. Either way, what was the next instruction is going to be skipped, and hundreds more may execute before it's apparent that something is off in the emulator state. Debugging this scenario is a nightmare, because it's impossible to pinpoint exactly what went wrong when.

If only there were a way to tell the compiler "hey, this is a number, but there's also something special about it I can't express to you." If only there were systems where every type acted that way that we could somehow draw inspiration from...

declare const Unique: unique symbol;
type Unique<T, Tag> = T & { [Unique]: Tag };

export type Byte = Unique<number, 'byte'> | 0 | 1 | 0xff;
export type Word = Unique<number, 'word'> | 0 | 1 | 0xffff;

The two types Byte and Word above are each numbers, but with a little something extra. We declare that they also have a secret additional key³ describing what size of value they're certified to belong to. We also specify certain literal types that pre-qualify for convenience (0, 1 and the max value for the range), but otherwise it's impossible to assign a plain number to one of these types.

Instead, we use helper functions that accept arbitrary numbers as input, truncate them to the appropriate size, and then cast the result to one of our special types.

export function byte(value: number): Byte {
  return (value & 0xff) as Byte;
}

export function word(value: number): Word {
  return (value & 0xffff) as Word;
}

In Gumball this system keeps me from having to waste time at runtime truncating values to the correct length at every function boundary, since the type signature tells me whether or not I need to. I only ever have to think locally when I'm performing operations that might overflow out of range, and the compiler happily stops me if I try to return an unchecked number when I've promised a Byte.

This solution also statically prevents the operand-decoding bug I described above. If consumeWordAtPC returns a Word but the function I'm passing the value to expects a Byte, that's a type error! Switching over to this system caught two places where I had made mistakes similar to that, and several others where I had forgotten to truncate results after doing some kind of bitwise operations on values. I can't even imagine how hard and frustrating it would have been to track down those bugs at runtime.

Nominally Mainstream

These patterns for adding dashes of nominality to TypeScript code are popular enough that the TS team is looking at possible ways of streamlining them in the language. @weswigham has two open proposals up right now: one that introduces unique types and one that introduces tag types.

The unique type approach is the "truly nominal" approach, and it almost precisely fits the way I described nominality here: it adds a keyword unique that you put in front of a type to declare to the compiler that the type in question is somehow special. With this proposal, the numeric types from earlier would look something like:

type Byte = unique number | 0 | 1 | 0xff;
type Word = unique number | 0 | 1 | 0xffff;

The tag approach is less of a departure from TypeScript's structural roots. Instead of adding "real" nominal types, it instead formalizes the tagging pattern we used above. It adds a tag keyword that you put in front of a type to tell the compiler that that extra information is only for type differentiation purposes, and should never be considered when doing things like populating autosuggest.

type Byte = (number & tag { byte: void }) | 0 | 1 | 0xff;
type Word = (number & tag { word: void }) | 0 | 1 | 0xffff;

It also suggests adding a type Tag to the standard library that would allow writing e.g. Tag<'byte'> as shorthand for tag { byte: void }.

I think my personal leaning at this point is toward the unique proposal, but I unknowingly used "unique" and "tag" terminology in my own implementation, so clearly the language of both proposals resonates with me.

I'd be thrilled to see either one land in the language—it's always nice to have an ad-hoc pattern you're using be formalized. In the meantime, though, what I've got meets my needs, so I'm happy to wait and see how the two proposals evolve and move on to thinking about the next steps for Gumball: cutting-edge pixel graphics on a 4-tone 160x144 display 🤓

There's also a proposal to introduce bigint as a second numeric primitive in JS, but even once that's finalized it won't change the story here when it comes to dealing with smaller numbers.↩
Given that method invocation on structurally-typed values requires reflection at runtime, it makes sense that developers are steered toward nominality. It's interesting that structural types are nevertheless a first class language feature, though.↩
Using a symbol rather than a regular string key like 'tag' as we did above prevents the key from showing up in places like autosuggest, since the only way to access it would be to first import the symbol value.↩