In my post about the mental model of data, I laid out the idea that data (as we commonly talk about it) is really composed of two sides: categories and instances. In that definition, I claimed that categories arise from consensually noticing patterns in the world, and encapsulating them with words.

In this post, I’m going to clarify what I mean by “category” a bit. Categories aren’t just words or labels; they are predicates of arbitrary complexity.

Take A Seat

Let’s examine a relatively simple category: chairs. (You know, the things you sit on.)

Chairs are so common and omnipresent that most of us gloss over the complexity of deciding what is and isn’t included in the category. If you sat someone down and had them explain what a chair is, the conversation might go something like this:

Me: “What’s a chair?”
You: “A chair is a piece of furniture.”
Me: “So are tables. Can you be more specific?”
You: “Sure; a chair is a piece of furniture … that you sit on.”
Me: “So are couches. Can you be more specific?”

If you persisted in playing my dumb game, we’d eventually limn the edges of what 99.99% of people would agree as the boundaries of “chair-ness”. That might look something like the description on Wikipedia, which continues:

Its primary features are two pieces of a durable material, attached as back and seat to one another at a 90° or slightly greater angle, with usually the four corners of the horizontal seat attached in turn to four legs—or other parts of the seat's underside attached to three legs or to a shaft about which a four-arm turnstile on rollers can turn—strong enough to support the weight of a person who sits on the seat (usually wide and broad enough to hold the lower body from the buttocks almost to the knees) and leans against the vertical back (usually high and wide enough to support the back to the shoulder blades). The legs are typically high enough for the seated person's thighs and knees to form a 90° or lesser angle.

While it’s longwinded, this description does a pretty good job of isolating the non-chairs from the chairs, in a way most people would agree with. It would include, for example, most of the entries on this list of the world’s most uncomfortable chair designs, while excluding things like couches, tree stumps, etc.

That still leaves plenty to argue about, of course. If it doesn’t have a back, is it still a chair? Does a kneeling chair merit its name, even though it doesn’t fit this description? My point isn’t to quibble about what is or isn’t a chair; it’s to step back and make a broader point, which is that the act of creating and refining these “boundary conditions” is the act of categorization.

Predicates

Pretend for a moment that the word “category” doesn’t exist. In its place, we could use the word “predicate”, as in, “an expression that evaluates to either true or false”. [1]

By equating a category with a predicate (or, really, a set of predicates), you provide instructions on where to draw the boundaries for a category–a description that can clearly delineate whether any given instance is part of the category or not.

So when I say, for example, that a chair is “strong enough to support the weight of a person who sits on the seat”, I’m ruling out chairs made of aluminum foil or tissue paper. I could relax my predicate and remove this, and that would mean more instances fall into my category (perhaps including gag chairs, or props on movie sets that wouldn’t actually hold up a person’s weight). Conversely, I could also tighten my predicate and say that a chair must have armrests as well, which would mean that fewer instances would fall into my category (bye bye, stools).

Agreement

So what’s the right predicate for the word “chair”? That’s the trick: there isn’t one! There are many contenders, and anyone is free to use words however they want to. You and I could happily use the “chair” to apply to different sets of connotations, based on different lists of predicates. The word we use for a concept (like “chair”) is just a label, a name affixed to a complex network of ideas for brevity (because nobody has time to say “Could you grab another … piece of furniture used primarily for sitting that’s composed of two pieces of a durable material, attached as back and seat to one another at a 90° or slightly greater angle … ?”).

For most concepts, most of the time, a rough concordance is just fine. Nearly all the physical objects we interact with fall under the same category labels for most people, and life goes on. This happens naturally as part of our journey to adulthood; as a child, you repeatedly hear people using words to point reliably to some objects and not others, and you internalize your own list of predicates that mostly matches everyone else’s. (“Ah, that’s not a chair, it’s a couch!”)

There are always edge cases, of course, which is what makes communication between humans so interesting. If I said, “Where’s my wallet?” and you said “Next to the chair", and you were referring to an exercise ball, I might look around in vain and eventually get annoyed at you (“That’s no chair, it’s an exercise ball!”).

In categories, the closest you can get to “right” is popular–that is, shared in common across a large population of human brains, and / or software systems.

One Instance, Many Categories

When you define categories as predicates, it also becomes obvious that any one thing (i.e. instance) in our world doesn’t belong solely to a single category, because there are likely to be other predicates that includes it.

For example, if we agree on a definition of “object” that’s something like “any clump of atoms that stays together with high statistical reliability when forces in a certain range act on it (like, say, pushing it with my hand)”, then that means that all (or at least most of) the chairs in the world fall into this category as well, as “objects”.

We could also say that a “product” is defined as “any object that’s mass-produced according to a schematic and intended for purchase by a person or organization”. Accordingly, nearly all of the chairs in the world are also products.

Categories are also generative; if tomorrow, we all agreed that “chmair” is a word for 5-legged chairs, then some set of chairs would also now be chmairs. This sounds dumb, but remember that categories can be arbitrarily complex. In one way of looking it it, the entirety of Moby Dick sketches out a category called “Captain Ahab” (to which no actual human being belongs, but that’s a separate point).

What’s a “proper name” (like Taylor Swift)? It’s merely a label for a category that has a predicate that evaluates to one single instance!

Bee tee dubs, this view implies that something I said in my original article about the mental model of data isn’t technically correct. I said:

There are lots more instances than there are concepts, obviously; “hydrogen atom” is a single concept, but there are vast numbers of actual hydrogen atoms.

But actually, the reverse is true if you think about categories this way. Every hydrogen atom in the universe could potentially be the subject of a category of its own, if anyone actually made that category explicit. I could say “all the hydrogen atoms to the left of this point are one category, and all the hydrogen atoms to the right of this point are another category”. So paradoxically, the infinity of potential categories is much more vast than the set of instances.

Conclusion

I’m making the claim that every category is secretly a predicate, and that labels tend to obscure this fact.

We don't all share the exact same list of predicates for all our category labels, and this is (partly) why communication is a challenge. It’s also why building software systems is difficult business (for example, getting everyone in a company to agree on a list of predicates for what constitutes an “employee”).


[1] - I'm using the sense of the word "predicate" that's common in the world of logic and computer science, based on a Latin root, praedicare, meaning "to proclaim, assert.". This is related to, but not identical with, the sense from grammar, meaning "a description of or action related to a subject).