Defining The Indefinable

“It all comes, I believe, from trying to define the indefinable.” - Charles Darwin

If you see a pair of birds, and you declare that one is a Golden Naped Finch, and the other is a Golden Winged Grosbeak Finch, then you’ve made a declaration about the essence of the two birds. Each goes in its own conceptual box, labeled with a name; they are different kinds of things.

This humble act–categorization–is something we do thousands of times a day. And mostly, it's entirely uncontroversial; nobody mistakes dogs for cats (with the possible exception of my 3-year-old, who sometimes has a very capricious take on dog / cat categorization).

It turns out that when you dig a bit deeper, the categorization of things is more subtle than it appears. There are always different ways to define categories, and the choice you make has everything to do with what you're trying to accomplish.

A Taxonomist's Eye

When I call something a Golden Naped Finch (or “Pyrrhoplectes epauletta” if you prefer the Latin), I’m making a claim about its classification under a well-known formal taxonomy: the common international system of Linnaean biological taxonomy. This system is well-established, and the modern version is largely subscribed to by millions of experts around the world. So, if I make a claim that disagrees with that consensus–say, I call this same bird a Spectacled Finch–then I am simply wrong, by the rules of that system.

Categorizations like this are what you might call formal; that is, they’re tightly bounded by explicit agreements, and intended to be prescriptive. Unlike biology, though, most domains don’t have a formal system; there's no consortium of experts to help us agree on how to classify all the genres of heavy metal (though if there were, it might resolve one of my favorite lame edit wars on Wikipedia: whether Limp Bizket is “nu metal/rapcore” or “rapcore/nu metal”). For most subjects under the sun, the association of category labels for concepts is a rough, fuzzy, collaborative, metaphor-driven process.

You can group things in many different ways, of course. I could categorize birds entirely by the region they live in, or their weight, or their coloration, or any number of other attributes. But those seem ancillary; the categories we're talking about here are more essential; they're answering the question, "What kind of thing is this, really?"

This seems, at first, like a very simple and sensible question, because every thing in the world is, deep down, a single kind of thing. Right? I mean, sure, categories can be members of more general categories–a Finch is a bird, which is an animal, which is a life form. But, at least in my own subjective experience, most of us have an expectation that each object of our awareness is really "of some determined type". And further, we harbor a reasonable hope that when we box things up–that is, when we connect names and concepts–that our categories will conform directly to reality in some way. If different people attempt to categorize the same things, we'd still hold out hope of coming up with a single categorization, because ... well, that's what these things really are. Right?

Unfortunately, this hope is not as reasonable as it seems.

Dissension In The Ranks

Let me pause here to mention that I am not a trained biologist. I’m a software engineer, and mainly writing as someone who wants to understand the way concepts are represented in formal systems like computers (more on that here). So take anything I say about biology with a grain of salt, since it’s really just based on the reading of a layperson[1]. (And I’m always willing to be corrected. :)

As a formal system, biological taxonomy gives us a nice example to look at through a critical lens. Does our traditional system of species names really “conform to reality” according to our best understanding of science today? Does it make internal sense, providing good explanations for why two animals should be in the same box, rather than different boxes? Can it explain why I can’t get a hybrid between two damn nearly identical-looking finches, but I could be the proud owner of a Sheppug?

If you drop down the Wikipedian rabbit hole of the history of biological taxonomy, you’ll get a quick tour of all the ways in which we’ve tried to approach this effort of critter-boxing over the centuries. Early versions, especially those prior to Darwin’s theory of evolution, were all based on easily detected surface features, like how living things look, or what medicinal use they could be put to, or whether they can make babies together (in exactly the way that horses and seals can’t).

In modern times, however, we have achieved a broad consensus on a couple important ideas that change how we categorize living things. First, Darwin's notion of evolution through natural selection; and second, the grounding of evolution in the specific biochemical mechanism of genetics. The essential pattern of life is DNA, and so the "tree of life" is really a "tree of DNA patterns"–patterns that all evolved from common descent[2].

So, our two finch species have DNA patterns that are different; they're relatively close to each other in the grand scheme of things (say, 99.99%, for sake of argument), but not identical. All the individual Golden Napers have much more similar DNA to each other (with, say, 0.0001% difference between individuals of the species) than they do to any individual Golden Winger. From a mathematical perspective, the two species are clear “lumps” of DNA patterns. And both species, being types of finches, have much more similar DNA patterns to each other than to chickens, or lizards, or broccoli.

Of course, the conceptual question isn’t just about the “leaf nodes” of the tree of life–the species (by whatever criteria you define them). It’s also about the interior nodes, the groupings of similar species (and groupings of those, and so forth). The original Linnaean system of taxonomy asserted a fixed number of “ranks”–the increasingly general classifications you probably learned in grade school, going up from species to genus, family, order, class, phylum, kingdom, and domain. (Did you learn "Dear King Philip Came Over For Good Soup" or some other such mnemonic in school?)

It's a very tidy system ... but nature knows no such rules. Members of a population all differ at a genetic level, and this variation provides grist for the mill of gradual evolution by natural selection. But when some condition, like geographical separation, allows portions of a population to evolve independently for long enough, then procreation is no longer possible between the two groups, either because the genome has changed too much, or because other factors (like behavior or physiology) prevent it. This is called "speciation" or "cladogenesis", and it's not beholden to any rules about there being exactly eight discrete ranks (order, class, etc); you can get trees of arbitrary depth in branching.

The field of phylogenetics seeks to account for this, by boxing up living beings based on common ancestors for species, in a great big hierarchy that could have any number of branches or "clades". The emerging naming standards in this field, PhyloCodes, takes this new more sophisticated approach and removes the limits of a pre-existing cadre of ranks. (It’s interesting, if not surprising, that the more simpleminded old system of fixed ranks is still what you learn in school; one has to start somewhere, I suppose.)

This is a better and more nuanced approach to groupings. That said, it still leaves a lot of room for quibbling about the best methods of discovering the right boxes for species. On the one hand, followers of cladistics (from Greek κλάδος, for "branch") will assert that this is mainly a mathematical operation; if you take the DNA of several species and work backwards, you can impute that there must have been a common point in "DNA-space" that preceded all of them before they diverged, even if no evidence for such a creature exists (today or in the fossil record).

On the other hand, the paleo-biologists (in the tradition of Stephen Jay Gould) have focussed on the anatomical features of the fossil record, and say it's not a legal move to just impute creatures that you haven't actually found. There’s much debate about which approach is more likely to give us the correct picture of what actually happened over millions of years, a debate that rages in part because hard evidence–i.e. fossils that have survived for us to look at–is hard to come by.

Personally, my (completely uneducated, and data-friendly) money is on cladistics. Direct analysis of genes gives us much more conclusive evidence than does morphology (looking at the shapes and functions of body parts) because it's much more information-dense; DNA consists of billions of bytes of information. When you get a pattern match at that level of detail, the likelihood of it being by chance is low–much lower than the likelihood of, say, two creatures developing things that look and work like eyes independently.

This is a vast oversimplification (of a domain in which I’m not an expert, remember). But the point I’m trying to make is that you can have different (and potentially incompatible) views of the same reality, and those views might show up as disagreement over the “same” concept. Sometimes those differences are normatively better, and sometimes they’re just different. It's a failing (and / or feature?) of human language that we end up using the same words for such different things.

Beware Species Nihilism

When you zoom out over evolutionary time, what you see is a much less static picture than even this more nuanced taxonomy supposes. If you lived for, say, a billion years, then you'd have very little use for such a system; in all its vast complexity, it’s still just a snapshot in time, a few minutes of Sagan’s cosmic calendar. It’d be like coming up with individual names for each second’s worth of stock prices on the New York Stock Exchange; possible, but pointless. It’s only our perspective in time, with regard to this relatively slowly changing picture, that makes it worth the effort to even name the little boxes.

The gradual and continuous nature of evolution and cladogenesis isn't a matter of universal agreement. Darwin was originally surprised (and challenged) over the fact that the fossil record doesn't show a whole lot of slow, gradual evolution between species; new ones tend to pop into view, all at once. To account for these discontinuities, debates have raged (and continue to rage) over ideas like punctuated equilibrium. But regardless of your stance how gradual it is, there's broad agreement that new species do in fact undergo gradual evolution from similar forms, into forms that can no longer cross-breed. At any given moment, there are some species that have been genetically stable for a long time, and others that might be in the midst of a transformation even as we speak (ring species, anyone?).

If the lesson you take away is that species is a worthless concept–or worse, that it doesn’t exist at all–then you’re in dangerous territory. Species is clearly a strong pattern in reality, regardless of which perspective you look at it from: interbreeding, origination, genetics, appearance, behavior, etc. In our current real world, there are many thousands of birds that look and act exactly like Golden Naped Finches, and many thousands of Golden Winged Grosbeak Finches too. But there are exactly zero Golden-Naped-And-Winged-Grosbeak Finches (at least, that anyone has ever spotted). Nature isn’t a smooth continuity of every arrangement of features; it’s lumpy.

Jerry Coyne recently made this point (in his usual forceful style) about philosopher Henry Taylor’s oversimplified conclusion in the article What Is A Species?. Claiming that speciation is “a complete mystery” is ridiculous position; just because a concept has multiple variations or interpretations, that doesn’t make it impenetrable to human reason (as Bill O’Reilly might put it: “birds go in, Golden Naped Finches come out; you can’t explain that!”). Species is absolutely a real, meaningful, and rigorous concept, even if the set of possible interpretations is broader and more nuanced than most people realize. The fact that it's fluid over long time scales doesn't take away from the fact that stable species do exist during that process, and follow rules.

To be a little more charitable, I do think Taylor is attempting to point to a broader philosophical question that’s worth considering. If we take any categorization to be an ontological primitive, an immutable given of nature like Planck’s constant, then we’re probably over-essentializing. As a physicist like Sean Carrol might say, "reality" is just the quantum wave function, and everything else is just fuzzy patterns on top. Species may be a concept that's fuzzier than most people think, and for sure we haven't said the final word on the optimal way to measure and study it. But that's not a license to declare the death of the concept of species.

Don’t Essentialize Categories

What this all points to, in my mind, is that any category you want to come up with is ultimately use-dependent. The simple interpretation of “species” that most of us carry around–“concrete, immutable boxes arranged in a taxonomy”–works well for some cases but not others.

For example: a book on birdwatching has exactly one use for the word “species”: a well-defined categorization of which entities deserve their own headings in the book’s table of contents. If you were making a database for bird watchers, species might be the table and each row would be one species (see also, the Mental Model of Data).

Conversely, a book on biological history and cladogenesis, like Coyne’s massive Speciation[3], goes to great lengths to introduce us to a vastly more nuanced use of the word, and attempts to ground the stability of species in good explanations. If you’re in this realm, then attempting to use a database where each row is one discrete species could be extremely limiting; you need something more like a semantic graph of DNA patterns as your "species" representation.

The underlying point here is this: it's not a bad impulse to attempt rigor in how we categorize things, but reality is usually trickier than we expect. Ultimately, there are patterns with similarities, and sometimes it's useful to pin those patterns to a name, for a specific purpose. Be aware, be explicit, and be willing to coexist with other interpretations, even ones that seem incongruent.

In a future post, I hope to talk about how this same idea works its way into other problems of category.

[1] Mayr, Ernst. What Evolution Is. New York: Basic Books, 2001. (Amazon). This is a terrific book for anyone looking for a basic grounding in modern evolutionary theory.

[2] Modulo those tricksy bacteria that can actually just swap genes rather than following standard evolutionary paths, which is yet another fascinating wrinkle in our attempts to categorize things that I won't go into any further detail on here.

[3] Coyne, Jerry A., and H. Allen Orr. Speciation. Sunderland, Mass: Sinauer Associates, 2004. I haven't actually, you know, read this book, owing to the fact that it's a couple hundred bucks and I'm not a biologist. But by all accounts, it's great.

← sntl.st

A Taxonomist's Eye

Dissension In The Ranks

Beware Species Nihilism

Don’t Essentialize Categories