A Whole New World
Well hello! It’s been, uh … (checks watch) … four years since my last post! That’s not very timely! In my defense, I was … busy doing stuff.
Some very heavy shit has gone down since 2021, though, hasn't it? This space I write about here—the intersection of concepts and computers—feels like it’s been reinvented overnight. Generative pre-trained transformers, large language models, vector embeddings, agents? It’s a lot! Personally, my job is being transformed fundamentally, and if yours hasn’t yet, it probably will soon. (For a foray into what I do, and how I think about using AI tools as part of my job, might I recommend this recent podcast I was interviewed on?)
So I’m firing up the old blogging machinery again. In part that’s because I still have lots to say on topics like categories, attributes, and the mental model of data. In fact, I have a LOT left to say, including dozens of half-finished drafts.
But now, suddenly, I also feel like there are a whole bunch of new doors to open, and new ways to answer and frame these same questions. So that’s what I’m going to do!
To start that off, here’s a little catch-me-up on why this class of AI models deserves (IMO) serious consideration in a forum like this. You probably don’t need convincing if that if you’re reading this at all, but here it is anyway, in lay person terms. :)
The basic idea of a pre-trained language model is not all that complicated to understand. You start with a vast amount of text written by people, and then you train a very large neural network on the task of guessing the right word anywhere in that text.
Intuitively, you might imagine that this amounts to just memorizing the whole thing, right? But it can’t just do that, because the input is many many orders of magnitude larger than the resulting model. So it has to compress it, it has to be able to guess those words without memorizing them exactly[1]. It has to abstract some “essence“ out of that text, at scale.
To do this well, it not only needs to get very good at language, but it also needs to form representations of lots of concepts in the world itself (because people talk about real stuff, at least sometimes). Consider, for example, these two sentences:
“the glass ball hit the metal table and it broke”
“the metal ball hit the glass table and it broke”
To correctly identify the referent of the word “it” in each sentence, you need more than just grammar; you need to know the properties of glass and metal, and to reason about how they would interact in this situation in order to get the right answer (which is, obviously, that the metal always breaks the glass, rather than the other way around).
This kind of world-model building is deeply resistant to top-down logical approaches (“hey, let’s make a big table of every material in the world and whether it would break every other material!”). But it arises naturally, in the course of simply trying to guess the next word for every word ever spoken by humanity (give or take), when that prediction engine is backed by an efficient architecture for guessing and refining and run at inconceivable scale.
The trick of LLMs was a kind of figure-ground reversal: instead of just having it predict the next word, what if we just start with a sample of a few words (a “prompt”) and let it loose predicting a whole string of words that follow, one after the other. And that (plus some other tricks) gives you ChatGPT.
When I first saw this in action back in last 2022, I had the same mental model as most other folks: yes, this is cool, but it’s … limited. It’s not actually doing anything like what we would call “thinking” or “reasoning,” and it doesn’t have any direct knowledge of the world, so these models get lots of stuff hilariously wrong, they hallucinate, and it’ll be years before they’re really useful for anything other than a party trick.
But that attitude didn’t last long. It was quickly clear that what had actually happened was much more important. These models took a sizable slice of the collective intellectual output of humanity, and turned it into a tool. The AI didn’t create this knowledge, of cousre; human civilization did (as Andre Karparthy succinctly put it, a big zip file of the internet).
But to some approximation, the AI converted that knowledge into a predictive engine, which can perform even fairly sophisticated intellectual work, like coding, mathematics, and complex semantic interactions (like summary and evaluation).
This was already kinda true with early models like GPT3, but the past three years have made it clear that by scaling it, we increase not just quantity, but quality—new emergent capabilities are increasingly showing up, just by virtue of making these models bigger.
What will this mean for us humans, then? I like the perspective that Thomas Pueyo gives here, which is that at a minimum, these tools are going to dramatically multiply the productivity of individual humans who do intellectual work, in the same way that tractors changed farm work. I'm seeing that in my own work! And changing the efficiency of knowledge work is going to change the landscape of work itself, in a way that’s similar to how industrialization changed farming and manufacturing. And it’s happening faaaaast.
Now, to be clear, not everyone welcomes these new-found AI superpowers. Some people (like Gary Marcus and Melanie Mitchell) are simply skeptical of the claims, and suggest that is just party tricks, not “real intelligence” (whatever that is). Others are concerned about the ethical implications (bias, disinformation), the legal implications (copyright), the economic implications (jobs), or the humanist implications (how will we make sense of anything?). Some (like Eliezer Yudkowsky) even think this spells literal doom for humanity, in short order.
Like everyone else, I don’t have any real idea about how this will play out. But I do have a pretty strong perspective that it means that the relationship between concepts and computers has entered a new era. And that's where we'll take things next.
1: At least, not most of the time. Certainly in some cases, it does amount to being something like memorization; if the model is trying to complete a sentence like “in 1983, the Prime Minister of India was …”, then “Indira Gandhi“ is the only correct answer. But that’s the exception, not the rule.