On Collaborating with an AI, Writing, and Innovation


Heads up: this week’s post is a bit different, in that it’s not built primarily from academic papers. We’ll be back to the normal format in 2021 with a post about learning-by-doing and learning curves. Happy Holidays!

GPT-3 is OpenAI's new generative language model. It’s built by feeding a gigantic neural network hundreds of gigabytes of text, pulled from books, wikipedia, and the internet at large. The neural network identifies statistical patterns in text, which are translated into the structure and weights of the neural network. The upshot is that you end up with a predictive text generator. Prompt GPT-3 with a bit of text and it will complete the text based on its underlying model of statistical regularities in text. The results range from spooky good to funny bad.

GPT-3 could end up having a big impact on innovation and research, but that’s not what this post is about. Instead, I’m going to talk about how writing stories with GPT-3 is kind of a neat metaphor for innovation in general.

Let's look at an example. The Eye of Thuban is a short incomplete science fiction-fantasy story about a woman named Vega who lives on an alien world, longs to be a space pilot, and encounters a mysterious artifact that seems to give her powers. It's a joint effort between GPT-3 and the human Arram Sabati, and while it's not particularly good, it is coherent and recognizably a science-fiction story.

Sabeti generated the story by prompting GPT-3 with the following:

This novel is a science fiction thriller that can be thought of as a strange mix of the fantastic and whimsical worlds of Hayo Miyazaki and Ian M. Banks Culture novels. They’re set in a post singularity world where humanity and its descendants span thousands of worlds, and sentient super intelligent ships with billions of people living on them wander the galaxy.

Chapter 1.

Starting with that, GPT-3 wrote a few sentences of story by predicting what kind of text would be most likely to continue on from this prompt. Sabeti accepted or rejected these sentences based on his own judgment: was it coherent? interesting? If not, he would prompt GPT-3 to try again. If he liked it, Sabeti would add the GPT-3 generated text to what he had and use the story so far as the next prompt for GPT-3. It would generate a few more sentences, based on what came before, which Sabeti would accept or reject. Repeat, until you have a story.

Who "wrote" the story that emerged? After the initial prompt, GPT-3 wrote all the text. But Sabeti played a crucial role in selecting text that he liked most, in the process shaping future prompts and pulling the story in one direction or another.

More mysteriously, we have very little understanding of what GPT-3 was "thinking" or "drawing on" when it composed it's contributions. We don't know what words or phrases in the prompts caught GPT-3's attention, and where the patterns it used to generate text originated in it's training data. To Sabeti, it's a black box. Indeed, even to the team at OpenAI that created it, GPT-3 is largely a black box. The way neural nets encode statistical patterns in data is useful, but hard to translate into the kind of explanations that people understand, at least at the present.

So, in essence, through a process Sabeti doesn't understand, text is generated. Sabeti evaluates it critically, and then chooses whether to leave it or try again. At some point, he is satisfied with the results and a story is published. This isn't writing as we know it. 

The Mystery of Writing

Or is it? In fact, the kind of creation described above is remarkably similar to how some writers describe their process. George Saunders writes:

we often discuss art this way: the artist had something he “wanted to express”, and then he just, you know … expressed it. We buy into some version of the intentional fallacy: the notion that art is about having a clear-cut intention and then confidently executing same...

The actual process, in my experience, is much more mysterious and more of a pain in the ass to discuss truthfully.

Authors often do not exactly understand where their own ideas come from. Stephen King, in his memoir On Writing writes:

Good story ideas seem to come quite literally from nowhere, sailing at you right out of the empty sky. Two previously unrelated ideas come together and make something new under the sun. Your job isn't to find these ideas but to recognize them when they show up.

In the same memoir, King describes his process of organic and intuitive writing. He advises writers to come up with a scenario and discover what happens to the characters, rather than setting up the plot in advance and maneuvering them through it. After finishing the first draft, he tells writers to reread it to discover what they were "really" writing about. Rather than sculpture, King likens the whole process to unearthing a fossil. The story is discovered, rather than planned.

King and Saunders don't know where they are going; but they still get somewhere interesting. How? Part of the answer is taste. They may not be able to create a satisfying plot twist or turn of phrase on command, but when they see one, they recognize it. Notice how in the previous quote King says:

Your job isn't to find these ideas but to recognize them when they show up.

If you are capable of creating work and then evaluating it, you are capable of using an evolutionary algorithm to write great fiction. Saunders is most explicit about how this works:

My method is: I imagine a meter mounted in my forehead, with “P” on this side (“Positive”) and “N” on this side (“Negative”). I try to read what I’ve written uninflectedly, the way a first-time reader might (“without hope and without despair”). Where’s the needle? Accept the result without whining. Then edit, so as to move the needle into the “P” zone. Enact a repetitive, obsessive, iterative application of preference: watch the needle, adjust the prose, watch the needle, adjust the prose (rinse, lather, repeat), through (sometimes) hundreds of drafts. Like a cruise ship slowly turning, the story will start to alter course via those thousands of incremental adjustments.

More generally, in Old Masters and Young Geniuses: The Two Lifecycles of Artistic Creativity David Galenson describes two broad approaches to artistic creation: conceptual and experimental. The experimental creator is unsure of where they are going and proceeds by creating, evaluating the results, tweaking them, and repeating over and over again. Galenson provides a wealth of anecdotes supporting this creative style for many famous writers: Charles Dickens, Virginia Woolf, Mark Twain, etc. 

Just as the blind forces of evolution are capable of creating highly complex creatures via a process of mutation, selection, and retention, this process of myopic, evolutionary writing is capable of generating work that surprises it's own authors. They didn't know they would end up here, but they recognized it was a good place to be, once they did. Again, Saunders:

The interesting thing, in my experience, is that the result of this laborious and slightly obsessive process is a story that is better than I am in “real life” – funnier, kinder, less full of crap, more empathetic, with a clearer sense of virtue, both wiser and more entertaining.

This method of writing sounds remarkably like collaborative writing with GPT-3 to me. Through a lifetime of reading and writing, great writers develop their own intuitive subconscious map of the regularities in good writing. Like GPT-3, they have a kind of prompt - where are they starting from - and like GPT-3, they have some kind of internalized model of what writing looks like, but for which they might struggle to explain the exact sources and reasons. Then, once their intuition or subconscious tosses out some text, they can evaluate whether it’s any good. If so, they keep it and “add it” to the prompt. If not, they try again. 

Indeed, François Chollet, makes this analogy pretty explicitly:

Not so different after all

That said, there are many differences too. The extent to which human brains are “like” digital neural networks is not clear, and it may be that the differences are really important. But for the purposes of this essay, that’s not a big problem. What matters is that both processes involve the generation of text via methods that are hard to describe but effective, and then a second step of conscious evaluation of this output.

A more important difference is that the unedited text generated by Saunders or King might be much better than the text generated by GPT-3. Certainly the Eye of Thuban is not a particularly compelling story. In writing The Eye of Thuban, Sabeti notes that sometimes GPT-3 adds text that is logically inconsistent with what has come earlier - we can assume King and Saunders usually don’t do that.

The quality of the text GPT-3 generates isn’t actually so important though, at least for the end result. Instead, the quality of the unedited text generated by Saunders, King, or GPT-3 mostly has an effect on the amount of time that must be spent curating the text. Indeed, if one was willing to put in a lot of time (like, longer than the lifespan of the universe) monkeys banging on keyboards could eventually replicate Shakespeare. GPT-3 is much, much better than that. It’s also clearly an improvement over what’s come before. The pseudonymous Gwern has extensively played with GPT-3 and it’s predecessor GPT-2, writing:

With GPT-2-117M poetry, I’d typically read through a few hundred samples to get a good one… But for GPT-3, once the prompt is dialed in, the ratio appears to have dropped to closer to 1:5—maybe even as low as 1:3! I frequently find myself shrugging at the first completion I generate, “not bad!” (Certainly, the quality of GPT-3’s average prompted poem appears to exceed that of almost all teenage poets.) I would have to read GPT-2 outputs for months and probably surreptitiously edit samples together to get a dataset of samples like this page.

Sabeti claims to have written the Eye of Thuban in the course of a few hours, whereas Saunders describes potentially hundreds of iterations on each draft. It may well be that collaborative writing with GPT-3 could come up with something really good, if one was willing to put in a lot more time. 

What GPT-3 does is reduce the time spent iterating to good work, by exploiting regularities in language to avoid wasting the curators time selecting obviously bad passages. 

Saunders and King do the same thing - a lifetime of reading, writing, and thinking critically about what they read and write, has led them to internalize “good writing” such that the raw text they come up with is not bad, even before they selectively curate their own writing. Stretching the analogy, we might view the texts Saunders and King read as the training data they use to make their inner models of good writing. Unlike GPT-3, some of the curation that Sabeti performs on GPT-3 probably occurs in a writer's head, rather than through the process of actually writing text and evaluating it (think of a writer mentally searching through a series of adjectives to find just the right one before putting pen to paper). To the extent the uncurated text is already good, this speeds up their iterative process.

There is one respect, however, in which time may not be sufficient to offset any weaknesses in GPT-3’s generated text. It may be that there are certain turns of phrase and sentences that GPT-3 will never produce, simply because they lie so much at odds with its internalized model of the regularities in writing. If this is true, then no amount of time would suffice to generate good writing from GPT-3. In this regard, GPT-3 could potentially be worse than monkeys pounding on keyboards, since they are at least capable of generating any text. The tradeoff seems unavoidable; when you exploit regularities in language to weed out certain passages and save the collaborator time, you also might weed out good passages that do not exhibit these regularities. We just hope these cases are rare.

Beyond Writing

What we have then, in both collaborative writing with GPT-3 and a certain method of professional writing, is an iterative process of generation and evaluation. Good writing does not emerge fully formed - instead, many texts are generated, and good ones are retained. If we were to generate our texts randomly, we could still generate great writing - but it would take a very long time, because most random text is gibberish. GPT-2 and GPT-3 greatly reduce the time necessary to generate great writing by restricting the generation of texts to those more likely to be good (they are coherent, and obey certain regularities in language). Great writers do both sides of the process - they have good subconscious models of writing, so that their raw text is reasonably good, and they have great taste, so they can prune their output and so direct their writing towards interesting ends.

Collaborative writing with GPT-3 is more than an analogy for good writing though; it’s an analogy for innovation in general.

Let’s pause briefly to consider what “innovation” is. To me, innovation is the emergence of interesting, reproducible novelty. Novel, because it must be something that has not been done before. Reproducible, because the innovation cannot be one-time miracle, but a new class of thing which can serve (in principle) as a blueprint for more of its kind. And interesting, because otherwise, who cares? The challenge with innovation is that most novel things are not interesting. How to find the few that are?

In practice, what we do is use some kind of simplified model of reality to guide our efforts, so that we don’t just try things at random. That “model” could literally be a scientific model of the physical world; these are, after all, ways of representing observed regularities in data. But they could also be much more prosaic: rules of thumb, analogies to other examples, or expert intuition (built up from long study of relevant precedents). When these models are good, they allow us to develop ideas and technologies that would take an eternity to arrive at by evolution or random chance. When they are bad, they restrict us and prevent us from trying things that would have worked if we could only take off our blinders.

In the analogy of collaborative writing with GPT-3 to innovation, our models of the world are analogous to GPT-3, but it is the world itself that is analogous to the human collaborator. The world is messy and complex, and even the best models may miss things. It is only when innovations are brought into reality - whether as clinical trials, prototypes, product launches, or startups - that we see if they are, in fact, “interesting.” Those that are, are retained. Like GPT-3’s text in The Eye of Thuban, they get added to the “prompt” and further work builds on the ideas and innovations that have been retained. For the rest, we “go back to the drawing board” (our simplified model of the world) and try again.

If you liked this post, you might also like the following:

Progress or stagnation in film?

Innovation as combination

Next week, “How useful are learning curves, really?”