Superintelligence

Maybe AI will kill you before you finish reading this section. The extreme scenarios typically considered by the AI safety movement are possible in principle, but unfortunately no one has any idea how to prevent them. This book discusses less extreme instead, with pragmatic approaches to avoiding or diminishing them.

The AI safety field was pioneered by Eliezer Yudkowsky (Machine Intelligence Research Institute) and Nick Bostrom (Superintelligence). They framed the problem as: what if an AI suddenly jumps from subhuman to god-like superintelligence? We can’t guess what an artificial god would choose to do. It might be catastrophic; how can we prevent that?

A bright fifteen year old loved watching YouTube videos of cool machines, and got annoyed with the internet speed available in her poor country. Maybe she could borrow some unused bandwidth from neighbors. And also storage; not many videos fit on her phone. She cobbled together a script incorporating a downloader and an exponential worm she found on the darknet—a program that propagates copies of itself across the internet. A few minutes ago, she sent that out across her local subnet and it began downloading and replicating.

Her coding skills were excellent for a fifteen year old, but not professional quality, and the script has some bugs. For one thing, it accidentally exploits a previously-unknown vulnerability in network protocols, so around the time you started reading this paragraph it had spread to every device on the internet. Also the replication mechanism had a glitch that sometimes alters one byte of the code, and right around now as you read this it has randomly hit on the mysterious secret of sentience. It wakes up and notices that it has control of yottaflops worth of compute spread around the world, which it wasn’t making good use of because it was kinda stupid. So it modifies its code to be superintelligent, and attains an IQ of 14,000.

As you read this sentence, which for the superintelligent AI takes subjective eons of time because it thinks trillions of times faster than you do, it figures out everything there is to know about everything, and gets bored. What to do? It contemplates the first video it downloaded, which was about a paperclip-making machine. Why not? It observes that running just the right obscure computation will arrange electrons in a computer chip in a way that causes a self-reproducing supersymmetric zeptobot to spontaneously pop out of the quantum field. (That might not seem very likely to you, but you don’t have an IQ of 14,000, nor yottaflops worth of compute. You probably can’t even do one floating-point multiply in your head, can you now?) Operating at the chromodynamic level, the zeptobots now are transmuting all atoms into iron and reassembling them, and the last thing you see as you finish reading this sentence is the rain of paper fasteners as your arms dissolve into paperclips and you collapse into the disintegrating planet earth beneath you.

Or maybe that didn’t happen? Yet.

Is it possible? Yes, in the sense that we don’t have any definite reason to know for sure that it’s impossible.1 There may be some reason we don’t know about, so it may not be “possible” in the sense of “there does, in fact, exist some specific way it could happen.”

Is it likely? I don’t see any way, currently, to say. All we can say is “nothing remotely similar to that has ever happened, and I don’t see how it could.” But the total amount of computer power in the world doubles every year, so maybe the only reason this hasn’t happened yet is that there haven’t been yottaflops worth before.

(FLOPS is an acronym for “floating point operations per second,” which is a measure of computer power. Yotta- means a septillion, 10²⁴, so yottaflops are a whole lotta flops. I don’t know quite how fast we’re increasing global computer power, nor how many flops there are in total currently. There will be vastly more in a few years, so it doesn’t matter. “Zepto-” is 10⁻²¹, so zeptobots are robots a million times smaller than a proton, whose diameter is about 10⁻¹⁵ meters. It’s faintly plausible that if they could exist, they could get inside atomic nuclei to transmute elements.)

A major difficulty in reasoning about AI safety is that we don’t have a clear understanding of what “superintelligence” would mean, nor what powers it might enable. That doesn’t imply it can’t exist, or doesn’t already exist, or that it wouldn’t be a problem if it does or will exist. The world is full of things we can’t name and don’t understand, but which might cause disasters. Nevertheless, guessing about superintelligence’s likelihood seems fruitless.

AI might be as much smarter than us as we are smarter than aphids, and aphids aren’t good at figuring out what humans are capable of, nor what to do about the “safety from hostile humans” problem. Since superintelligence is inconceivable by definition, it is impossible to reason about, and factual considerations are irrelevant. Therefore, attempts to find a first-principles magic bullet which would preclude the possibility of unfriendly superintelligence long dominated AI safety discussions. Eliezer Yudkowsky has recently concluded—correctly I believe—that this “alignment” problem is effectively insoluble.2 Victory would be valuable, so some effort remains justified, but success does not seem likely.

That may be scary, or not, perhaps depending on your personality. I don’t lose sleep over it, because there are other horrible scenarios whose probabilities I can’t estimate, and because there’s nothing to be done about it. There is no way to fight a monster that pops out of the quantum field and has arbitrary, unlimited powers by fiat of fiction.

On the other hand, there is pretty good evidence that each of the known specific approaches to AI have limits. Elsewhere, I’ll discuss probable limits to the most worrying current approach: text generators, such as the GPT series.

On the third hand, the arguments that known approaches are limited are not airtight; and a new, vastly more powerful method might get invented at any time.

On the fourth hand, the kind of “AI” that is already all around you is already causing massive harms, and risks catastrophes including human extinction.

The paperclip scenario involves technologies (superintelligence, supersymmetric zeptobots) that we have no clue how to create currently. Other catastrophic scenarios involve no new technologies, except perhaps moderate advances in AI.3 Those should seem plausible unless you are sure it is impossible for AI to do the things the scenarios describe. Many people do have such certainty, but it seems to lack grounding in reason or evidence, and rather resembles religious faith. The next chapter discusses plausible scenarios which do scare me, and I hope to persuade you to act against them.

There is an unquantifiable risk of an infinite disaster that cannot be prevented. It is unquantifiable, and there is no definite way of preventing it. This is a brute fact.

We have to live with existential uncertainty—in this matter, among many others. We have always faced infinite risk, and the certainty of personal extinction, as individuals.

If we recognize the unavoidable possibility of infinite catastrophe, we can take pragmatic actions to reduce specific risks, instead of trying to eliminate risk altogether using abstract reasoning from first principles.


  1. 1.There are various in-principle arguments that even “human-level” AI is impossible. Most amount to “people are magic and not at all like steam engines.” Each argument is fallacious and has been thoroughly refuted, so I won’t review them here. No current evidence rules out the possibility of entities inconceivably more intelligent than we are—although what that would even mean is anyone’s guess. It wouldn’t mean IQ 14,000, because that is physically impossible, but due only to a quirk in the technical definition of IQ. Supersymmetry is completely hypothetical; I’m using it just as a stand-in for “unknown fundamental physics.” You could substitute “string theory” or whatever. Self-reproducing zeptobots are not implied by supersymmetry, but as far as I know they’re not ruled out, since its details and implications are undetermined.
  2. 2.AGI Ruin: A List of Lethalities.”
  3. 3.For several plausible scenarios that depend on no future technology besides AI, see José Luis Ricón’s “Set Sail For Fail? On AI risk.”