Scary AI when?

The AI safety field often takes the central question as “when will it happen?!” That is futile: we don’t have a coherent description of what “it” is, much less how “it” would come about. Fortunately, a prediction wouldn’t be useful anyway. An AI apocalypse is possible, so we should try to avert it.

We have no meaningful way of estimating the probability or timeline for AI catastrophes. We don’t adequately understand the operation even of existing systems, nor their effects, and cannot make predictions about technologies that don’t exist yet. We cannot address the range of possibilities with a uniform first-principles solution (“alignment”). We cannot rule out extreme scenarios of sudden domination or destruction by incomprehensible omnipotent superintelligence, but we also can’t do anything about that.

Scary AI is imagined as the output of a research process that has been highly unpredictable. (In both directions: despite confident predictions, there was hardly any progress in AI for two decades starting around 1990; and there’s been dramatic, unexpected progress during the past few years.) The research community has often had a paradoxical anti-scientific attitude, actively resisting creating the sorts of understanding that would make prediction more reliable.1

Some in the AI safety field believe that research already under way will soon produce Scary AI. That might happen either as an emergent property of existing technologies, or as an added engineered feature that is a small extension of current algorithms. In that case, they say it may happen in as little as three years (and then we will all die horribly). Other AI safety researchers believe that Scary AI will require qualitatively different algorithms, which haven’t been invented yet, so it will be at least a decade (possibly several) before we all die horribly. I have no opinion about this, since it’s a speculation about unknown unknowns.

Since there’s no explanation of what would count as Scary AI, predicting “when it will happen” is foolish.

If “it” is an AI “waking up” into mindness, we have no clue what that would even mean, much less how it might happen, so trying to guess when seems pointless.

If “it” is a machine that is better than humans at all tasks, we’d need to know which are most difficult for machines. Ideas about that have been consistently wrong. Usually people expect the tasks that are most difficult for people will be most difficult for machines, despite extensive experience that shows the opposite. For example, humans are terrible at systematic rationality, so problems which demand it are frequently posed as challenges for AI. Computers are extremely good at rationality—they are mathematical logic made flesh!—so solving puzzles and playing board games are trivial for AI. Household chores (making breakfast and cleaning up after it) are far out of reach for AI now. Are those the most difficult challenge? We have no idea.

If “it” is a sudden acceleration of science and technology, it’s probably more useful to investigate what that would involve concretely, before asking when or how AI might be involved. (I discuss that later in this book.)

A typical prediction is that “it” will most likely happen in twenty or thirty years, or at any rate probably not for a decade, but almost certainly within a century.

This seems intuitively reasonable to me; and I think my intuition is worthless. No one’s estimate seems to have any basis in technical specifics;2 we’re all just saying “seems reasonable I guess.” We might think “this is a very hard technical problem, but not inherently impossible; and once a very hard technical problem is identified, it usually gets solved within a century—often only a few decades, or maybe it takes about a decade if we’re lucky.” But there are exceptions. Also, it’s not clear that categorizing it as “a very hard technical problem” is accurate. It might not be very hard, if approached differently. It might not be a technical problem at all: we don’t have a technical definition of what Scary AI is, and some people think there are metaphysical problems prior to any technical ones.

Nick Bostrom is more cynical:

Two decades is a sweet spot for prognosticators of radical change: near enough to be attention-grabbing and relevant, yet far enough to make it possible to suppose that a string of breakthroughs, currently only vaguely imaginable, might by then have occurred… Twenty years may also be close to the typical duration remaining of a forecaster’s career, bounding the reputational risk of a bold prediction.3

AI safety organizations also want to convince everyone that AI safety is important. Stressing that an apocalypse is likely in your personal lifetime is necessary to get the message across. “Apocalypse this year” is possible, but most people won’t buy that;4 and “by 2100” sounds sufficiently remote that the public will ignore it.

In any case, if an accurate timing of an AI apocalypse is impossible, then what does it matter when it is most probable? What would or should or could we do differently if we knew the time of maximum danger was 2025 or 2030 or 2040 or 2070? Why not do whatever it is now?5

Wanting to quantify uncertainty doesn’t make the attempt feasible or useful. AI risk is in the domain of Knightian uncertainty: of unknown unknowns. In such domains, probabilistic reasoning is unhelpful.6 The best approaches are to seek understanding through cautious exploration toward possible unknown unknowns; to prepare against known, unquantified but plausible risks; to avoid dramatic actions that could make a risky situation still more turbulent and confusing; and to act tentatively on such concrete understanding of possible improvements as one does have.

We should, therefore, work to better understand possible consequences of existing and near-future technologies. We can also take pragmatic measures to mitigate their predictable risks. The middle chapter of this book suggests many feasible approaches. Those also may protect against unforeseen, probably more distant AI scenarios. As a fortuitous side benefit, the same actions protect against hostile humans. Further, we can slow or block any risky development until we’re reasonably confident it is a good idea.

  1. 1.I discuss the AI research community’s reluctance to understand its own creations in the “Backpropaganda” chapter of Gradient Dissent.
  2. 2.Trying to guess when computers will get as many flops as the human brain may be an exception, but I think it’s inherently irrelevant, and also attempts don’t narrow down the timing much. I discuss this “biological anchors” approach in my “Reviews of some major AI safety reports,” betterwithout.ai/AI-safety-reviews.
  3. 3.Nick Bostrom, Superintelligence, 2014.
  4. 4.A deadly global pandemic caused by a bat virus is a tired made-for-TV science fiction movie plot, and therefore can’t happen this year.
  5. 5.In principle, if we expect near-term doom, a Hail Mary pass—looking for a low probability magic bullet—makes sense. If there’s more time, we might prioritize approaches that are unlikely to pay off in the next decade, but which have higher probability of success in the long run. But as Scott Alexander says, “It’s not like there’s some vast set of promising 30-year research programs and some other set of promising 5-year research programs that have to be triaged against each other.” That’s in his “Biological Anchors: A Trick That Might Or Might Not Work,” Astral Codex Ten, Feb 23, 2022.
  6. 6.See Part One of In the Cells of the Eggplant on ways probabilistic reasoning can be misleading; particularly “The probability of green cheese” on reasoning about unique events and unbounded unknowns.