Run-time task-relevant algorithmic understanding

The type of scientific and engineering understanding most relevant to AI safety is run-time, task-relevant, and algorithmic. That can lead to more reliable, safer systems. Unfortunately, gaining such understanding has been neglected in AI research, so currently we have little.

It’s common to explain that current AI systems are “neural networks,” consisting of arithmetical “units” that are trained with “machine learning.” This explanation is true as far as it goes, but it applies uniformly to extremely different AI systems, so it can’t give much insight into what specific ones can or can’t do, or why.

We need a different kind of explanation: one that can tell us how and when a system is likely to fail, and what to do about it

Run time is the operation of an AI system when deployed; it is what users see, and what can affect the world.1 It contrasts with training time, which is the construction of the system using backprop. In current practice, these are entirely disjoint; a network is trained once, frozen, and does not change once put into use. Training time behavior is important for the people creating a system, but irrelevant for everyone else.

We could distinguish three levels of analysis for any computational system.2 The black box level describes what a system does, without explaining why or how. It considers only the input-output behavior. The algorithmic level explains why and how, abstractly. A full algorithmic analysis specifies the transformations that lead from each input to each output. The mechanistic level explains the specific machinery that accomplishes the algorithm.

One run-time black box explanation of a chatbot is “a program that accepts human-language inputs and produces often-appropriate human-language outputs.” One run-time algorithmic explanation is that it alternates matrix multiplications with nonlinearly increasing function applications. One run-time mechanistic explanation is that the matrices are distributed to GPU memory and the multiplications are performed in parallel. These explanations are necessary for people building AI systems, but irrelevant for understanding them from a user or societal perspective.

They are nevertheless commonly given to the interested public, although they are quite useless. They give no insight into what the system is likely to do, right or wrong. They don’t explain why the outputs are “often appropriate”; what sorts of inputs are likely to produce outputs that aren’t appropriate; and what those outputs are likely to be instead.

It’s also common to explain chatbots as “predicting the next word,” which is true, but useless. Lacking the “why,” it completely fails to explain what the chatbot actually does (which is not—from the user’s point of view—predicting the next word). Lacking the “how,” it says nothing about chatbots’ capabilities and limits.

We need, instead, task-relevant run-time algorithmic understanding. That means understanding in terms relevant to wanted and unwanted behavior in the situation of use. It means insight into what a system can or can’t do, and why. That can suggest both useful applications and risks.

Unfortunately, we don’t have much task-relevant run-time algorithmic understanding, because little effort has been made to gain it.

Run-time task-relevant algorithmic investigations are uncommon in AI research. The field rewards system performance, not understanding. It may take a huge amount of work to figure out what algorithms a network uses, because those emerge from training on a particular dataset, and weren’t engineered in.

This is not an adequate excuse, in my opinion. Lack of attention to task-relevant run-time algorithms leads to AI systems that don’t work well, because no one understands them. Then they can make inscrutable mistakes that may cause serious harms.

With task-relevant run-time algorithmic understanding, we should be able to:

We can get this kind of understanding with science; with reverse engineering; and with synergies between the two.

Science, in this case, proceeds by formulating task-relevant algorithmic hypotheses and devising and running experimental tests for them. We create hypotheses via analysis of the task dynamics, knowledge of general backprop behavior, understanding of human psychology and neuroscience, and informal observations of existing systems. I discuss an example in “Classifying images,” later in this chapter.

Due to the complexity and inherent randomness of backprop networks, testing algorithmic hypotheses through examining input/output behavior may often be infeasible. Analyzing them at the mechanistic level first may yield algorithmic level insights. Sufficient mechanistic understanding—an explanation of how specific parts of the system contribute to its overall functioning—can reveal algorithms directly.

  1. 1.Run time is also referred to with other terms, such as “inference time” and “feedforward computation.” Usage doesn’t seem to have standardized yet. There’s a good analogy to compilation time vs. run time in conventional software, though.
  2. 2.This idea is originally due to the pioneering computational neuroscientist David Marr in 1976, and remains influential in cognitive science. I’m using different terms for the three levels than he did, but they’re conceptually similar or identical.