Comments on “Superintelligence”

Add new comment

n=0 -> n=1

Nick Cammarata 2023-02-14

WRT superintelligence you say All we can say is “nothing remotely similar to that has ever happened, …”. But isn’t humanity a pretty clearcut example of that? A huge interlinking ecosystem of animals completely dominated in power in a blink (on an evolutionary timescale) by a new species that’s smarter, that has the power to end life on Earth if it really wanted to

"Nothing remotely similar"

David Chapman 2023-02-15

Interesting point—I hadn’t thought of it that way. Thanks! I guess the question would be whether it’s similar in a way that’s predictive. I suppose we can’t know (at this point).

Unusual statement of the paperclip maximizer problem

Stephen Prior 2023-02-16

I’m not a fan of the paperclip maximizer as a demonstration of AI risk, being unrealistic enough that it can be dismissed as fantasy. A more realistic example now would be a bitcoin maximizer. How I’ve usually seen it though is as a demonstration of risks around alignment - that even a goal as boring as creating paperclips can be disastrous if it drives a sufficiently powerful, uncontrolled and poorly aligned AI system. Having the system arise spontaneously and decide itself to create paperclips sort of misses this point. Are you making a different point here that I’m missing?

risks versus facts

Brian Slesinsky 2023-02-16

Haven’t read the rest yet, but I’ll post my first impression before moving on: there’s a sense of urgency that, so far, isn’t justified, and, so far, seems to be based on bare assertions and philosophical trickery. It would be better to acknowledge that you haven’t shown anything yet.

What kind of trickery? For example:

There is an unquantifiable risk of an infinite disaster that cannot be prevented. It is unquantifiable, and there is no definite way of preventing it. This is a brute fact.

In this case, whatever your reasoning might be, such an assertion doesn’t seem to be about a concrete fact as we normally use the term? In what sense does an “unquantifiable risk” exist in the real world? This is more like a prediction or scenario than a thing that actually exists.

Predictions and scenarios, however dire or justified, aren’t facts, though they might be justified using a factual basis.

I’m generally of the opinion, though, that it’s not given to us to know the future, which is a sort of similar belief. If we cannot know the future then we can’t quantify the risks. And we can’t prevent everything. So that’s sort of the same, but the sentiment is different. Not knowing the future is normal. It would be weird to think you could quantify every risk and prevent every disaster. We just do what we can for the risks we know about.

paperclips and other risks

David Chapman 2023-02-19

Stephen — Yes, this is a bit different from the usual paperclip scenario. The book covers the “correct motivation is hard” point later instead. Maybe it’s confusing to recycle paperclip making. The point here is just that “foom” scenarios can’t be ruled out, but they’re also not something we can do anything about. (As far as we know.)

Brian — Not sure I understand your point? Mine is the same as yours:

Not knowing the future is normal. It would be weird to think you could quantify every risk and prevent every disaster. We just do what we can for the risks we know about.

I put it as:

There is an unquantifiable risk of an infinite disaster that cannot be prevented. It is unquantifiable, and there is no definite way of preventing it. This is a brute fact.

Can you point out how these are different?

(Maybe it’s a matter of audience. This was originally written to point out to AI safety people that their mainstream program is probably hopeless, because it was trying to quantify the risks, and was trying to definitely prevent them.)

Let me get back to you

Brian Slesinsky 2023-02-19

Uh, I think it’s less about object-level differences and more about feel. I’m not sure which emotions you’re going for. Like, do you want your reader to have patience and remain calm while you explain some conceptual issues at length (which seems like more your usual writing style) or feel impatient about an urgent problem? I generally prefer patience, but it does seem like an urgent problem?

To exaggerate: “millions of people could die! But first, let me clarify a few things.”

But I’m thinking maybe I should wait until I read the whole thing before commenting further.

Eliezer does not think Alignment is "insoluble"

Robert 2023-02-21

David,

You say:

Eliezer Yudkowsky has recently concluded—correctly I believe—that this “alignment” problem is effectively insoluble.

Eliezer does not think the alignment problem is insoluble. In the very post you cite to this effect, he literally says:

None of this is about anything being impossible in principle. The metaphor I usually use is that if a textbook from one hundred years in the future fell into our hands, containing all of the simple ideas that actually work robustly in practice, we could probably build an aligned superintelligence in six months.

Separately, these bits seems confusingly phrased to me:

Is it likely? I don’t see any way, currently, to say. All we can say is “nothing remotely similar to that has ever happened, and I don’t see how it could.”

I’m not sure if this is referring to the exact scenario described above (which does seem very unlikely to me) or to any instantiation of artificial superhuman intelligence (which seems very likely to me). “This has never happened before, and is therefore impossible to predict” is a weird kind of reference-class tennis. In real life, nothing has ever happened before in the exact same way. We are always reasoning based on a collection of heuristics, models, and priors. We have aready achieved superhuman performance across multiple domains. We have good reasons to believe that humans are not especially good at scientific reasoning (and therefore scientific research); that domain is not where the optimization pressure of evolution was applied. It would not be a crazy leap of deductive reasoning to say “this seems like it ought to be possible by default, absent some convincing argument to the contrary”. No such argument has been presented, to my knowledge. (As you note, many people have presented many obviously bad arguments.)

Since superintelligence is inconceivable by definition, it is impossible to reason about, and factual considerations are irrelevant.

Superintelligence is not impossible to reason about. It is impossible to predict how it will accomplish any specific goal it has, of course, but this is not the only possible kind of reasoning we can do about it. An easy analogy is chess: if you start a game against a state-of-the-art chess AI, you can predict with very high confidence that you will lose. (This is a kind of reasoning you can perform about such a system.) You can’t predict what sequence of moves it will make without playing the game out. (You can try to poke the internals of the system to figure out what move it would make in response to any given board position, but while you may win a game in a human lifespan like that, this becomes much less helpful in any domain less narrowly constrained than a board game.)

Possibilities

David Chapman 2023-02-25

Hi Robert, thanks for the comment, and sorry to take a while to reply.

Eliezer does not think the alignment problem is insoluble.

I said “effectively insoluble,” which seems consistent with what he says. (That we cannot, in practice, solve the problem before it is too late.) Maybe you can suggest language that you would find more accurate, but not too long?

this seems like it ought to be possible

Indeed. I already said that it is possible, just above. You seem to have some how read me as saying it isn’t, or that it’s very unlikely, whereas I said that it’s possible but we can’t assign a probability.

It is impossible to predict how it will accomplish any specific goal it has, of course, but this is not the only possible kind of reasoning we can do about it.

I’m not sure what other sort of reasoning you propose. We can’t know what its goals will be, and we can’t know how it would achieve its goals; what can we know?

From the chess analogy, maybe it’s “it can do anything whatsoever.” In this discussion, I grant that for the sake of the argument, and then suggest that no further reasoning is possible.

Wording

Robert 2023-02-25

Hey David,

To me, the phrasing “effectively insoluble” seems to imply some fundamental impossibility, rather than a prediction of low likelihood, which is contingent on the current state of the world.

I might describe that as “very unlikely to be solved in time”, which is slightly longer but I think will leave the average reader with a more accurate impression of Eliezer’s thoughts on this question.

You seem to have some how read me as saying it isn’t, or that it’s very unlikely, whereas I said that it’s possible but we can’t assign a probability.

Possibly I misunderstood the quoted section here:

All we can say is “nothing remotely similar to that has ever happened, and I don’t see how it could.”

I interpreted “I don’t see how it could” to refer to the scenario you describe above (or maybe the broader notion of ASI).

I’m not sure what other sort of reasoning you propose. We can’t know what its goals will be, and we can’t know how it would achieve its goals; what can we know?

I’m not sure if this just isn’t in the reference class of the kind of reasoning you’d find interesting, but e.g. there are compelling arguments for ASI having convergent instrumental goals (or see the Arbital page), and for being coherent relative to humans, if not immediately then probably not too long after it’s secured local control over its environment. These do make pretty specific predictions that rule out many kinds of behavior, and so feel like useful forms of reasoning to me.

Bikeshedding

David Chapman 2023-02-26

Hi Robert,

These points seem like bikeshedding hypothetical misunderstandings of what I wrote. However, I will take your suggestions into account when I revise this page before freezing the text for paperback publication (if I do that). Thanks!

Add new comment:

You can use some Markdown and/or HTML formatting here.

Optional, but required if you want follow-up notifications. Used to show your Gravatar if you have one. Address will not be shown publicly.

If you check this box, you will get an email whenever there’s a new comment on this page. The emails include a link to unsubscribe.