Comments on “Superintelligence”
Adding new comments is disabled for now.
Comments are for the page: Superintelligence
Unusual statement of the paperclip maximizer problem
I’m not a fan of the paperclip maximizer as a demonstration of AI risk, being unrealistic enough that it can be dismissed as fantasy. A more realistic example now would be a bitcoin maximizer. How I’ve usually seen it though is as a demonstration of risks around alignment - that even a goal as boring as creating paperclips can be disastrous if it drives a sufficiently powerful, uncontrolled and poorly aligned AI system. Having the system arise spontaneously and decide itself to create paperclips sort of misses this point. Are you making a different point here that I’m missing?
risks versus facts
Haven’t read the rest yet, but I’ll post my first impression before moving on: there’s a sense of urgency that, so far, isn’t justified, and, so far, seems to be based on bare assertions and philosophical trickery. It would be better to acknowledge that you haven’t shown anything yet.
What kind of trickery? For example:
There is an unquantifiable risk of an infinite disaster that cannot be prevented. It is unquantifiable, and there is no definite way of preventing it. This is a brute fact.
In this case, whatever your reasoning might be, such an assertion doesn’t seem to be about a concrete fact as we normally use the term? In what sense does an “unquantifiable risk” exist in the real world? This is more like a prediction or scenario than a thing that actually exists.
Predictions and scenarios, however dire or justified, aren’t facts, though they might be justified using a factual basis.
I’m generally of the opinion, though, that it’s not given to us to know the future, which is a sort of similar belief. If we cannot know the future then we can’t quantify the risks. And we can’t prevent everything. So that’s sort of the same, but the sentiment is different. Not knowing the future is normal. It would be weird to think you could quantify every risk and prevent every disaster. We just do what we can for the risks we know about.
Let me get back to you
Uh, I think it’s less about object-level differences and more about feel. I’m not sure which emotions you’re going for. Like, do you want your reader to have patience and remain calm while you explain some conceptual issues at length (which seems like more your usual writing style) or feel impatient about an urgent problem? I generally prefer patience, but it does seem like an urgent problem?
To exaggerate: “millions of people could die! But first, let me clarify a few things.”
But I’m thinking maybe I should wait until I read the whole thing before commenting further.
Eliezer does not think Alignment is "insoluble"
David,
You say:
Eliezer Yudkowsky has recently concluded—correctly I believe—that this “alignment” problem is effectively insoluble.
Eliezer does not think the alignment problem is insoluble. In the very post you cite to this effect, he literally says:
None of this is about anything being impossible in principle. The metaphor I usually use is that if a textbook from one hundred years in the future fell into our hands, containing all of the simple ideas that actually work robustly in practice, we could probably build an aligned superintelligence in six months.
Separately, these bits seems confusingly phrased to me:
Is it likely? I don’t see any way, currently, to say. All we can say is “nothing remotely similar to that has ever happened, and I don’t see how it could.”
I’m not sure if this is referring to the exact scenario described above (which does seem very unlikely to me) or to any instantiation of artificial superhuman intelligence (which seems very likely to me). “This has never happened before, and is therefore impossible to predict” is a weird kind of reference-class tennis. In real life, nothing has ever happened before in the exact same way. We are always reasoning based on a collection of heuristics, models, and priors. We have aready achieved superhuman performance across multiple domains. We have good reasons to believe that humans are not especially good at scientific reasoning (and therefore scientific research); that domain is not where the optimization pressure of evolution was applied. It would not be a crazy leap of deductive reasoning to say “this seems like it ought to be possible by default, absent some convincing argument to the contrary”. No such argument has been presented, to my knowledge. (As you note, many people have presented many obviously bad arguments.)
Since superintelligence is inconceivable by definition, it is impossible to reason about, and factual considerations are irrelevant.
Superintelligence is not impossible to reason about. It is impossible to predict how it will accomplish any specific goal it has, of course, but this is not the only possible kind of reasoning we can do about it. An easy analogy is chess: if you start a game against a state-of-the-art chess AI, you can predict with very high confidence that you will lose. (This is a kind of reasoning you can perform about such a system.) You can’t predict what sequence of moves it will make without playing the game out. (You can try to poke the internals of the system to figure out what move it would make in response to any given board position, but while you may win a game in a human lifespan like that, this becomes much less helpful in any domain less narrowly constrained than a board game.)
Wording
Hey David,
To me, the phrasing “effectively insoluble” seems to imply some fundamental impossibility, rather than a prediction of low likelihood, which is contingent on the current state of the world.
I might describe that as “very unlikely to be solved in time”, which is slightly longer but I think will leave the average reader with a more accurate impression of Eliezer’s thoughts on this question.
You seem to have some how read me as saying it isn’t, or that it’s very unlikely, whereas I said that it’s possible but we can’t assign a probability.
Possibly I misunderstood the quoted section here:
All we can say is “nothing remotely similar to that has ever happened, and I don’t see how it could.”
I interpreted “I don’t see how it could” to refer to the scenario you describe above (or maybe the broader notion of ASI).
I’m not sure what other sort of reasoning you propose. We can’t know what its goals will be, and we can’t know how it would achieve its goals; what can we know?
I’m not sure if this just isn’t in the reference class of the kind of reasoning you’d find interesting, but e.g. there are compelling arguments for ASI having convergent instrumental goals (or see the Arbital page), and for being coherent relative to humans, if not immediately then probably not too long after it’s secured local control over its environment. These do make pretty specific predictions that rule out many kinds of behavior, and so feel like useful forms of reasoning to me.
n=0 -> n=1
WRT superintelligence you say All we can say is “nothing remotely similar to that has ever happened, …”. But isn’t humanity a pretty clearcut example of that? A huge interlinking ecosystem of animals completely dominated in power in a blink (on an evolutionary timescale) by a new species that’s smarter, that has the power to end life on Earth if it really wanted to