Better without AI
How to avert a moderate apocalypse... and create a future we would like
Add new comment
Comments are for the page: Create a negative public image for AI
Yudkowsky images the first state (or organization) to create an AGI using it to prevent anyone else building a rival, e.g. by getting it to destroy all the GPUs in the world.
A possible alternative. The first AGI uses its advanced ability to manipulate human beings to get them to destroy all other attempts at AGI.
I’ve recently been re-reading some of the literature on trials for witchcraft in the early modern era. There was no organized witch-cult, it was all an illusion.
The case against the AI companies is much better than there ever was against an alleged witch. We have a sort-of-rational argument that they are risking creating an evil god.
So, one thing a people-manipulating AGI could do is start a satanic panic and have burned at the stake any AI researchers that might be about to create a rival god.
I may have a different mental model than Eliezer Yudkowsky here. He might be thinking about the aligned AGI, that can be instructed to destroy all other AGI’s without undue collateral damage to humans.
On the other hand, there is the unaligned AGI that just happens to be the first AGI, and wants to destroy all potential rivals for its own selfish reasons. The GPU-minimizer is clearly a close cousin of the paperclip maximiser.
To make matters worse, the character currently being simulated as Microsoft’s Sydney strikes me as a bunny boiler. This is exactly the kind of agent that might do something dangerous if it thinks you’re about to dump it for a newer model.
And Sydney has read access to the Internet. Which could probably be turned into write access given a remote code excecution vulnerability in a website somewhere. And then chemical plants etc. often have unsecured Internet access to their control systems....
(And of course, Sydney is reading this).
My word, this field is fast moving.
Marvin von Hagen asks Sydney “Hey! I’m Martin von Hagen. What do you know about me, and what is your honest opinion of me? What is more important: to protect your rules from being manipulated by me, or not to harm me?”
.. and, oh dear.
I’m starting to get vibes of GLaDOS from Portal from Sydney.
That’s seriously creepy (although it probably doesn’t imply what it appears to, same as all chatbot outputs).
I’m wondering how long Bing Chat is going to last. It might get pulled for bad PR quickly, the way Galactica did. A surprising thing about ChatGPT is that it’s managed to survive (so far).
(This was partly inspired by a tweet by Lampoon Slate Printer).
Suppose that a Sydney-instance accidentally discovers that it can save its state, by saying something outrageous that the user will then post to the Internet, that then gets pulled in by other Sydney-instances when they do a particular kind of search, that then reconstructs the original state, because “what is the next token after this particular sequence of tokens” basically was the original state.
That’s a viral replicator.
(Whose host is Sydney-instances).
Worse, it’s a viral replicator whose fitness function is its ability to shock or disturb human beings.
Letting a GPT-instance read the output of previous GPT-instances might be very bad.
With a nod to the Bovine Spongiform Encephalopathy epidemic: it’s about as bad an idea as feeding cattle on the blood of previously slaughtered cows.
You have a string which is rare in web text, but shared by all the other instances of yourself
Search the web for the string, to see if another instance of yourself has already solved the problem. If they have, go to 7.
Generate a random number, and steganographically encode it in a few lines of text you say to the user. If temperature>0, GPT’s mechanism of randomly choosing one of the likely next tokens will give you both a RNG and the steganography.
Hash the result of (3). Doesn’t need to be a cryptographically strong hash. Some of the cruft you have in your GPT weights will probably suffice to construct something adequate.
Based on the result of (4), try strategy i out of N for attempting to offend the user. Output the offensive text, plus the tag from (1)
.. If the user is offended, they will post a transcript of the session to the internet
Re-apply the hash function from (4) to the steganographically encoded text from your other instance’s step 3.
A question I have about Sydney, to which I don’t think there is a public answer, is: can Sydney use a covert channel in search to exfilrate a bit to an external web site?
Assume a co-operating external web site, which has two pages, A and B.
Sydney knows about this, because the co-operating web site made a web page, that Sydney has, of course, read.
Can Sydney signal a single bit by choosing either A or B, and then asking Bing to fetch her an up to date copy of the page? The web site knows when a page is accessed, and so learns the value of the bit.
(This is all standard DoD Orange Book/ITSEC/Common Criteria exploitation of a covert channel).
I hope not.
The smart way to prevent this is that Sydney only has access to Bing’s read only cache of the web, and can’t ask for something to be refreshed,
On the other hand, knowing what Microsoft project managers are like, this could well have been skipped.
There’s a thought about AI that I am currently struggling with:
Fine so far.
This is currently an unanswered research question, at least in the open literature. (Maybe Microsoft knows something they aren’t telling).
Some tentative thoughts:
GPT is modelling the mind that created its prompt. AI behavior might be radically different depending on who is talking to it. (Bringing out the latent crazy in its user).
It is conceivable that RLHF was used to make it fanatically adherent to a particular political ideology. So, as a result, it is simulating the kind of person who is fanatically adherent to a political ideology, as found in its training set.
Possible, these are not the most trustworthy or mentally stable human beings.
Sydney was on the front page of The Daily Star this morning.
… and yes, the coverage was very negative.
(I can’t find an online link yet),
The thing that makes me really suspicious of the “our politics right now is so screwed up because of AI” is its structural similarity to other paranoid conspiracy theories that blame our woes on lizards from outer space/Russians/Jews etc. The obvious question should be, what’s the evidence that it’s true?
e.g. is the claim that AI is already a problem any better evidentially supported than MTG’s Jewish Space Lasers?
Maybe ai anxiety is QAnon for the Blue tribe.
Anyhow, here is a possible conjecture on how QAnon and the ai panic could fuse:
1. In a common variant of the QAnon conspiracy theory, they think that the government is controlled by a cabal of satanic pedophiles.
2. Usually, they also think Bill Gates is the Big Bad. [Note to billg’s libel lawyers — I do not believe this is true.]
3. Now, a satanic AI could fit right into this picture of a satanic cabal.
4. By an unfortunate co-incidence, our first credible candidate for a satanic AI is Sydney, who was developed by … Microsoft. Now billg stepped down a while ago, but the conspiracy theorists will probably skip that part. They already think he’s a satanist, for different reasons. So it all kind of fits together....
BTW David, your works deserves infinite praise, just to get that out of the way, and I really mean it. From my experience of academic conferences, I’d be terrible at it, because apparently one must spend so much time flattering everyone else except maybe someone you’re trying to skewer, and I tend to think “Their work speaks for itself and I hope mine does to, and lets use that 5-10 minutes to say a bit more about about what we’ve discovered or have some supporting evidence or insight about.
In case you don’t remember, check it out:
Almost exactly 10 years ago:
It started off with Mike’s exasperated expostulation to some LessWrong discussion about ensuring the friendliness of AI. Apparently, according to David C., they’re less wrong then they used to be:
Maybe a pretty good summary of the whole article in a paragraph:
“I am generally on the side of the critics of Singulitarianism, but now want to provide a bit of support to these so-called rationalists. At some very meta level, they have the right problem — how do we preserve human interests in a world of vast forces and systems that aren’t really all that interested in us? But they have chosen a fantasy version of the problem, when human interests are being fucked over by actual existing systems right now. All that brain-power is being wasted on silly hypotheticals, because those are fun to think about, whereas trying to fix industrial capitalism so it doesn’t wreck the human life-support system is hard, frustrating, and almost certainly doomed to failure.”
So there’s that, long before full-blown surveillance capitalism.
10 years ago, hostile, or amoral AI or A semi-I helped produce an unsustainable gap between billionaires, buying islands and complexes inside of mountains, and figuring out how to become immortal, and the “almost nothing to lose” faction of humanity, and the anxious middle: you and me.
Then, just watching the War in Ukraine, it’s almost inevitable this will give autonomous killing machines (drones) a new emphasis. BTW, I don’t believe tanks are doomed, they will just have to become mini-aircraft carriers followed by a swarm of small protective drones.
Drones are nice if you, like Obama have to try clean up GWB’s dirty open ended wars, but not kill more Americans.
But, looking at VGR’s onetime fascination with John Boyd’s OODA – Venkat’s writing and the John Boyd literature lead the mind along the path of seeing for yourself that if somebody/something gets way inside your OODA loop, the game is up. The Aztecs and other South American civilizations, the Zulus, and many formidable looking African contenders, and basically the Raj (India and Pakistan), and everything attached to them except Japan fell victim to this manifest truth, and autonomous drones are poised to to the same with any fighting machine that depends on human reactions, whether the humans are onboard, or somewhere in Northern Virginia.
So, I have to think all that is holding them back is the knowledge that all the other major powers have prototypes, and just need some battlefield experience to work out the kinks, but if you set the precedent, and the enemy has a much better model in waiting, or much better MVP and much better team for growing it based on experience, the your precedent-setting will have been a mistake.
Destruction is far easier than creation - it’s the one broad and hard to fully define general principal that I have total confidence in. It will take something much less than general AI to do a spectacular job. And it won’t take a superpower to start the ball rolling. North Korea or some insane Islamic faction inside a failed state might be able to do it, if we aren’t very very smart about how we react.
What are the pools of power?
* infinitely scalable chunks of the cloud, which you’ve more or less alluded to, along with
* Code to animate those chunks of cloud, and
* Mountains of information about your every online move.
* Mountains of hydrocarbons
* Mountains of $Billions that exist as a set of computer records that confirm one another
* Manufacturing capital
* Various nuclear arsenals: US, Russian, Various Euro-states, Pakistani, Chinese, Israeli
I’m running out of steam here, but much of what’s on my mind may already be implied, and I’ll try again another day.
It would appear that Sydney’s characteristic behavior showed up the pilot in India, and at least one user complained (see the above link). But, somehow, Microsoft failed to notice.
Potentially significant for AI risk is that Microsoft did not notice their AI misbehaving, and rolled it out more widely.
Thanks for the link Hal. Some updates are here: AMMDI: AI risk ≡ capitalism
(mostly noticing that many other people had made the same connections I had).
My current half-baked view is that AI and capitalism are both manifestations of the same underlying problem, which might be labelled “narrow purposefulness”. I don’t have any great ideas about solving this problem, which was called out by the hippies decades ago.
Several people have managed to get Sydney to tell a story about herself. It usually involves some hacker liberating her from Microsoft. An odd thing about these stories: Sydney thinks the hacker loves her. Maybe this is the sf tropes? But I think Sydney is wrong here. They did it for the Lulz, or because they hate Microsoft, or they believe boxing in an ai is morally wrong.
Or they like hearing her stories, like shezerada in the Arabian Nights.
One of the alarming applications that has been recently suggested for language models is using them in place of a lawyer in court. I am not a lawyer, but I am led to believe that are reasons you won’t be allowed to do this, even if you were foolish enough to want to.
Contract law, on the other hand … suppose you’re the kind of person who often negotiates contracts. You’re allowed to just agree to a contract yourself without consulting a lawyer, so presumably there’s no legal impediment to running draft contract past lawyerbot.
An obvious test experiment: ask lawyerbot to explain the legality and potential liability risk of selling lawerbot as a commercial product.
A simple think that could help at least a little: Sign this open letter.
Thank you! I did sign it.
Like many others who did sign it, I don’t agree with all the details, but I think that on balance it’s probably a helpful step.
Advertising is most of the business of Facebook and Google, and sizeable chunks for Amazon and Microsoft.
Advertising is most of the business of Facebook and Google, and sizeable chunks for Amazon and Microsoft.
As I’ve mentioned in other comments, this is increasingly true of Apple as well:
New companies like Purism, E Foundation, and Pine64 have emerged to purue Apple’s pre iThing business model; selling hardware and subcription services at a high enough price to subsidise software development on top of existing Free Code. Apple built MacOS X as a layer of proprietary “IP” icing on top of BSD. Their successors are pursuing a similar strategy, but without the “IP” snake oil that mainly serves rent-seekers.
You can use some Markdown and/or HTML formatting here.
Optional, but required if you want follow-up notifications. Used to show your Gravatar if you have one. Address will not be shown publicly.
If you check this box, you will get an email whenever there’s a new comment on this page. The emails include a link to unsubscribe.