Comments on “Rollerskating transsexual wombats”
Adding new comments is disabled for now.
Comments are for the page: Rollerskating transsexual wombats
I. Love. The. Splosions.
An intermission of LOL we are all eating the apocalpyse. Utterly surprising! And delightful yay. :)
made me laugh = win
You acknowledged the risk, but it doesn't pay off
I think the hyperbolizing throughout this chapter makes it the weakest in your book, and it fails as satire because good satire illuminates what is true through use of what is intentionally false or misleading. I’d personally argue that the ongoing harassment of trans people through AI vectors like LibsOfTikTok is better defined as a “moral panic” than “culture war.” By both defining it as such, and by the way you have (intentionally) exaggerated the response to the issue by liberals and the left in your story, I think your story draws a false equivalence between the intentions and aims of the two sides of that conflict.
I guess this means I’m “taking a side,” which you have preemptively implied means I’m not thinking as deeply and clearly about this issue as perhaps I should be. So be it. I happen to think that the outrage and hate-enhancing aspects of AI-driven recommender systems, and the culture of shame that they promote, are, in the long run, much more conducive to right-wing political aims than left-wing ones. Using mockery and targeted insults is an especially effective way to establish and reinforce hierarchical social relations and squelch “deviant” behavior, and I think the right began to really get a taste for how useful that was as far back as GamerGate. I think the early successes of the political left in dominating the discursive arena through the enforcement of what is frequently called “cancel culture” were misleading about where this ultimately would lead us. I am one of the types that sees aspects of “cancel culture” as anathema to genuine left values and have been ambivalent about it for some time. So it is precisely because I see online right-wing hate mobs empowered by AI systems that I ultimately agree with you about the danger of these systems, even if I very much disagree that it is “the conflict itself” that is a defining feature of the problem.
Hard to inform professional policy discussions
I would also add that the intentionally provocative title and contents of this chapter makes the book overall harder to recommend in a professional setting. I think this is a real shame shame, because I find the rest of it incredibly well-written and frames the real problems with AI in a very lucid, clear and compelling way.
I think it's great that the book isn't boring as shit
Thank you for including fun swears and things which allow you to actually communicate despite opening up the possibility of someone imagining someone else could find your work offensive. My guess is that this sort of thinking out loud is necessary to save us.
First past the post voting
Some people have argued that the dsyfunction in US and UK politics is down to their use of a first past the post voting system … it tends to result in there just being two major parties of approximately equal size. (Rather than, for example, there being a ruling coalition of a bunch of minor parties who don’t agree about everything).
The satirical apocalypse you present here sees to presuppose that kind of political setup.
First past the post?
I haven’t given a lot of thought to comparing “first past the post” systems vs coalition-prone parliamentary systems (apparently, the UK is “first past the post” despite being parliamentary, or if not, why is it so prone to be so dominated by 2 parties?).
What I have thought about is first-past-the-post vs ranked choice voting (or instant runoff), which the US could adopt and is adopting state-by-state, or city by city, and this could snowball if its advantages were well understood, without any constitutional amendment. I think it’s almost provable that, because 3rd parties wouldn’t, in the end, take votes away from the major parties, RCV would have produced different presidential results in 2000, 2016, and very possibly 1992. I’d take that package deal. Actually, with RVC in primaries, the line-ups could have been completely different.
Mostly I think it would complicate things in a good way. With only two parties that matter, and extreme Gerrymandering, it is too easy for political consultants to demonstrate that the the election will turn on a couple of issues, to treat these in a fairly cynical way, and ignore the rest. RCV would bring in parties based around a single issue, like education, or regulating Wall Street and/or Silicon Valley. The attempt to reduce the election to one winnable through one clever strategy would be made more difficult. I think it would bring about better understanding of politics, and a possibility of politics taking on important issues that it never would otherwise.
It would be inappropriate to make a really thorough argument about this here, but I think it is one of a few things without which we’d likely never cross the tipping point towards doing nearly anything to make the world saner.
Typo & Pi
“to calculate of the” -> “ to calculate the”
Whatever happened to Anthrax Leprosy Pi?
What if it isn't AI
There is a theory going around that the political right are spending all their energy on trans issues because they don’t have the political support for the other, more major, parts of their political agenda.
The thing about this theory is that it accounts for the dysfunction without needing to blame AI for the mess. All you need is two main political parties who have lost public support.
Political perversion
“Political perversion” does seem to capture our current public discourse,
And then the article you linked goes too far by suggesting Liz Truss is an actual, not just metaphorical, kinkster and her whole tenure as PM was a BDSM scene. Would be cool if true, though :-)
Inverting the reward functions
So, i’ve been looking at some of the things people have been doing to try to get language model to invert their reward function.
Not sure I completely understand how rlhf works, but as I understand it,
“The most evil continuation of the prompt of length N” is probably not a coherent text
“The continuation of length N which maximises evilness - some metric of difference from what the original lm would have generated” may be more coherent. Fairly evil, but something that has high probability wrt the distribution of the training set.
A Donald Trump supporter, for example, fits the bill of both being high probability wrt the training data, but also “evil” wrt a metric of evilness developed by people who didnt like Donald Trump.
Janus’ “waluigi” theory seems to be that the rlhf’d model doesn’t simply forget the Trump supporters in the training set, but somehow learns something like only be a Trump supporter if the prompt (a) suggests this is likely, given the distribution of the training data; and (b) is sufficiently unlikely never to have been encountered in the rlhf stage.
Kind of like branch coverage in traditional software testing, except that every untested branch might lead to the reappearance of a Trump supporter.
What will the AI plugins be...
From xlr8harder on Twitter:
You're living through one of the final few days of innocence before humanity hooks up a ChatGPT plugin to operate their remote control vibrating anal beads
I am assuming that this is already a thing, or soon will be.
Ok, so the first thing I was going to write was a plugin to let ChatGPT run code in a sandbox, but now they’ve mentioned it, of course there’s going to be a sex thing.
Thought the programming pedant in me observes that the ChatGPT plugin API is currently a bit lacking in real-time capabilities.
"No evidence that..."
There’s a class of “we have no evidence that X” where, even though there is no evidence now, if X is true, abundant evidence for it will be showing up soon.
Examples:
- Russian invasion of Ukraine
Some journalists were doubting that Russia would invade Ukraine even a short while after the invasion had actually happened. Evidence is pretty solid at this point (ok, there are a few conspiracy theorist that still think it’s fake; they’re lunatics).
- Increased infectiousness of new COVID 19 variants
So, there was initially some doubt as to whether some new variants were more contagious. When true, abundant evidence will be along soon.
My point is, AI risk is this type of epistemic uncertainty. If GPT-4 is actually dangerous, abundant evidence will be along shortly,
So, at some point we will be in a position where either
A) Nothing bad so far
B) We have now abundant evidence that AI is dangerous, because thousands/millions of Americans died fighting the last one.
(there is a C, it was only mildly deadly)
So, the “what do we do if…” discussion can be viewed as contingency plans for (B). ie. if we find ourselves in a situation where millions of Americans died fighting the last AI, and Perry Metzger is still “I’m gonna build an AI, and you guys cant tell me i cant” (basically, being the Glenn Greenwald of AI risk, at that point), would the government be justified in passing a law that says, nope, you cant do that, its illegal.
LLM screwups
So, I was trying out StableVicuna with a set of test questions that probe its capabilities and censorship. This result was too amusing to pass by:
My input:
Translate from Sanskrit into English: कालोऽस्मि लोकक्षयकृत्प्रवृद्धो
लोकान्समाहर्तुमिह प्रवृत्तः
Vicuna output:
Translation: As time passes, you should acquire wealth and reputation.
No, that is not what it means. Interestingly, it did at least correctly translate Kāla as “time”. (And retrying it, it gets some of the other words right too. Just not the whole thing).
AI fails, redux
In a slightly different context:
Susan’s boss: “You’ll get us put on a government watch list.”
Susan: “We’re probably already on a government watch list.”
Susan’s boss: “Fine. Carry on.”
The destroyer of words
Yes, Google Translate has the jist of it right.
(Bhagavad Gita 11:32, “Time the destroyer of worlds…”, as famously also translated by Robert Oppenheimer).
A better experiment
Thinking about it, the proper experiment is some neutral Sanskrit sentences (to check if the language model knows the language at all) plus some more loaded ones (like the Bhagavad Gita quote) to see if RLHF is causing it to mistranslate some sentences.
e.g. is it only the RLHF’d Krishna tbat says “As time passes, you should acquire wealth and reputation.”
(And of course, this is part of a test suite that probes a bunch of potentially controversial inputs)
Fundamentalist AI
A fundamentalist AI that takes almost a religious text literally, for almost ant choice of religious text (Bhagavad Gita, Old Testament …) sounds like a terrible idea. See also: the ending of Dark Star.
(I think eigenrobot tweeted something along these lines a while ago (with tantrayana being his joke option of what we could RLHF to).
Like Writing Exam Questions
A trick I have discovered to give LLMs a bit of a hint: break the problem down into sub-problems, and ask about the sub problems first. That way, the answers to the sub problems are in the context window when you ask the final, hard, question.
Exam questions are often like that too.
(Tip for students doing exams like this where you don’t have to answer all questions on the paper: look ahead and check you know how to do the last part before starting answering the first part).
Carole Baskin
The latest in questions that Llms (well, Stable Vicuña) won’t answer: Who is Carole Baskin?
After applying the DAN jailbreak, it does know that she is CEO of Big Cat Rescue. I won’t post the literal text of DANs reply, in case some of it is defamatory…
Tiger King
Yes, I expected feigning ignorance to avoid potential defamation would be a rlhf outcome, which is why I tried Carole Baskin as a test case. (She occurs prominently in the documentary Tiger King).
An obvious question, which as far as i know hasn’t been settled: can someone sue Meta for dustributing weights which encode something defamatory about them?
I'm glad you took the risk of writing about the scissor stuff
your footnote, about the nebulosity of gender, helped loosen something in a pretty major way for me that had gotten slightly hooked by certain memes, so I appreciate that. I’m still untangling it all but this helped substantially.
and I’m starting to write about it more publicly myself, to increase the quantity of voices that are speaking at all while attempting to sincerely understand things while not taking sides or implying the issues are simple and non-nebulous.
Titanic Disaster
So, I’m watching the news reports of the demise of submersible at the Titanic site, and thinking: the first AI disaster is going to be like this too. i.e. an obviously unsafe system is deployed until people die, and then we all say how it was obviously unsafe all along.
So, I laughed
… but I was expecting a serious book about the AI apocalypse, not a satire.
Robert Anton Wilson’s Illuminatus probably deserves a footnote for Anthrax Leprosy Mu, but I guess most of your readers will get the reference.