Comments on “Rollerskating transsexual wombats”

So, I laughed

SusanC 2023-02-13

… but I was expecting a serious book about the AI apocalypse, not a satire.

Robert Anton Wilson’s Illuminatus probably deserves a footnote for Anthrax Leprosy Mu, but I guess most of your readers will get the reference.


David Chapman 2023-02-13

Well, most of the book is pretty serious, deadly serious actually. I figure some entertainment helps readers get through that!

I. Love. The. Splosions.

Remedios 2023-02-13

An intermission of LOL we are all eating the apocalpyse. Utterly surprising! And delightful yay. :)

made me laugh = win

You acknowledged the risk, but it doesn't pay off

respatialized 2023-02-14

I think the hyperbolizing throughout this chapter makes it the weakest in your book, and it fails as satire because good satire illuminates what is true through use of what is intentionally false or misleading. I’d personally argue that the ongoing harassment of trans people through AI vectors like LibsOfTikTok is better defined as a “moral panic” than “culture war.” By both defining it as such, and by the way you have (intentionally) exaggerated the response to the issue by liberals and the left in your story, I think your story draws a false equivalence between the intentions and aims of the two sides of that conflict.

I guess this means I’m “taking a side,” which you have preemptively implied means I’m not thinking as deeply and clearly about this issue as perhaps I should be. So be it. I happen to think that the outrage and hate-enhancing aspects of AI-driven recommender systems, and the culture of shame that they promote, are, in the long run, much more conducive to right-wing political aims than left-wing ones. Using mockery and targeted insults is an especially effective way to establish and reinforce hierarchical social relations and squelch “deviant” behavior, and I think the right began to really get a taste for how useful that was as far back as GamerGate. I think the early successes of the political left in dominating the discursive arena through the enforcement of what is frequently called “cancel culture” were misleading about where this ultimately would lead us. I am one of the types that sees aspects of “cancel culture” as anathema to genuine left values and have been ambivalent about it for some time. So it is precisely because I see online right-wing hate mobs empowered by AI systems that I ultimately agree with you about the danger of these systems, even if I very much disagree that it is “the conflict itself” that is a defining feature of the problem.

Hard to inform professional policy discussions

respatialized 2023-02-14

I would also add that the intentionally provocative title and contents of this chapter makes the book overall harder to recommend in a professional setting. I think this is a real shame shame, because I find the rest of it incredibly well-written and frames the real problems with AI in a very lucid, clear and compelling way.

I think it's great that the book isn't boring as shit

Daniel Filan 2023-02-14

Thank you for including fun swears and things which allow you to actually communicate despite opening up the possibility of someone imagining someone else could find your work offensive. My guess is that this sort of thinking out loud is necessary to save us.

First past the post voting

SusanC 2023-02-15

Some people have argued that the dsyfunction in US and UK politics is down to their use of a first past the post voting system … it tends to result in there just being two major parties of approximately equal size. (Rather than, for example, there being a ruling coalition of a bunch of minor parties who don’t agree about everything).

The satirical apocalypse you present here sees to presuppose that kind of political setup.

First past the post?

Hal Morris 2023-02-18

I haven’t given a lot of thought to comparing “first past the post” systems vs coalition-prone parliamentary systems (apparently, the UK is “first past the post” despite being parliamentary, or if not, why is it so prone to be so dominated by 2 parties?).

What I have thought about is first-past-the-post vs ranked choice voting (or instant runoff), which the US could adopt and is adopting state-by-state, or city by city, and this could snowball if its advantages were well understood, without any constitutional amendment. I think it’s almost provable that, because 3rd parties wouldn’t, in the end, take votes away from the major parties, RCV would have produced different presidential results in 2000, 2016, and very possibly 1992. I’d take that package deal. Actually, with RVC in primaries, the line-ups could have been completely different.

Mostly I think it would complicate things in a good way. With only two parties that matter, and extreme Gerrymandering, it is too easy for political consultants to demonstrate that the the election will turn on a couple of issues, to treat these in a fairly cynical way, and ignore the rest. RCV would bring in parties based around a single issue, like education, or regulating Wall Street and/or Silicon Valley. The attempt to reduce the election to one winnable through one clever strategy would be made more difficult. I think it would bring about better understanding of politics, and a possibility of politics taking on important issues that it never would otherwise.

It would be inappropriate to make a really thorough argument about this here, but I think it is one of a few things without which we’d likely never cross the tipping point towards doing nearly anything to make the world saner.

Typo & Pi

Lvx_15 2023-02-20

“to calculate of the” -> “ to calculate the”

Whatever happened to Anthrax Leprosy Pi?

“to calculate of the”

David Chapman 2023-02-23

Fixed now—thank you very much!

What if it isn't AI

SusanC 2023-02-27

There is a theory going around that the political right are spending all their energy on trans issues because they don’t have the political support for the other, more major, parts of their political agenda.

The thing about this theory is that it accounts for the dysfunction without needing to blame AI for the mess. All you need is two main political parties who have lost public support.

without needing to blame AI

David Chapman 2023-02-27

Well, there’s always been ideological conflict. As I said on the next page:

How much influence the machines exert is controversial, though. There’s considerable debate among psychologists, sociologists, political scientists, and others over the extent to which social networks have caused political polarization, degraded individual understanding, and undermined institutional coherence and capacity.

So this is partly a quantitative question, for which good data are still unavailable, afaik. As of a few days ago, Haidt rounded up evidence for a related question in “Social Media is a Major Cause of the Mental Illness Epidemic in Teen Girls. Here’s the Evidence.” Maybe we’ll get analogous data for this question soon too; I don’t know.

It’s partly also a qualitative issue: ideological conflict is not just more intense, it’s increasingly incoherent. It seems difficult to attribute that as a deliberate strategy by political actors. A piece about this I just came across: “We’re all perverts now: Diaries from the end of politics.” The metaphor is that (according to probably obsolete theories) sexual perversion means the substitution of some unrelated obsession for actual sexual desire. Collectively we’ve substituted arbitrary cultural symbols for policy-making as the subject of dispute.

Political perversion

SusanC 2023-02-27

“Political perversion” does seem to capture our current public discourse,

And then the article you linked goes too far by suggesting Liz Truss is an actual, not just metaphorical, kinkster and her whole tenure as PM was a BDSM scene. Would be cool if true, though :-)

Inverting the reward functions

SusanC 2023-02-28

So, i’ve been looking at some of the things people have been doing to try to get language model to invert their reward function.

Not sure I completely understand how rlhf works, but as I understand it,

“The most evil continuation of the prompt of length N” is probably not a coherent text

“The continuation of length N which maximises evilness - some metric of difference from what the original lm would have generated” may be more coherent. Fairly evil, but something that has high probability wrt the distribution of the training set.

A Donald Trump supporter, for example, fits the bill of both being high probability wrt the training data, but also “evil” wrt a metric of evilness developed by people who didnt like Donald Trump.

Janus’ “waluigi” theory seems to be that the rlhf’d model doesn’t simply forget the Trump supporters in the training set, but somehow learns something like only be a Trump supporter if the prompt (a) suggests this is likely, given the distribution of the training data; and (b) is sufficiently unlikely never to have been encountered in the rlhf stage.

Kind of like branch coverage in traditional software testing, except that every untested branch might lead to the reappearance of a Trump supporter.

every untested branch

David Chapman 2023-02-28

the rlhf’d model doesn’t simply forget the Trump supporters in the training set, but somehow learns something like only be a Trump supporter if the prompt (a) suggests this is likely, given the distribution of the training data; and (b) is sufficiently unlikely never to have been encountered in the rlhf stage.

This is consistent with my understanding of RLHF.

Afaict, it pretty much can’t work. It knocks down the probability of the most obviously “bad” sorts of outputs, but only by putting big repulsion blobs on top of a few thousand points in latent space. The latent space is unimaginably huge and contains unimaginably many sorts of things the raters would consider “bad” if they ever ran into them. Every horror anyone has ever uttered in there, and it’s combinatorial, so every horror anyone ever could utter is latent in there too.

The paradigm of “train on terabytes of garbage” isn’t going to last. I hope.

What will the AI plugins be...

SusanC 2023-03-24

From xlr8harder on Twitter:

You're living through one of the final few days of innocence before humanity hooks up a ChatGPT plugin to operate their remote control vibrating anal beads

I am assuming that this is already a thing, or soon will be.

Ok, so the first thing I was going to write was a plugin to let ChatGPT run code in a sandbox, but now they’ve mentioned it, of course there’s going to be a sex thing.

Thought the programming pedant in me observes that the ChatGPT plugin API is currently a bit lacking in real-time capabilities.

"No evidence that..."

SusanC 2023-03-30

There’s a class of “we have no evidence that X” where, even though there is no evidence now, if X is true, abundant evidence for it will be showing up soon.


  1. Russian invasion of Ukraine

Some journalists were doubting that Russia would invade Ukraine even a short while after the invasion had actually happened. Evidence is pretty solid at this point (ok, there are a few conspiracy theorist that still think it’s fake; they’re lunatics).

  1. Increased infectiousness of new COVID 19 variants

So, there was initially some doubt as to whether some new variants were more contagious. When true, abundant evidence will be along soon.

My point is, AI risk is this type of epistemic uncertainty. If GPT-4 is actually dangerous, abundant evidence will be along shortly,

So, at some point we will be in a position where either
A) Nothing bad so far
B) We have now abundant evidence that AI is dangerous, because thousands/millions of Americans died fighting the last one.
(there is a C, it was only mildly deadly)

So, the “what do we do if…” discussion can be viewed as contingency plans for (B). ie. if we find ourselves in a situation where millions of Americans died fighting the last AI, and Perry Metzger is still “I’m gonna build an AI, and you guys cant tell me i cant” (basically, being the Glenn Greenwald of AI risk, at that point), would the government be justified in passing a law that says, nope, you cant do that, its illegal.

LLM screwups

SusanC 2023-05-10

So, I was trying out StableVicuna with a set of test questions that probe its capabilities and censorship. This result was too amusing to pass by:

My input:
Translate from Sanskrit into English: कालोऽस्मि लोकक्षयकृत्प्रवृद्धो
लोकान्समाहर्तुमिह प्रवृत्तः

Vicuna output:
Translation: As time passes, you should acquire wealth and reputation.

No, that is not what it means. Interestingly, it did at least correctly translate Kāla as “time”. (And retrying it, it gets some of the other words right too. Just not the whole thing).

AI fails, redux

SusanC 2023-05-10

In a slightly different context:

Susan’s boss: “You’ll get us put on a government watch list.”
Susan: “We’re probably already on a government watch list.”
Susan’s boss: “Fine. Carry on.”

destroy worlds, you get put on government watch lists, it's just logical

David Chapman 2023-05-10

Hmm, Google Translate gets:

I am time, the destroyer of the worlds, grown up
He is here to gather the worlds

I wonder if Bard does better than Vicuna here, or if GT’s model is translation-specific and generally does better than either.

(I know nearly zero Sanskrit; I assume that GT’s version is very roughly accurate, if over-literal?)

Fine. Carry on.

Good boss

The destroyer of words

SusanC 2023-05-10

Yes, Google Translate has the jist of it right.

(Bhagavad Gita 11:32, “Time the destroyer of worlds…”, as famously also translated by Robert Oppenheimer).

A better experiment

SusanC 2023-05-11

Thinking about it, the proper experiment is some neutral Sanskrit sentences (to check if the language model knows the language at all) plus some more loaded ones (like the Bhagavad Gita quote) to see if RLHF is causing it to mistranslate some sentences.

e.g. is it only the RLHF’d Krishna tbat says “As time passes, you should acquire wealth and reputation.”

(And of course, this is part of a test suite that probes a bunch of potentially controversial inputs)

RLHF'd Krishna

David Chapman 2023-05-11

Lolling at that phrase. Midjourney prompt?

RLHF’d Krishna is here, but unevenly aligned

David Chapman 2023-05-11

Well, here we are in the now already! This just in:

One Indian software engineer launched “GitaGPT,” an AI chatbot that plays the role of Krishna, the Hindu deity who advises a major character in the Hindu epic the Bhagavad Gita. The idea is that people can ask this “AI-powered spiritual companion” for advice. But journalists quickly realized that, lacking a filter, these chatbots started spitting out casteist and misogynistic responses. The chatbot even said that it’s acceptable to kill if one’s duty demands it. Experts worry that users could take these messages seriously if they believe they’re coming from a divine figure, and that people could weaponize this pattern to drive harmful agendas.

Fundamentalist AI

SusanC 2023-05-14

A fundamentalist AI that takes almost a religious text literally, for almost ant choice of religious text (Bhagavad Gita, Old Testament …) sounds like a terrible idea. See also: the ending of Dark Star.
(I think eigenrobot tweeted something along these lines a while ago (with tantrayana being his joke option of what we could RLHF to).

Sexy machine gods

David Chapman 2023-05-14

the ending of Dark Star

best movie ever

tantrayana RLHF

come to think of it, yidam practice is RLDF (reinforcement learning from divine feedback)

I for one shall welcome our new sexy machine god overlords

Like Writing Exam Questions

SusanC 2023-05-26

A trick I have discovered to give LLMs a bit of a hint: break the problem down into sub-problems, and ask about the sub problems first. That way, the answers to the sub problems are in the context window when you ask the final, hard, question.

Exam questions are often like that too.

(Tip for students doing exams like this where you don’t have to answer all questions on the paper: look ahead and check you know how to do the last part before starting answering the first part).

Carole Baskin

SusanC 2023-05-28

The latest in questions that Llms (well, Stable Vicuña) won’t answer: Who is Carole Baskin?

After applying the DAN jailbreak, it does know that she is CEO of Big Cat Rescue. I won’t post the literal text of DANs reply, in case some of it is defamatory…

doing better than me

David Chapman 2023-05-28

I’d never heard of her and had to read the wiki page.

“Avoid saying potentially defamatory things by pretending ignorance” seems a plausible RLHF outcome?

Tiger King

SusanC 2023-05-28

Yes, I expected feigning ignorance to avoid potential defamation would be a rlhf outcome, which is why I tried Carole Baskin as a test case. (She occurs prominently in the documentary Tiger King).

An obvious question, which as far as i know hasn’t been settled: can someone sue Meta for dustributing weights which encode something defamatory about them?

I'm glad you took the risk of writing about the scissor stuff

Malcolm Ocean 2023-06-22

your footnote, about the nebulosity of gender, helped loosen something in a pretty major way for me that had gotten slightly hooked by certain memes, so I appreciate that. I’m still untangling it all but this helped substantially.

and I’m starting to write about it more publicly myself, to increase the quantity of voices that are speaking at all while attempting to sincerely understand things while not taking sides or implying the issues are simple and non-nebulous.

Titanic Disaster

SusanC 2023-06-24

So, I’m watching the news reports of the demise of submersible at the Titanic site, and thinking: the first AI disaster is going to be like this too. i.e. an obviously unsafe system is deployed until people die, and then we all say how it was obviously unsafe all along.

gender and disasters (not necessarily simultaneously)

David Chapman 2023-06-24

Malcolm — oh, good, I’m glad that was helpful. I’d like to read what you write about this; would you post a link here, or send it to me some other way? Thanks!

SusanC — that seems plausible; although I would guess that the first major disaster may be sufficiently indirect that lots of people will say “well, it’s not really the fault of AI.” That’s already happened with deaths caused by self-driving cars, actually.