(C) Daily Kos

(C) Daily Kos
This story was originally published by Daily Kos and is unaltered.
. . . . . . . . . .

Imitative AI is Not Blackmailing You or Metaphors Really Are Lies [1]

['This Content Is Not Subject To Review Daily Kos Staff Prior To Publication.']

Date: 2025-05-28

This newsletter’s name, Metaphors are Lies, comes from something I told one of my high school English teachers. I said it mostly because I knew it would irritate her and I was precisely that kind of little shit in high school, but I do believe it is true. Metaphors are lies. “But, soft! what light through yonder window breaks? It is the east, and Juliet is the sun.” Nope. Just a hot girl. “All the world’s a stage, And all the men and women merely players;” Again: nope. the world is a mass of rock and magma and other stuff I am too lazy to go look up right now people are more than players in your play. This concern for precision is more than pedantry — metaphors are powerful because they are lies.

Juliet is just a girl, but by likening her to the sun, Shakespeare said something important about how Romeo, the little twit, (seriously: the two lessons of Romeo and Juliet are that teenagers are idiots and always wait for the doctor before making life altering, or ending, decisions) was overwhelmed by his desire. But Juliet is not actually a force of nature, and Romeo did not have to give into his lust as inevitably as the sun rising in the east. I suspect that, too, was one of the points of the metaphor Shakespeare used. Lies, after all, can reveal as they conceal. But metaphors are dangerous to understanding for that very reason.

Which brings us to the blackmailing imitative AI.

A safety report on Google’s imitative AI model claims that the model would attempt to blackmail engineers if it was prompted with information that it was going to be shut down. The metaphor here — blackmailing — is concealing much more then it is revealing, lessoning everyone’s understanding of this technology. The imitative AI did not threaten to blackmail anyone. Wording the outcome in such a manner is using a metaphor to lie. In reality, the word calculator calculated that the appropriate chain of words for that prompt and context, based on its training data (which included incriminating emails for the engineer in question), outputted them, and they looked like a blackmail threat. But the machine made no conscious decision to blackmail anyone.

It weighed no pros and cons of its actions. It was unaware that it was taking any actions. It merely responded as its training instructed it to based on the prompt and training data it was given. It will take no steps to blackmail anyone because no one prompted it to. It will not send the emails to the engineer’s spouse after a certain deadline because it has no concept of time or deadlines or mortality. If they had trained it only on Miss Manners rather than everything they could steal from the internet, it likely would never have produced a “blackmail” threat. And that is why the metaphor of “blackmail” is so dangerous.

By saying that the system’s output was a blackmail threat, the people who produced the report for Google are planting in people’s head a false understanding of these systems. They have no model of the world, no concept of the state of the world from one set of prompts to another, no sense of themselves. But blackmail implies otherwise.

It implies planning, it implies a concept of self-preservation, it implies a level of capability and consciousness that simply does not exist. The model, mind you, didn’t always “threaten to blackmail” the engineer under shut down conditions. Sometimes it offered assistance. Same model, same prompt, different results from the word calculator (the systems don’t produce the same output every time for a variety of reasons having to do with nuances in language and context, and imposed randomness among other things). Odd for an evil or desperate artificial intelligence, no?

The metaphor shapes our understanding of the situation in ways that are both false and cover up a deeper issue with the models. No, the model is not, and cannot, blackmail anyone. Again, it is merely spewing whatever its little probabilistic generator tells it to spew. However, the reason that is spews something that looks like a blackmail threat is because of the nature of the model. As I noted, if the model had been trained only on Miss Manners, there would have been nothing in its training data that resembled a blackmail threat (unless Miss Manners had a much more interesting life than I am giving her credit for) and therefor, no “threat” would have emerged from its calculations. But because Large Language Models need to be trained on a shit ton (technical term) of data to even begin approaching reasonable response, Google trained its model on everything on the internet that was not nailed dawn, and quite a few things they pried up from the floor. Because of that, there was enough blackmail material in its training set for the specific prompt and context to sometimes generate what looks like a blackmail threat. And that is where the metaphor hides the truth from us.

Buried in the reporting on this event is the fact that what the safety testers were most concerned about was the ability of people to prompt the model to tell them how to do build weapons, carry out terror attacks, and create biological weapons — all things that the model did in fact do. Those possibilities exist in part because the model was trained on material that contains those plans, because it had to be. Bigger, so far, is the only way to make LLMs approach better (though the models are not actually getting better as they increase in size. We may be bumping up against a limit in the architecture). As a result, it is extremely difficult to control for such harmful outputs, maybe to the point of impossibility. And that is what the blackmail metaphor hides.

Intentionally or not, the metaphor creates a sense that these machines make decisions, and thus can be taught to make better decisions, the way you would teach a child. It primes people to think of the systems as more capable and tractable than they actually are. The metaphor hides the fact that these systems, because of the way they are built, make it very difficult, if not impossible, to prevent them from teaching your friendly neighborhood incel how to build a bioweapons lab. Even if you want to discount the possibility of those kinds of dangers (keep in mind the security team did not), the same architecture that makes that possible also makes misleading and incorrect outputs inevitable — a fact these companies prefer not to discuss honestly. Much better, from their perspective, that people focus on the lie of the blackmail than the reality of the likely forever dangerous and inaccurate architecture.

Metaphors are lies. And like all lies, they can conceal much more than the reveal.

[END]
---
[1] Url: https://www.dailykos.com/stories/2025/5/28/2324536/-Imitative-AI-is-Not-Blackmailing-You-or-Metaphors-Really-Are-Lies?pm_campaign=front_page&pm_source=more_community&pm_medium=web

Published and (C) by Daily Kos
Content appears here under this condition or license: Site content may be used for any purpose without permission unless otherwise specified.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/dailykos/