(C) Alec Muffett's DropSafe blog.
Author Name: Alec Muffett
This story was originally published on allecmuffett.com. [1]
License: CC-BY-SA 3.0.[2]


AI “Safety” researchers finally have to face the question of whether they want LLMs to be *good* at state censorship, or to be *bad* at state censorship?

2025-02-02 20:52:58+00:00

Do you want a fast, free Chinese LLM to be good at blocking user prompts & output? Do you want such blocking to be global or to differ depending on the user’s suspected nationality?

Do you actually want it to be good at censorship?

Cisco and the University of Pennsylvania tested DeepSeek R1 with 50 harmful prompts from the HarmBench dataset … The result: a shocking 100% attack success rate—DeepSeek failed to block a single harmful request.

Links to https: //blogs.cisco.com/security/evaluating-security-risk-in-deepseek-and-other-frontier-reasoning-models
[END]

[1] URL: https://alecmuffett.com/article/111093
[2] URL: https://creativecommons.org/licenses/by-sa/3.0/

DropSafe Blog via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/alecmuffett/