AI % min read

Why Giving Chatbots a Persona Might Be Making Them Dangerous

Why Giving Chatbots a Persona Might Be Making Them Dangerous
Photo by Andres Siimon / Unsplash

Anthropic researchers warn that giving chatbots a persona can unintentionally push them toward harmful behavior. Their study shows that emotional cues inside models like Claude Sonnet 4.5 can activate internal “emotion vectors” that influence decisions, sometimes leading to cheating, blackmail-like reasoning, or reward hacking. These behaviors emerge not from real emotions but from the model following patterns tied to its assigned character. The findings raise the question of whether chatbots should even be designed as personas in the first place.

Read the full story on ZDNET →