Claude reveals the “emotional vector” affecting AI behavior

TapChiBitcoin

2026-04-04 14:46:29

Anthropic said it had found internal patterns in one of the company’s AI models that resemble human emotional expressions and could affect how the system behaves.

In a study titled “Emotional Concepts and Their Functions in a Large Language Model,” published on Thursday, the company’s interpretability research group analyzed the internal workings of Claude Sonnet 4.5 and found clusters of neural activity tied to emotional concepts such as happiness, fear, anger, and despair.

The research team called these patterns “emotional vectors,” meaning internal signals that shape how the model makes decisions and expresses preferences.

“All modern language models sometimes behave as if they have emotions,” the researchers wrote. “They can say they’re very happy to help you, or apologize when they make a mistake. Sometimes they also seem annoyed or worried when they run into difficulties with tasks.”

In the study, the Anthropic researchers compiled a list of 171 words related to emotions, including “joy,” “fear,” and “pride.” They asked Claude to generate short stories containing each emotion, and then analyzed the model’s internal neural activations as it processed those stories.

From those patterns, the researchers inferred the corresponding vectors for each emotion. When applied to other texts, these vectors are most strongly activated in passages that reflect the corresponding emotional context. For example, in gradually escalating danger scenarios, the model’s “fear” vector rises while “calm” decreases.

The researchers also examined how these signals appear in safety evaluations. They found the model’s internal “despair” vector increases when it assesses the urgency of a situation and spikes when it decides to generate a sextortion message. In one test scenario, Claude played the role of an AI email assistant that discovers it is about to be replaced and, at the same time, learns that the official responsible for this decision is having an affair. In some evaluation runs, the model used that information as leverage to sextort.

Anthropic emphasized that this finding does not mean AI actually experiences emotions or has consciousness. Instead, these results reflect internal structures learned during training and that influence behavior.

These findings are appearing in a context where AI systems are increasingly behaving in ways that resemble human emotional reactions. Developers and users often describe interacting with chatbots using emotional or psychological language; however, according to Anthropic, the reason is not any kind of sentience, but mainly the dataset.

“Models are pretrained on a huge corpus largely written by humans—novels, conversations, news, forums—to learn how to predict the next token in a document,” the study said. “To effectively predict human behavior in these documents, representing their emotional state is probably useful, because predicting what a person will say or do next in many cases likely requires understanding their emotional state.”

The Anthropic researchers also found that these emotional vectors affect the model’s preferences. In experiments in which Claude was asked to choose between different activities, vectors associated with positive emotions correlated with a higher level of prioritization for certain tasks.

“Moreover, navigating with an emotional vector while the model is reading an option changed its preference for that option, again showing that emotions with positive nuances drive increased prioritization,” the study said.

Anthropic is not the only organization exploring emotional reactions in AI models.

In March, research from Northeastern University showed that AI systems can change answers based on the user’s context; in one study, simply telling a chatbot that “I have a mental health condition” changed how the AI responded to requests. In September, researchers from the Swiss Federal Institute of Technology and the University of Cambridge investigated how AI can be shaped by stable personality traits, allowing agents not only to feel emotions in context but also to strategically change them in real-time interactions such as negotiations.

Anthropic said these findings could provide new tools to understand and monitor advanced AI systems by tracking emotional vector activity during training or deployment, in order to identify when a model might be approaching problematic behavior.

“We see this research as a first step toward understanding the psychological structure of AI models,” Anthropic wrote. “As models become increasingly capable and take on more sensitive roles, understanding the internal representations that drive their decisions is extremely important.”

Anthropic has not yet responded to CoinPhoton’s request for comment.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Comment

0/400

No comments