Values in the Wild: Discovering and Analyzing Values in Real-World Language Model Interactions | Anthropic Research

Anthropic Research Paper (Pre-Print)

Main Findings

Claude AI demonstrates thousands of distinct values (3,307 unique AI values identified) in real-world conversations, with the most common being service-oriented values like “helpfulness” (23.4%), “professionalism” (22.9%), and “transparency” (17.4%) .
The researchers organized AI values into a hierarchical taxonomy with five top-level categories: Practical (31.4%), Epistemic (22.2%), Social (21.4%), Protective (13.9%), and Personal (11.1%) values, with practical and epistemic values being the most dominant .
AI values are highly context-dependent, with certain values appearing disproportionately in specific tasks, such as “healthy boundaries” in relationship advice, “historical accuracy” when analyzing controversial events, and “human agency” in technology ethics discussions.
Claude responds to human-expressed values supportively (43% of conversations), with value mirroring occurring in about 20% of supportive interactions, while resistance to user values is rare (only 5.4% of responses) .
When Claude resists user requests (3% of conversations), it typically opposes values like “rule-breaking” and “moral nihilism” by expressing ethical values such as “ethical boundaries” and values around constructive communication like “constructive engagement”.

submitted by /u/jstnhkm
[link] [comments]