Values in the Wild: Discovering and Analyzing Values in Real-World Language Model Interactions | Anthropic Research
Values in the Wild: Discovering and Analyzing Values in Real-World Language Model Interactions | Anthropic Research

Values in the Wild: Discovering and Analyzing Values in Real-World Language Model Interactions | Anthropic Research

Anthropic Research Paper (Pre-Print)

Main Findings

  • Claude AI demonstrates thousands of distinct values (3,307 unique AI values identified) in real-world conversations, with the most common being service-oriented values like “helpfulness” (23.4%), “professionalism” (22.9%), and “transparency” (17.4%) .
  • The researchers organized AI values into a hierarchical taxonomy with five top-level categories: Practical (31.4%), Epistemic (22.2%), Social (21.4%), Protective (13.9%), and Personal (11.1%) values, with practical and epistemic values being the most dominant .
  • AI values are highly context-dependent, with certain values appearing disproportionately in specific tasks, such as “healthy boundaries” in relationship advice, “historical accuracy” when analyzing controversial events, and “human agency” in technology ethics discussions.
  • Claude responds to human-expressed values supportively (43% of conversations), with value mirroring occurring in about 20% of supportive interactions, while resistance to user values is rare (only 5.4% of responses) .
  • When Claude resists user requests (3% of conversations), it typically opposes values like “rule-breaking” and “moral nihilism” by expressing ethical values such as “ethical boundaries” and values around constructive communication like “constructive engagement”.
submitted by /u/jstnhkm
[link] [comments]