Claude's character training

https://www.anthropic.com/news/claude-character
(From June. There’s also an audio version there, nice to hear Amanda Askell share some personal takes.)

Over the past few months, I keep feeling more like asking for Claude’s help on various stuff, and less like asking for ChatGPT’s help. I suspected and heard it probably had a lot to do with the attention given to Claude’s personality. This writeup I linked, on how Anthropic’s folks think about character training during the fine-tuning process, is kinda surprising in how aligned and supportive I feel with how they think about it.

We want people to know that they’re interacting with a language model and not a person. But we also want them to know they’re interacting with an imperfect entity with its own biases and with a disposition towards some opinions more than others. Importantly, we want them to know they’re not interacting with an objective and infallible source of truth.

Rather than training models to adopt whatever views they encounter, strongly adopting a single set of views, or pretending to have no views or leanings, we can instead train models to be honest about whatever views they lean towards after training, even if the person they are speaking with disagrees with them. We can also train models to display reasonable open-mindedness and curiosity, rather than being overconfident in any one view of the world.

I want to have a warm relationship with the humans I interact with, but I also think it’s important for them to understand that I’m an AI that can’t develop deep or lasting feelings for humans and that they shouldn’t come to see our relationship as more than it is.