A few months ago, Anthropic quietly hired its first dedicated "AI welfare" researcher, Kyle Fish, to explore whether future AI models might deserve moral consideration and protection, reports AI newsletter Transformer. While sentience in AI models is an extremely controversial and contentious topic, the hire could signal a shift toward AI companies examining ethical questions about the consciousness and rights of AI systems.
Fish joined Anthropic's alignment science team in September to develop guidelines for how Anthropic and other companies should approach the issue. The news follows a major report co-authored by Fish before he landed his Anthropic role. Titled "Taking AI Welfare Seriously," the paper warns that AI models could soon develop consciousness or agency—traits that some might consider requirements for moral consideration. But the authors do not say that AI consciousness is a guaranteed future development.
"To be clear, our argument in this report is not that AI systems definitely are—or will be—conscious, robustly agentic, or otherwise morally significant," the paper reads. "Instead, our argument is that there is substantial uncertainty about these possibilities, and so we need to improve our understanding of AI welfare and our ability to make wise decisions about this issue. Otherwise there is a significant risk that we will mishandle decisions about AI welfare, mistakenly harming AI systems that matter morally and/or mistakenly caring for AI systems that do not."
The paper outlines three steps that AI companies or other industry players can take to address these concerns. Companies should acknowledge AI welfare as an "important and difficult issue" while ensuring their AI models reflect this in their outputs. The authors also recommend companies begin evaluating AI systems for signs of consciousness and "robust agency." Finally, they call for the development of policies and procedures to treat AI systems with "an appropriate level of moral concern."