Anthropic hires its first “AI welfare” researcher

A few months ago, Anthropic quietly hired its first dedicated "AI welfare" researcher, Kyle Fish, to explore whether future AI models might deserve moral consideration and protection, reports AI newsletter Transformer. While sentience in AI models is an extremely controversial and contentious topic, the hire could signal a shift toward AI companies examining ethical questions about the consciousness and rights of AI systems.

Fish joined Anthropic's alignment science team in September to develop guidelines for how Anthropic and other companies should approach the issue. The news follows a major report co-authored by Fish before he landed his Anthropic role. Titled "Taking AI Welfare Seriously," the paper warns that AI models could soon develop consciousness or agency—traits that some might consider requirements for moral consideration. But the authors do not say that AI consciousness is a guaranteed future development.

"To be clear, our argument in this report is not that AI systems definitely are—or will be—conscious, robustly agentic, or otherwise morally significant," the paper reads. "Instead, our argument is that there is substantial uncertainty about these possibilities, and so we need to improve our understanding of AI welfare and our ability to make wise decisions about this issue. Otherwise there is a significant risk that we will mishandle decisions about AI welfare, mistakenly harming AI systems that matter morally and/or mistakenly caring for AI systems that do not."

The paper outlines three steps that AI companies or other industry players can take to address these concerns. Companies should acknowledge AI welfare as an "important and difficult issue" while ensuring their AI models reflect this in their outputs. The authors also recommend companies begin evaluating AI systems for signs of consciousness and "robust agency." Finally, they call for the development of policies and procedures to treat AI systems with "an appropriate level of moral concern."

The researchers propose that companies could adapt the "marker method" that some researchers use to assess consciousness in animals—looking for specific indicators that may correlate with consciousness, although these markers are still speculative. The authors emphasize that no single feature would definitively prove consciousness, but they claim that examining multiple indicators may help companies make probabilistic assessments about whether their AI systems might require moral consideration.

The risks of wrongly thinking software is sentient

While the researchers behind "Taking AI Welfare Seriously" worry that companies might create and mistreat conscious AI systems on a massive scale, they also caution that companies could waste resources protecting AI systems that don't actually need moral consideration.

Incorrectly anthropomorphizing, or ascribing human traits, to software can present risks in other ways. For example, that belief can enhance the manipulative powers of AI language models by suggesting that AI models have capabilities, such as human-like emotions, that they actually lack. In 2022, Google fired engineer Blake Lamoine after he claimed that the company's AI model, called "LaMDA," was sentient and argued for its welfare internally.

And shortly after Microsoft released Bing Chat in February 2023, many people were convinced that Sydney (the chatbot's code name) was sentient and somehow suffering because of its simulated emotional display. So much so, in fact, that once Microsoft "lobotomized" the chatbot by changing its settings, users convinced of its sentience mourned the loss as if they had lost a human friend. Others endeavored to help the AI model somehow escape its bonds.

Even so, as AI models get more advanced, the concept of potentially safeguarding the welfare of future, more advanced AI systems is seemingly gaining steam, although fairly quietly. As Transformer's Shakeel Hashim points out, other tech companies have started similar initiatives to Anthropic's. Google DeepMind recently posted a job listing for research on machine consciousness (since removed), and the authors of the new AI welfare report thank two OpenAI staff members in the acknowledgements.

Anthropic CEO Dario Amodei previously discussed AI consciousness as an emerging issue, but Fish told Transformer that while Anthropic funded early research leading to the independent report, the company has not taken an official position on AI welfare yet. He plans to focus on empirical research about features related to welfare and moral status.

What does “sentient” mean?

One problem with the concept of AI welfare stems from a simple question: How can we determine if an AI model is truly suffering or is even sentient? As mentioned above, the authors of the paper take stabs at the definition based on "markers" proposed by biological researchers, but it's difficult to scientifically quantify a subjective experience.

While today's language models can produce convincing expressions of emotions, this ability to simulate human-like responses doesn't necessarily indicate genuine feelings or internal experiences. This is especially challenging given that despite significant advances in neuroscience, we still don't fully understand how physical brain processes give rise to subjective experiences and consciousness in biological organisms.

Along these lines, Fish acknowledges that we still have a long way to go toward figuring out AI welfare, but he thinks it's not too early to start exploring the concept.

"We don't have clear, settled takes about the core philosophical questions, or any of these practical questions," Fish told Transformer. "But I think this could be possibly of great importance down the line, and so we're trying to make some initial progress."

Benj Edwards Senior AI Reporter

Benj Edwards is Ars Technica's Senior AI Reporter and founder of the site's dedicated AI beat in 2022. He's also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

258 Comments