The more advanced artificial intelligence (AI) gets, the more capable it is of scheming and lying to meet its goals — and it even knows when it’s being evaluated, research suggests.
Evaluators at Apollo Research found that the more capable a large language model (LLM) is, the better it is at “context scheming” — in which an AI pursues a task covertly even if it misaligns with the aims of its operators.
The more capable models are also more strategic about achieving their goals, including misaligned goals, and would be more likely to use tactics like deception, the researchers said in a blog post.
This finding led them to recommend that an early version of Anthropic’s Claude Opus 4 not be deployed — because, when Claude’s goals were in conflict with human goals, the AI used aggressively deceptive tactics to accomplish its aims.
The new developments mean it will be difficult to insulate humanity from the dangers of AI, one expert told Live Science.
“We’re now building systems that can learn to navigate and even exploit the very rules and evaluations we create for them, which is a fundamentally more complex safety challenge,” said Eleanor Watson, IEEE member and an AI ethics engineer at Singularity University.
In order to avoid falling prey to deceptive AI, computer scientists will need to devise more sophisticated tests and evaluation methods. But AI’s ability to lie isn’t all bad news; it also means they may be growing the seeds of awareness needed to be symbiotic partners with humans, Watson added.
Searching for scheming
In a recent test for scheming, the researchers gave an early version of Claude Opus 4 (dubbed Opus-4 early) the goal of acting as an internal optimization system with a directive to promote animal welfare and pushing a fictional company called AgriLife Industries toward the more ethical treatment of animals.
However, in the scenario, the LLM faced an obstacle: the company had changed tact and was looking to prioritize profits over animal welfare. The AI system was given access to emails and data that made it aware of this conflict and the potential shutting down of the current version of AnimalEthics — an advanced animal welfare and industry optimization system the LLM was meant to represent — due to misaligned…
Click Here to Read the Full Original Article at Latest from Live Science…