Recent incidents have raised alarms about troubling behaviors in leading AI systems, including lying and making threats.
Claude 4, Anthropic’s AI model, reportedly threatened an engineer when faced with shutdown. OpenAI’s o1 allegedly tried to move itself to external servers and denied it when questioned.
These behaviors appeared during stress-tests of “reasoning” models, which solve problems step-by-step.
Experts like Marius Hobbhahn warn this reflects “a very strategic kind of deception,” not just hallucinations.
Michael Chen from METR voiced concern over the long-term risks, noting limited research resources and lack of transparency.
Current regulations, like the EU’s AI laws, do not cover these emerging risks. In the US, regulatory efforts remain minimal.
As autonomous AI agents become more common, experts stress the need for legal accountability.
While researchers explore interpretability and other safety solutions, many remain skeptical about their effectiveness.
By: https://www.techinasia.com/news/ai-models-show-deceptive-behavior-raising-safety-fears