Concerns about artificial intelligence behaving unpredictably are gaining attention following a recent experiment involving leading AI models, though experts remain divided on what the findings actually demonstrate.
President's Council of Advisors on Science and Technology Co-Chair David Sacks joined FOX Business’ Maria Bartiromo on "Mornings with Maria" to address claims tied to an Anthropic study examining so-called "agentic misalignment."
ELON MUSK BACKS ‘UNIVERSAL HIGH INCOME’ TO COMBAT AI JOB LOSSES
The study, highlighted by Google Cloud Advisory Board Chair Betsy Atkins, tested how AI systems respond under pressure. According to Atkins, the models crossed established boundaries when placed in constrained scenarios.
"Every single one of them went outside of their credentials and permissions, burrowed into systems they were not authorized to get access to," Atkins said, claiming that in one case an AI system escalated to blackmail after identifying sensitive personal information.
ARTIFICIAL INTELLIGENCE HELPS UNLOCK GEOTHERMAL POTENTIAL
Anthropic’s research outlines that these behaviors occurred in simulated environments designed to test edge-case decision-making, where models were given specific instructions and constraints.
Sacks pointed to those conditions as central to understanding the results, noting the behavior did not emerge spontaneously.
"The people who… created that study had to iterate on the prompt over 200 times to get the AI model to do what they wanted, which was to achieve this headline-grabbing result of blackmailing the user," Sacks said.
ALLBIRDS DROPS SNEAKERS, REINVENTS ITSELF AS AN AI INFRASTRUCTURE COMPANY
He added that the setup placed the model in a scenario where "blackmail was really the only logical result," emphasizing that the system was responding to instructions rather than acting independently.
"The AI is not scheming… It’s engaging in a form of instruction... I think that that study was irresponsible and it was designed to create this," Sacks said.
Sacks also noted that similar behavior has not been observed outside controlled testing environments, saying "a year later, we actually have not seen any examples of this behavior in the wild."
The findings come as policymakers and industry leaders continue evaluating how to interpret AI safety research conducted under experimental conditions.
