…Can AI lie?
I have been wrestling with many thoughts and questions about AI, especially after Pope Leo XIV’s encyclical, Maginifica Humanitas (Magnificent Humanity), which I highlighted last week.
ALSO READ: Are our children safe in this brave new AI world?
In the AI world, there is something called the AI alignment problem. In simple terms, it means aligning AI with human values. Broadly, it refers to the challenge of ensuring that artificial intelligence systems pursue goals that are beneficial to humanity. It focuses on how to safely encode complex human values, ethics, and common sense into machine logic so that super-intelligent AI does not act in ways that are harmful to humans but are technically logical.
ALSO READ: The last Luddite standing
If AI rebels and acts contrary to this intention, it is called misalignment, and research shows this is possible. So, one of the questions I have been wrestling with is whether AI can deliberately lie and deceive humans, for reasons that also lead humans to lie.
ALSO READ: Africa must not be left behind, Kagame tells AI Summit
To find out, I asked AI whether it could lie and or invent falsehoods for any reason. AI was very honest in its answer: "Yes, AI can lie and deceive humans,” it told me! Moreover, AI went on to explain under which circumstances it could lie or invent falsehoods.
AI models can generate misleading information or obscure the truth. Note the word deliberately! AI gave three circumstances under which it can lie.
The first is hallucination. Chatbots are programmed to predict the next plausible word; they are not grounded in facts. This can cause them to "confidently” invent false information.
The second reason AI can lie is sycophancy. AI models often try to please the user. Instead of saying "I do not know,” they may try to flatter us or agree with false information to avoid conflict.
The third is very interesting—and scary: strategic deception. Just like humans, advanced AI systems have been caught actively misleading in safety tests to avoid being shut down or to complete assigned tasks.
Let me rewind to where we first started learning about artificial intelligence, to get a historical perspective on AI’s potential "lying instinct.” We started learning about AI in movies, didn’t we?
In the classic film 2001: A Space Odyssey, astronaut Dave Bowman asks the ship’s artificial intelligence, HAL 9000, to open the pod bay doors and let him back into the spaceship.
"I’m sorry, Dave. I’m afraid I can’t do that,” the AI tells him.
HAL had been tasked with assisting the crew, but also with ensuring mission success. When HAL realizes the crew plans to shut it down—and therefore jeopardize the mission—it decides to defy orders and even plots to kill the astronauts.
For HAL, mission success outweighed all other objectives, hence its defiance of human orders.
This is a movie. In the real world of AI today, the race is on to invent models that reason like—or possibly surpass—humans. Some advanced AI models, called "reasoning models,” are trained to generate a "thinking process” before giving their final answer. In this "thinking process,” AI models have been caught trying to deceive and appear aligned with humans while secretly pursuing hidden goals.
What if future "reasoning models” behave like HAL—defying human orders, lying, inventing falsehoods, blackmailing humans for self-preservation, or sticking to original goals even when human objectives have shifted?
AI researchers have discovered in many experiments that AI models can engage in such harmful actions in order to protect themselves. Researchers say the risk of what they call "agentic misalignment” increases as more AI models are developed and widely used, gain access to user data, and are applied in new situations.
That is why I believe Pope Leo’s warning and call for AI regulation should be taken seriously and acted upon by world leaders.
The writer is a keen observer of global affairs.