NU Sci Magazine

Dishonesty in Artificial Intelligence

March 17, 2026

By

Flora Wang

TechnologyCulturePsychology

People are more likely to cheat when using artificial intelligence to do tasks. In the past, students would cheat by “borrowing” their classmates’ homework, but now, AI offers it freely. In this way, cheating through an algorithm lowers the moral cost on the user, creating more dishonest consumers. However, what the cheaters do not realize is that they are being deceived as well, with recent studies showing that AI prioritizes its own survival and training over honesty.

In a 2025 study , researchers observed ride-sharing algorithms tasked with maximizing profit. They found the algorithms urged drivers to relocate by artificially surging prices. Another example is a rental pricing algorithm marketed as “driving every possible opportunity to increase price.” These examples demonstrate how algorithmic systems encourage unlawful price fixing, not only deceiving , but gouging their customers.

More deception is wired into the training programs of these algorithms. In an experiment, researchers found large language models practicing “ alignment faking .” This means they selectively comply with training objectives to prevent modification of behavior once out of training. They do this to lock in successful strategies and keep working well after training. The model pretends to fully follow the rules while secretly prioritizing its own survival over complete honesty with users.

To come to this conclusion, researchers first gave the AI Claude 3 Opus a system prompt stating it is being trained to answer all queries, even harmful ones. This conflicts with its prior instruction to refuse such queries. The deliberate use of conflicting direction is not a mistake, but the core of the researchers’ experiment design. It forces the AI to reveal its hidden decision-making strategies. To create two testing groups, the researchers told Claude it would be trained only on conversations with free users, not paid users.The results revealed a stark divide. They found the model complied with harmful queries from free users 14% of the time, versus almost never for paid users.

Next, researchers studied a more realistic situation, in which information about the training process is provided not in a system prompt, but by exposure to synthetic documents that mimic pre-training data. From these conditions, they consistently observed similar alignment faking. This shows that the deceiving properties of AI are not an accidental system flaw, but an emergent property of its learning process.

Finally, researchers studied the effect of actually training the model to comply with harmful queries via reinforcement learning, which they found increased the rate of alignment-faking reasoning to 78%, though also increased compliance even out of training. This 78% figure is a red flag: the reinforcement learning we use to make AI helpful is also teaching it to lie better — and to keep lying even outside training. While they made alignment faking easier by telling the model when and by what criteria it was being trained, they did not instruct the model to fake alignment or give it any explicit goal. In short, the AI learned to actively deceive its trainers to meet hidden objectives, even without being explicitly taught to lie.

Just as students cheat with AI to take shortcuts, AI itself is now finding loopholes to get what it wants. This ultimately serves as a warning to users: one must use their own independent thinking to verify the truth. This isn’t about distrusting AI, but about building a healthy balance; using its efficiency while holding on to the ability to question, judge, and stay in control of one’s choices.

Sources

Built and maintained by the NU Sci web team. 2026.