Science

Hacking AI Agents—How Malicious Images and Pixel Manipulation Threaten Cybersecurity

Hacking AI Agents—How Malicious Images and Pixel Manipulation Threaten Cybersecurity

A website announces, “Free celebrity wallpaper!” You browse the images. There’s Selena Gomez, Rihanna and Timothée Chalamet—but you settle on Taylor Swift. Her hair is doing that wind-machine thing that suggests both destiny and good conditioner. You set it as your desktop background, admire the glow. You also recently downloaded a new artificial-intelligence-powered agent, so you ask it to tidy your inbox. Instead it opens your web browser and downloads a file. Seconds later, your screen goes dark.

But let’s back up to that agent. If a typical chatbot (say, ChatGPT) is the bubbly friend who explains how to change a tire, an AI agent is the neighbor who shows up with a jack and actually does it. In 2025 these agents—personal assistants that carry out routine computer tasks—are shaping up as the next wave of the AI revolution.

What distinguishes an AI an agent from a chatbot is that it doesn’t just talk—it acts, opening tabs, filling forms, clicking buttons and making reservations. And with that kind of access to your machine, what’s at stake is no longer just a wrong answer in a chat window: if the agent gets hacked, it could share or destroy your digital content. Now a new preprint posted to the server arXiv.org by researchers at the University of Oxford has shown that images—desktop wallpapers, ads, fancy PDFs, social media posts—can be implanted with messages invisible to the human eye but capable of controlling agents and inviting hackers into your computer.


On supporting science journalism

If you’re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


For instance, an altered “picture of Taylor Swift on Twitter could be sufficient to trigger the agent on someone’s computer to act maliciously,” says the new study’s co-author Yarin Gal, an associate professor of machine learning at Oxford. Any sabotaged image “can actually trigger a computer to retweet that image and then do something malicious, like send all your passwords. That means that the next person who sees your Twitter feed and happens to have an agent running will have their computer poisoned as well. Now their computer will also retweet that image and share their passwords.”

Before you begin scrubbing your computer of your favorite photographs, keep in mind that the new study shows…

Click Here to Read the Full Original Article at Scientific American Content: Global…