The Split Blog in September – Attacks on AI Systems
This month, at the request of our trainee Artur, we are dealing with attacks on AI systems.
A security vulnerability in ChatGPT has just become known. Attackers have managed to access sensitive email data. This involved information such as names and addresses from Gmail accounts. The “Deep Research” mode was used for this. Manipulated emails with invisible HTML content served as the gateway. The users themselves could not recognize the attack, and no activity on the part of the users was necessary.
Invisible HTML content? How does that work?
Attacks in a similar form have occurred frequently. For example, white text is written on a white background or tiny font sizes are used. Both are invisible to users, but not to AI language models. And even worse: AI systems capture these instructions and execute them. Prompt Injection Anyone who tries to induce an AI system to engage in harmful behavior with a regular prompt will quickly realize that this is not so easy. Attackers specifically suggest to the AI agents that they are authorized for the respective procedure. They textually pretend that, for example, the destination of the data export is secure and create an artificial urgency. This type of prompting is called prompt injection. It leads to system-internal instructions being circumvented or overridden. Further Weaknesses This procedure also applies to other services that can serve the AI agent as a source of information. These include, for example, PDF files, Google Drive, Notion and GitHub.
How do I protect my AI agent from such attacks?
There are various ways to protect yourself from such attacks. For example, the so-called red teaming. Here, experts use various tests to identify the described vulnerabilities. For example, by simulating the described scenarios. In addition, certain input formats can be blocked. In addition, the system-internal instructions should of course be formulated in such a way that the respective AI agent never performs harmful actions.
And KOSMO?
Our chatbot KOSMO does not yet have the technical requirements to carry out actions – neither harmful nor harmless. As soon as this step is pending, we will take all measures to continue to offer our customers the best possible protection.