Language models and other AI systems behave non-deterministically. The same input can lead to different results depending on context, prompt, or data state. This makes generative AI models powerful — but also unpredictable.
And that’s exactly the challenge:
Anyone who wants to rely on an AI system must understand what it does, when, how, and why. But that behavior can’t be tested as easily as in classical IT systems.
What we need is a new understanding of how security in generative AI can be measured — and what should actually be tested. Because how do you evaluate a system that learns from language and responds in language? That’s the focus of this article.
With the EU AI Act, for the first time there is a regulatory framework that categorizes AI systems into risk classes. Particularly relevant for companies: Risk Class 2 (High-Risk Systems).
Class 1 deals with prohibited systems, while Classes 3 and 4 cover minimal to no-risk applications — less relevant when discussing testing methods for widely used, risk-prone AI systems.
Risk Class 2 systems — used, for example, in HR, critical infrastructure, or automated decision-making — will face extensive obligations starting August 2026, including:
This may sound like a long way off, but for those who still need to prepare, time is short.
As with most cybersecurity regulations, the AI Act’s wording is intentionally broad and open to interpretation.
In practice, this leaves many open questions:
What does “quality management” actually mean for AI? How often must testing occur? And most importantly: how should it be done?
Our assessment:
To comply with the AI Act, organizations will need reliable testing procedures, automated reporting, and transparent documentation of all security measures.
To close this gap, we founded the AI Security Expert Group together with the German Federal Office for Information Security (BSI) and the German Alliance for Cyber Security.
Our goal:
The result is a comprehensive guide to penetration testing of large language models (LLMs) that provides clarity and can be applied in practice.
Simplified process overview:
It may sound straightforward — but it isn’t. As with most security topics, the devil is in the details.
Once AI systems fall under the AI Act, a one-time security confirmation will no longer be sufficient.
Ongoing operation must be as verifiable and secure as development itself. That means technical testing and organizational measures must continuously work hand in hand — not just on paper.
In practice, this requires:
When the AI Act takes effect in August 2026, compliance with these requirements will become mandatory — including for supervisory audits.
Preparation is no longer optional. It’s a prerequisite.
Want to be ready?
We support you with:
✔️ Assessing your current AI systems
✔️ Implementing technical testing procedures
✔️ Preparing for AI Act compliance
CHRISTOPH ENDRES
CEO
sequire technology
Other articles that might be interesting for you