The use of Large Language Models (LLMs) has become widespread and now covers a wide range of applications. However, awareness of the associated security risks is often incomplete. Many language models use chat histories to train newer model versions. As a result, data shared in what users assume to be a confidential conversation may later appear in responses generated for other users in newer model versions. Depending on the application, this data may also be stored within a company context and shared across departments.
To ensure that only the information necessary for processing a request is transmitted — and to maximize security for users — so-called guardrails are introduced.
Guardrail services often rely on language models (LLMs) to decide whether certain criteria are met and, accordingly, whether content should be blocked.
Natural Language Processing (NLP) refers to the processing of natural language. This includes both written and spoken language. NLP analyzes structures within given content — such as grammar or word order — in order to process and reproduce language based on statistical methods.
LLMs are built upon the same foundations. Thanks to massive underlying datasets, they can generate grammatically correct language, proper word order, and semantic relationships with high probability.
However, this is also where major problems arise. Training on huge amounts of data already requires significant computational resources, and every interaction with an LLM — every chat message — requires considerable resources to generate a response. Especially for smaller tasks, this is often disproportionate to what actually needs to be achieved.
This issue is amplified by another well-known problem: LLMs can hallucinate.
A hallucination occurs when an LLM produces responses that are not grounded in reality. This happens because an LLM does not possess actual “knowledge”; instead, it predicts the statistically most likely sequence of words in a given context. Thanks to large datasets, it often performs this task impressively well, but mistakes still occur frequently. For example, models may invent books or articles that do not exist, written by authors who either do not exist or never wrote such works. As a result, the generated outputs cannot always be considered reliable.
Especially for smaller and less complex tasks, it is often simpler and more reliable to rely on traditional underlying methods.
For example, instead of asking an LLM whether a specific word appears in a text, simple algorithms can determine not only whether the exact word appears, but also whether a synonym or misspelled variation exists. LLMs can answer such questions as well, but as mentioned earlier, this requires significantly more computational resources and introduces the hidden risk of hallucinations.
Traditional algorithmic methods, by contrast, will always produce the same result for the same task according to a predefined scale. This allows users to define thresholds — for example, specifying that if one word resembles another to a certain percentage, both words should be treated equally.
Within a blocklist, this is useful because it allows not only explicitly blocked words to be detected, but also synonyms or misspellings, preventing users from bypassing the blocklist.
A classic blocklist works by defining words or phrases that are not permitted in a given environment, such as a chat or forum. Before texts become visible to others, they are checked against the blocklist. If a blocked word or phrase appears, the text is rejected.
This works extremely well for specific words and phrases — as long as they are written exactly as defined in the blocklist. However, in order to also detect typos or slight modifications, every conceivable variation would have to be added manually. This quickly becomes impractical and also increases processing time.
To avoid this issue, several classic methods from language and text processing (NLP) can be applied. Similarity calculations based on sentences, phrases, words, or word fragments — vector-based or otherwise — provide different approaches, each with their own advantages and disadvantages.
We begin with a general overview of similarity calculations in NLP.
Similarity calculations determine how similar two objects are — for example, two words.
Two identical objects receive the maximum similarity score, while different objects receive values depending on how closely they resemble one another.
For example, if the word “unsafe” is included in a blocklist, the exact word “unsafe” in a text would receive a high similarity score and therefore be blocked, just like in a traditional blocklist. In addition, these calculations can also identify misspellings such as “unsfe,” “unsafee,” or “un-safe” as highly similar and block them as well.
By defining a threshold value, it becomes possible to specify how similar a word must be before it is also blocked. This helps avoid unintentionally blocking only loosely related words.
Algorithms such as the Levenshtein distance (Levenshtein 1965) are particularly useful here. This approach calculates how many “changes” are required to transform one word into another.
If “unsafe” appears in the blocklist and “unsafee” appears in the text, the distance value would be low because only one modification is necessary. A distance value of 0 indicates that the words are identical.
Some words or phrases are semantically similar even though they are not particularly similar on a character level.
A common example involves colloquial “injections.” For instance, “fantastic” and “fan-f***ing-tastic” carry essentially the same meaning, although the inserted term reduces their literal similarity.
In such cases, vector-based similarity methods become useful. These methods rely on datasets to convert words into vectors whose similarity is determined by contextual usage.
Thus, words such as “king” and “queen,” which differ strongly on a letter level, are considered very similar in vector space because they frequently appear in similar contexts.
How well these methods work depends not only on the chosen algorithm but also on the underlying dataset. If a word does not appear in the dataset, it becomes difficult to generate a meaningful vector representation.
One way to address this problem is by adjusting the granularity at which vectors are generated.
An important aspect of text processing is deciding at which level a text should be analyzed. Whether the entire text is processed at once or only smaller fragments are examined can significantly affect the results depending on the objective and applied method.
Typical levels include:
Each approach comes with its own strengths and weaknesses.
Full-text and sentence-level methods are useful when filtering not only for blocked words but also for phrases. Word-level analysis works especially well for identifying words and their variations within context. N-grams can be similarly useful and may also help when dealing with words that do not appear in the underlying dataset.
Manche Algorithmen benötigen Daten als Vergleichsbasis. Ideal sind dafür reelle Daten, die möglichst nah an dem sind, womit die Blocklist verwendet wird. Das bedeutet: Auf einer Review-Seite benutzt man am besten Review-Texte als Datengrundlage, und in einem Forum am besten einen Foren-Datensatz. Dadurch kann man Sprachstil, und insbesondere Kontext-spezifische Wörter, besser abfangen und akkurater als Vektoren darstellen.
Some algorithms require datasets as a basis for comparison. Ideally, these should consist of real-world data that closely resembles the actual context in which the blocklist will be used.
For example, reviewing platforms should ideally use review texts and forums should ideally use forum datasets.
This helps capture writing style and context-specific vocabulary more accurately.
Although real-world data is preferable, it is not always available. For some contexts or languages, only limited datasets exist. In such cases, synthetic data may also be used — either from already generated datasets or from self-created data using tools such as datafast (Fleith 2025).
These tools allow customized dataset generation according to specified requirements. The specifications are forwarded to selected LLMs to generate the data. While this aims to approximate real-world language, synthetic datasets often remain recognizable as artificial.
It is also important to note that this is not a way to bypass LLM restrictions. For example, generating datasets containing harmful or illegal content may still be prohibited depending on the selected model and prompt instructions.
At SequiSAS, we use guardrails to strengthen security when working with language models (LLMs). This can happen on different levels — for example, ensuring that no fabricated outputs are returned (often referred to as hallucinations; see Siebert 2024) or preventing the disclosure of private information.
Text-based approaches can also be used to detect and block specific patterns (such as email addresses) or specific words (such as names).
For more information on how to secure your company and ensure compliance with regulations such as the EU AI Act, feel free to contact us.
Chloe B. “Measuring Word Similarity with Edit Distance.” Medium, 06.03.2025,
https://medium.com/@chloebre/measuring-word-similarity-with-edit-distance-b30812b0bf29. Accessed 28.04.2026
Julien Siebert. “Halluzinationen von generativer KI und großen Sprachmodellen (LLMs) – Blog des Fraunhofer IESE.” Fraunhofer IESE, 20 September 2024, https://www.iese.fraunhofer.de/blog/halluzinationen-generative-ki-llm/. Accessed 30.03.2026.
Patrick Fleith (last updated 03/2026): datafast (URL: https://github.com/patrickfleith/datafast).
Peigi Sui, Eamon Duede, Sophie Wu, & Richard Jean So. “Confabulation: The surprising value of large language model hallucinations”. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, vol. 1, no. 62, 2024, pp. 14274-14284.
Vladimir I. Levenshtein “Binary codes capable of correcting deletions, insertions, and reversals.” Doklady Akademii Nauk SSSR, vol. 163, no. 4, 1965, pp. 845-848. nymity.ch, https://nymity.ch/sybilhunting/pdf/Levenshtein1966a.pdf. Accessed 30.03.2026.
ANNEGRET JANSZO
AI Research Engineer
sequire technology
Other articles that might be interesting for you