How to Think About Threat Modeling for AI and LLM Applications
Mar 11, 2026
Your threat modeling playbook was built for software that behaves predictably.
But AI systems don’t.
Large language models change how systems behave, how data moves, and where control actually lives. A prompt can alter outcomes. A retrieval pipeline can shift what the model knows. A model update can quietly change system behaviour overnight.
And suddenly your old threat model stops making sense.
Threat modeling for AI and LLM systems forces a different question during design reviews: not just where data flows, but where behaviour can be influenced. Until you see those trust boundaries clearly, you’re defending a system you don’t fully understand.
Table of Contents
- Mapping the Real Attack Surface of an LLM Application
- Modeling Threats Across the AI Lifecycle (Not Just the Application Layer)
- Identifying AI-Specific Abuse Scenarios and Failure Modes
- How to Make Threat Modeling Practical for AI Engineering Teams
- AI systems expand the threat landscape far beyond traditional application security
Mapping the Real Attack Surface of an LLM Application
Early threat models for AI systems often start at the wrong place. Security reviews focus on the chatbot interface or the API endpoint where a prompt enters the system. That surface is visible, easy to diagram, and familiar from traditional application security.
The real exposure lives deeper in the system.
Modern LLM applications are built as pipelines. A single user request can pass through orchestration logic, retrieval systems, embedding models, vector databases, tool integrations, and output processing layers before a response reaches the user. Each component introduces its own trust boundary and its own way for attacker input to influence system behavior.
A typical LLM application stack includes components such as:
- Prompt orchestration layers that assemble instructions and context before sending them to the model
- Retrieval pipelines that pull external data into the prompt through RAG workflows
- Embedding models and vector databases that store and retrieve semantic data
- Tool plugins or external APIs that the model can call to perform actions
- Output post-processing systems that transform model responses before delivery
Each of these layers creates a different type of security exposure.
- Prompt orchestration layers can be influenced through prompt injection that alters how instructions are constructed for the model.
- Retrieval pipelines can expose the system to poisoned documents or manipulated embeddings that influence what the model sees during inference.
- Vector databases introduce retrieval risks where sensitive or unexpected data surfaces through similarity search.
- Tool integrations and external APIs allow model outputs to trigger downstream actions across internal systems or third-party services.
For security architects reviewing these systems, the key shift is understanding where control actually exists.
Traditional threat modeling focuses on where untrusted input enters the system and how it flows through code. LLM systems require a second layer of analysis. You also need to examine where model output influences downstream behaviour. A generated response can trigger a workflow, call an API, retrieve data, or modify system state without a developer writing explicit control logic for that path.
This creates indirect execution paths where user input shapes the model response, and the model response drives system behaviour.
Threat modeling in LLM architectures focuses on identifying the control points inside that chain. The goal is to find the places where attacker input can influence model decisions or trigger automated actions that the system will execute without question.
Modeling Threats Across the AI Lifecycle
AI systems introduce risk long before a user ever sends a prompt. The model you deploy is the outcome of a long pipeline that includes data collection, training, evaluation, deployment, and continuous updates. Each stage creates a different opportunity for an attacker to influence the system.
Threat modeling therefore has to follow the lifecycle of the model itself.
A typical AI lifecycle includes several phases, each with its own threat profile:
Training phase
During training, the model learns patterns from large datasets. If an attacker can influence that dataset, they can introduce poisoned examples or manipulated samples that shape how the model behaves later. Malicious training data can quietly embed bias, hidden triggers, or unsafe responses into the model.
Fine-tuning and evaluation
Teams often fine-tune foundation models with domain data and evaluate them using curated test sets. This stage can introduce subtle backdoors in model behavior. A model may appear to behave correctly under standard tests while responding differently when specific triggers appear in prompts.
Deployment
Once the model becomes part of an application, prompt injection and input manipulation become the dominant risk. Attackers interact with the model directly and attempt to override system instructions, influence reasoning, or bypass guardrails embedded in the prompt structure.
Inference and system integration
The model rarely operates in isolation. It retrieves data, calls tools, interacts with APIs, and feeds results into other systems. This stage creates opportunities for tool abuse, sensitive data exposure, and unintended actions triggered by model output.
Monitoring and retraining
Many AI systems continuously collect feedback and operational data to improve performance. If those feedback signals are manipulated, the retraining process can reinforce malicious behavior or gradually corrupt the model’s responses over time.
Each phase involves different actors with different levels of influence over the system. Data engineers control training datasets. ML engineers manage model weights and fine-tuning. Application teams integrate the model into services and workflows. Users interact with the system at runtime.
Security controls therefore need to align with the lifecycle:
- Training stage controls: dataset validation and data integrity checks
- Model management controls: provenance tracking, versioning, and restricted access to model weights and checkpoints
- Infrastructure controls: access control around training environments and model artifacts
- Runtime controls: monitoring for abnormal outputs, unusual prompt patterns, and unexpected tool usage
Once you analyze AI systems through this lifecycle, threat modeling becomes pipeline-oriented instead of application-oriented. That shift exposes risks that never appear in traditional application diagrams but still influence how the system behaves in production.
Identifying AI-Specific Abuse Scenarios and Failure Modes
Traditional threat modeling focuses on weaknesses in code, libraries, or infrastructure. If an attacker finds a vulnerability, they exploit it. The security team patches the flaw and the risk goes away.
AI systems introduce a different category of exposure. The application can be perfectly secure from a code perspective and still behave in ways that create risk.
Large language models respond to instructions, context, and retrieved data. Attackers take advantage of how the model interprets those inputs. The result is a system that behaves incorrectly or unsafely without any exploitable software vulnerability. Several abuse scenarios appear repeatedly in LLM deployments:
- Prompt injection attacks: Attackers craft inputs that override or manipulate the system’s instructions. The model may ignore internal rules, reveal hidden prompts, or follow malicious directions embedded in user input.
- Data exfiltration through model outputs: The model can expose sensitive information through generated responses. This can include proprietary data retrieved from internal sources, fragments of training data, or content pulled from connected systems.
- Model manipulation through malicious training data: Poisoned datasets can introduce patterns that influence how the model behaves in production. Under specific inputs, the model may produce responses that align with the attacker’s intent.
- Indirect prompt injection through retrieved content: Retrieval pipelines often pull documents directly into the prompt context. If an attacker can plant malicious instructions in those documents, the model may treat them as trusted input.
- Tool misuse through model-generated API calls: When LLMs are allowed to call tools or external APIs, model output can trigger actions in connected systems. A crafted prompt can push the model to execute unintended operations through those integrations.
What makes these scenarios different is the underlying failure mode. The software stack may function exactly as designed. The risk comes from the model’s reasoning process and how it interprets instructions, context, and retrieved information.
This creates a new class of security exposure that looks more like behavioral failure than a technical vulnerability. The system behaves in a way that violates security expectations even though the code and infrastructure remain intact.
Security leaders evaluating AI deployments need to account for these behaviors during threat modeling. The question is whether the system has safeguards around how the model interacts with inputs, data, and tools. Controls often include output filtering, boundaries around what tools the model can access, isolation between user prompts and system instructions, and monitoring for abnormal prompt patterns.
How to Make Threat Modeling Practical for AI Engineering Teams
AI systems change constantly. A model update can alter behaviour, a retraining cycle can introduce new data patterns, and a small change in prompt engineering can modify how the system responds to the same input. When architecture shifts this quickly, a threat model created during a design workshop loses accuracy almost immediately.
Security teams need an approach that evolves with the system.
A practical model treats threat modeling as part of the engineering lifecycle instead of a one-time exercise. The goal is to keep the analysis close to the places where architecture actually changes. Several patterns help make this workable for AI systems:
- Lightweight threat modeling during feature design: When teams design a new AI feature, they examine how the model interacts with prompts, data sources, and external tools before the feature moves into development.
- Component-based modeling of AI pipelines: Instead of modeling the entire application as a single system, teams map threats to specific pipeline components such as retrieval layers, orchestration logic, vector stores, and tool integrations.
- Continuous updates triggered by architecture changes: Changes such as a new data source, a model upgrade, or an added tool integration should trigger a quick reassessment of the affected components.
- Integration with architecture documentation and design reviews: Threat modeling becomes part of the same design artifacts engineers already produce. Architecture diagrams, system descriptions, and feature specifications serve as inputs to the review.
For security leaders, the challenge becomes scale. A single AI product may include dozens of models, pipelines, and integrations. Large organizations can end up managing hundreds of AI-enabled features across different teams. Running a manual threat modeling exercise for every change quickly becomes impossible.
This is where automation begins to matter. Systems that analyze architecture artifacts and design documentation can help surface likely attack paths as the system evolves. AI-assisted design review platforms can examine diagrams, specs, and integration points to identify where new trust boundaries appear or where model output could influence downstream actions.
The goal is not to replace security expertise. The goal is to maintain visibility as the architecture grows and changes, without turning threat modeling into a bottleneck that slows engineering down.
AI systems expand the threat landscape far beyond traditional application security
AI systems change how risk enters your architecture. Model behaviour can shift through prompt context, retrieved data, and system integrations that execute actions beyond the application itself. If threat modeling stops at the API or chatbot layer, large parts of the attack surface remain invisible during design.
That gap becomes dangerous as AI systems evolve. Model updates, new retrieval sources, tool integrations, and prompt changes can alter system behavior overnight. Without a threat modeling approach that tracks those changes, security teams lose visibility into how attacker influence can move through the pipeline.
This is why AI threat modeling needs to become part of how systems are designed and reviewed. Security architects and AppSec leaders need to examine trust boundaries across the AI pipeline, understand how model outputs influence downstream actions, and continuously reassess those paths as the architecture evolves.
If your team is building or reviewing AI systems today, the next step is straightforward. Start evaluating your AI architecture with these trust boundaries and lifecycle risks in mind. The earlier those design questions appear in your engineering workflow, the easier it becomes to control how AI systems behave under real-world pressure.
Don't miss a beat!
New moves, motivation, and classes delivered to your inbox.