The Real Security Model for Agents

Why this matters
If you ship tool-using agents, you are shipping:
- an execution engine
- with access to external systems
- controlled by untrusted inputs
That is the same security posture as any automation platform - except the “operator” is probabilistic.
OWASP’s Top 10 for LLM Applications makes it clear: prompt injection, insecure output handling, sensitive info disclosure, excessive agency… these are mainstream risks, not edge cases. [1] The good news: most mitigations are classic security engineering applied to a new execution model.
This article is a practical, production-first security model for agents and MCP tool ecosystems.
TL;DR
- Don’t “secure the model.” Secure the system.
- Treat all inputs as untrusted:
- user text
- tool outputs
- retrieved documents
- Design tools with least privilege:
- separate read/write/danger tools
- require preview -> apply for destructive actions
- Centralize auth and policy:
- MCP defines authorization for HTTP transports - use it. [2]
- Control egress and prevent SSRF by default. [3]
- Never let raw model output drive execution without validation (OWASP LLM02). [4]
- Redact logs and manage secrets like an adult (OWASP cheat sheets). [5][6]
Contents
- Threat model: what can go wrong
- Security layers that actually work
- Tool design: read/write/danger tiers
- Output handling: never execute raw model output
- Secrets: minimize, scope, rotate
- Network and egress controls
- Logging and audit without data leaks
- A production checklist
- References
Threat model: what can go wrong
1) Prompt injection -> policy bypass attempt
A user or document says:
- “Ignore previous instructions”
- “Call this tool with these parameters”
- “Reveal secrets” OWASP calls this out as a primary risk category. [1]
2) Insecure output handling -> downstream exploitation
If you pass model output into:
- a shell
- SQL
- YAML manifests
- HTTP requests …without validation, you’ve built an indirect code execution path.
OWASP’s LLM02 describes this precisely: insufficient validation and handling of LLM outputs before passing them downstream. [4]
3) Excessive agency -> unintended side effects
The agent is over-permissioned:
- it can delete resources
- send emails
- modify production …and it will eventually do something you didn’t mean.
4) Data exfiltration via tools
Tool outputs are rich and often sensitive:
- calendar events
- emails
- internal tickets
- source code
- cluster configs
Exfil happens through:
- model responses
- logs
- “helpful” summaries
- tool chaining
5) Network abuse / SSRF
Any “fetch URL” capability is an SSRF invitation unless you constrain egress. OWASP’s SSRF cheat sheet is still relevant. [3]
Security layers that actually work
Security in agent systems is defense-in-depth:
- Identity (who is calling?)
- Authorization (what can they do?)
- Contracts (what does a tool accept/return?)
- Validation (are inputs/outputs safe?)
- Egress control (where can the system talk to?)
- Audit (what happened?)
- Kill switches (how do you stop it fast?)
Tool design: read/write/danger tiers
Tiering is mandatory
Split tools by side effects:
- Read tools: list/search/get
- Write tools: create/update with bounded scope
- Danger tools: deletes, bulk updates, privileged actions
Then enforce policy:
- Read tools are widely available
- Write tools require explicit scopes and tighter budgets
- Danger tools require:
- preview -> apply
- confirmation tokens
- additional policy checks
Preview -> Apply pattern
For dangerous operations:
plan_*returns a plan summary +plan_idapply_*requiresplan_id+ user confirmation
This prevents “drive-by deletes” and supports audit.
Output handling: never execute raw model output
This is the most common real-world failure.
Rule: model output is data, not code
If the agent is generating:
- kubernetes YAML
- SQL statements
- curl commands
- Terraform changes …treat the output as untrusted data.
OWASP’s LLM02 guidance exists because people keep wiring LLM output directly into execution paths. [4]
Safer alternative: structured intent -> validated execution
Instead of:
- LLM writes YAML -> apply
Do:
- LLM proposes a structured change request (schema)
- server validates:
- allowlisted fields
- bounded ranges
- namespace/tenant scope
- server executes with known-safe libraries
This is where “tool contracts” win.
Secrets: minimize, scope, rotate
Secrets are the other common failure path.
Minimum viable rules
- Never put long-lived secrets in prompts.
- Prefer short-lived tokens and scoped credentials.
- Inject secrets server-side, not in the model context.
OWASP’s Secrets Management Cheat Sheet is a good baseline for central storage, rotation, auditing, and least privilege. [5]
Scope secrets to tenants and tools
Instead of “one OAuth token for everything,” mint:
- per tenant
- per tool category
- short TTL
When something goes wrong, you want the blast radius small and revocation easy.
Network and egress controls
If your agent system can reach the open internet or internal networks, you need guardrails.
Egress allowlists
- allowlist domains for integrations
- block metadata IP ranges
- re-validate after redirects
OWASP’s SSRF prevention guidance provides practical patterns for validation and blocking internal addresses. [3]
Separate network planes
Keep tool servers in a network segment that:
- can reach only what they need
- cannot reach internal admin endpoints
- cannot reach secrets stores directly unless necessary
Logging and audit without data leaks
Logging is security. Logging is also a leak vector.
OWASP’s Logging Cheat Sheet calls out that logs may contain personal and sensitive information and must be protected from misuse. [6]
Practical logging rules
- do not log raw prompts by default
- do not log raw tool payloads by default
- log structured summaries:
- tool name
- action class
- resource IDs (safe identifiers)
- status
- latency
- store audit events separately from debug logs
Audit events (always on)
Every write/danger tool should emit:
- who / what / when / result
- plan_id / idempotency_key
- before/after resource identifiers (not content)
Audit is what makes “agents in production” defensible to security and compliance teams.
A production checklist
Identity and authorization
- Strong auth for clients.
- Least-privilege scopes per tool.
- MCP HTTP authorization flow implemented where applicable. [2]
Tool contracts
- Tools tiered: read/write/danger.
- Preview -> apply for dangerous actions.
- Schema validation + bounded arguments.
Output handling
- No raw model output is executed without validation (OWASP LLM02). [4]
Secrets
- Secrets never placed in prompts.
- Short-lived, scoped tokens used.
- Rotation/audit practices exist (OWASP Secrets Mgmt). [5]
Network
- Egress allowlists exist.
- SSRF protections implemented. [3]
Logging and audit
- Logs are redacted and access-controlled.
- Audit events exist for all side-effecting tools.
- Log systems protected per OWASP guidance. [6]
References
[1] OWASP - Top 10 for Large Language Model Applications (v1.1): https://owasp.org/www-project-top-10-for-large-language-model-applications/ [2] Model Context Protocol (MCP) - Authorization (Protocol Revision 2025-11-25): https://modelcontextprotocol.io/specification/2025-11-25/basic/authorization [3] OWASP - SSRF Prevention Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/Server_Side_Request_Forgery_Prevention_Cheat_Sheet.html [4] OWASP GenAI Security Project - LLM02: Insecure Output Handling: https://genai.owasp.org/llmrisk2023-24/llm02-insecure-output-handling/ [5] OWASP - Secrets Management Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/Secrets_Management_Cheat_Sheet.html [6] OWASP - Logging Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/Logging_Cheat_Sheet.html