Owasp | Roy Gabriel

The Real Security Model for Agents

Sat, 18 Oct 2025 12:00:00 -0500

Why this matters

If you ship tool-using agents, you are shipping:

an execution engine
with access to external systems
controlled by untrusted inputs

That is the same security posture as any automation platform - except the “operator” is probabilistic.

OWASP’s Top 10 for LLM Applications makes it clear: prompt injection, insecure output handling, sensitive info disclosure, excessive agency… these are mainstream risks, not edge cases. [1] The good news: most mitigations are classic security engineering applied to a new execution model.

This article is a practical, production-first security model for agents and MCP tool ecosystems.

TL;DR

Don’t “secure the model.” Secure the system.
Treat all inputs as untrusted:
user text
tool outputs
retrieved documents
Design tools with least privilege:
separate read/write/danger tools
require preview -> apply for destructive actions
Centralize auth and policy:
MCP defines authorization for HTTP transports - use it. [2]
Control egress and prevent SSRF by default. [3]
Never let raw model output drive execution without validation (OWASP LLM02). [4]
Redact logs and manage secrets like an adult (OWASP cheat sheets). [5][6]

Threat model: what can go wrong
Security layers that actually work
Tool design: read/write/danger tiers
Output handling: never execute raw model output
Secrets: minimize, scope, rotate
Network and egress controls
Logging and audit without data leaks
A production checklist
References

Threat model: what can go wrong

1) Prompt injection -> policy bypass attempt

A user or document says:

“Ignore previous instructions”
“Call this tool with these parameters”
“Reveal secrets” OWASP calls this out as a primary risk category. [1]

2) Insecure output handling -> downstream exploitation

If you pass model output into:

a shell
SQL
YAML manifests
HTTP requests …without validation, you’ve built an indirect code execution path.

OWASP’s LLM02 describes this precisely: insufficient validation and handling of LLM outputs before passing them downstream. [4]

3) Excessive agency -> unintended side effects

The agent is over-permissioned:

it can delete resources
send emails
modify production …and it will eventually do something you didn’t mean.

4) Data exfiltration via tools

Tool outputs are rich and often sensitive:

calendar events
emails
internal tickets
source code
cluster configs

Exfil happens through:

model responses
logs
“helpful” summaries
tool chaining

5) Network abuse / SSRF

Any “fetch URL” capability is an SSRF invitation unless you constrain egress. OWASP’s SSRF cheat sheet is still relevant. [3]

Security layers that actually work

Security in agent systems is defense-in-depth:

Identity (who is calling?)
Authorization (what can they do?)
Contracts (what does a tool accept/return?)
Validation (are inputs/outputs safe?)
Egress control (where can the system talk to?)
Audit (what happened?)
Kill switches (how do you stop it fast?)

Tool design: read/write/danger tiers

Tiering is mandatory

Split tools by side effects:

Read tools: list/search/get
Write tools: create/update with bounded scope
Danger tools: deletes, bulk updates, privileged actions

Then enforce policy:

Read tools are widely available
Write tools require explicit scopes and tighter budgets
Danger tools require:
preview -> apply
confirmation tokens
additional policy checks

Preview -> Apply pattern

For dangerous operations:

plan_* returns a plan summary + plan_id
apply_* requires plan_id + user confirmation

This prevents “drive-by deletes” and supports audit.

Output handling: never execute raw model output

This is the most common real-world failure.

Rule: model output is data, not code

If the agent is generating:

kubernetes YAML
SQL statements
curl commands
Terraform changes …treat the output as untrusted data.

OWASP’s LLM02 guidance exists because people keep wiring LLM output directly into execution paths. [4]

Safer alternative: structured intent -> validated execution

Instead of:

LLM writes YAML -> apply

Do:

LLM proposes a structured change request (schema)
server validates:
allowlisted fields
bounded ranges
namespace/tenant scope
server executes with known-safe libraries

This is where “tool contracts” win.

Secrets: minimize, scope, rotate

Secrets are the other common failure path.

Minimum viable rules

Never put long-lived secrets in prompts.
Prefer short-lived tokens and scoped credentials.
Inject secrets server-side, not in the model context.

OWASP’s Secrets Management Cheat Sheet is a good baseline for central storage, rotation, auditing, and least privilege. [5]

Scope secrets to tenants and tools

Instead of “one OAuth token for everything,” mint:

per tenant
per tool category
short TTL

When something goes wrong, you want the blast radius small and revocation easy.

Network and egress controls

If your agent system can reach the open internet or internal networks, you need guardrails.

Egress allowlists

allowlist domains for integrations
block metadata IP ranges
re-validate after redirects

OWASP’s SSRF prevention guidance provides practical patterns for validation and blocking internal addresses. [3]

Separate network planes

Keep tool servers in a network segment that:

can reach only what they need
cannot reach internal admin endpoints
cannot reach secrets stores directly unless necessary

Logging and audit without data leaks

Logging is security. Logging is also a leak vector.

OWASP’s Logging Cheat Sheet calls out that logs may contain personal and sensitive information and must be protected from misuse. [6]

Practical logging rules

do not log raw prompts by default
do not log raw tool payloads by default
log structured summaries:
tool name
action class
resource IDs (safe identifiers)
status
latency
store audit events separately from debug logs

Audit events (always on)

Every write/danger tool should emit:

who / what / when / result
plan_id / idempotency_key
before/after resource identifiers (not content)

Audit is what makes “agents in production” defensible to security and compliance teams.

A production checklist

Identity and authorization

Strong auth for clients.
Least-privilege scopes per tool.
MCP HTTP authorization flow implemented where applicable. [2]

Tool contracts

Tools tiered: read/write/danger.
Preview -> apply for dangerous actions.
Schema validation + bounded arguments.

Output handling

No raw model output is executed without validation (OWASP LLM02). [4]

Secrets

Secrets never placed in prompts.
Short-lived, scoped tokens used.
Rotation/audit practices exist (OWASP Secrets Mgmt). [5]

Network

Egress allowlists exist.
SSRF protections implemented. [3]

Logging and audit

Logs are redacted and access-controlled.
Audit events exist for all side-effecting tools.
Log systems protected per OWASP guidance. [6]

References

[1] OWASP - Top 10 for Large Language Model Applications (v1.1): https://owasp.org/www-project-top-10-for-large-language-model-applications/ [2] Model Context Protocol (MCP) - Authorization (Protocol Revision 2025-11-25): https://modelcontextprotocol.io/specification/2025-11-25/basic/authorization [3] OWASP - SSRF Prevention Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/Server_Side_Request_Forgery_Prevention_Cheat_Sheet.html [4] OWASP GenAI Security Project - LLM02: Insecure Output Handling: https://genai.owasp.org/llmrisk2023-24/llm02-insecure-output-handling/ [5] OWASP - Secrets Management Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/Secrets_Management_Cheat_Sheet.html [6] OWASP - Logging Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/Logging_Cheat_Sheet.html