The Real Security Model for Agents

October 18, 2025 · 5 min read
blog

Why this matters

If you ship tool-using agents, you are shipping:

  • an execution engine
  • with access to external systems
  • controlled by untrusted inputs

That is the same security posture as any automation platform - except the “operator” is probabilistic.

OWASP’s Top 10 for LLM Applications makes it clear: prompt injection, insecure output handling, sensitive info disclosure, excessive agency… these are mainstream risks, not edge cases. [1] The good news: most mitigations are classic security engineering applied to a new execution model.

This article is a practical, production-first security model for agents and MCP tool ecosystems.


TL;DR

  • Don’t “secure the model.” Secure the system.
  • Treat all inputs as untrusted:
  • user text
  • tool outputs
  • retrieved documents
  • Design tools with least privilege:
  • separate read/write/danger tools
  • require preview -> apply for destructive actions
  • Centralize auth and policy:
  • MCP defines authorization for HTTP transports - use it. [2]
  • Control egress and prevent SSRF by default. [3]
  • Never let raw model output drive execution without validation (OWASP LLM02). [4]
  • Redact logs and manage secrets like an adult (OWASP cheat sheets). [5][6]

Contents


Threat model: what can go wrong

1) Prompt injection -> policy bypass attempt

A user or document says:

  • “Ignore previous instructions”
  • “Call this tool with these parameters”
  • “Reveal secrets” OWASP calls this out as a primary risk category. [1]

2) Insecure output handling -> downstream exploitation

If you pass model output into:

  • a shell
  • SQL
  • YAML manifests
  • HTTP requests …without validation, you’ve built an indirect code execution path.

OWASP’s LLM02 describes this precisely: insufficient validation and handling of LLM outputs before passing them downstream. [4]

3) Excessive agency -> unintended side effects

The agent is over-permissioned:

  • it can delete resources
  • send emails
  • modify production …and it will eventually do something you didn’t mean.

4) Data exfiltration via tools

Tool outputs are rich and often sensitive:

  • calendar events
  • emails
  • internal tickets
  • source code
  • cluster configs

Exfil happens through:

  • model responses
  • logs
  • “helpful” summaries
  • tool chaining

5) Network abuse / SSRF

Any “fetch URL” capability is an SSRF invitation unless you constrain egress. OWASP’s SSRF cheat sheet is still relevant. [3]


Security layers that actually work

Security in agent systems is defense-in-depth:

  1. Identity (who is calling?)
  2. Authorization (what can they do?)
  3. Contracts (what does a tool accept/return?)
  4. Validation (are inputs/outputs safe?)
  5. Egress control (where can the system talk to?)
  6. Audit (what happened?)
  7. Kill switches (how do you stop it fast?)

Tool design: read/write/danger tiers

Tiering is mandatory

Split tools by side effects:

  • Read tools: list/search/get
  • Write tools: create/update with bounded scope
  • Danger tools: deletes, bulk updates, privileged actions

Then enforce policy:

  • Read tools are widely available
  • Write tools require explicit scopes and tighter budgets
  • Danger tools require:
  • preview -> apply
  • confirmation tokens
  • additional policy checks

Preview -> Apply pattern

For dangerous operations:

  1. plan_* returns a plan summary + plan_id
  2. apply_* requires plan_id + user confirmation

This prevents “drive-by deletes” and supports audit.


Output handling: never execute raw model output

This is the most common real-world failure.

Rule: model output is data, not code

If the agent is generating:

  • kubernetes YAML
  • SQL statements
  • curl commands
  • Terraform changes …treat the output as untrusted data.

OWASP’s LLM02 guidance exists because people keep wiring LLM output directly into execution paths. [4]

Safer alternative: structured intent -> validated execution

Instead of:

  • LLM writes YAML -> apply

Do:

  • LLM proposes a structured change request (schema)
  • server validates:
  • allowlisted fields
  • bounded ranges
  • namespace/tenant scope
  • server executes with known-safe libraries

This is where “tool contracts” win.


Secrets: minimize, scope, rotate

Secrets are the other common failure path.

Minimum viable rules

  • Never put long-lived secrets in prompts.
  • Prefer short-lived tokens and scoped credentials.
  • Inject secrets server-side, not in the model context.

OWASP’s Secrets Management Cheat Sheet is a good baseline for central storage, rotation, auditing, and least privilege. [5]

Scope secrets to tenants and tools

Instead of “one OAuth token for everything,” mint:

  • per tenant
  • per tool category
  • short TTL

When something goes wrong, you want the blast radius small and revocation easy.


Network and egress controls

If your agent system can reach the open internet or internal networks, you need guardrails.

Egress allowlists

  • allowlist domains for integrations
  • block metadata IP ranges
  • re-validate after redirects

OWASP’s SSRF prevention guidance provides practical patterns for validation and blocking internal addresses. [3]

Separate network planes

Keep tool servers in a network segment that:

  • can reach only what they need
  • cannot reach internal admin endpoints
  • cannot reach secrets stores directly unless necessary

Logging and audit without data leaks

Logging is security. Logging is also a leak vector.

OWASP’s Logging Cheat Sheet calls out that logs may contain personal and sensitive information and must be protected from misuse. [6]

Practical logging rules

  • do not log raw prompts by default
  • do not log raw tool payloads by default
  • log structured summaries:
  • tool name
  • action class
  • resource IDs (safe identifiers)
  • status
  • latency
  • store audit events separately from debug logs

Audit events (always on)

Every write/danger tool should emit:

  • who / what / when / result
  • plan_id / idempotency_key
  • before/after resource identifiers (not content)

Audit is what makes “agents in production” defensible to security and compliance teams.


A production checklist

Identity and authorization

  • Strong auth for clients.
  • Least-privilege scopes per tool.
  • MCP HTTP authorization flow implemented where applicable. [2]

Tool contracts

  • Tools tiered: read/write/danger.
  • Preview -> apply for dangerous actions.
  • Schema validation + bounded arguments.

Output handling

  • No raw model output is executed without validation (OWASP LLM02). [4]

Secrets

  • Secrets never placed in prompts.
  • Short-lived, scoped tokens used.
  • Rotation/audit practices exist (OWASP Secrets Mgmt). [5]

Network

  • Egress allowlists exist.
  • SSRF protections implemented. [3]

Logging and audit

  • Logs are redacted and access-controlled.
  • Audit events exist for all side-effecting tools.
  • Log systems protected per OWASP guidance. [6]

References

[1] OWASP - Top 10 for Large Language Model Applications (v1.1): https://owasp.org/www-project-top-10-for-large-language-model-applications/ [2] Model Context Protocol (MCP) - Authorization (Protocol Revision 2025-11-25): https://modelcontextprotocol.io/specification/2025-11-25/basic/authorization [3] OWASP - SSRF Prevention Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/Server_Side_Request_Forgery_Prevention_Cheat_Sheet.html [4] OWASP GenAI Security Project - LLM02: Insecure Output Handling: https://genai.owasp.org/llmrisk2023-24/llm02-insecure-output-handling/ [5] OWASP - Secrets Management Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/Secrets_Management_Cheat_Sheet.html [6] OWASP - Logging Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/Logging_Cheat_Sheet.html

Authors
DevOps Architect · Applied AI Engineer
I’ve spent 20 years building systems across embedded systems, micro-controllers, PLCS, security platforms, fintech, SRE, and platform architecture. Today I focus on production AI systems in Go: multi-agent orchestration, MCP server ecosystems, and the DevOps platforms that keep them running. I care about systems that work under pressure: observable, recoverable, and built to last.