<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Observability | Roy Gabriel</title><link>https://roygabriel.dev/tags/observability/</link><description>Roy Gabriel: DevOps Architect &amp; Applied AI Engineer. Technical blog on Go, MCP servers, Kubernetes, and production AI systems.</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Fri, 27 Feb 2026 03:18:04 +0000</lastBuildDate><atom:link href="https://roygabriel.dev/tags/observability/index.xml" rel="self" type="application/rss+xml"/><item><title>MCP Servers in Production: Hardening, Backpressure, and Observability (Go)</title><link>https://roygabriel.dev/blog/mcp-servers-production-hardening-go/</link><pubDate>Sat, 31 Jan 2026 09:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/mcp-servers-production-hardening-go/</guid><description>&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;As-of note:&lt;/strong&gt; MCP is evolving. This article references the MCP specification versioned &lt;strong&gt;2025-11-25&lt;/strong&gt; and related docs; verify details against the current spec before shipping changes. [1][2][4]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;Most “agent demos” fail in production for boring reasons: missing timeouts, unbounded concurrency, ambiguous tool interfaces, and logging that accidentally turns into data exfiltration.&lt;/p&gt;
&lt;p&gt;An &lt;strong&gt;MCP server&lt;/strong&gt; isn’t “just an integration.” It’s a &lt;strong&gt;capability boundary&lt;/strong&gt; between an LLM host (IDE, desktop app, agent runner) and the real world: files, APIs, databases, tickets, home automation, and anything else you wire up. MCP uses JSON-RPC 2.0 messages over transports like stdio (local) and Streamable HTTP (remote). [1][2][5]&lt;/p&gt;</description><content:encoded>
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;As-of note:&lt;/strong&gt; MCP is evolving. This article references the MCP specification versioned &lt;strong&gt;2025-11-25&lt;/strong&gt; and related docs; verify details against the current spec before shipping changes. [1][2][4]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;Most “agent demos” fail in production for boring reasons: missing timeouts, unbounded concurrency, ambiguous tool interfaces, and logging that accidentally turns into data exfiltration.&lt;/p&gt;
&lt;p&gt;An &lt;strong&gt;MCP server&lt;/strong&gt; isn’t “just an integration.” It’s a &lt;strong&gt;capability boundary&lt;/strong&gt; between an LLM host (IDE, desktop app, agent runner) and the real world: files, APIs, databases, tickets, home automation, and anything else you wire up. MCP uses JSON-RPC 2.0 messages over transports like stdio (local) and Streamable HTTP (remote). [1][2][5]&lt;/p&gt;
&lt;p&gt;That means an MCP server is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;an API gateway for tools&lt;/li&gt;
&lt;li&gt;a policy enforcement point (whether you intended it or not)&lt;/li&gt;
&lt;li&gt;a reliability hotspot (tool calls are where latency and failure concentrate)&lt;/li&gt;
&lt;li&gt;a security hotspot (tools are where “read” becomes “exfil” and “write” becomes “impact”)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This post is a pragmatic checklist + a set of Go patterns to harden an MCP server so it keeps working when it’s under real load, and remains safe when the model gets “creative.”&lt;/p&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Treat tool inputs as &lt;strong&gt;untrusted&lt;/strong&gt;. Validate and constrain everything.&lt;/li&gt;
&lt;li&gt;Put &lt;strong&gt;budgets&lt;/strong&gt; everywhere: timeouts, concurrency limits, rate limits, and payload caps.&lt;/li&gt;
&lt;li&gt;Build for &lt;strong&gt;partial failure&lt;/strong&gt;: retries, idempotency keys, circuit breaking, fallbacks.&lt;/li&gt;
&lt;li&gt;Log like a security engineer: &lt;strong&gt;structured&lt;/strong&gt;, &lt;strong&gt;redacted&lt;/strong&gt;, &lt;strong&gt;auditable&lt;/strong&gt;, and &lt;strong&gt;useful&lt;/strong&gt;. [11]&lt;/li&gt;
&lt;li&gt;Instrument with traces/metrics early; “we’ll add telemetry later” is a trap. [13]&lt;/li&gt;
&lt;li&gt;Prefer Go for MCP servers because deployment and operational behavior are predictable: single binary, fast startup, structured concurrency via &lt;code&gt;context&lt;/code&gt;, and a strong standard library.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="contents"&gt;Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#a-production-mental-model-for-mcp-servers"&gt;A production mental model for MCP servers&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#threat-model-what-actually-goes-wrong"&gt;Threat model: what actually goes wrong&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#hardening-layer-1-identity-and-authorization"&gt;Hardening layer 1: identity and authorization&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#hardening-layer-2-tool-contracts-that-resist-ambiguity"&gt;Hardening layer 2: tool contracts that resist ambiguity&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#hardening-layer-3-budgets-and-backpressure"&gt;Hardening layer 3: budgets and backpressure&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#hardening-layer-4-safe-networking-and-ssrf-containment"&gt;Hardening layer 4: safe networking and SSRF containment&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#hardening-layer-5-observability-without-leaking-secrets"&gt;Hardening layer 5: observability without leaking secrets&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#hardening-layer-6-versioning-and-rollout-discipline"&gt;Hardening layer 6: versioning and rollout discipline&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-production-checklist"&gt;A production checklist&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#references"&gt;References&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="a-production-mental-model-for-mcp-servers"&gt;A production mental model for MCP servers&lt;/h2&gt;
&lt;p&gt;MCP’s docs describe a host (the AI application), a client (connector inside the host), and servers (capabilities/providers). Servers can be “local” (stdio) or “remote” (Streamable HTTP). [2][3]&lt;/p&gt;
&lt;p&gt;Here’s the production mental model that matters:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Your MCP server is a tool gateway.&lt;/strong&gt;&lt;br&gt;
Every tool is effectively an RPC method exposed to an agent. MCP uses JSON-RPC 2.0 semantics for requests/responses/notifications. [1][5]&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;LLM tool arguments are not trustworthy.&lt;/strong&gt;&lt;br&gt;
Even if the LLM is “helpful,” arguments can be malformed, overbroad, or dangerous, especially under prompt injection or user-provided hostile input.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The host UI is not a security boundary.&lt;/strong&gt;&lt;br&gt;
The spec emphasizes user consent and tool safety, but the protocol can’t enforce your policy for you. You still need server-side controls. [1]&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Transport changes your blast radius, not your responsibilities.&lt;/strong&gt;&lt;br&gt;
Stdio reduces network exposure, but doesn’t remove safety requirements. Streamable HTTP adds multi-client/multi-tenant concerns and requires real auth. [2][3]&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If you remember nothing else: treat the MCP server like a production API you’d be willing to put on call for.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="threat-model-what-actually-goes-wrong"&gt;Threat model: what actually goes wrong&lt;/h2&gt;
&lt;p&gt;When MCP servers cause incidents, it’s usually one of these:&lt;/p&gt;
&lt;h3 id="1-input-ambiguity--destructive-actions"&gt;1) Input ambiguity → destructive actions&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;A “delete” tool with optional filters&lt;/li&gt;
&lt;li&gt;A “run command” tool with free-form strings&lt;/li&gt;
&lt;li&gt;A “sync” tool that can touch thousands of objects&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Mitigation:&lt;/strong&gt; schema + semantic validation, safe defaults, two-phase commit patterns (preview then apply), and explicit “danger gates.”&lt;/p&gt;
&lt;h3 id="2-prompt-injection--tool-misuse"&gt;2) Prompt injection → tool misuse&lt;/h3&gt;
&lt;p&gt;The model can be tricked into calling tools with attacker-provided arguments. If your tool can read internal data or call internal APIs, you’ve created an exfil path.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Mitigation:&lt;/strong&gt; least privilege, allowlists, strong auth, egress controls, and redaction.&lt;/p&gt;
&lt;h3 id="3-ssrf--network-pivoting"&gt;3) SSRF / network pivoting&lt;/h3&gt;
&lt;p&gt;Any tool that fetches URLs, loads webhooks, or calls dynamic endpoints can be abused to hit internal networks or metadata endpoints. OWASP treats SSRF as a major category for a reason. [10]&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Mitigation:&lt;/strong&gt; deny-by-default networking (CIDR blocks, DNS/IP resolution checks, allowlisted destinations).&lt;/p&gt;
&lt;h3 id="4-unbounded-concurrency--resource-collapse"&gt;4) Unbounded concurrency → resource collapse&lt;/h3&gt;
&lt;p&gt;Agents can fire tools in parallel. Without limits you’ll blow up:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;API quotas&lt;/li&gt;
&lt;li&gt;DB connections&lt;/li&gt;
&lt;li&gt;CPU/memory&lt;/li&gt;
&lt;li&gt;downstream latency&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Mitigation:&lt;/strong&gt; per-tenant rate limiting, concurrency caps, queues, and backpressure.&lt;/p&gt;
&lt;h3 id="5-helpful-logs--data-leak"&gt;5) “Helpful logs” → data leak&lt;/h3&gt;
&lt;p&gt;Tool arguments and tool responses often contain secrets, tokens, or private data. If you log everything, you’ve built an involuntary data lake.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Mitigation:&lt;/strong&gt; structured + redacted logging, security logging guidelines, and minimal retention. [11][12]&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="hardening-layer-1-identity-and-authorization"&gt;Hardening layer 1: identity and authorization&lt;/h2&gt;
&lt;p&gt;If you run &lt;strong&gt;Streamable HTTP&lt;/strong&gt;, assume:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;multiple clients&lt;/li&gt;
&lt;li&gt;untrusted networks&lt;/li&gt;
&lt;li&gt;tokens will leak eventually&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;MCP’s architecture guidance recommends standard HTTP authentication methods and mentions OAuth as a recommended way to obtain tokens for remote servers. [2][3]&lt;/p&gt;
&lt;h3 id="practical-rules"&gt;Practical rules&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Authenticate every request.&lt;/strong&gt;&lt;br&gt;
Use bearer tokens or mTLS depending on environment.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Authorize per tool.&lt;/strong&gt;&lt;br&gt;
“Authenticated” ≠ “allowed to run &lt;code&gt;delete_everything&lt;/code&gt;”.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Prefer short-lived tokens&lt;/strong&gt; and rotate them. [12]&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Multi-tenant?&lt;/strong&gt; Put the tenant identity into:
&lt;ul&gt;
&lt;li&gt;auth token claims, or&lt;/li&gt;
&lt;li&gt;an explicit, validated tenant header (signed), then&lt;/li&gt;
&lt;li&gt;enforce it everywhere.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="go-pattern-a-minimal-auth-middleware-skeleton-http-transport"&gt;Go pattern: a minimal auth middleware skeleton (HTTP transport)&lt;/h3&gt;
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;This is &lt;em&gt;not&lt;/em&gt; a full MCP implementation, just the hardening pattern you’ll wrap around your MCP handler.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;// Pseudocode-ish middleware skeleton. Replace verifyToken with your auth logic.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;authMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Handler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Handler&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;HandlerFunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ResponseWriter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;strings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;TrimPrefix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Header&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#34;Authorization&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;Bearer &amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;&amp;#34;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;missing auth&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;StatusUnauthorized&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ident&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;verifyToken&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// includes tenant + scopes&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;invalid auth&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;StatusUnauthorized&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ctxKeyIdentity&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ident&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ServeHTTP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Key point:&lt;/strong&gt; authorization should happen &lt;em&gt;after&lt;/em&gt; you parse the requested tool name, but &lt;em&gt;before&lt;/em&gt; you execute anything.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="hardening-layer-2-tool-contracts-that-resist-ambiguity"&gt;Hardening layer 2: tool contracts that resist ambiguity&lt;/h2&gt;
&lt;p&gt;Most MCP tool failures are self-inflicted: tool interfaces are too vague.&lt;/p&gt;
&lt;h3 id="design-tools-like-production-apis"&gt;Design tools like production APIs&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Bad tool signature:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;run(command: string)&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Better:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;run_command(program: enum, args: string[], cwd: string, timeout_ms: int, dry_run: bool)&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Why it’s better:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;forces structure&lt;/li&gt;
&lt;li&gt;allows you to enforce allowlists&lt;/li&gt;
&lt;li&gt;gives you timeouts and safe defaults&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="add-a-preview--apply-flow-for-risky-tools"&gt;Add a “preview → apply” flow for risky tools&lt;/h3&gt;
&lt;p&gt;For any tool that writes data or triggers side effects, do a two-step approach:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;plan_*&lt;/code&gt; returns a machine-readable plan + a &lt;code&gt;plan_id&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;apply_*&lt;/code&gt; requires &lt;code&gt;plan_id&lt;/code&gt; and optional user confirmation token&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This mirrors how we run infra changes (plan/apply) and dramatically reduces accidental blast radius.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="hardening-layer-3-budgets-and-backpressure"&gt;Hardening layer 3: budgets and backpressure&lt;/h2&gt;
&lt;p&gt;Production systems are budget systems.&lt;/p&gt;
&lt;p&gt;If you don’t set explicit budgets, your MCP server will eventually allocate them for you via outages.&lt;/p&gt;
&lt;h3 id="budget-checklist"&gt;Budget checklist&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Server timeouts&lt;/strong&gt; (header read, request read, write, idle)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Request body caps&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Outbound timeouts&lt;/strong&gt; to dependencies&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Concurrency caps&lt;/strong&gt; per tool and per tenant&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rate limits&lt;/strong&gt; per tenant and per identity&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Queue limits&lt;/strong&gt; (bounded channels) to avoid memory blowups&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Circuit breaking&lt;/strong&gt; for flaky downstream dependencies&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="go-server-timeouts-are-not-optional"&gt;Go: server timeouts are not optional&lt;/h3&gt;
&lt;p&gt;Go’s &lt;code&gt;net/http&lt;/code&gt; provides explicit server timeouts; leaving them at zero is a common footgun. [6][7]&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nx"&gt;srv&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Addr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;:8080&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// your MCP handler + middleware&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ReadHeaderTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ReadTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;WriteTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;IdleTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;srv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ListenAndServe&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="go-propagate-cancellation-everywhere-with-context"&gt;Go: propagate cancellation everywhere with &lt;code&gt;context&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;context.Context&lt;/code&gt; is the backbone of “structured concurrency” in Go: deadlines and cancellation signals flow through your call stack. [8][9]&lt;/p&gt;
&lt;p&gt;Rule: &lt;strong&gt;every tool execution must accept a &lt;code&gt;context.Context&lt;/code&gt;&lt;/strong&gt;, and every outbound call must honor it.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;toolCall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ToolRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ToolResponse&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;cancel&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;defer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;cancel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// ... outbound calls use ctx&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;integration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Do&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="go-per-tenant-rate-limiting-with-xtimerate"&gt;Go: per-tenant rate limiting with &lt;code&gt;x/time/rate&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;golang.org/x/time/rate&lt;/code&gt; implements a token bucket limiter. [9]&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;limiters&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;mu&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;sync&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Mutex&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Limiter&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;l&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;limiters&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Limiter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;mu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;defer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;mu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Unlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Limiter&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;lim&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;lim&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// Example: 5 req/sec with bursts up to 10&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;lim&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;NewLimiter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;lim&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;lim&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;rateLimitMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lims&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;limiters&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Handler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Handler&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;HandlerFunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ResponseWriter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ident&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;mustIdentity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;lims&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ident&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;TenantID&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;Allow&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;rate limited&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;StatusTooManyRequests&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ServeHTTP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="backpressure-choose-a-policy"&gt;Backpressure: choose a policy&lt;/h3&gt;
&lt;p&gt;When you’re overloaded, you need a policy. Pick one explicitly:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Fail fast&lt;/strong&gt; with 429 / “busy” (simplest, safest)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Queue&lt;/strong&gt; with bounded depth (more complex; must cap memory)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Degrade&lt;/strong&gt; by disabling expensive tools first&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The “fail fast” approach is often correct for tool gateways.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="hardening-layer-4-safe-networking-and-ssrf-containment"&gt;Hardening layer 4: safe networking and SSRF containment&lt;/h2&gt;
&lt;p&gt;If any tool can fetch a user-provided URL or call a user-influenced endpoint, SSRF is on the table. [10]&lt;/p&gt;
&lt;h3 id="ssrf-containment-strategies-that-actually-work"&gt;SSRF containment strategies that actually work&lt;/h3&gt;
&lt;p&gt;OWASP’s SSRF guidance boils down to a few themes: don’t trust user-controlled URLs, use allowlists, and enforce network controls. [10]&lt;/p&gt;
&lt;p&gt;In practice, for MCP servers:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Prefer allowlists over blocklists.&lt;/strong&gt;&lt;br&gt;
“Only these domains” beats “block internal IPs.” Attackers are creative.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Resolve and validate IPs before dialing.&lt;/strong&gt;&lt;br&gt;
DNS can be weaponized. Validate the final destination IP (and re-validate on redirects).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Disable redirects or re-validate each hop.&lt;/strong&gt;&lt;br&gt;
Redirect chains are SSRF’s favorite tool.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Enforce egress policy at the network layer too.&lt;/strong&gt;&lt;br&gt;
Kubernetes NetworkPolicies / firewall rules are your last line of defense.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="go-pattern-an-outbound-http-client-with-strict-timeouts"&gt;Go pattern: an outbound HTTP client with strict timeouts&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// whole request budget&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Transport&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Transport&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Proxy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ProxyFromEnvironment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;DialContext&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;net&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Dialer&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;KeepAlive&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nx"&gt;DialContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;TLSHandshakeTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ResponseHeaderTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ExpectContinueTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;MaxIdleConns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;IdleConnTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then wrap URL validation around any request creation. Keep it boring and strict.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="hardening-layer-5-observability-without-leaking-secrets"&gt;Hardening layer 5: observability without leaking secrets&lt;/h2&gt;
&lt;p&gt;Telemetry is how you prove:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;you’re within budgets&lt;/li&gt;
&lt;li&gt;tools behave as expected&lt;/li&gt;
&lt;li&gt;failures are localized&lt;/li&gt;
&lt;li&gt;incidents can be diagnosed without “ssh and guess”&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But logging is also where teams accidentally leak sensitive data.&lt;/p&gt;
&lt;p&gt;OWASP’s logging guidance emphasizes logging that supports detection/response while avoiding sensitive data exposure. [11] Pair that with secrets management discipline. [12]&lt;/p&gt;
&lt;h3 id="what-to-measure-minimum-viable-mcp-telemetry"&gt;What to measure (minimum viable MCP telemetry)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Counters&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;tool_calls_total{tool, tenant, status}&lt;/li&gt;
&lt;li&gt;auth_failures_total{reason}&lt;/li&gt;
&lt;li&gt;rate_limited_total{tenant}&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Histograms&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;tool_latency_seconds{tool}&lt;/li&gt;
&lt;li&gt;outbound_latency_seconds{dependency}&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Gauges&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;in_flight_tool_calls{tool}&lt;/li&gt;
&lt;li&gt;queue_depth{tool}&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="trace-boundaries"&gt;Trace boundaries&lt;/h3&gt;
&lt;p&gt;Instrument:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;request → tool routing&lt;/li&gt;
&lt;li&gt;tool execution span&lt;/li&gt;
&lt;li&gt;downstream calls span&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;OpenTelemetry’s Go docs show how to add instrumentation and emit traces/metrics. [13]&lt;/p&gt;
&lt;h3 id="logging-rules-that-save-you-later"&gt;Logging rules that save you later&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Use structured logging (JSON).&lt;/li&gt;
&lt;li&gt;Add correlation IDs (trace IDs) to logs.&lt;/li&gt;
&lt;li&gt;Redact:
&lt;ul&gt;
&lt;li&gt;Authorization headers&lt;/li&gt;
&lt;li&gt;tokens&lt;/li&gt;
&lt;li&gt;cookies&lt;/li&gt;
&lt;li&gt;tool payload fields known to contain secrets&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Log &lt;em&gt;events&lt;/em&gt;, not raw payloads:
&lt;ul&gt;
&lt;li&gt;“tool X called”&lt;/li&gt;
&lt;li&gt;“resource Y read”&lt;/li&gt;
&lt;li&gt;“write operation requested (dry_run=true)”&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Audit logs&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For high-impact tools, write an append-only audit record:
&lt;ul&gt;
&lt;li&gt;who (identity)&lt;/li&gt;
&lt;li&gt;what (tool + parameters summary)&lt;/li&gt;
&lt;li&gt;when&lt;/li&gt;
&lt;li&gt;result (success/failure)&lt;/li&gt;
&lt;li&gt;plan_id / idempotency_key&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Audit logs should be treated as security data.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="hardening-layer-6-versioning-and-rollout-discipline"&gt;Hardening layer 6: versioning and rollout discipline&lt;/h2&gt;
&lt;p&gt;MCP uses string-based version identifiers like &lt;code&gt;YYYY-MM-DD&lt;/code&gt; to represent the last date of backwards-incompatible changes. [4]&lt;/p&gt;
&lt;p&gt;That’s helpful, but it doesn’t solve the operational problem:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;clients upgrade at different times&lt;/li&gt;
&lt;li&gt;schema changes drift&lt;/li&gt;
&lt;li&gt;hosts differ in which capabilities they support&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="practical-compatibility-rules"&gt;Practical compatibility rules&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Pin your server’s supported protocol version&lt;/strong&gt; and expose it in &lt;code&gt;health&lt;/code&gt; or diagnostics.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Add contract tests&lt;/strong&gt; that run against:
&lt;ul&gt;
&lt;li&gt;one “current” client&lt;/li&gt;
&lt;li&gt;one “previous” client version&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Support additive changes&lt;/strong&gt; first:
&lt;ul&gt;
&lt;li&gt;new tools&lt;/li&gt;
&lt;li&gt;new optional fields&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Use feature flags for risky tools.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="rollout-like-a-platform-team"&gt;Rollout like a platform team&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Canaries for remote servers&lt;/li&gt;
&lt;li&gt;“Shadow mode” for new tools (log what would happen)&lt;/li&gt;
&lt;li&gt;Slow ramp with budget monitoring&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="a-production-checklist"&gt;A production checklist&lt;/h2&gt;
&lt;p&gt;If you’re building (or inheriting) an MCP server, run this checklist:&lt;/p&gt;
&lt;h3 id="safety"&gt;Safety&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Tool contracts are structured (no free-form “do anything” strings).&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Every tool has a safe default (&lt;code&gt;dry_run=true&lt;/code&gt;, &lt;code&gt;limit&lt;/code&gt; required, etc.).&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Destructive tools require a plan/apply step (or explicit confirmation gates).&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Tool inputs are validated and bounded (length, ranges, enums).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="identity--access"&gt;Identity &amp;amp; access&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Remote transport requires authentication and per-tool authorization.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Tokens are short-lived and rotated; secrets are not in source control. [12]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Tenant identity is enforced at every access point (not “best effort”).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="budgets--resilience"&gt;Budgets &amp;amp; resilience&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; HTTP server timeouts are configured. [6][7]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Outbound clients have timeouts and connection limits.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Rate limiting exists per tenant/identity. [9]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Concurrency caps exist per tool; overload behavior is explicit (fail fast / queue).&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Retries are bounded and idempotent where side effects exist.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="networking"&gt;Networking&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; URL fetch tools have allowlists and SSRF protections. [10]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Redirect policies are explicit (disabled or re-validated).&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Egress is constrained at the network layer (not only in code).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="observability"&gt;Observability&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Metrics cover tool calls, latency, errors, and rate limiting.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Tracing exists across tool execution and downstream calls. [13]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Logs are structured, correlated, and redacted. [11]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Audit logging exists for high-impact tools.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="operations"&gt;Operations&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Health checks and readiness checks exist.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Configuration is explicit and validated on startup.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Versioning strategy is documented and tested. [4]&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;Model Context Protocol (MCP) Specification (version 2025-11-25): &lt;a href="https://modelcontextprotocol.io/specification/2025-11-25" target="_blank" rel="noopener noreferrer"&gt;https://modelcontextprotocol.io/specification/2025-11-25&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;MCP Architecture Overview (participants, transports, concepts): &lt;a href="https://modelcontextprotocol.io/docs/learn/architecture" target="_blank" rel="noopener noreferrer"&gt;https://modelcontextprotocol.io/docs/learn/architecture&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;MCP Transport details (Streamable HTTP transport overview): &lt;a href="https://modelcontextprotocol.io/specification/2025-03-26/basic/transports" target="_blank" rel="noopener noreferrer"&gt;https://modelcontextprotocol.io/specification/2025-03-26/basic/transports&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;MCP Versioning: &lt;a href="https://modelcontextprotocol.io/specification/versioning" target="_blank" rel="noopener noreferrer"&gt;https://modelcontextprotocol.io/specification/versioning&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;JSON-RPC 2.0 Specification: &lt;a href="https://www.jsonrpc.org/specification" target="_blank" rel="noopener noreferrer"&gt;https://www.jsonrpc.org/specification&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Go &lt;code&gt;net/http&lt;/code&gt; package documentation: &lt;a href="https://pkg.go.dev/net/http" target="_blank" rel="noopener noreferrer"&gt;https://pkg.go.dev/net/http&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Cloudflare: “The complete guide to Go net/http timeouts”: &lt;a href="https://blog.cloudflare.com/the-complete-guide-to-golang-net-http-timeouts/" target="_blank" rel="noopener noreferrer"&gt;https://blog.cloudflare.com/the-complete-guide-to-golang-net-http-timeouts/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Go &lt;code&gt;context&lt;/code&gt; package documentation: &lt;a href="https://pkg.go.dev/context" target="_blank" rel="noopener noreferrer"&gt;https://pkg.go.dev/context&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Go &lt;code&gt;x/time/rate&lt;/code&gt; documentation: &lt;a href="https://pkg.go.dev/golang.org/x/time/rate" target="_blank" rel="noopener noreferrer"&gt;https://pkg.go.dev/golang.org/x/time/rate&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;OWASP SSRF Prevention Cheat Sheet / SSRF category references:&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://cheatsheetseries.owasp.org/cheatsheets/Server_Side_Request_Forgery_Prevention_Cheat_Sheet.html" target="_blank" rel="noopener noreferrer"&gt;https://cheatsheetseries.owasp.org/cheatsheets/Server_Side_Request_Forgery_Prevention_Cheat_Sheet.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://owasp.org/Top10/2021/A10_2021-Server-Side_Request_Forgery_%28SSRF%29/" target="_blank" rel="noopener noreferrer"&gt;https://owasp.org/Top10/2021/A10_2021-Server-Side_Request_Forgery_%28SSRF%29/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="11"&gt;
&lt;li&gt;OWASP Logging Cheat Sheet (security-focused logging guidance): &lt;a href="https://cheatsheetseries.owasp.org/cheatsheets/Logging_Cheat_Sheet.html" target="_blank" rel="noopener noreferrer"&gt;https://cheatsheetseries.owasp.org/cheatsheets/Logging_Cheat_Sheet.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Secrets management guidance:&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;OWASP Secrets Management Cheat Sheet: &lt;a href="https://cheatsheetseries.owasp.org/cheatsheets/Secrets_Management_Cheat_Sheet.html" target="_blank" rel="noopener noreferrer"&gt;https://cheatsheetseries.owasp.org/cheatsheets/Secrets_Management_Cheat_Sheet.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Kubernetes “Good practices for Kubernetes Secrets”: &lt;a href="https://kubernetes.io/docs/concepts/security/secrets-good-practices/" target="_blank" rel="noopener noreferrer"&gt;https://kubernetes.io/docs/concepts/security/secrets-good-practices/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="13"&gt;
&lt;li&gt;OpenTelemetry Go instrumentation docs: &lt;a href="https://opentelemetry.io/docs/languages/go/instrumentation/" target="_blank" rel="noopener noreferrer"&gt;https://opentelemetry.io/docs/languages/go/instrumentation/&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;</content:encoded></item><item><title>Agent Observability That Doesn't Lie</title><link>https://roygabriel.dev/blog/agent-observability-that-doesnt-lie/</link><pubDate>Sat, 20 Dec 2025 12:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/agent-observability-that-doesnt-lie/</guid><description>&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;Most &amp;ldquo;agent observability&amp;rdquo; is either:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;too shallow&lt;/strong&gt; (a chat transcript and a couple logs), or&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;too noisy&lt;/strong&gt; (every token logged, every tool payload stored, no signal)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Neither works in production.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re serious about operating agents, you need observability that answers three questions quickly:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;What happened?&lt;/strong&gt; (forensics)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Why did it happen?&lt;/strong&gt; (debuggability)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How often does it happen?&lt;/strong&gt; (reliability)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;OpenTelemetry exists to standardize how you instrument, generate, and export telemetry across traces, metrics, and logs. [1] W3C Trace Context defines how trace context propagates across service boundaries. [2]&lt;/p&gt;</description><content:encoded>&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;Most &amp;ldquo;agent observability&amp;rdquo; is either:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;too shallow&lt;/strong&gt; (a chat transcript and a couple logs), or&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;too noisy&lt;/strong&gt; (every token logged, every tool payload stored, no signal)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Neither works in production.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re serious about operating agents, you need observability that answers three questions quickly:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;What happened?&lt;/strong&gt; (forensics)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Why did it happen?&lt;/strong&gt; (debuggability)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How often does it happen?&lt;/strong&gt; (reliability)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;OpenTelemetry exists to standardize how you instrument, generate, and export telemetry across traces, metrics, and logs. [1] W3C Trace Context defines how trace context propagates across service boundaries. [2]&lt;/p&gt;
&lt;p&gt;Agents add two new requirements:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;tool calls are part of your &amp;ldquo;distributed trace&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;decisioning&amp;rdquo; is a first-class component (not just business logic)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This article is a practical blueprint.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Instrument agents like distributed systems:&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;traces&lt;/strong&gt; for causality (what triggered what)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;metrics&lt;/strong&gt; for health (p95 latency, error rates)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;logs&lt;/strong&gt; for human context (but redacted)&lt;/li&gt;
&lt;li&gt;Propagate a single trace across:&lt;/li&gt;
&lt;li&gt;agent runtime -&amp;gt; MCP gateway -&amp;gt; MCP tool servers -&amp;gt; upstream APIs&lt;/li&gt;
&lt;li&gt;Capture &lt;strong&gt;decision summaries&lt;/strong&gt;, not chain-of-thought.&lt;/li&gt;
&lt;li&gt;Treat cost as a production signal: emit per-run and per-tool cost metrics.&lt;/li&gt;
&lt;li&gt;Use semantic conventions where possible to keep telemetry queryable. [3]&lt;/li&gt;
&lt;li&gt;Don&amp;rsquo;t turn observability into a data breach: OWASP highlights sensitive info disclosure and prompt injection as key risks. [7]&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="contents"&gt;Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#what-to-observe-in-an-agent-system"&gt;What to observe in an agent system&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-trace-model-for-agents"&gt;A trace model for agents&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#metrics-that-matter"&gt;Metrics that matter&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#logs-and-redaction"&gt;Logs and redaction&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#audit-events-vs-debug-logs"&gt;Audit events vs debug logs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#dashboards-and-alerts"&gt;Dashboards and alerts&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-production-checklist"&gt;A production checklist&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#references"&gt;References&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="what-to-observe-in-an-agent-system"&gt;What to observe in an agent system&lt;/h2&gt;
&lt;p&gt;Agents have four observable subsystems:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Planner/Reasoner&lt;/strong&gt; (creates the plan, chooses tools)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tool execution&lt;/strong&gt; (calls MCP tools and interprets results)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory/state&lt;/strong&gt; (what was stored or retrieved)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Policy/budget&lt;/strong&gt; (what was allowed or blocked)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If you only observe #2, you&amp;rsquo;ll miss why the agent chose the wrong tool.
If you only observe #1, you&amp;rsquo;ll miss production failures.&lt;/p&gt;
&lt;p&gt;You need the full chain.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="a-trace-model-for-agents"&gt;A trace model for agents&lt;/h2&gt;
&lt;h3 id="the-core-idea"&gt;The core idea&lt;/h3&gt;
&lt;p&gt;A single &amp;ldquo;agent run&amp;rdquo; is a distributed trace:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;it spans model calls&lt;/li&gt;
&lt;li&gt;tool calls&lt;/li&gt;
&lt;li&gt;downstream system calls&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Use W3C Trace Context (&lt;code&gt;traceparent&lt;/code&gt;, &lt;code&gt;tracestate&lt;/code&gt;) to propagate the trace across boundaries. [2]&lt;/p&gt;
&lt;h3 id="suggested-spans-minimum-viable"&gt;Suggested spans (minimum viable)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Root span&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;agent.run&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;attributes: &lt;code&gt;agent.name&lt;/code&gt;, &lt;code&gt;tenant&lt;/code&gt;, &lt;code&gt;user&lt;/code&gt;, &lt;code&gt;session&lt;/code&gt;, &lt;code&gt;goal_hash&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Planner&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;agent.plan&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;attributes: &lt;code&gt;planner.model&lt;/code&gt;, &lt;code&gt;plan.step_count&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Model calls&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;llm.call&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;attributes: &lt;code&gt;model&lt;/code&gt;, &lt;code&gt;prompt_tokens&lt;/code&gt;, &lt;code&gt;completion_tokens&lt;/code&gt;, &lt;code&gt;latency_ms&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Tool selection&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;agent.tool_select&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;attributes: &lt;code&gt;selector.version&lt;/code&gt;, &lt;code&gt;candidate_count&lt;/code&gt;, &lt;code&gt;selected_count&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Tool call&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;tool.call&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;attributes: &lt;code&gt;tool.name&lt;/code&gt;, &lt;code&gt;tool.class&lt;/code&gt; (read/write/danger), &lt;code&gt;tool.server&lt;/code&gt;, &lt;code&gt;status&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Policy&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;policy.check&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;attributes: &lt;code&gt;policy.rule_id&lt;/code&gt;, &lt;code&gt;decision&lt;/code&gt; (allow/deny), &lt;code&gt;reason_code&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Memory&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;memory.read&lt;/code&gt; / &lt;code&gt;memory.write&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;attributes: &lt;code&gt;store&lt;/code&gt;, &lt;code&gt;keys&lt;/code&gt;, &lt;code&gt;bytes&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="why-spans--logs"&gt;Why spans &amp;gt; logs&lt;/h3&gt;
&lt;p&gt;Spans give you causality:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;which tool call caused a failure&lt;/li&gt;
&lt;li&gt;which step blew the budget&lt;/li&gt;
&lt;li&gt;which upstream dependency was slow&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With OpenTelemetry, you can emit traces and metrics using the same SDK approach. [1][4]&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="metrics-that-matter"&gt;Metrics that matter&lt;/h2&gt;
&lt;h3 id="tool-health-metrics"&gt;Tool health metrics&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;tool_calls_total{tool,status}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tool_latency_ms_bucket{tool}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tool_timeouts_total{tool}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tool_retries_total{tool}&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="agent-run-health-metrics"&gt;Agent run health metrics&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;agent_runs_total{status}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;agent_run_latency_ms_bucket{agent}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;agent_steps_total_bucket{agent}&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="cost-metrics-treat-cost-like-reliability"&gt;Cost metrics (treat cost like reliability)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;llm_tokens_total{model,type=prompt|completion}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;llm_cost_usd_total{model}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;run_cost_usd_bucket{agent}&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="policy-metrics"&gt;Policy metrics&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;policy_denied_total{rule_id}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;danger_tool_attempt_total{tool}&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Semantic conventions help your metrics stay queryable and consistent across systems. OpenTelemetry documents semantic conventions for HTTP spans/metrics, for example. [3][5]&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="logs-and-redaction"&gt;Logs and redaction&lt;/h2&gt;
&lt;p&gt;Logs should add human context, not become a data lake of secrets.&lt;/p&gt;
&lt;p&gt;Rules I like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Do not log prompts by default.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Do not log tool payloads by default.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Log summaries and hashes:&lt;/li&gt;
&lt;li&gt;&lt;code&gt;goal_hash&lt;/code&gt;, &lt;code&gt;plan_hash&lt;/code&gt;, &lt;code&gt;tool_args_hash&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Log &lt;strong&gt;structured error reasons&lt;/strong&gt;:&lt;/li&gt;
&lt;li&gt;&lt;code&gt;validation_error&lt;/code&gt;, &lt;code&gt;upstream_rate_limited&lt;/code&gt;, &lt;code&gt;auth_failed&lt;/code&gt;, &lt;code&gt;policy_denied&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For agent systems, OWASP highlights sensitive information disclosure and insecure output handling. Logging is one of the easiest ways to accidentally create both. [7]&lt;/p&gt;
&lt;h3 id="debug-mode-that-isnt-dangerous"&gt;&amp;ldquo;Debug mode&amp;rdquo; that isn&amp;rsquo;t dangerous&lt;/h3&gt;
&lt;p&gt;If you must support deeper logs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;only enable per tenant/user for a limited window&lt;/li&gt;
&lt;li&gt;auto-expire&lt;/li&gt;
&lt;li&gt;redact aggressively&lt;/li&gt;
&lt;li&gt;never store raw secrets&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="audit-events-vs-debug-logs"&gt;Audit events vs debug logs&lt;/h2&gt;
&lt;p&gt;Treat them as different products:&lt;/p&gt;
&lt;h3 id="audit-events-for-governance"&gt;Audit events (for governance)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;immutable-ish records of side effects&lt;/li&gt;
&lt;li&gt;minimal sensitive data&lt;/li&gt;
&lt;li&gt;always on&lt;/li&gt;
&lt;li&gt;long retention&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example audit fields:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;who: tenant/user/client&lt;/li&gt;
&lt;li&gt;what: tool + action class (create/update/delete)&lt;/li&gt;
&lt;li&gt;when: timestamp&lt;/li&gt;
&lt;li&gt;where: environment&lt;/li&gt;
&lt;li&gt;result: success/failure&lt;/li&gt;
&lt;li&gt;resource IDs (safe identifiers)&lt;/li&gt;
&lt;li&gt;idempotency keys / plan IDs&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="debug-logs-for-engineers"&gt;Debug logs (for engineers)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;short retention&lt;/li&gt;
&lt;li&gt;more context&lt;/li&gt;
&lt;li&gt;highly controlled access&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Mixing these two is how you end up with &amp;ldquo;SharePoint logs full of PII&amp;rdquo; and no one wants to touch them.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="dashboards-and-alerts"&gt;Dashboards and alerts&lt;/h2&gt;
&lt;h3 id="dashboards-start-simple"&gt;Dashboards (start simple)&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Tool reliability&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;top tools by error rate&lt;/li&gt;
&lt;li&gt;top tools by p95 latency&lt;/li&gt;
&lt;li&gt;timeouts per tool&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="2"&gt;
&lt;li&gt;&lt;strong&gt;Agent success&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;success rate by agent type&lt;/li&gt;
&lt;li&gt;&amp;ldquo;stuck runs&amp;rdquo; (runs exceeding max duration)&lt;/li&gt;
&lt;li&gt;average steps per run&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="3"&gt;
&lt;li&gt;&lt;strong&gt;Cost&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;cost per run&lt;/li&gt;
&lt;li&gt;cost per tenant&lt;/li&gt;
&lt;li&gt;top drivers (which tools/model calls)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="alerts-avoid-noise"&gt;Alerts (avoid noise)&lt;/h3&gt;
&lt;p&gt;Alert on what is actionable:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;tool error rate spikes for critical tools&lt;/li&gt;
&lt;li&gt;tool latency p95 spikes beyond SLO&lt;/li&gt;
&lt;li&gt;budget exceeded spike (runaway behavior)&lt;/li&gt;
&lt;li&gt;policy denied spike (possible prompt injection attempt)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you use SLOs and error budgets, Google&amp;rsquo;s SRE material is a practical reference for turning SLOs into alerting strategies. [6]&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="a-production-checklist"&gt;A production checklist&lt;/h2&gt;
&lt;h3 id="tracing"&gt;Tracing&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Every agent run has a trace ID.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Trace context propagates across MCP boundaries (W3C Trace Context). [2]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Tool calls are spans with stable tool identifiers.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="metrics"&gt;Metrics&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Tool success/error/latency metrics exist.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Agent run success/latency/steps metrics exist.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Cost metrics exist and are monitored.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="logging"&gt;Logging&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Default logs are redacted summaries, not raw payloads.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Debug logging is time-bounded and access-controlled.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="audit"&gt;Audit&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Audit events exist for all side-effecting tools.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Audit records include &amp;ldquo;who/what/when/result&amp;rdquo; without leaking secrets.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="security"&gt;Security&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Observability does not become a secret exfil path (OWASP risks considered). [7]&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;p&gt;[1] OpenTelemetry - Documentation (overview): &lt;a href="https://opentelemetry.io/docs/" target="_blank" rel="noopener noreferrer"&gt;https://opentelemetry.io/docs/&lt;/a&gt;
[2] W3C - Trace Context: &lt;a href="https://www.w3.org/TR/trace-context/" target="_blank" rel="noopener noreferrer"&gt;https://www.w3.org/TR/trace-context/&lt;/a&gt;
[3] OpenTelemetry - Semantic conventions for HTTP (spans/metrics/logs): &lt;a href="https://opentelemetry.io/docs/specs/semconv/http/" target="_blank" rel="noopener noreferrer"&gt;https://opentelemetry.io/docs/specs/semconv/http/&lt;/a&gt;
[4] OpenTelemetry Go - Instrumentation docs: &lt;a href="https://opentelemetry.io/docs/languages/go/instrumentation/" target="_blank" rel="noopener noreferrer"&gt;https://opentelemetry.io/docs/languages/go/instrumentation/&lt;/a&gt;
[5] OpenTelemetry - Semantic conventions for HTTP metrics: &lt;a href="https://opentelemetry.io/docs/specs/semconv/http/http-metrics/" target="_blank" rel="noopener noreferrer"&gt;https://opentelemetry.io/docs/specs/semconv/http/http-metrics/&lt;/a&gt;
[6] Google SRE Workbook - Alerting on SLOs: &lt;a href="https://sre.google/workbook/alerting-on-slos/" target="_blank" rel="noopener noreferrer"&gt;https://sre.google/workbook/alerting-on-slos/&lt;/a&gt;
[7] OWASP - Top 10 for Large Language Model Applications: &lt;a href="https://owasp.org/www-project-top-10-for-large-language-model-applications/" target="_blank" rel="noopener noreferrer"&gt;https://owasp.org/www-project-top-10-for-large-language-model-applications/&lt;/a&gt;
&lt;/p&gt;</content:encoded></item></channel></rss>