<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Go | Roy Gabriel</title><link>https://roygabriel.dev/tags/go/</link><description>Roy Gabriel: DevOps Architect &amp; Applied AI Engineer. Technical blog on Go, MCP servers, Kubernetes, and production AI systems.</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Fri, 27 Feb 2026 03:18:04 +0000</lastBuildDate><atom:link href="https://roygabriel.dev/tags/go/index.xml" rel="self" type="application/rss+xml"/><item><title>Cruvero - AI Agent Ecosystem Platform</title><link>https://roygabriel.dev/projects/cruvero/</link><pubDate>Thu, 12 Feb 2026 19:25:00 -0500</pubDate><guid>https://roygabriel.dev/projects/cruvero/</guid><description>A production-grade, Temporal-native AI agent orchestration platform. 90,000+ lines of Go powering durable multi-agent workflows, neuro-inspired intelligence, enterprise governance, and a full React operational UI.</description><content:encoded>&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Cruvero is a production-grade AI agent orchestration platform I designed and built from the ground up in Go. It treats durability, observability, and operational control as infrastructure guarantees, not library afterthoughts.&lt;/p&gt;
&lt;p&gt;Where frameworks like LangGraph bolt checkpointing onto a graph abstraction, Cruvero inverts the model: Temporal&amp;rsquo;s battle-tested workflow engine &lt;em&gt;is&lt;/em&gt; the foundation, and the agent abstraction compiles down to it. The result is a platform where retry logic, failure recovery, human-in-the-loop approval, and multi-agent coordination aren&amp;rsquo;t library features; they&amp;rsquo;re infrastructure guarantees backed by the same technology that runs Uber&amp;rsquo;s and Stripe&amp;rsquo;s most critical workflows.&lt;/p&gt;
&lt;p&gt;The system currently spans 90,000+ lines of Go and TypeScript, with a comprehensive React UI, Kubernetes deployment via Helm and ArgoCD, and an enterprise MCP gateway architecture designed to support 1,000+ concurrent agents across 150+ integrations.&lt;/p&gt;
&lt;h2 id="the-problem"&gt;The Problem&lt;/h2&gt;
&lt;p&gt;Every major agent framework optimizes for the same thing: time-to-demo. Spin up a LangGraph chain, wire a few tools, get a result in 30 seconds. Impressive on a slide. Catastrophic in production.&lt;/p&gt;
&lt;p&gt;The failure modes are predictable. An agent workflow running for 40 minutes crashes mid-execution; state is gone. A tool call to an external API times out; the entire run fails with no recovery. A billing-sensitive agent hallucinates a $50,000 API call; no cost guardrails existed to stop it. An agent enters a reasoning loop, calling the same tool 15 times with near-identical arguments; nothing detects the degeneration.&lt;/p&gt;
&lt;p&gt;These aren&amp;rsquo;t edge cases. They&amp;rsquo;re the baseline reality of running AI agents at enterprise scale. Cruvero was built to make them structurally impossible.&lt;/p&gt;
&lt;h2 id="architecture"&gt;Architecture&lt;/h2&gt;
&lt;p&gt;Cruvero&amp;rsquo;s architecture is layered around a single principle: every agent action is a Temporal activity, and every workflow survives infrastructure failure by default.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Core Runtime:&lt;/strong&gt; The agent loop follows a deterministic &lt;code&gt;decide → act → observe → repeat&lt;/code&gt; state machine. Each cycle produces an immutable &lt;code&gt;DecisionRecord&lt;/code&gt; with content-addressed hashes of the prompt, state, tool schemas, and model config. This gives you complete forensic capability: for any decision an agent made, you can see the exact inputs, replay the decision with a different model, or run counterfactual analysis (&amp;ldquo;what if it had chosen differently at step 4?&amp;rdquo;).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Durable Execution:&lt;/strong&gt; Temporal manages all workflow state. Agent runs survive process crashes, worker restarts, and infrastructure failures transparently. Long-running workflows (minutes to hours) use continue-as-new with automatic state compaction. There is zero data loss on agent failure, guaranteed by Temporal&amp;rsquo;s event sourcing, not by application-level retry logic.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Multi-Agent Coordination:&lt;/strong&gt; A first-class supervisor pattern supports seven coordination strategies: delegate, broadcast, debate, pipeline, map-reduce, voting, and saga with compensation. Agents communicate through signals, shared blackboard state, and pub/sub events. A supervisor can launch child agents, aggregate their results, and handle partial failures; all as durable Temporal workflows with full replay capability.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Graph DSL &amp;amp; Workflow Engine:&lt;/strong&gt; A custom graph DSL compiles structured execution plans (steps, conditional routes, parallel branches, join semantics, subgraphs) into Temporal workflows. Join modes include all, any, N-of-M, and voting. The visual workflow builder (React Flow) provides bidirectional serialization between the visual canvas and the underlying graph definition.&lt;/p&gt;
&lt;h2 id="neuro-inspired-intelligence"&gt;Neuro-Inspired Intelligence&lt;/h2&gt;
&lt;p&gt;This is the feature set that no other agent framework implements. Drawing from neuroscience and cognitive architecture research, this layer introduces eight subsystems that fundamentally change how agents reason, learn, and self-correct.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Metacognitive Monitoring:&lt;/strong&gt; Modeled on prefrontal cortex performance monitoring. The system tracks tool call hashes, observation hashes, progress deltas, confidence entropy, and goal-drift scores (via embedding cosine similarity against the original prompt). When it detects degradation, such as repetition loops, stalled progress, drifting goals, or collapsing confidence, it triggers graduated backpressure: forced reflection, model escalation (swap to a more capable model mid-run), context reset, mandatory strategy pivots, or human escalation. No more agents spinning their wheels for 200 steps.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Attention-Weighted Context Windows:&lt;/strong&gt; Inspired by hippocampal memory replay. Instead of dumping context linearly into the prompt, a multi-factor salience scorer (relevance, recency, confidence, usage frequency) re-ranks all memory before assembly. A dynamic token budget allocator shifts allocation by task phase. Planning phases boost semantic/procedural memory, execution phases boost tool schemas, and review phases boost episodic memory. An interference detector flags contradictory facts explicitly in the prompt rather than letting the LLM silently pick one.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Temporal Reasoning:&lt;/strong&gt; Deadline-aware execution with soft and hard deadlines, graduated pressure levels (relaxed through critical), automatic model switching under time pressure, and structured time context injection into every prompt.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Agent Immune System:&lt;/strong&gt; Anomaly signature tracking with automatic tool quarantine. When a tool&amp;rsquo;s behavior degrades or produces anomalous outputs, the immune system hashes the failure pattern, tracks hit counts, and quarantines the tool after a configurable threshold. A vaccination CLI injects procedural memory to teach agents how to work around quarantined capabilities.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Compositional Tool Synthesis:&lt;/strong&gt; Meta-tools that chain multiple tool calls into atomic pipelines with pre/postcondition contracts, typed argument mapping, and enforcement of non-retryable errors on contract violations.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Federated Trust &amp;amp; Delegation:&lt;/strong&gt; Trust scoring for multi-agent delegation. Agents build trust through successful task completion; supervisors automatically select agents based on capability manifests and accumulated trust scores. Delegation chains provide full accountability tracking for post-mortem analysis.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Execution Provenance Graph:&lt;/strong&gt; A tamper-evident DAG tracking every action, decision, and data dependency in an agent run. Supports ancestor/descendant queries, subgraph extraction, and run diffing to compare two executions and identify the exact point of divergence.&lt;/p&gt;
&lt;h2 id="enterprise-governance"&gt;Enterprise Governance&lt;/h2&gt;
&lt;p&gt;Cruvero&amp;rsquo;s enterprise hardening philosophy is &amp;ldquo;tenant isolation is a property of the architecture, not a feature.&amp;rdquo; Every boundary is enforced at the infrastructure layer.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Multi-Tenancy &amp;amp; Namespace Isolation:&lt;/strong&gt; Temporal namespaces, Postgres row-level security, and network policies enforce tenant boundaries. Per-tenant model selection, tool access control, and resource quotas are infrastructure-level guarantees that cannot be bypassed by application code.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Rate Limiting, Quotas &amp;amp; Cost Guardrails:&lt;/strong&gt; Per-decision cost tracking (estimated and actual) with configurable policies: max cost per run, max cost per step, prefer-cheaper-model flags. Budget enforcement halts runs before they exceed limits. A model catalog with pricing metadata enables real-time cost optimization across providers.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Audit Logging &amp;amp; Compliance:&lt;/strong&gt; Every tool call, LLM invocation, and state mutation is authenticated, authorized, and recorded in a tamper-evident audit trail. SOC 2-ready export formats. PII detection across five enforcement boundaries (audit, output, tool I/O, memory, events) with 12 PII types, unified secret detection, Shannon entropy analysis, HMAC-based stable tokenization, and a risk scoring engine.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Security Hardening:&lt;/strong&gt; OWASP Top 10 mitigations, RBAC with four role levels (Viewer, Editor, Admin, Super Admin), OIDC authentication, CSRF protection, input sanitization, and CSP headers.&lt;/p&gt;
&lt;h2 id="tool-ecosystem--mcp-integration"&gt;Tool Ecosystem &amp;amp; MCP Integration&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Semantic Tool Discovery:&lt;/strong&gt; A three-stage pipeline (keyword search → embedding similarity → quality-weighted reranking) selects tools dynamically rather than dumping all tool schemas into every prompt. Tool quality tracking quarantines degraded tools automatically.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;MCP Protocol:&lt;/strong&gt; 150+ Model Context Protocol integrations (Notion, GitHub, AWS, Azure, O365, ServiceNow, Slack, and more) with standardized tool interfaces. The current architecture uses stdio subprocesses; the enterprise target architecture introduces a gateway-mediated Streamable HTTP model with per-integration scaling, Dragonfly response caching, circuit breakers, Vault-backed credential isolation, and KEDA autoscaling, designed for 1,000+ concurrent agents.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Event-Driven Architecture:&lt;/strong&gt; NATS provides async event fan-out alongside Temporal&amp;rsquo;s durable execution. MCP server lifecycle management, embedding pipeline intake, audit/telemetry buffering, and external consumer subscriptions (Teams/Telegram bots, dashboards, webhook relays) all flow through NATS, without ever entering the workflow deterministic path.&lt;/p&gt;
&lt;h2 id="observability--operations"&gt;Observability &amp;amp; Operations&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Distributed Tracing:&lt;/strong&gt; OpenTelemetry spans per decision cycle, tool call, memory operation, and MCP invocation. Full correlation IDs from workflow entry through every activity.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Structured Logging:&lt;/strong&gt; Zap-based structured logging with per-tenant, per-run, and per-step context propagation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Production API:&lt;/strong&gt; RESTful API with automatic OpenAPI 3.1 documentation, SSE streaming for live run updates, and comprehensive endpoints for run management, approval workflows, replay, tracing, cost queries, and tool management.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;React Operational UI:&lt;/strong&gt; A full-featured React 18 / TypeScript interface replacing the original htmx console. Surfaces every runtime capability: run management with live SSE streaming, approval queues, replay console with counterfactual analysis, causal trace explorer, tool registry browser, memory explorer with salience scores, cost dashboards (ECharts), supervisor multi-agent visualization, visual workflow builder (React Flow), live workflow inspection, speculative execution, and differential model testing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Kubernetes Deployment:&lt;/strong&gt; Helm chart with environment-aware value overlays, ArgoCD ApplicationSet for GitOps promotion (dev/staging/prod), ServiceMonitor templates, and ingress configuration.&lt;/p&gt;
&lt;h2 id="key-decisions"&gt;Key Decisions&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Go over Python:&lt;/strong&gt; Single-binary deploys, predictable latency, deterministic resource usage, and a strong concurrency model for managing hundreds of concurrent agent sessions. No GIL, no dependency hell, no runtime surprises.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Temporal over custom durability:&lt;/strong&gt; Rather than implementing checkpointing, retry logic, and state recovery as library features, Cruvero delegates all of it to Temporal&amp;rsquo;s battle-tested workflow engine. This is the same infrastructure that runs mission-critical systems at companies processing millions of transactions per day.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Neuroscience-grounded intelligence:&lt;/strong&gt; The cognitive architecture isn&amp;rsquo;t marketing. Each subsystem maps to a specific neuroscience principle (prefrontal monitoring, hippocampal salience, temporal reasoning, immune response). The result is agents that self-correct, learn from failures, and degrade gracefully, capabilities no other framework offers.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Context management as a competitive advantage:&lt;/strong&gt; Most frameworks dump everything into the context window and pray. Cruvero&amp;rsquo;s context pipeline includes phase-aware budget allocation, five-component salience scoring, semantic tool search, interference detection, observation masking, and proactive compression triggers. The competitive analysis shows clear advantages over LangChain/LangGraph across every dimension.&lt;/p&gt;
&lt;h2 id="outcome"&gt;Outcome&lt;/h2&gt;
&lt;p&gt;Cruvero runs production agent workloads with infrastructure-grade reliability guarantees. The platform handles long-running workflows (minutes to hours), survives arbitrary infrastructure failures without data loss, enforces per-tenant cost and security policies, and provides complete observability from workflow entry through every LLM decision and tool call.&lt;/p&gt;
&lt;p&gt;The codebase represents 90,000+ lines of production code, 80%+ test coverage, comprehensive documentation published via Hugo, and a development methodology designed for systematic LLM-assisted engineering at scale.&lt;/p&gt;
&lt;h2 id="stack"&gt;Stack&lt;/h2&gt;
&lt;p&gt;Go · Temporal · PostgreSQL · NATS · React 18 · TypeScript · Vite · React Flow · ECharts · Tailwind CSS · Kubernetes · Helm · ArgoCD · Qdrant · Dragonfly · Ollama · OpenTelemetry · Zap · Keycloak · Docker&lt;/p&gt;</content:encoded></item><item><title>Go vs Spring Boot for Enterprise APIs: Cost, Performance, and Cloud-Native Ops</title><link>https://roygabriel.dev/blog/go-vs-springboot-enterprise-apis/</link><pubDate>Sun, 01 Feb 2026 10:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/go-vs-springboot-enterprise-apis/</guid><description>&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;As-of note:&lt;/strong&gt; This is a production engineering perspective, not a benchmark scoreboard. If you care about cost or p99 latency, measure &lt;em&gt;your&lt;/em&gt; service with &lt;em&gt;your&lt;/em&gt; dependencies and &lt;em&gt;your&lt;/em&gt; deployment constraints.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="why-this-comparison-keeps-showing-up"&gt;Why this comparison keeps showing up&lt;/h2&gt;
&lt;p&gt;If you build enterprise APIs long enough, you’ll see the same pattern:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The “language choice” isn’t what breaks production.&lt;/li&gt;
&lt;li&gt;The &lt;em&gt;runtime envelope&lt;/em&gt; and &lt;em&gt;operational model&lt;/em&gt; usually are.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When teams compare &lt;strong&gt;Go&lt;/strong&gt; and &lt;strong&gt;Java Spring Boot&lt;/strong&gt;, they’re often asking a more specific question:&lt;/p&gt;</description><content:encoded>
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;As-of note:&lt;/strong&gt; This is a production engineering perspective, not a benchmark scoreboard. If you care about cost or p99 latency, measure &lt;em&gt;your&lt;/em&gt; service with &lt;em&gt;your&lt;/em&gt; dependencies and &lt;em&gt;your&lt;/em&gt; deployment constraints.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="why-this-comparison-keeps-showing-up"&gt;Why this comparison keeps showing up&lt;/h2&gt;
&lt;p&gt;If you build enterprise APIs long enough, you’ll see the same pattern:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The “language choice” isn’t what breaks production.&lt;/li&gt;
&lt;li&gt;The &lt;em&gt;runtime envelope&lt;/em&gt; and &lt;em&gt;operational model&lt;/em&gt; usually are.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When teams compare &lt;strong&gt;Go&lt;/strong&gt; and &lt;strong&gt;Java Spring Boot&lt;/strong&gt;, they’re often asking a more specific question:&lt;/p&gt;
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;“What will it cost to run this API at scale, and how predictable is it under real production conditions?”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Spring Boot’s value proposition is speed-to-service: stand-alone, production-grade Spring applications you can “just run,” with strong ecosystem defaults and integration breadth. [1]&lt;/p&gt;
&lt;p&gt;Go’s value proposition is operational simplicity: compile to an executable, ship a small container, run with fewer moving pieces, and keep latency and resource usage easier to reason about. &lt;code&gt;go build&lt;/code&gt; compiles packages into an executable. [5]&lt;/p&gt;
&lt;p&gt;This article is about the production-relevant tradeoffs: &lt;strong&gt;cost/resource usage&lt;/strong&gt;, &lt;strong&gt;performance under load&lt;/strong&gt;, &lt;strong&gt;cloud-native deployability&lt;/strong&gt;, and the “you will be on call for this” realities.&lt;/p&gt;
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;On code quality:&lt;/strong&gt; This isn’t “Go good / Java bad.” It’s an observation about failure modes: framework-heavy stacks can hide complexity until it shows up in startup time, memory, and surprises under load. Go’s bias toward explicitness often makes problems easier to see and cheaper to operate, even before the codebase is perfect.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;If your org is already Spring-heavy, Spring Boot can be the fastest path to a robust API, especially when you need Spring’s ecosystem (security, data, integrations). [1]&lt;/li&gt;
&lt;li&gt;If you run many small services, care about density, or need fast scale-to-zero/scale-from-zero behavior, Go often has an operational edge due to simpler packaging and typically lower baseline resource footprint.&lt;/li&gt;
&lt;li&gt;Kubernetes costs are strongly influenced by &lt;strong&gt;requests/limits&lt;/strong&gt; and scheduling density, so &lt;em&gt;baseline memory&lt;/em&gt; is often a bigger lever than micro-optimizing CPU. [7][8]&lt;/li&gt;
&lt;li&gt;Both ecosystems support hardened container builds (including &lt;strong&gt;distroless&lt;/strong&gt;) to reduce attack surface. [9][10]&lt;/li&gt;
&lt;li&gt;Observability is excellent in both; Java has very mature &lt;strong&gt;zero-code&lt;/strong&gt; instrumentation via the OpenTelemetry Java agent. [13][14] Go has strong SDK support and growing options for auto-instrumentation. [11]&lt;/li&gt;
&lt;li&gt;“Best” depends on your constraints. The best move is to benchmark your service envelope and compare &lt;strong&gt;p95/p99 latency&lt;/strong&gt;, &lt;strong&gt;RSS&lt;/strong&gt;, &lt;strong&gt;startup&lt;/strong&gt;, and &lt;strong&gt;error rates&lt;/strong&gt; under load.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="contents"&gt;Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="the-cost-model-what-you-actually-pay-for"&gt;The cost model: what you actually pay for&lt;/h2&gt;
&lt;p&gt;In cloud and Kubernetes environments, cost is strongly driven by:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;How many replicas you need&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How much CPU/memory you request per replica&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How quickly you can scale (up and down)&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How much time you spend operating the service&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Kubernetes scheduling and resource guarantees are based on &lt;strong&gt;requests&lt;/strong&gt; and &lt;strong&gt;limits&lt;/strong&gt;. Requests influence where Pods can be scheduled; limits cap what they can consume. [7][8]&lt;/p&gt;
&lt;p&gt;That means your “baseline footprint” matters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A service that requests 512Mi RAM &lt;em&gt;even when idle&lt;/em&gt; reduces node density.&lt;/li&gt;
&lt;li&gt;A service that requests 128Mi RAM allows more Pods per node.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="a-simple-illustrative-density-example"&gt;A simple (illustrative) density example&lt;/h3&gt;
&lt;p&gt;Assume you run 100 replicas of an API, and memory is your limiting resource:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Case A:&lt;/strong&gt; 100 × 512Mi = 51,200Mi ≈ &lt;strong&gt;50Gi&lt;/strong&gt; reserved&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Case B:&lt;/strong&gt; 100 × 128Mi = 12,800Mi ≈ &lt;strong&gt;12.5Gi&lt;/strong&gt; reserved&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That’s a ~&lt;strong&gt;37.5Gi&lt;/strong&gt; delta in reserved memory &lt;em&gt;before&lt;/em&gt; you count overhead (sidecars, DaemonSets, kube-system). This is not “Go vs Java math.” It’s “baseline footprint sets cluster size.”&lt;/p&gt;
&lt;p&gt;The point: &lt;strong&gt;cost discussions are often memory-and-startup discussions wearing a language-comparison mask.&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="gos-production-advantages-when-they-matter"&gt;Go’s production advantages (when they matter)&lt;/h2&gt;
&lt;h3 id="1-packaging-simplicity-and-deployment-surface"&gt;1) Packaging simplicity and deployment surface&lt;/h3&gt;
&lt;p&gt;Go’s toolchain compiles code into an executable (&lt;code&gt;go build&lt;/code&gt;). [5] Go’s modern toolchain approach (including toolchain selection starting in recent Go releases) helps keep builds reproducible across environments. [6]&lt;/p&gt;
&lt;p&gt;In practice, Go services often ship as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;a single process&lt;/li&gt;
&lt;li&gt;a single container layer containing a single binary&lt;/li&gt;
&lt;li&gt;minimal runtime dependencies&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That tends to reduce:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;container image complexity&lt;/li&gt;
&lt;li&gt;“works on my machine” drift&lt;/li&gt;
&lt;li&gt;runtime patch surface area&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;This matters most when you operate many services&lt;/strong&gt; and want upgrades to be boring.&lt;/p&gt;
&lt;h3 id="2-fast-start-and-scale-events"&gt;2) Fast start and “scale events”&lt;/h3&gt;
&lt;p&gt;In real systems, performance isn’t only request/response speed, it’s also how the service behaves during:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;deployments&lt;/li&gt;
&lt;li&gt;autoscaling&lt;/li&gt;
&lt;li&gt;node drains&lt;/li&gt;
&lt;li&gt;crashes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Go services commonly start quickly because they don’t require JVM warmup/classloading/JIT compilation. (Exact numbers vary; measure your service.)&lt;/p&gt;
&lt;p&gt;Spring Boot can start fast enough for most use cases, but cold starts can become a visible factor when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;you scale from zero frequently (serverless-like patterns)&lt;/li&gt;
&lt;li&gt;you do aggressive HPA scaling&lt;/li&gt;
&lt;li&gt;you run lots of short-lived jobs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Spring Boot also supports building &lt;strong&gt;native images&lt;/strong&gt; with GraalVM, which can materially improve startup and memory in some cases, but introduces different tradeoffs (build time, reflection limits, operational differences). [3][4]&lt;/p&gt;
&lt;h3 id="3-resource-envelope-predictability"&gt;3) Resource envelope predictability&lt;/h3&gt;
&lt;p&gt;For many “API gateway / orchestration / integration” services, CPU isn’t the bottleneck. Latency, network, and downstream behavior are.&lt;/p&gt;
&lt;p&gt;Go’s strengths here tend to be:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;predictable concurrency behavior&lt;/li&gt;
&lt;li&gt;straightforward backpressure patterns (bounded queues, semaphores)&lt;/li&gt;
&lt;li&gt;fewer runtime tuning knobs compared to JVM-heavy stacks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is not “Go always uses less RAM.” It’s “Go often gives you a tighter baseline envelope for simpler services, which improves scheduling density.”&lt;/p&gt;
&lt;h3 id="4-cloud-native-ergonomics-minimalism-wins-over-time"&gt;4) Cloud-native ergonomics: minimalism wins over time&lt;/h3&gt;
&lt;p&gt;Enterprise services accrete complexity over years. The less your runtime depends on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;classpath complexity&lt;/li&gt;
&lt;li&gt;reflection-driven magic&lt;/li&gt;
&lt;li&gt;extensive framework graphs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;…the easier it is to keep production surprises rare.&lt;/p&gt;
&lt;p&gt;Go’s bias toward explicit wiring tends to help with long-term operability, especially in platform/API layers where consistency matters.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="where-spring-boot-is-still-the-right-tool"&gt;Where Spring Boot is still the right tool&lt;/h2&gt;
&lt;p&gt;Spring Boot exists for a reason, and in many enterprises it’s still the correct default:&lt;/p&gt;
&lt;h3 id="1-ecosystem-and-starter-leverage"&gt;1) Ecosystem and “starter” leverage&lt;/h3&gt;
&lt;p&gt;Spring Boot’s opinionated defaults and starter ecosystem are an enormous accelerator for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;auth (OAuth2/OIDC)&lt;/li&gt;
&lt;li&gt;data access and ORM patterns&lt;/li&gt;
&lt;li&gt;enterprise integrations&lt;/li&gt;
&lt;li&gt;standardized configuration and profiles&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Spring Boot is explicitly designed to minimize configuration and help you ship “production-grade” applications quickly. [1]&lt;/p&gt;
&lt;p&gt;If you already have:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;shared Spring libraries&lt;/li&gt;
&lt;li&gt;internal Spring starters&lt;/li&gt;
&lt;li&gt;company-wide Spring conventions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;…then choosing Go for “purity” can be expensive in human terms.&lt;/p&gt;
&lt;h3 id="2-jvm-performance-can-be-excellent"&gt;2) JVM performance can be excellent&lt;/h3&gt;
&lt;p&gt;For long-lived services under sustained load, HotSpot JIT compilation can deliver extremely strong performance, sometimes outperforming Go in CPU-bound or allocation-sensitive scenarios.&lt;/p&gt;
&lt;p&gt;It’s a mistake to assume “compiled native binary” automatically means “faster.” The real question is: &lt;strong&gt;p99 latency, throughput per core, and behavior under GC pressure for your workload.&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="3-operational-maturity-and-tooling"&gt;3) Operational maturity and tooling&lt;/h3&gt;
&lt;p&gt;Spring Boot has well-worn operational patterns:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;actuator endpoints&lt;/li&gt;
&lt;li&gt;consistent configuration patterns&lt;/li&gt;
&lt;li&gt;deep tracing/profiling options&lt;/li&gt;
&lt;li&gt;broad community knowledge&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Also: if your org has deep Java on-call expertise, “operational simplicity” may already be solved socially.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="cloud-native-reality-images-cves-and-deploy-surface"&gt;Cloud-native reality: images, CVEs, and deploy surface&lt;/h2&gt;
&lt;h3 id="distroless-is-not-a-go-only-advantage"&gt;Distroless is not a Go-only advantage&lt;/h3&gt;
&lt;p&gt;A common Go pattern is “static binary + scratch/distroless.” But distroless images exist for Java too.&lt;/p&gt;
&lt;p&gt;Distroless images contain only the application and its runtime dependencies, with no package manager and no shell, reducing attack surface. [9] The distroless project includes Java images as well. [10]&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Operational implication:&lt;/strong&gt; smaller, simpler images usually mean:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;faster pulls and rollouts&lt;/li&gt;
&lt;li&gt;fewer things to patch&lt;/li&gt;
&lt;li&gt;fewer “shell inside container” habits (a feature, not a bug)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Whether you ship Go or Spring Boot, you can adopt hardened bases.&lt;/p&gt;
&lt;h3 id="two-dockerfile-patterns-illustrative"&gt;Two Dockerfile patterns (illustrative)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Go (multi-stage + distroless):&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-Dockerfile" data-lang="Dockerfile"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;golang:1.22-alpine&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;build&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;/src&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;COPY&lt;/span&gt; go.mod go.sum ./&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;RUN&lt;/span&gt; go mod download&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;COPY&lt;/span&gt; . .&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;RUN&lt;/span&gt; &lt;span class="nv"&gt;CGO_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="nv"&gt;GOOS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;linux go build -trimpath -ldflags &lt;span class="s2"&gt;&amp;#34;-s -w&amp;#34;&lt;/span&gt; -o /out/api ./cmd/api&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;gcr.io/distroless/static-debian12:nonroot&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;COPY&lt;/span&gt; --from&lt;span class="o"&gt;=&lt;/span&gt;build /out/api /api&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;nonroot:nonroot&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;/api&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Spring Boot (JAR + distroless Java):&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-Dockerfile" data-lang="Dockerfile"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;eclipse-temurin:21-jdk&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;build&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;/src&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;COPY&lt;/span&gt; . .&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;RUN&lt;/span&gt; ./mvnw -DskipTests package&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;gcr.io/distroless/java21-debian12:nonroot&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;COPY&lt;/span&gt; --from&lt;span class="o"&gt;=&lt;/span&gt;build /src/target/app.jar /app.jar&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;nonroot:nonroot&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;java&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;-jar&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;/app.jar&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The important part isn’t the exact base image, it&amp;rsquo;s the &lt;em&gt;principle&lt;/em&gt;: reduce image surface area and keep the deploy artifact boring.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="observability-and-operations"&gt;Observability and operations&lt;/h2&gt;
&lt;p&gt;Both ecosystems are strong here, but they differ in “how quickly can I get real telemetry.”&lt;/p&gt;
&lt;h3 id="opentelemetry-support"&gt;OpenTelemetry support&lt;/h3&gt;
&lt;p&gt;OpenTelemetry is the vendor-neutral standard for traces/metrics/logs. [11]&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Go language docs: SDK + instrumentation guidance. [11]&lt;/li&gt;
&lt;li&gt;Java language docs: SDK + instrumentation guidance. [12]&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="javas-advantage-zero-code-instrumentation"&gt;Java’s advantage: zero-code instrumentation&lt;/h3&gt;
&lt;p&gt;The OpenTelemetry Java agent can attach to Java applications and automatically instrument popular libraries via bytecode injection. [13] The OpenTelemetry Java instrumentation project provides the agent and broad library coverage. [14]&lt;/p&gt;
&lt;p&gt;Practical implication: &lt;strong&gt;you can often get useful traces without touching code.&lt;/strong&gt; That’s a meaningful ops advantage in large enterprises.&lt;/p&gt;
&lt;h3 id="gos-reality-explicit-instrumentation-plus-growing-options"&gt;Go’s reality: explicit instrumentation (plus growing options)&lt;/h3&gt;
&lt;p&gt;Go’s OpenTelemetry SDK support is strong. [11] Go auto-instrumentation options exist and are improving, but your fastest path today is still typically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;instrument key inbound/outbound edges in code&lt;/li&gt;
&lt;li&gt;standardize middleware across services&lt;/li&gt;
&lt;li&gt;treat telemetry as part of the API contract&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That’s not bad. It’s just a different default.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="a-decision-matrix"&gt;A decision matrix&lt;/h2&gt;
&lt;p&gt;Use this as a starting point, not a rule.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Constraint / Goal&lt;/th&gt;
&lt;th&gt;Go tends to win&lt;/th&gt;
&lt;th&gt;Spring Boot tends to win&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Many small services, high density&lt;/td&gt;
&lt;td&gt;✅ smaller baseline envelopes often help&lt;/td&gt;
&lt;td&gt;⚠️ can be heavier per-service&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fast scale-from-zero, frequent redeploys&lt;/td&gt;
&lt;td&gt;✅ typically quick startup&lt;/td&gt;
&lt;td&gt;✅ with care; ✅✅ with native image tradeoffs [3][4]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise integration breadth&lt;/td&gt;
&lt;td&gt;⚠️ you build more glue yourself&lt;/td&gt;
&lt;td&gt;✅ Spring ecosystem leverage [1]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Team expertise&lt;/td&gt;
&lt;td&gt;✅ if Go is your platform standard&lt;/td&gt;
&lt;td&gt;✅ if Java/Spring is your standard&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;“Boring deployments”&lt;/td&gt;
&lt;td&gt;✅ single binary patterns&lt;/td&gt;
&lt;td&gt;✅ well-trodden JVM patterns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Zero-code observability&lt;/td&gt;
&lt;td&gt;⚠️ emerging&lt;/td&gt;
&lt;td&gt;✅ OTel Java agent maturity [13][14]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Long-lived CPU-heavy services&lt;/td&gt;
&lt;td&gt;✅ sometimes&lt;/td&gt;
&lt;td&gt;✅ JVM can be extremely strong&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr&gt;
&lt;h2 id="how-to-validate-with-a-real-experiment"&gt;How to validate with a real experiment&lt;/h2&gt;
&lt;p&gt;If you want a decision you can defend, run a 2-4 hour experiment:&lt;/p&gt;
&lt;h3 id="1-define-a-representative-endpoint-mix"&gt;1) Define a representative endpoint mix&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;1 simple “health/read” endpoint&lt;/li&gt;
&lt;li&gt;1 endpoint that hits your DB&lt;/li&gt;
&lt;li&gt;1 endpoint that calls a downstream HTTP service&lt;/li&gt;
&lt;li&gt;1 endpoint with payload validation + auth&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="2-measure-the-four-numbers-that-matter"&gt;2) Measure the four numbers that matter&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Startup time&lt;/strong&gt; (cold start to ready)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Steady-state RSS&lt;/strong&gt; at idle&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;p95 / p99 latency&lt;/strong&gt; under load&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Error rate&lt;/strong&gt; under load + partial downstream failure&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="3-run-the-same-load-and-failure-profile"&gt;3) Run the same load and failure profile&lt;/h3&gt;
&lt;p&gt;Use the same:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;container runtime&lt;/li&gt;
&lt;li&gt;resource requests/limits&lt;/li&gt;
&lt;li&gt;ingress configuration&lt;/li&gt;
&lt;li&gt;downstream simulators&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="4-compare-operational-work-not-only-performance"&gt;4) Compare operational work, not only performance&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;How painful is debugging?&lt;/li&gt;
&lt;li&gt;How much config is required?&lt;/li&gt;
&lt;li&gt;How quickly can your team ship fixes safely?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is where enterprise reality lives.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="common-failure-modes"&gt;Common failure modes&lt;/h2&gt;
&lt;h3 id="go-pitfalls"&gt;Go pitfalls&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Teams reinvent frameworks inconsistently across services.&lt;/li&gt;
&lt;li&gt;Too much “just a handler” code without shared middleware for auth, limits, tracing, and error handling.&lt;/li&gt;
&lt;li&gt;Ignoring backpressure (unbounded goroutines) → memory blowups.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="spring-boot-pitfalls"&gt;Spring Boot pitfalls&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Default dependency graphs grow quietly until startup time and memory become a problem.&lt;/li&gt;
&lt;li&gt;Classpath/auto-config complexity makes “why did it do that?” debugging expensive.&lt;/li&gt;
&lt;li&gt;Container runtime tuning gets deferred, then becomes urgent during cost reviews.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="both-ecosystems"&gt;Both ecosystems&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;No explicit timeouts (inbound and outbound).&lt;/li&gt;
&lt;li&gt;No limits or budgets.&lt;/li&gt;
&lt;li&gt;No telemetry until after the first incident.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="closing-thought"&gt;Closing thought&lt;/h2&gt;
&lt;p&gt;If your enterprise APIs are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;small, numerous, latency-sensitive, and cost-sensitive&lt;br&gt;
…Go is often a strong default.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your enterprise APIs are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;integration-heavy, domain-rich, and built on existing Spring conventions&lt;br&gt;
…Spring Boot is usually the shortest path to “production-grade.”&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The best answer is the one you can operate confidently, on call, at scale.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;Spring Boot project overview:
&lt;/li&gt;
&lt;li&gt;Spring Boot reference: Graceful Shutdown:
&lt;/li&gt;
&lt;li&gt;Spring Boot reference: GraalVM Native Images:
&lt;/li&gt;
&lt;li&gt;GraalVM guide: Build a Spring Boot app into a native executable:
&lt;/li&gt;
&lt;li&gt;Go tutorial: Compile and install the application (&lt;code&gt;go build&lt;/code&gt; produces an executable):
&lt;/li&gt;
&lt;li&gt;Go docs: Toolchains and the &lt;code&gt;go&lt;/code&gt; command:
&lt;/li&gt;
&lt;li&gt;Kubernetes docs: Resource Management for Pods and Containers (requests/limits):
&lt;/li&gt;
&lt;li&gt;Google Cloud: Kubernetes best practices for resource requests and limits:
&lt;/li&gt;
&lt;li&gt;Distroless container images (project overview):
&lt;/li&gt;
&lt;li&gt;Distroless Java images:
&lt;/li&gt;
&lt;li&gt;OpenTelemetry Go docs:
&lt;/li&gt;
&lt;li&gt;OpenTelemetry Java docs:
&lt;/li&gt;
&lt;li&gt;OpenTelemetry Java Agent (zero-code):
&lt;/li&gt;
&lt;li&gt;OpenTelemetry Java instrumentation (agent JAR + library coverage):
&lt;/li&gt;
&lt;/ol&gt;</content:encoded></item><item><title>MCP Servers in Production: Hardening, Backpressure, and Observability (Go)</title><link>https://roygabriel.dev/blog/mcp-servers-production-hardening-go/</link><pubDate>Sat, 31 Jan 2026 09:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/mcp-servers-production-hardening-go/</guid><description>&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;As-of note:&lt;/strong&gt; MCP is evolving. This article references the MCP specification versioned &lt;strong&gt;2025-11-25&lt;/strong&gt; and related docs; verify details against the current spec before shipping changes. [1][2][4]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;Most “agent demos” fail in production for boring reasons: missing timeouts, unbounded concurrency, ambiguous tool interfaces, and logging that accidentally turns into data exfiltration.&lt;/p&gt;
&lt;p&gt;An &lt;strong&gt;MCP server&lt;/strong&gt; isn’t “just an integration.” It’s a &lt;strong&gt;capability boundary&lt;/strong&gt; between an LLM host (IDE, desktop app, agent runner) and the real world: files, APIs, databases, tickets, home automation, and anything else you wire up. MCP uses JSON-RPC 2.0 messages over transports like stdio (local) and Streamable HTTP (remote). [1][2][5]&lt;/p&gt;</description><content:encoded>
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;As-of note:&lt;/strong&gt; MCP is evolving. This article references the MCP specification versioned &lt;strong&gt;2025-11-25&lt;/strong&gt; and related docs; verify details against the current spec before shipping changes. [1][2][4]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;Most “agent demos” fail in production for boring reasons: missing timeouts, unbounded concurrency, ambiguous tool interfaces, and logging that accidentally turns into data exfiltration.&lt;/p&gt;
&lt;p&gt;An &lt;strong&gt;MCP server&lt;/strong&gt; isn’t “just an integration.” It’s a &lt;strong&gt;capability boundary&lt;/strong&gt; between an LLM host (IDE, desktop app, agent runner) and the real world: files, APIs, databases, tickets, home automation, and anything else you wire up. MCP uses JSON-RPC 2.0 messages over transports like stdio (local) and Streamable HTTP (remote). [1][2][5]&lt;/p&gt;
&lt;p&gt;That means an MCP server is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;an API gateway for tools&lt;/li&gt;
&lt;li&gt;a policy enforcement point (whether you intended it or not)&lt;/li&gt;
&lt;li&gt;a reliability hotspot (tool calls are where latency and failure concentrate)&lt;/li&gt;
&lt;li&gt;a security hotspot (tools are where “read” becomes “exfil” and “write” becomes “impact”)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This post is a pragmatic checklist + a set of Go patterns to harden an MCP server so it keeps working when it’s under real load, and remains safe when the model gets “creative.”&lt;/p&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Treat tool inputs as &lt;strong&gt;untrusted&lt;/strong&gt;. Validate and constrain everything.&lt;/li&gt;
&lt;li&gt;Put &lt;strong&gt;budgets&lt;/strong&gt; everywhere: timeouts, concurrency limits, rate limits, and payload caps.&lt;/li&gt;
&lt;li&gt;Build for &lt;strong&gt;partial failure&lt;/strong&gt;: retries, idempotency keys, circuit breaking, fallbacks.&lt;/li&gt;
&lt;li&gt;Log like a security engineer: &lt;strong&gt;structured&lt;/strong&gt;, &lt;strong&gt;redacted&lt;/strong&gt;, &lt;strong&gt;auditable&lt;/strong&gt;, and &lt;strong&gt;useful&lt;/strong&gt;. [11]&lt;/li&gt;
&lt;li&gt;Instrument with traces/metrics early; “we’ll add telemetry later” is a trap. [13]&lt;/li&gt;
&lt;li&gt;Prefer Go for MCP servers because deployment and operational behavior are predictable: single binary, fast startup, structured concurrency via &lt;code&gt;context&lt;/code&gt;, and a strong standard library.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="contents"&gt;Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#a-production-mental-model-for-mcp-servers"&gt;A production mental model for MCP servers&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#threat-model-what-actually-goes-wrong"&gt;Threat model: what actually goes wrong&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#hardening-layer-1-identity-and-authorization"&gt;Hardening layer 1: identity and authorization&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#hardening-layer-2-tool-contracts-that-resist-ambiguity"&gt;Hardening layer 2: tool contracts that resist ambiguity&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#hardening-layer-3-budgets-and-backpressure"&gt;Hardening layer 3: budgets and backpressure&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#hardening-layer-4-safe-networking-and-ssrf-containment"&gt;Hardening layer 4: safe networking and SSRF containment&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#hardening-layer-5-observability-without-leaking-secrets"&gt;Hardening layer 5: observability without leaking secrets&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#hardening-layer-6-versioning-and-rollout-discipline"&gt;Hardening layer 6: versioning and rollout discipline&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-production-checklist"&gt;A production checklist&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#references"&gt;References&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="a-production-mental-model-for-mcp-servers"&gt;A production mental model for MCP servers&lt;/h2&gt;
&lt;p&gt;MCP’s docs describe a host (the AI application), a client (connector inside the host), and servers (capabilities/providers). Servers can be “local” (stdio) or “remote” (Streamable HTTP). [2][3]&lt;/p&gt;
&lt;p&gt;Here’s the production mental model that matters:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Your MCP server is a tool gateway.&lt;/strong&gt;&lt;br&gt;
Every tool is effectively an RPC method exposed to an agent. MCP uses JSON-RPC 2.0 semantics for requests/responses/notifications. [1][5]&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;LLM tool arguments are not trustworthy.&lt;/strong&gt;&lt;br&gt;
Even if the LLM is “helpful,” arguments can be malformed, overbroad, or dangerous, especially under prompt injection or user-provided hostile input.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The host UI is not a security boundary.&lt;/strong&gt;&lt;br&gt;
The spec emphasizes user consent and tool safety, but the protocol can’t enforce your policy for you. You still need server-side controls. [1]&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Transport changes your blast radius, not your responsibilities.&lt;/strong&gt;&lt;br&gt;
Stdio reduces network exposure, but doesn’t remove safety requirements. Streamable HTTP adds multi-client/multi-tenant concerns and requires real auth. [2][3]&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If you remember nothing else: treat the MCP server like a production API you’d be willing to put on call for.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="threat-model-what-actually-goes-wrong"&gt;Threat model: what actually goes wrong&lt;/h2&gt;
&lt;p&gt;When MCP servers cause incidents, it’s usually one of these:&lt;/p&gt;
&lt;h3 id="1-input-ambiguity--destructive-actions"&gt;1) Input ambiguity → destructive actions&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;A “delete” tool with optional filters&lt;/li&gt;
&lt;li&gt;A “run command” tool with free-form strings&lt;/li&gt;
&lt;li&gt;A “sync” tool that can touch thousands of objects&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Mitigation:&lt;/strong&gt; schema + semantic validation, safe defaults, two-phase commit patterns (preview then apply), and explicit “danger gates.”&lt;/p&gt;
&lt;h3 id="2-prompt-injection--tool-misuse"&gt;2) Prompt injection → tool misuse&lt;/h3&gt;
&lt;p&gt;The model can be tricked into calling tools with attacker-provided arguments. If your tool can read internal data or call internal APIs, you’ve created an exfil path.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Mitigation:&lt;/strong&gt; least privilege, allowlists, strong auth, egress controls, and redaction.&lt;/p&gt;
&lt;h3 id="3-ssrf--network-pivoting"&gt;3) SSRF / network pivoting&lt;/h3&gt;
&lt;p&gt;Any tool that fetches URLs, loads webhooks, or calls dynamic endpoints can be abused to hit internal networks or metadata endpoints. OWASP treats SSRF as a major category for a reason. [10]&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Mitigation:&lt;/strong&gt; deny-by-default networking (CIDR blocks, DNS/IP resolution checks, allowlisted destinations).&lt;/p&gt;
&lt;h3 id="4-unbounded-concurrency--resource-collapse"&gt;4) Unbounded concurrency → resource collapse&lt;/h3&gt;
&lt;p&gt;Agents can fire tools in parallel. Without limits you’ll blow up:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;API quotas&lt;/li&gt;
&lt;li&gt;DB connections&lt;/li&gt;
&lt;li&gt;CPU/memory&lt;/li&gt;
&lt;li&gt;downstream latency&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Mitigation:&lt;/strong&gt; per-tenant rate limiting, concurrency caps, queues, and backpressure.&lt;/p&gt;
&lt;h3 id="5-helpful-logs--data-leak"&gt;5) “Helpful logs” → data leak&lt;/h3&gt;
&lt;p&gt;Tool arguments and tool responses often contain secrets, tokens, or private data. If you log everything, you’ve built an involuntary data lake.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Mitigation:&lt;/strong&gt; structured + redacted logging, security logging guidelines, and minimal retention. [11][12]&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="hardening-layer-1-identity-and-authorization"&gt;Hardening layer 1: identity and authorization&lt;/h2&gt;
&lt;p&gt;If you run &lt;strong&gt;Streamable HTTP&lt;/strong&gt;, assume:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;multiple clients&lt;/li&gt;
&lt;li&gt;untrusted networks&lt;/li&gt;
&lt;li&gt;tokens will leak eventually&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;MCP’s architecture guidance recommends standard HTTP authentication methods and mentions OAuth as a recommended way to obtain tokens for remote servers. [2][3]&lt;/p&gt;
&lt;h3 id="practical-rules"&gt;Practical rules&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Authenticate every request.&lt;/strong&gt;&lt;br&gt;
Use bearer tokens or mTLS depending on environment.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Authorize per tool.&lt;/strong&gt;&lt;br&gt;
“Authenticated” ≠ “allowed to run &lt;code&gt;delete_everything&lt;/code&gt;”.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Prefer short-lived tokens&lt;/strong&gt; and rotate them. [12]&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Multi-tenant?&lt;/strong&gt; Put the tenant identity into:
&lt;ul&gt;
&lt;li&gt;auth token claims, or&lt;/li&gt;
&lt;li&gt;an explicit, validated tenant header (signed), then&lt;/li&gt;
&lt;li&gt;enforce it everywhere.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="go-pattern-a-minimal-auth-middleware-skeleton-http-transport"&gt;Go pattern: a minimal auth middleware skeleton (HTTP transport)&lt;/h3&gt;
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;This is &lt;em&gt;not&lt;/em&gt; a full MCP implementation, just the hardening pattern you’ll wrap around your MCP handler.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;// Pseudocode-ish middleware skeleton. Replace verifyToken with your auth logic.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;authMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Handler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Handler&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;HandlerFunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ResponseWriter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;strings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;TrimPrefix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Header&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#34;Authorization&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;Bearer &amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;&amp;#34;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;missing auth&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;StatusUnauthorized&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ident&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;verifyToken&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// includes tenant + scopes&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;invalid auth&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;StatusUnauthorized&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ctxKeyIdentity&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ident&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ServeHTTP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Key point:&lt;/strong&gt; authorization should happen &lt;em&gt;after&lt;/em&gt; you parse the requested tool name, but &lt;em&gt;before&lt;/em&gt; you execute anything.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="hardening-layer-2-tool-contracts-that-resist-ambiguity"&gt;Hardening layer 2: tool contracts that resist ambiguity&lt;/h2&gt;
&lt;p&gt;Most MCP tool failures are self-inflicted: tool interfaces are too vague.&lt;/p&gt;
&lt;h3 id="design-tools-like-production-apis"&gt;Design tools like production APIs&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Bad tool signature:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;run(command: string)&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Better:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;run_command(program: enum, args: string[], cwd: string, timeout_ms: int, dry_run: bool)&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Why it’s better:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;forces structure&lt;/li&gt;
&lt;li&gt;allows you to enforce allowlists&lt;/li&gt;
&lt;li&gt;gives you timeouts and safe defaults&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="add-a-preview--apply-flow-for-risky-tools"&gt;Add a “preview → apply” flow for risky tools&lt;/h3&gt;
&lt;p&gt;For any tool that writes data or triggers side effects, do a two-step approach:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;plan_*&lt;/code&gt; returns a machine-readable plan + a &lt;code&gt;plan_id&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;apply_*&lt;/code&gt; requires &lt;code&gt;plan_id&lt;/code&gt; and optional user confirmation token&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This mirrors how we run infra changes (plan/apply) and dramatically reduces accidental blast radius.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="hardening-layer-3-budgets-and-backpressure"&gt;Hardening layer 3: budgets and backpressure&lt;/h2&gt;
&lt;p&gt;Production systems are budget systems.&lt;/p&gt;
&lt;p&gt;If you don’t set explicit budgets, your MCP server will eventually allocate them for you via outages.&lt;/p&gt;
&lt;h3 id="budget-checklist"&gt;Budget checklist&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Server timeouts&lt;/strong&gt; (header read, request read, write, idle)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Request body caps&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Outbound timeouts&lt;/strong&gt; to dependencies&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Concurrency caps&lt;/strong&gt; per tool and per tenant&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rate limits&lt;/strong&gt; per tenant and per identity&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Queue limits&lt;/strong&gt; (bounded channels) to avoid memory blowups&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Circuit breaking&lt;/strong&gt; for flaky downstream dependencies&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="go-server-timeouts-are-not-optional"&gt;Go: server timeouts are not optional&lt;/h3&gt;
&lt;p&gt;Go’s &lt;code&gt;net/http&lt;/code&gt; provides explicit server timeouts; leaving them at zero is a common footgun. [6][7]&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nx"&gt;srv&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Addr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;:8080&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// your MCP handler + middleware&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ReadHeaderTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ReadTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;WriteTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;IdleTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;srv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ListenAndServe&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="go-propagate-cancellation-everywhere-with-context"&gt;Go: propagate cancellation everywhere with &lt;code&gt;context&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;context.Context&lt;/code&gt; is the backbone of “structured concurrency” in Go: deadlines and cancellation signals flow through your call stack. [8][9]&lt;/p&gt;
&lt;p&gt;Rule: &lt;strong&gt;every tool execution must accept a &lt;code&gt;context.Context&lt;/code&gt;&lt;/strong&gt;, and every outbound call must honor it.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;toolCall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ToolRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ToolResponse&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;cancel&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;defer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;cancel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// ... outbound calls use ctx&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;integration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Do&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="go-per-tenant-rate-limiting-with-xtimerate"&gt;Go: per-tenant rate limiting with &lt;code&gt;x/time/rate&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;golang.org/x/time/rate&lt;/code&gt; implements a token bucket limiter. [9]&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;limiters&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;mu&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;sync&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Mutex&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Limiter&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;l&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;limiters&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Limiter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;mu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;defer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;mu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Unlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Limiter&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;lim&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;lim&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// Example: 5 req/sec with bursts up to 10&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;lim&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;NewLimiter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;lim&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;lim&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;rateLimitMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lims&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;limiters&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Handler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Handler&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;HandlerFunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ResponseWriter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ident&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;mustIdentity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;lims&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ident&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;TenantID&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;Allow&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;rate limited&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;StatusTooManyRequests&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ServeHTTP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="backpressure-choose-a-policy"&gt;Backpressure: choose a policy&lt;/h3&gt;
&lt;p&gt;When you’re overloaded, you need a policy. Pick one explicitly:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Fail fast&lt;/strong&gt; with 429 / “busy” (simplest, safest)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Queue&lt;/strong&gt; with bounded depth (more complex; must cap memory)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Degrade&lt;/strong&gt; by disabling expensive tools first&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The “fail fast” approach is often correct for tool gateways.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="hardening-layer-4-safe-networking-and-ssrf-containment"&gt;Hardening layer 4: safe networking and SSRF containment&lt;/h2&gt;
&lt;p&gt;If any tool can fetch a user-provided URL or call a user-influenced endpoint, SSRF is on the table. [10]&lt;/p&gt;
&lt;h3 id="ssrf-containment-strategies-that-actually-work"&gt;SSRF containment strategies that actually work&lt;/h3&gt;
&lt;p&gt;OWASP’s SSRF guidance boils down to a few themes: don’t trust user-controlled URLs, use allowlists, and enforce network controls. [10]&lt;/p&gt;
&lt;p&gt;In practice, for MCP servers:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Prefer allowlists over blocklists.&lt;/strong&gt;&lt;br&gt;
“Only these domains” beats “block internal IPs.” Attackers are creative.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Resolve and validate IPs before dialing.&lt;/strong&gt;&lt;br&gt;
DNS can be weaponized. Validate the final destination IP (and re-validate on redirects).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Disable redirects or re-validate each hop.&lt;/strong&gt;&lt;br&gt;
Redirect chains are SSRF’s favorite tool.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Enforce egress policy at the network layer too.&lt;/strong&gt;&lt;br&gt;
Kubernetes NetworkPolicies / firewall rules are your last line of defense.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="go-pattern-an-outbound-http-client-with-strict-timeouts"&gt;Go pattern: an outbound HTTP client with strict timeouts&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// whole request budget&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Transport&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Transport&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Proxy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ProxyFromEnvironment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;DialContext&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;net&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Dialer&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;KeepAlive&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nx"&gt;DialContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;TLSHandshakeTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ResponseHeaderTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ExpectContinueTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;MaxIdleConns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;IdleConnTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then wrap URL validation around any request creation. Keep it boring and strict.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="hardening-layer-5-observability-without-leaking-secrets"&gt;Hardening layer 5: observability without leaking secrets&lt;/h2&gt;
&lt;p&gt;Telemetry is how you prove:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;you’re within budgets&lt;/li&gt;
&lt;li&gt;tools behave as expected&lt;/li&gt;
&lt;li&gt;failures are localized&lt;/li&gt;
&lt;li&gt;incidents can be diagnosed without “ssh and guess”&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But logging is also where teams accidentally leak sensitive data.&lt;/p&gt;
&lt;p&gt;OWASP’s logging guidance emphasizes logging that supports detection/response while avoiding sensitive data exposure. [11] Pair that with secrets management discipline. [12]&lt;/p&gt;
&lt;h3 id="what-to-measure-minimum-viable-mcp-telemetry"&gt;What to measure (minimum viable MCP telemetry)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Counters&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;tool_calls_total{tool, tenant, status}&lt;/li&gt;
&lt;li&gt;auth_failures_total{reason}&lt;/li&gt;
&lt;li&gt;rate_limited_total{tenant}&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Histograms&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;tool_latency_seconds{tool}&lt;/li&gt;
&lt;li&gt;outbound_latency_seconds{dependency}&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Gauges&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;in_flight_tool_calls{tool}&lt;/li&gt;
&lt;li&gt;queue_depth{tool}&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="trace-boundaries"&gt;Trace boundaries&lt;/h3&gt;
&lt;p&gt;Instrument:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;request → tool routing&lt;/li&gt;
&lt;li&gt;tool execution span&lt;/li&gt;
&lt;li&gt;downstream calls span&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;OpenTelemetry’s Go docs show how to add instrumentation and emit traces/metrics. [13]&lt;/p&gt;
&lt;h3 id="logging-rules-that-save-you-later"&gt;Logging rules that save you later&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Use structured logging (JSON).&lt;/li&gt;
&lt;li&gt;Add correlation IDs (trace IDs) to logs.&lt;/li&gt;
&lt;li&gt;Redact:
&lt;ul&gt;
&lt;li&gt;Authorization headers&lt;/li&gt;
&lt;li&gt;tokens&lt;/li&gt;
&lt;li&gt;cookies&lt;/li&gt;
&lt;li&gt;tool payload fields known to contain secrets&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Log &lt;em&gt;events&lt;/em&gt;, not raw payloads:
&lt;ul&gt;
&lt;li&gt;“tool X called”&lt;/li&gt;
&lt;li&gt;“resource Y read”&lt;/li&gt;
&lt;li&gt;“write operation requested (dry_run=true)”&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Audit logs&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For high-impact tools, write an append-only audit record:
&lt;ul&gt;
&lt;li&gt;who (identity)&lt;/li&gt;
&lt;li&gt;what (tool + parameters summary)&lt;/li&gt;
&lt;li&gt;when&lt;/li&gt;
&lt;li&gt;result (success/failure)&lt;/li&gt;
&lt;li&gt;plan_id / idempotency_key&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Audit logs should be treated as security data.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="hardening-layer-6-versioning-and-rollout-discipline"&gt;Hardening layer 6: versioning and rollout discipline&lt;/h2&gt;
&lt;p&gt;MCP uses string-based version identifiers like &lt;code&gt;YYYY-MM-DD&lt;/code&gt; to represent the last date of backwards-incompatible changes. [4]&lt;/p&gt;
&lt;p&gt;That’s helpful, but it doesn’t solve the operational problem:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;clients upgrade at different times&lt;/li&gt;
&lt;li&gt;schema changes drift&lt;/li&gt;
&lt;li&gt;hosts differ in which capabilities they support&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="practical-compatibility-rules"&gt;Practical compatibility rules&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Pin your server’s supported protocol version&lt;/strong&gt; and expose it in &lt;code&gt;health&lt;/code&gt; or diagnostics.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Add contract tests&lt;/strong&gt; that run against:
&lt;ul&gt;
&lt;li&gt;one “current” client&lt;/li&gt;
&lt;li&gt;one “previous” client version&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Support additive changes&lt;/strong&gt; first:
&lt;ul&gt;
&lt;li&gt;new tools&lt;/li&gt;
&lt;li&gt;new optional fields&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Use feature flags for risky tools.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="rollout-like-a-platform-team"&gt;Rollout like a platform team&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Canaries for remote servers&lt;/li&gt;
&lt;li&gt;“Shadow mode” for new tools (log what would happen)&lt;/li&gt;
&lt;li&gt;Slow ramp with budget monitoring&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="a-production-checklist"&gt;A production checklist&lt;/h2&gt;
&lt;p&gt;If you’re building (or inheriting) an MCP server, run this checklist:&lt;/p&gt;
&lt;h3 id="safety"&gt;Safety&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Tool contracts are structured (no free-form “do anything” strings).&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Every tool has a safe default (&lt;code&gt;dry_run=true&lt;/code&gt;, &lt;code&gt;limit&lt;/code&gt; required, etc.).&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Destructive tools require a plan/apply step (or explicit confirmation gates).&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Tool inputs are validated and bounded (length, ranges, enums).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="identity--access"&gt;Identity &amp;amp; access&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Remote transport requires authentication and per-tool authorization.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Tokens are short-lived and rotated; secrets are not in source control. [12]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Tenant identity is enforced at every access point (not “best effort”).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="budgets--resilience"&gt;Budgets &amp;amp; resilience&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; HTTP server timeouts are configured. [6][7]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Outbound clients have timeouts and connection limits.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Rate limiting exists per tenant/identity. [9]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Concurrency caps exist per tool; overload behavior is explicit (fail fast / queue).&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Retries are bounded and idempotent where side effects exist.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="networking"&gt;Networking&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; URL fetch tools have allowlists and SSRF protections. [10]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Redirect policies are explicit (disabled or re-validated).&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Egress is constrained at the network layer (not only in code).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="observability"&gt;Observability&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Metrics cover tool calls, latency, errors, and rate limiting.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Tracing exists across tool execution and downstream calls. [13]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Logs are structured, correlated, and redacted. [11]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Audit logging exists for high-impact tools.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="operations"&gt;Operations&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Health checks and readiness checks exist.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Configuration is explicit and validated on startup.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Versioning strategy is documented and tested. [4]&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;Model Context Protocol (MCP) Specification (version 2025-11-25): &lt;a href="https://modelcontextprotocol.io/specification/2025-11-25" target="_blank" rel="noopener noreferrer"&gt;https://modelcontextprotocol.io/specification/2025-11-25&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;MCP Architecture Overview (participants, transports, concepts): &lt;a href="https://modelcontextprotocol.io/docs/learn/architecture" target="_blank" rel="noopener noreferrer"&gt;https://modelcontextprotocol.io/docs/learn/architecture&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;MCP Transport details (Streamable HTTP transport overview): &lt;a href="https://modelcontextprotocol.io/specification/2025-03-26/basic/transports" target="_blank" rel="noopener noreferrer"&gt;https://modelcontextprotocol.io/specification/2025-03-26/basic/transports&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;MCP Versioning: &lt;a href="https://modelcontextprotocol.io/specification/versioning" target="_blank" rel="noopener noreferrer"&gt;https://modelcontextprotocol.io/specification/versioning&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;JSON-RPC 2.0 Specification: &lt;a href="https://www.jsonrpc.org/specification" target="_blank" rel="noopener noreferrer"&gt;https://www.jsonrpc.org/specification&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Go &lt;code&gt;net/http&lt;/code&gt; package documentation: &lt;a href="https://pkg.go.dev/net/http" target="_blank" rel="noopener noreferrer"&gt;https://pkg.go.dev/net/http&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Cloudflare: “The complete guide to Go net/http timeouts”: &lt;a href="https://blog.cloudflare.com/the-complete-guide-to-golang-net-http-timeouts/" target="_blank" rel="noopener noreferrer"&gt;https://blog.cloudflare.com/the-complete-guide-to-golang-net-http-timeouts/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Go &lt;code&gt;context&lt;/code&gt; package documentation: &lt;a href="https://pkg.go.dev/context" target="_blank" rel="noopener noreferrer"&gt;https://pkg.go.dev/context&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Go &lt;code&gt;x/time/rate&lt;/code&gt; documentation: &lt;a href="https://pkg.go.dev/golang.org/x/time/rate" target="_blank" rel="noopener noreferrer"&gt;https://pkg.go.dev/golang.org/x/time/rate&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;OWASP SSRF Prevention Cheat Sheet / SSRF category references:&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://cheatsheetseries.owasp.org/cheatsheets/Server_Side_Request_Forgery_Prevention_Cheat_Sheet.html" target="_blank" rel="noopener noreferrer"&gt;https://cheatsheetseries.owasp.org/cheatsheets/Server_Side_Request_Forgery_Prevention_Cheat_Sheet.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://owasp.org/Top10/2021/A10_2021-Server-Side_Request_Forgery_%28SSRF%29/" target="_blank" rel="noopener noreferrer"&gt;https://owasp.org/Top10/2021/A10_2021-Server-Side_Request_Forgery_%28SSRF%29/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="11"&gt;
&lt;li&gt;OWASP Logging Cheat Sheet (security-focused logging guidance): &lt;a href="https://cheatsheetseries.owasp.org/cheatsheets/Logging_Cheat_Sheet.html" target="_blank" rel="noopener noreferrer"&gt;https://cheatsheetseries.owasp.org/cheatsheets/Logging_Cheat_Sheet.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Secrets management guidance:&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;OWASP Secrets Management Cheat Sheet: &lt;a href="https://cheatsheetseries.owasp.org/cheatsheets/Secrets_Management_Cheat_Sheet.html" target="_blank" rel="noopener noreferrer"&gt;https://cheatsheetseries.owasp.org/cheatsheets/Secrets_Management_Cheat_Sheet.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Kubernetes “Good practices for Kubernetes Secrets”: &lt;a href="https://kubernetes.io/docs/concepts/security/secrets-good-practices/" target="_blank" rel="noopener noreferrer"&gt;https://kubernetes.io/docs/concepts/security/secrets-good-practices/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="13"&gt;
&lt;li&gt;OpenTelemetry Go instrumentation docs: &lt;a href="https://opentelemetry.io/docs/languages/go/instrumentation/" target="_blank" rel="noopener noreferrer"&gt;https://opentelemetry.io/docs/languages/go/instrumentation/&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;</content:encoded></item><item><title>Agent Observability That Doesn't Lie</title><link>https://roygabriel.dev/blog/agent-observability-that-doesnt-lie/</link><pubDate>Sat, 20 Dec 2025 12:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/agent-observability-that-doesnt-lie/</guid><description>&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;Most &amp;ldquo;agent observability&amp;rdquo; is either:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;too shallow&lt;/strong&gt; (a chat transcript and a couple logs), or&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;too noisy&lt;/strong&gt; (every token logged, every tool payload stored, no signal)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Neither works in production.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re serious about operating agents, you need observability that answers three questions quickly:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;What happened?&lt;/strong&gt; (forensics)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Why did it happen?&lt;/strong&gt; (debuggability)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How often does it happen?&lt;/strong&gt; (reliability)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;OpenTelemetry exists to standardize how you instrument, generate, and export telemetry across traces, metrics, and logs. [1] W3C Trace Context defines how trace context propagates across service boundaries. [2]&lt;/p&gt;</description><content:encoded>&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;Most &amp;ldquo;agent observability&amp;rdquo; is either:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;too shallow&lt;/strong&gt; (a chat transcript and a couple logs), or&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;too noisy&lt;/strong&gt; (every token logged, every tool payload stored, no signal)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Neither works in production.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re serious about operating agents, you need observability that answers three questions quickly:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;What happened?&lt;/strong&gt; (forensics)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Why did it happen?&lt;/strong&gt; (debuggability)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How often does it happen?&lt;/strong&gt; (reliability)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;OpenTelemetry exists to standardize how you instrument, generate, and export telemetry across traces, metrics, and logs. [1] W3C Trace Context defines how trace context propagates across service boundaries. [2]&lt;/p&gt;
&lt;p&gt;Agents add two new requirements:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;tool calls are part of your &amp;ldquo;distributed trace&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;decisioning&amp;rdquo; is a first-class component (not just business logic)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This article is a practical blueprint.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Instrument agents like distributed systems:&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;traces&lt;/strong&gt; for causality (what triggered what)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;metrics&lt;/strong&gt; for health (p95 latency, error rates)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;logs&lt;/strong&gt; for human context (but redacted)&lt;/li&gt;
&lt;li&gt;Propagate a single trace across:&lt;/li&gt;
&lt;li&gt;agent runtime -&amp;gt; MCP gateway -&amp;gt; MCP tool servers -&amp;gt; upstream APIs&lt;/li&gt;
&lt;li&gt;Capture &lt;strong&gt;decision summaries&lt;/strong&gt;, not chain-of-thought.&lt;/li&gt;
&lt;li&gt;Treat cost as a production signal: emit per-run and per-tool cost metrics.&lt;/li&gt;
&lt;li&gt;Use semantic conventions where possible to keep telemetry queryable. [3]&lt;/li&gt;
&lt;li&gt;Don&amp;rsquo;t turn observability into a data breach: OWASP highlights sensitive info disclosure and prompt injection as key risks. [7]&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="contents"&gt;Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#what-to-observe-in-an-agent-system"&gt;What to observe in an agent system&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-trace-model-for-agents"&gt;A trace model for agents&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#metrics-that-matter"&gt;Metrics that matter&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#logs-and-redaction"&gt;Logs and redaction&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#audit-events-vs-debug-logs"&gt;Audit events vs debug logs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#dashboards-and-alerts"&gt;Dashboards and alerts&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-production-checklist"&gt;A production checklist&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#references"&gt;References&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="what-to-observe-in-an-agent-system"&gt;What to observe in an agent system&lt;/h2&gt;
&lt;p&gt;Agents have four observable subsystems:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Planner/Reasoner&lt;/strong&gt; (creates the plan, chooses tools)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tool execution&lt;/strong&gt; (calls MCP tools and interprets results)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory/state&lt;/strong&gt; (what was stored or retrieved)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Policy/budget&lt;/strong&gt; (what was allowed or blocked)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If you only observe #2, you&amp;rsquo;ll miss why the agent chose the wrong tool.
If you only observe #1, you&amp;rsquo;ll miss production failures.&lt;/p&gt;
&lt;p&gt;You need the full chain.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="a-trace-model-for-agents"&gt;A trace model for agents&lt;/h2&gt;
&lt;h3 id="the-core-idea"&gt;The core idea&lt;/h3&gt;
&lt;p&gt;A single &amp;ldquo;agent run&amp;rdquo; is a distributed trace:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;it spans model calls&lt;/li&gt;
&lt;li&gt;tool calls&lt;/li&gt;
&lt;li&gt;downstream system calls&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Use W3C Trace Context (&lt;code&gt;traceparent&lt;/code&gt;, &lt;code&gt;tracestate&lt;/code&gt;) to propagate the trace across boundaries. [2]&lt;/p&gt;
&lt;h3 id="suggested-spans-minimum-viable"&gt;Suggested spans (minimum viable)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Root span&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;agent.run&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;attributes: &lt;code&gt;agent.name&lt;/code&gt;, &lt;code&gt;tenant&lt;/code&gt;, &lt;code&gt;user&lt;/code&gt;, &lt;code&gt;session&lt;/code&gt;, &lt;code&gt;goal_hash&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Planner&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;agent.plan&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;attributes: &lt;code&gt;planner.model&lt;/code&gt;, &lt;code&gt;plan.step_count&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Model calls&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;llm.call&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;attributes: &lt;code&gt;model&lt;/code&gt;, &lt;code&gt;prompt_tokens&lt;/code&gt;, &lt;code&gt;completion_tokens&lt;/code&gt;, &lt;code&gt;latency_ms&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Tool selection&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;agent.tool_select&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;attributes: &lt;code&gt;selector.version&lt;/code&gt;, &lt;code&gt;candidate_count&lt;/code&gt;, &lt;code&gt;selected_count&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Tool call&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;tool.call&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;attributes: &lt;code&gt;tool.name&lt;/code&gt;, &lt;code&gt;tool.class&lt;/code&gt; (read/write/danger), &lt;code&gt;tool.server&lt;/code&gt;, &lt;code&gt;status&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Policy&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;policy.check&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;attributes: &lt;code&gt;policy.rule_id&lt;/code&gt;, &lt;code&gt;decision&lt;/code&gt; (allow/deny), &lt;code&gt;reason_code&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Memory&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;memory.read&lt;/code&gt; / &lt;code&gt;memory.write&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;attributes: &lt;code&gt;store&lt;/code&gt;, &lt;code&gt;keys&lt;/code&gt;, &lt;code&gt;bytes&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="why-spans--logs"&gt;Why spans &amp;gt; logs&lt;/h3&gt;
&lt;p&gt;Spans give you causality:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;which tool call caused a failure&lt;/li&gt;
&lt;li&gt;which step blew the budget&lt;/li&gt;
&lt;li&gt;which upstream dependency was slow&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With OpenTelemetry, you can emit traces and metrics using the same SDK approach. [1][4]&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="metrics-that-matter"&gt;Metrics that matter&lt;/h2&gt;
&lt;h3 id="tool-health-metrics"&gt;Tool health metrics&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;tool_calls_total{tool,status}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tool_latency_ms_bucket{tool}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tool_timeouts_total{tool}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tool_retries_total{tool}&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="agent-run-health-metrics"&gt;Agent run health metrics&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;agent_runs_total{status}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;agent_run_latency_ms_bucket{agent}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;agent_steps_total_bucket{agent}&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="cost-metrics-treat-cost-like-reliability"&gt;Cost metrics (treat cost like reliability)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;llm_tokens_total{model,type=prompt|completion}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;llm_cost_usd_total{model}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;run_cost_usd_bucket{agent}&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="policy-metrics"&gt;Policy metrics&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;policy_denied_total{rule_id}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;danger_tool_attempt_total{tool}&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Semantic conventions help your metrics stay queryable and consistent across systems. OpenTelemetry documents semantic conventions for HTTP spans/metrics, for example. [3][5]&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="logs-and-redaction"&gt;Logs and redaction&lt;/h2&gt;
&lt;p&gt;Logs should add human context, not become a data lake of secrets.&lt;/p&gt;
&lt;p&gt;Rules I like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Do not log prompts by default.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Do not log tool payloads by default.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Log summaries and hashes:&lt;/li&gt;
&lt;li&gt;&lt;code&gt;goal_hash&lt;/code&gt;, &lt;code&gt;plan_hash&lt;/code&gt;, &lt;code&gt;tool_args_hash&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Log &lt;strong&gt;structured error reasons&lt;/strong&gt;:&lt;/li&gt;
&lt;li&gt;&lt;code&gt;validation_error&lt;/code&gt;, &lt;code&gt;upstream_rate_limited&lt;/code&gt;, &lt;code&gt;auth_failed&lt;/code&gt;, &lt;code&gt;policy_denied&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For agent systems, OWASP highlights sensitive information disclosure and insecure output handling. Logging is one of the easiest ways to accidentally create both. [7]&lt;/p&gt;
&lt;h3 id="debug-mode-that-isnt-dangerous"&gt;&amp;ldquo;Debug mode&amp;rdquo; that isn&amp;rsquo;t dangerous&lt;/h3&gt;
&lt;p&gt;If you must support deeper logs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;only enable per tenant/user for a limited window&lt;/li&gt;
&lt;li&gt;auto-expire&lt;/li&gt;
&lt;li&gt;redact aggressively&lt;/li&gt;
&lt;li&gt;never store raw secrets&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="audit-events-vs-debug-logs"&gt;Audit events vs debug logs&lt;/h2&gt;
&lt;p&gt;Treat them as different products:&lt;/p&gt;
&lt;h3 id="audit-events-for-governance"&gt;Audit events (for governance)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;immutable-ish records of side effects&lt;/li&gt;
&lt;li&gt;minimal sensitive data&lt;/li&gt;
&lt;li&gt;always on&lt;/li&gt;
&lt;li&gt;long retention&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example audit fields:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;who: tenant/user/client&lt;/li&gt;
&lt;li&gt;what: tool + action class (create/update/delete)&lt;/li&gt;
&lt;li&gt;when: timestamp&lt;/li&gt;
&lt;li&gt;where: environment&lt;/li&gt;
&lt;li&gt;result: success/failure&lt;/li&gt;
&lt;li&gt;resource IDs (safe identifiers)&lt;/li&gt;
&lt;li&gt;idempotency keys / plan IDs&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="debug-logs-for-engineers"&gt;Debug logs (for engineers)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;short retention&lt;/li&gt;
&lt;li&gt;more context&lt;/li&gt;
&lt;li&gt;highly controlled access&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Mixing these two is how you end up with &amp;ldquo;SharePoint logs full of PII&amp;rdquo; and no one wants to touch them.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="dashboards-and-alerts"&gt;Dashboards and alerts&lt;/h2&gt;
&lt;h3 id="dashboards-start-simple"&gt;Dashboards (start simple)&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Tool reliability&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;top tools by error rate&lt;/li&gt;
&lt;li&gt;top tools by p95 latency&lt;/li&gt;
&lt;li&gt;timeouts per tool&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="2"&gt;
&lt;li&gt;&lt;strong&gt;Agent success&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;success rate by agent type&lt;/li&gt;
&lt;li&gt;&amp;ldquo;stuck runs&amp;rdquo; (runs exceeding max duration)&lt;/li&gt;
&lt;li&gt;average steps per run&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="3"&gt;
&lt;li&gt;&lt;strong&gt;Cost&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;cost per run&lt;/li&gt;
&lt;li&gt;cost per tenant&lt;/li&gt;
&lt;li&gt;top drivers (which tools/model calls)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="alerts-avoid-noise"&gt;Alerts (avoid noise)&lt;/h3&gt;
&lt;p&gt;Alert on what is actionable:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;tool error rate spikes for critical tools&lt;/li&gt;
&lt;li&gt;tool latency p95 spikes beyond SLO&lt;/li&gt;
&lt;li&gt;budget exceeded spike (runaway behavior)&lt;/li&gt;
&lt;li&gt;policy denied spike (possible prompt injection attempt)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you use SLOs and error budgets, Google&amp;rsquo;s SRE material is a practical reference for turning SLOs into alerting strategies. [6]&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="a-production-checklist"&gt;A production checklist&lt;/h2&gt;
&lt;h3 id="tracing"&gt;Tracing&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Every agent run has a trace ID.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Trace context propagates across MCP boundaries (W3C Trace Context). [2]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Tool calls are spans with stable tool identifiers.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="metrics"&gt;Metrics&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Tool success/error/latency metrics exist.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Agent run success/latency/steps metrics exist.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Cost metrics exist and are monitored.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="logging"&gt;Logging&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Default logs are redacted summaries, not raw payloads.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Debug logging is time-bounded and access-controlled.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="audit"&gt;Audit&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Audit events exist for all side-effecting tools.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Audit records include &amp;ldquo;who/what/when/result&amp;rdquo; without leaking secrets.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="security"&gt;Security&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Observability does not become a secret exfil path (OWASP risks considered). [7]&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;p&gt;[1] OpenTelemetry - Documentation (overview): &lt;a href="https://opentelemetry.io/docs/" target="_blank" rel="noopener noreferrer"&gt;https://opentelemetry.io/docs/&lt;/a&gt;
[2] W3C - Trace Context: &lt;a href="https://www.w3.org/TR/trace-context/" target="_blank" rel="noopener noreferrer"&gt;https://www.w3.org/TR/trace-context/&lt;/a&gt;
[3] OpenTelemetry - Semantic conventions for HTTP (spans/metrics/logs): &lt;a href="https://opentelemetry.io/docs/specs/semconv/http/" target="_blank" rel="noopener noreferrer"&gt;https://opentelemetry.io/docs/specs/semconv/http/&lt;/a&gt;
[4] OpenTelemetry Go - Instrumentation docs: &lt;a href="https://opentelemetry.io/docs/languages/go/instrumentation/" target="_blank" rel="noopener noreferrer"&gt;https://opentelemetry.io/docs/languages/go/instrumentation/&lt;/a&gt;
[5] OpenTelemetry - Semantic conventions for HTTP metrics: &lt;a href="https://opentelemetry.io/docs/specs/semconv/http/http-metrics/" target="_blank" rel="noopener noreferrer"&gt;https://opentelemetry.io/docs/specs/semconv/http/http-metrics/&lt;/a&gt;
[6] Google SRE Workbook - Alerting on SLOs: &lt;a href="https://sre.google/workbook/alerting-on-slos/" target="_blank" rel="noopener noreferrer"&gt;https://sre.google/workbook/alerting-on-slos/&lt;/a&gt;
[7] OWASP - Top 10 for Large Language Model Applications: &lt;a href="https://owasp.org/www-project-top-10-for-large-language-model-applications/" target="_blank" rel="noopener noreferrer"&gt;https://owasp.org/www-project-top-10-for-large-language-model-applications/&lt;/a&gt;
&lt;/p&gt;</content:encoded></item><item><title>Durable Agents with Temporal: Retries, Idempotency, and Long-Running State</title><link>https://roygabriel.dev/blog/durable-agents-with-temporal/</link><pubDate>Sat, 06 Dec 2025 12:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/durable-agents-with-temporal/</guid><description>&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;Agents are often framed as &amp;ldquo;reason + tools.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;In production, the actual problem is &lt;strong&gt;execution&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;calls fail&lt;/li&gt;
&lt;li&gt;networks flake&lt;/li&gt;
&lt;li&gt;credentials expire&lt;/li&gt;
&lt;li&gt;humans need to approve steps&lt;/li&gt;
&lt;li&gt;tasks take hours/days&lt;/li&gt;
&lt;li&gt;systems restart&lt;/li&gt;
&lt;li&gt;you need a forensic trail of what happened&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your agent runtime is &amp;ldquo;one process with a loop,&amp;rdquo; you will eventually lose state and do the wrong side effect twice.&lt;/p&gt;
&lt;p&gt;This is why workflow engines exist.&lt;/p&gt;</description><content:encoded>&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;Agents are often framed as &amp;ldquo;reason + tools.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;In production, the actual problem is &lt;strong&gt;execution&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;calls fail&lt;/li&gt;
&lt;li&gt;networks flake&lt;/li&gt;
&lt;li&gt;credentials expire&lt;/li&gt;
&lt;li&gt;humans need to approve steps&lt;/li&gt;
&lt;li&gt;tasks take hours/days&lt;/li&gt;
&lt;li&gt;systems restart&lt;/li&gt;
&lt;li&gt;you need a forensic trail of what happened&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your agent runtime is &amp;ldquo;one process with a loop,&amp;rdquo; you will eventually lose state and do the wrong side effect twice.&lt;/p&gt;
&lt;p&gt;This is why workflow engines exist.&lt;/p&gt;
&lt;p&gt;Temporal&amp;rsquo;s model - durable workflows with deterministic execution and event history - maps incredibly well to tool-using agents. Temporal explicitly requires workflow code to be deterministic and provides APIs for versioning long-running workflows. [1][2]&lt;/p&gt;
&lt;p&gt;This article is a production pattern: &lt;strong&gt;use Temporal to make agents durable.&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Represent an agent run as a &lt;strong&gt;Temporal Workflow&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Make tool calls &lt;strong&gt;Activities&lt;/strong&gt; (retryable, timeout-bounded).&lt;/li&gt;
&lt;li&gt;Put side-effecting tools behind:&lt;/li&gt;
&lt;li&gt;idempotency keys&lt;/li&gt;
&lt;li&gt;preview -&amp;gt; apply&lt;/li&gt;
&lt;li&gt;durable &amp;ldquo;exactly-once&amp;rdquo; semantics (from the workflow&amp;rsquo;s perspective)&lt;/li&gt;
&lt;li&gt;Use Temporal&amp;rsquo;s retry policies for Activities and explicit failure handling. [3]&lt;/li&gt;
&lt;li&gt;Use event history and replay for forensics (Temporal events are first-class). [4]&lt;/li&gt;
&lt;li&gt;Use workflow versioning for safe evolution of long-running agents. [2]&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="contents"&gt;Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#why-agents-need-durable-execution"&gt;Why agents need durable execution&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#mapping-an-agent-to-temporal"&gt;Mapping an agent to Temporal&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#determinism-and-why-it-matters"&gt;Determinism and why it matters&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#retries-timeouts-and-idempotency"&gt;Retries, timeouts, and idempotency&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#human-in-the-loop-as-a-first-class-step"&gt;Human-in-the-loop as a first-class step&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#replay-audit-and-debugging"&gt;Replay, audit, and debugging&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#versioning-evolving-agents-safely"&gt;Versioning: evolving agents safely&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-production-checklist"&gt;A production checklist&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#references"&gt;References&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="why-agents-need-durable-execution"&gt;Why agents need durable execution&lt;/h2&gt;
&lt;p&gt;A few failure modes you&amp;rsquo;ll recognize:&lt;/p&gt;
&lt;h3 id="partial-side-effects"&gt;Partial side effects&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;agent creates a ticket&lt;/li&gt;
&lt;li&gt;process dies before storing the ticket ID&lt;/li&gt;
&lt;li&gt;agent retries and creates a duplicate&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="long-running-waits"&gt;Long-running waits&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;wait for PR approvals&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;wait for a CI pipeline&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;wait for a meeting to complete&amp;rdquo;
If your agent can&amp;rsquo;t wait durably, it becomes a polling daemon.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="human-approval"&gt;Human approval&lt;/h3&gt;
&lt;p&gt;Some steps should not be automated:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;apply to prod&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;send email&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;delete resources&amp;rdquo;
You need durable pause/resume with clean audit.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="mapping-an-agent-to-temporal"&gt;Mapping an agent to Temporal&lt;/h2&gt;
&lt;h3 id="workflow--agent-run"&gt;Workflow = agent run&lt;/h3&gt;
&lt;p&gt;One agent run becomes a single Temporal Workflow Execution. Temporal workflows are designed for long-running, durable coordination. [5]&lt;/p&gt;
&lt;p&gt;Inside the workflow you model steps:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;interpret goal&lt;/li&gt;
&lt;li&gt;choose tools&lt;/li&gt;
&lt;li&gt;call tools&lt;/li&gt;
&lt;li&gt;react to results&lt;/li&gt;
&lt;li&gt;request approvals&lt;/li&gt;
&lt;li&gt;finalize output&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="activities--tool-calls-and-external-io"&gt;Activities = tool calls and external IO&lt;/h3&gt;
&lt;p&gt;All external calls should be Activities:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;MCP tool calls&lt;/li&gt;
&lt;li&gt;HTTP calls&lt;/li&gt;
&lt;li&gt;DB writes&lt;/li&gt;
&lt;li&gt;notifications&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Why? Activities are where retries and timeouts belong. Temporal defines retry policies as configuration for how and when to retry failures. [3]&lt;/p&gt;
&lt;h3 id="signals--external-events"&gt;Signals = external events&lt;/h3&gt;
&lt;p&gt;Use signals for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;human approvals&lt;/li&gt;
&lt;li&gt;&amp;ldquo;cancel&amp;rdquo;&lt;/li&gt;
&lt;li&gt;updated user intent&lt;/li&gt;
&lt;li&gt;out-of-band events (&amp;ldquo;incident resolved&amp;rdquo;)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="queries--introspection"&gt;Queries = introspection&lt;/h3&gt;
&lt;p&gt;Expose workflow state:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;current step&lt;/li&gt;
&lt;li&gt;last tool call&lt;/li&gt;
&lt;li&gt;pending approvals&lt;/li&gt;
&lt;li&gt;budget remaining&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="determinism-and-why-it-matters"&gt;Determinism and why it matters&lt;/h2&gt;
&lt;p&gt;Temporal requires workflow code to be deterministic. [1] Determinism is what allows Temporal to replay history and rebuild state after worker crashes.&lt;/p&gt;
&lt;p&gt;Practical consequence:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Don&amp;rsquo;t do IO in workflow code.&lt;/li&gt;
&lt;li&gt;Don&amp;rsquo;t read the current time directly in workflow code (use Temporal APIs).&lt;/li&gt;
&lt;li&gt;Don&amp;rsquo;t call random generators without deterministic control.&lt;/li&gt;
&lt;li&gt;Keep workflow logic as &amp;ldquo;orchestration,&amp;rdquo; not execution.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you violate determinism, you can hit non-deterministic errors on replay. Temporal&amp;rsquo;s docs and community discussions emphasize this constraint and the need for careful changes. [1][2]&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="retries-timeouts-and-idempotency"&gt;Retries, timeouts, and idempotency&lt;/h2&gt;
&lt;h3 id="retry-policies-activities"&gt;Retry policies (Activities)&lt;/h3&gt;
&lt;p&gt;Temporal retry policies control backoff and retry behavior for activity failures. [3]&lt;/p&gt;
&lt;p&gt;Use them intentionally:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;retries for transient failures (rate limits, timeouts)&lt;/li&gt;
&lt;li&gt;limited retries for &amp;ldquo;probably broken&amp;rdquo; failures&lt;/li&gt;
&lt;li&gt;exponential backoff with jitter (avoid thundering herd)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="timeouts-are-not-optional"&gt;Timeouts are not optional&lt;/h3&gt;
&lt;p&gt;Set explicit timeouts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;ScheduleToStart&lt;/li&gt;
&lt;li&gt;StartToClose&lt;/li&gt;
&lt;li&gt;ScheduleToClose&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Without timeouts, retries can run &amp;ldquo;forever&amp;rdquo; in practice.&lt;/p&gt;
&lt;h3 id="idempotency-keys-for-side-effects"&gt;Idempotency keys for side effects&lt;/h3&gt;
&lt;p&gt;Your workflow can be retried/replayed. Your Activity can be retried. Upstream systems can time out after performing the operation.&lt;/p&gt;
&lt;p&gt;For side-effecting tools:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;generate an idempotency key in the workflow&lt;/li&gt;
&lt;li&gt;pass it into the tool Activity&lt;/li&gt;
&lt;li&gt;store &amp;ldquo;operation result&amp;rdquo; in workflow state&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When the Activity retries, it reuses the key so the upstream system deduplicates.&lt;/p&gt;
&lt;p&gt;This is the difference between &amp;ldquo;retries&amp;rdquo; and &amp;ldquo;duplicates.&amp;rdquo;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="human-in-the-loop-as-a-first-class-step"&gt;Human-in-the-loop as a first-class step&lt;/h2&gt;
&lt;p&gt;For dangerous operations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;pause&lt;/li&gt;
&lt;li&gt;ask for approval with the plan summary&lt;/li&gt;
&lt;li&gt;resume when approved&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Temporal workflows can wait for signals without holding threads like a traditional process would.&lt;/p&gt;
&lt;p&gt;This is one of the cleanest ways to build:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;preview -&amp;gt; approve -&amp;gt; apply&amp;rdquo;
without building a bunch of custom state machinery.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="replay-audit-and-debugging"&gt;Replay, audit, and debugging&lt;/h2&gt;
&lt;p&gt;Temporal events are recorded as part of the workflow&amp;rsquo;s event history. [4]&lt;/p&gt;
&lt;p&gt;This yields production superpowers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;reconstruct exactly what happened&lt;/li&gt;
&lt;li&gt;understand why a step was taken&lt;/li&gt;
&lt;li&gt;replay a run to test a bug fix&lt;/li&gt;
&lt;li&gt;implement &amp;ldquo;reset&amp;rdquo; patterns (carefully)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For agents, this is the difference between:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;the model did something weird&amp;rdquo;
and&lt;/li&gt;
&lt;li&gt;&amp;ldquo;step 7 called tool X with args Y after tool Z returned response R&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="versioning-evolving-agents-safely"&gt;Versioning: evolving agents safely&lt;/h2&gt;
&lt;p&gt;Agent logic will change. Prompts will change. Tool contracts will change.&lt;/p&gt;
&lt;p&gt;If you have long-running agents, you need a strategy that doesn&amp;rsquo;t break in-flight executions.&lt;/p&gt;
&lt;p&gt;Temporal provides workflow versioning mechanisms because determinism means you can&amp;rsquo;t simply change workflow logic without thought. [2]&lt;/p&gt;
&lt;p&gt;Production approach:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;keep existing executions on old code paths&lt;/li&gt;
&lt;li&gt;route new executions to new paths&lt;/li&gt;
&lt;li&gt;migrate intentionally&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This prevents &amp;ldquo;deploy broke every running workflow.&amp;rdquo;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="a-production-checklist"&gt;A production checklist&lt;/h2&gt;
&lt;h3 id="architecture"&gt;Architecture&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Agent runs modeled as workflows; tool calls as activities.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; External events modeled as signals; state exposed via queries.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="determinism"&gt;Determinism&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; No IO in workflow code (only orchestration).&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Workflow changes use versioning strategy. [2]&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="reliability"&gt;Reliability&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Retry policies defined for Activities. [3]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Timeouts defined and bounded.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Idempotency keys used for side-effecting actions.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="governance"&gt;Governance&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Human approval gates exist for dangerous operations.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Audit trails include plan summaries and results.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="operability"&gt;Operability&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Event history used for debugging and incident analysis. [4]&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;p&gt;[1] Temporal - Workflow Definition (determinism requirement): &lt;a href="https://docs.temporal.io/workflow-definition" target="_blank" rel="noopener noreferrer"&gt;https://docs.temporal.io/workflow-definition&lt;/a&gt;
[2] Temporal Go SDK - Versioning (evolving deterministic workflows safely): &lt;a href="https://docs.temporal.io/develop/go/versioning" target="_blank" rel="noopener noreferrer"&gt;https://docs.temporal.io/develop/go/versioning&lt;/a&gt;
[3] Temporal - Retry Policies (how and when retries happen): &lt;a href="https://docs.temporal.io/encyclopedia/retry-policies" target="_blank" rel="noopener noreferrer"&gt;https://docs.temporal.io/encyclopedia/retry-policies&lt;/a&gt;
[4] Temporal - Events reference (event history): &lt;a href="https://docs.temporal.io/references/events" target="_blank" rel="noopener noreferrer"&gt;https://docs.temporal.io/references/events&lt;/a&gt;
[5] Temporal - Workflows overview: &lt;a href="https://docs.temporal.io/workflows" target="_blank" rel="noopener noreferrer"&gt;https://docs.temporal.io/workflows&lt;/a&gt;
&lt;/p&gt;</content:encoded></item><item><title>From Stdio to Enterprise: The MCP Gateway Pattern</title><link>https://roygabriel.dev/blog/mcp-gateway-pattern/</link><pubDate>Sat, 22 Nov 2025 12:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/mcp-gateway-pattern/</guid><description>&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;As-of note:&lt;/strong&gt; MCP evolves quickly. This article references the MCP spec revision &lt;strong&gt;2025-11-25&lt;/strong&gt;. Validate details against the current spec before shipping changes. [1][2][3]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;Local MCP servers over &lt;strong&gt;stdio&lt;/strong&gt; are an amazing developer experience: you install a tool server, the host (Claude Desktop / Claude Code / an agent runtime) launches it, and you&amp;rsquo;re productive in minutes. [2]&lt;/p&gt;
&lt;p&gt;But as soon as MCP becomes &lt;em&gt;shared infrastructure&lt;/em&gt; - multiple clients, multiple users, multiple environments - the &amp;ldquo;local tool server&amp;rdquo; model runs into the same constraints every integration layer hits:&lt;/p&gt;</description><content:encoded>
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;As-of note:&lt;/strong&gt; MCP evolves quickly. This article references the MCP spec revision &lt;strong&gt;2025-11-25&lt;/strong&gt;. Validate details against the current spec before shipping changes. [1][2][3]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;Local MCP servers over &lt;strong&gt;stdio&lt;/strong&gt; are an amazing developer experience: you install a tool server, the host (Claude Desktop / Claude Code / an agent runtime) launches it, and you&amp;rsquo;re productive in minutes. [2]&lt;/p&gt;
&lt;p&gt;But as soon as MCP becomes &lt;em&gt;shared infrastructure&lt;/em&gt; - multiple clients, multiple users, multiple environments - the &amp;ldquo;local tool server&amp;rdquo; model runs into the same constraints every integration layer hits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Who is allowed to call what tool?&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How do you prevent one noisy user from melting shared dependencies?&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How do you audit tool side effects?&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How do you roll out tool changes without breaking clients?&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How do you keep secrets out of prompts, logs, and screenshots?&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is where the &lt;strong&gt;MCP Gateway Pattern&lt;/strong&gt; shows up.&lt;/p&gt;
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;A gateway is not &amp;ldquo;another service.&amp;rdquo; It&amp;rsquo;s a &lt;strong&gt;capability boundary&lt;/strong&gt;: the place where you enforce policy, budgets, and observability for tool use at scale.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Stdio is great for local, single-user, low-blast-radius setups.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;HTTP transports&lt;/strong&gt; (Streamable HTTP) enable multi-client servers - but they also require real auth and multi-tenant safety. [2][3]&lt;/li&gt;
&lt;li&gt;An &lt;strong&gt;MCP gateway&lt;/strong&gt; sits between clients and tool servers to provide:&lt;/li&gt;
&lt;li&gt;authentication &amp;amp; authorization&lt;/li&gt;
&lt;li&gt;tenant isolation&lt;/li&gt;
&lt;li&gt;rate limits / concurrency / cost budgets&lt;/li&gt;
&lt;li&gt;consistent tool schemas + safety gates&lt;/li&gt;
&lt;li&gt;audit logs and observability&lt;/li&gt;
&lt;li&gt;routing, versioning, rollout controls&lt;/li&gt;
&lt;li&gt;Build the gateway to be boring: small surface area, strict validation, explicit policies, great telemetry.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="contents"&gt;Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#when-stdio-stops-being-enough"&gt;When stdio stops being enough&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#the-mcp-gateway-pattern"&gt;The MCP Gateway Pattern&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#responsibilities-of-a-gateway"&gt;Responsibilities of a gateway&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#reference-architecture"&gt;Reference architecture&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#policy-patterns-that-actually-work"&gt;Policy patterns that actually work&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#scaling-and-isolation-strategies"&gt;Scaling and isolation strategies&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#observability-and-audit"&gt;Observability and audit&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#rollouts-and-versioning"&gt;Rollouts and versioning&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-production-checklist"&gt;A production checklist&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#references"&gt;References&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="when-stdio-stops-being-enough"&gt;When stdio stops being enough&lt;/h2&gt;
&lt;p&gt;MCP supports multiple transports; stdio is common for local servers. [2] In that model, the host controls process lifetime and secrets typically come from the environment on the local machine.&lt;/p&gt;
&lt;p&gt;Stdio starts to strain when you need:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;multi-client concurrency&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;shared tenancy&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;central policy enforcement&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;centralized audit&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;fleet-level rollout controls&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At that point, you&amp;rsquo;re effectively building a platform. The platform needs a stable ingress point with consistent security and operational behavior.&lt;/p&gt;
&lt;p&gt;MCP&amp;rsquo;s &lt;strong&gt;HTTP-based transports&lt;/strong&gt; (like Streamable HTTP) are designed for servers that can handle multiple connections and enable streaming/notifications. [2] MCP also defines an authorization flow for HTTP-based transports. [3]&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s the entry point for a gateway.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="the-mcp-gateway-pattern"&gt;The MCP Gateway Pattern&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Definition:&lt;/strong&gt; An MCP gateway is an MCP server (or MCP-adjacent ingress layer) that:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;authenticates and authorizes the client&lt;/li&gt;
&lt;li&gt;routes requests to one or more downstream MCP servers (or tool backends)&lt;/li&gt;
&lt;li&gt;enforces budgets and safety gates&lt;/li&gt;
&lt;li&gt;emits consistent telemetry and audit records&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;It looks like an API gateway, but the payload is &amp;ldquo;tool capability&amp;rdquo; not &amp;ldquo;REST endpoints.&amp;rdquo;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="responsibilities-of-a-gateway"&gt;Responsibilities of a gateway&lt;/h2&gt;
&lt;h3 id="1-authentication-and-authorization"&gt;1) Authentication and authorization&lt;/h3&gt;
&lt;p&gt;If you expose MCP servers over HTTP, you need strong auth. MCP includes an authorization framework at the transport layer for HTTP-based transports. [3]&lt;/p&gt;
&lt;p&gt;Practical gateway rules:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Authenticate every client&lt;/strong&gt; (bearer tokens, mTLS, OAuth-derived access tokens).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Authorize per tool&lt;/strong&gt;, not per server.&lt;/li&gt;
&lt;li&gt;Prefer &lt;strong&gt;least privilege&lt;/strong&gt; scopes:&lt;/li&gt;
&lt;li&gt;&lt;code&gt;calendar.read&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;calendar.write&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;email.read&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;email.send&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;k8s.readonly&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;k8s.apply&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;For high-impact tools: require explicit confirmation tokens and/or multi-party approval.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="2-tool-contract-enforcement"&gt;2) Tool contract enforcement&lt;/h3&gt;
&lt;p&gt;MCP tools are invoked by an LLM-driven client. That means tool arguments are &lt;strong&gt;untrusted&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The gateway is the ideal place to enforce:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;schema validation&lt;/li&gt;
&lt;li&gt;payload size caps&lt;/li&gt;
&lt;li&gt;allowlists and blocklists&lt;/li&gt;
&lt;li&gt;&amp;ldquo;danger gates&amp;rdquo; (preview/apply, confirmations)&lt;/li&gt;
&lt;li&gt;&amp;ldquo;semantic validation&amp;rdquo; (not just types - e.g., limits required, date ranges bounded)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;MCP&amp;rsquo;s spec is grounded in structured schemas; treat those schemas as contracts. [1]&lt;/p&gt;
&lt;h3 id="3-budgets-and-backpressure"&gt;3) Budgets and backpressure&lt;/h3&gt;
&lt;p&gt;Agents can trigger bursty tool calls. Without backpressure you get the classic cascade:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;upstream rate limits&lt;/li&gt;
&lt;li&gt;DB pool exhaustion&lt;/li&gt;
&lt;li&gt;thread/goroutine explosion&lt;/li&gt;
&lt;li&gt;timeouts everywhere&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At the gateway you can enforce:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;per-tenant rate limits&lt;/li&gt;
&lt;li&gt;per-tool concurrency limits&lt;/li&gt;
&lt;li&gt;timeouts and deadline propagation&lt;/li&gt;
&lt;li&gt;queue depth caps (bounded memory)&lt;/li&gt;
&lt;li&gt;circuit breakers for flaky dependencies&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is where you keep &amp;ldquo;one user spamming tools&amp;rdquo; from becoming &amp;ldquo;everyone is down.&amp;rdquo;&lt;/p&gt;
&lt;h3 id="4-secret-handling-and-redaction"&gt;4) Secret handling and redaction&lt;/h3&gt;
&lt;p&gt;Gateways are a natural place to centralize:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;secret injection (short-lived tokens per tenant)&lt;/li&gt;
&lt;li&gt;output redaction (strip tokens, emails, PII fields)&lt;/li&gt;
&lt;li&gt;logging policies (never log raw tool payloads by default)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For agent systems, OWASP highlights risks like prompt injection and sensitive info disclosure as major categories. [7]&lt;/p&gt;
&lt;p&gt;Your gateway should assume that anything returned by a tool could be coerced into exfiltration if you&amp;rsquo;re careless.&lt;/p&gt;
&lt;h3 id="5-observability-and-audit"&gt;5) Observability and audit&lt;/h3&gt;
&lt;p&gt;Operationally, the gateway is your best place to emit consistent:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;request logs&lt;/li&gt;
&lt;li&gt;tool call metrics&lt;/li&gt;
&lt;li&gt;traces across tool chains&lt;/li&gt;
&lt;li&gt;audit events for side effects&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;OpenTelemetry is the de facto standard for collecting and exporting telemetry. [5] W3C Trace Context defines headers like &lt;code&gt;traceparent&lt;/code&gt;/&lt;code&gt;tracestate&lt;/code&gt; for trace propagation across services. [6]&lt;/p&gt;
&lt;p&gt;If you want an enterprise to trust agents, you need the forensic trail.&lt;/p&gt;
&lt;h3 id="6-routing-and-discovery-at-scale"&gt;6) Routing and discovery at scale&lt;/h3&gt;
&lt;p&gt;The gateway becomes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the routing table (&amp;ldquo;tool X lives in cluster Y&amp;rdquo;)&lt;/li&gt;
&lt;li&gt;the discovery system (&amp;ldquo;list tools available for tenant Z&amp;rdquo;)&lt;/li&gt;
&lt;li&gt;the version broker (&amp;ldquo;tool schema v3 for client A, v4 for client B&amp;rdquo;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is also where you can implement &amp;ldquo;tool quality&amp;rdquo; policies:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;quarantine tools with high error rates&lt;/li&gt;
&lt;li&gt;fallback to read-only alternatives&lt;/li&gt;
&lt;li&gt;degrade gracefully under partial outages&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="reference-architecture"&gt;Reference architecture&lt;/h2&gt;
&lt;p&gt;Here&amp;rsquo;s a simple, effective gateway architecture:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;--------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- Agent host / IDE / runtime -
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- (MCP client) -
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;--------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; - Streamable HTTP / JSON-RPC [2][4]
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; v
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- MCP Gateway -
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- - AuthN/Z [3] -
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- - Schema + safety gates -
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- - Budgets (rate, concurrency, cost) -
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- - Audit + telemetry (OTel) [5][6] -
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- - Routing + tool registry -
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; ------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; v v
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;----------------- ------------------
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- MCP Server A - - MCP Server B -
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- (calendar) - - (k8s, github...)-
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;------------------ ------------------
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; v v
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; Upstream APIs Upstream APIs
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Key design decision: &lt;strong&gt;the gateway should not contain business logic&lt;/strong&gt;. It enforces policy and routes tool calls. Tool semantics live in tool servers.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="policy-patterns-that-actually-work"&gt;Policy patterns that actually work&lt;/h2&gt;
&lt;h3 id="pattern-read-vs-write-tool-classes"&gt;Pattern: Read vs write tool classes&lt;/h3&gt;
&lt;p&gt;Classify tools into tiers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Read-only:&lt;/strong&gt; listing, searching, fetching&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Write-safe:&lt;/strong&gt; creates/updates that are naturally reversible&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Dangerous:&lt;/strong&gt; deletes, bulk updates, destructive actions, privileged ops&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Then enforce different rules per tier:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Read-only: wide availability, higher concurrency&lt;/li&gt;
&lt;li&gt;Write-safe: lower concurrency, stronger audit, idempotency keys&lt;/li&gt;
&lt;li&gt;Dangerous: preview/apply, explicit confirmations, restricted scopes&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="pattern-preview---apply"&gt;Pattern: Preview -&amp;gt; Apply&lt;/h3&gt;
&lt;p&gt;For any tool that can cause harm:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;plan_*&lt;/code&gt; returns a plan + summary + &lt;code&gt;plan_id&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;apply_*&lt;/code&gt; requires &lt;code&gt;plan_id&lt;/code&gt; (and optionally a user confirmation token)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This is the &amp;ldquo;terraform plan/apply&amp;rdquo; mental model applied to tools.&lt;/p&gt;
&lt;h3 id="pattern-allowlisted-egress-ssrf-containment"&gt;Pattern: Allowlisted egress (SSRF containment)&lt;/h3&gt;
&lt;p&gt;If tools can fetch URLs or call arbitrary endpoints, treat it as SSRF risk. OWASP&amp;rsquo;s SSRF prevention guidance is a useful baseline. [8]&lt;/p&gt;
&lt;p&gt;At the gateway, enforce:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;allowlisted domains&lt;/li&gt;
&lt;li&gt;IP/CIDR blocks for internal metadata ranges&lt;/li&gt;
&lt;li&gt;redirect re-validation&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="pattern-tenant-bound-tokens"&gt;Pattern: Tenant-bound tokens&lt;/h3&gt;
&lt;p&gt;Instead of giving tool servers &amp;ldquo;global&amp;rdquo; credentials, mint tenant-scoped tokens and inject them for each call.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;reduces blast radius&lt;/li&gt;
&lt;li&gt;makes audit meaningful&lt;/li&gt;
&lt;li&gt;enables &amp;ldquo;kill switch&amp;rdquo; revocation per tenant&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="scaling-and-isolation-strategies"&gt;Scaling and isolation strategies&lt;/h2&gt;
&lt;p&gt;A gateway is where multi-tenancy becomes real. Choose an isolation model:&lt;/p&gt;
&lt;h3 id="option-a-process-isolation-per-tool-server-simple-strong-isolation"&gt;Option A: Process isolation per tool server (simple, strong isolation)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;each integration is its own process/container&lt;/li&gt;
&lt;li&gt;faults stay contained&lt;/li&gt;
&lt;li&gt;rollouts per integration are easy&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Tradeoff: more processes to manage.&lt;/p&gt;
&lt;h3 id="option-b-shared-server-with-strong-tenant-sandboxing"&gt;Option B: Shared server with strong tenant sandboxing&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;single multi-tenant server handles many clients&lt;/li&gt;
&lt;li&gt;cheaper to run&lt;/li&gt;
&lt;li&gt;requires rigorous isolation inside the process&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Tradeoff: higher risk if a bug leaks across tenants.&lt;/p&gt;
&lt;h3 id="option-c-hybrid"&gt;Option C: Hybrid&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;sensitive&amp;rdquo; integrations are isolated&lt;/li&gt;
&lt;li&gt;&amp;ldquo;low-risk&amp;rdquo; read-only tools can be multi-tenant&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Most enterprises end up here.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="observability-and-audit"&gt;Observability and audit&lt;/h2&gt;
&lt;h3 id="what-to-emit-minimum-viable"&gt;What to emit (minimum viable)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Metrics&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;tool_calls_total{tool, tenant, status}&lt;/li&gt;
&lt;li&gt;tool_latency_ms{tool}&lt;/li&gt;
&lt;li&gt;rate_limited_total{tenant}&lt;/li&gt;
&lt;li&gt;budget_exceeded_total{tenant, budget_type}&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Traces&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;request span (client -&amp;gt; gateway)&lt;/li&gt;
&lt;li&gt;tool execution span (gateway -&amp;gt; server)&lt;/li&gt;
&lt;li&gt;downstream spans (server -&amp;gt; upstream API)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Audit events&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;who (tenant/user/client)&lt;/li&gt;
&lt;li&gt;what (tool + summarized parameters)&lt;/li&gt;
&lt;li&gt;when&lt;/li&gt;
&lt;li&gt;result (success/failure)&lt;/li&gt;
&lt;li&gt;side effect IDs (resource IDs, plan_id, idempotency_key)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;OpenTelemetry&amp;rsquo;s Go docs are a good reference for instrumentation patterns. [5]&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="rollouts-and-versioning"&gt;Rollouts and versioning&lt;/h2&gt;
&lt;p&gt;Tool contracts drift. Clients upgrade at different times. Gateways can reduce pain by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;pinning tool schema versions per client&lt;/li&gt;
&lt;li&gt;supporting additive changes first (new fields optional)&lt;/li&gt;
&lt;li&gt;allowing parallel tool versions for a period&lt;/li&gt;
&lt;li&gt;enabling canary rollouts per tenant&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you do nothing else: &lt;strong&gt;never deploy a breaking tool change to 100% of tenants at once.&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="a-production-checklist"&gt;A production checklist&lt;/h2&gt;
&lt;h3 id="security"&gt;Security&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; AuthN required for all HTTP-based access. [3]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; AuthZ enforced per tool (least privilege).&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Tool inputs validated and bounded.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Dangerous tools require preview/apply and explicit confirmations.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Egress allowlists exist for URL/network tools. [8]&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="reliability"&gt;Reliability&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Per-tenant rate limiting and per-tool concurrency caps.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Timeouts everywhere; deadlines propagate.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Bounded queues (no unbounded memory growth).&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Circuit breakers for flaky dependencies.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="operability"&gt;Operability&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Traces propagate end-to-end (W3C Trace Context). [6]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Metrics and logs are consistent and redacted.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Audit events exist for side effects.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="delivery"&gt;Delivery&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Tool schemas versioned; canary rollouts supported.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Quarantine and fallback policies exist for failing tools.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;p&gt;[1] Model Context Protocol (MCP) - Specification (Protocol Revision 2025-11-25): &lt;a href="https://modelcontextprotocol.io/specification/2025-11-25" target="_blank" rel="noopener noreferrer"&gt;https://modelcontextprotocol.io/specification/2025-11-25&lt;/a&gt;
[2] MCP - Transports (including Streamable HTTP): &lt;a href="https://modelcontextprotocol.io/specification/2025-03-26/basic/transports" target="_blank" rel="noopener noreferrer"&gt;https://modelcontextprotocol.io/specification/2025-03-26/basic/transports&lt;/a&gt;
[3] MCP - Authorization (HTTP-based transports): &lt;a href="https://modelcontextprotocol.io/specification/2025-11-25/basic/authorization" target="_blank" rel="noopener noreferrer"&gt;https://modelcontextprotocol.io/specification/2025-11-25/basic/authorization&lt;/a&gt;
[4] JSON-RPC 2.0 Specification: &lt;a href="https://www.jsonrpc.org/specification" target="_blank" rel="noopener noreferrer"&gt;https://www.jsonrpc.org/specification&lt;/a&gt;
[5] OpenTelemetry Go - Instrumentation docs: &lt;a href="https://opentelemetry.io/docs/languages/go/instrumentation/" target="_blank" rel="noopener noreferrer"&gt;https://opentelemetry.io/docs/languages/go/instrumentation/&lt;/a&gt;
[6] W3C - Trace Context: &lt;a href="https://www.w3.org/TR/trace-context/" target="_blank" rel="noopener noreferrer"&gt;https://www.w3.org/TR/trace-context/&lt;/a&gt;
[7] OWASP - Top 10 for Large Language Model Applications: &lt;a href="https://owasp.org/www-project-top-10-for-large-language-model-applications/" target="_blank" rel="noopener noreferrer"&gt;https://owasp.org/www-project-top-10-for-large-language-model-applications/&lt;/a&gt;
[8] OWASP - SSRF Prevention Cheat Sheet: &lt;a href="https://cheatsheetseries.owasp.org/cheatsheets/Server_Side_Request_Forgery_Prevention_Cheat_Sheet.html" target="_blank" rel="noopener noreferrer"&gt;https://cheatsheetseries.owasp.org/cheatsheets/Server_Side_Request_Forgery_Prevention_Cheat_Sheet.html&lt;/a&gt;
&lt;/p&gt;</content:encoded></item><item><title>Tool Discovery at Scale: Solving the Million Tool Problem</title><link>https://roygabriel.dev/blog/million-tool-problem/</link><pubDate>Sat, 15 Nov 2025 12:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/million-tool-problem/</guid><description>&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;Tool-using agents are powerful &lt;em&gt;because&lt;/em&gt; they can do real work: read systems, change systems, orchestrate workflows.&lt;/p&gt;
&lt;p&gt;The trap is what I call the &lt;strong&gt;Million Tool Problem&lt;/strong&gt;:&lt;/p&gt;
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;The moment you have &amp;ldquo;enough tools,&amp;rdquo; tool selection becomes harder than tool execution.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;At small scale, you can stuff tool schemas into the prompt and hope the model chooses correctly. At scale, that approach breaks:&lt;/p&gt;</description><content:encoded>&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;Tool-using agents are powerful &lt;em&gt;because&lt;/em&gt; they can do real work: read systems, change systems, orchestrate workflows.&lt;/p&gt;
&lt;p&gt;The trap is what I call the &lt;strong&gt;Million Tool Problem&lt;/strong&gt;:&lt;/p&gt;
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;The moment you have &amp;ldquo;enough tools,&amp;rdquo; tool selection becomes harder than tool execution.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;At small scale, you can stuff tool schemas into the prompt and hope the model chooses correctly. At scale, that approach breaks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;token budgets explode&lt;/li&gt;
&lt;li&gt;accuracy drops (models confuse similar tools)&lt;/li&gt;
&lt;li&gt;latency rises (bigger prompts, more reasoning)&lt;/li&gt;
&lt;li&gt;safety degrades (wrong tool, wrong args, wrong side effects)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This isn&amp;rsquo;t hypothetical. Tool-use research exists because selection is hard. Benchmarks like ToolBench and AgentBench exist specifically to evaluate this capability in interactive settings. [3][6]&lt;/p&gt;
&lt;p&gt;This post is a production-first design for tool discovery that stays:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;fast&lt;/strong&gt; (low latency, bounded prompt size)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;safe&lt;/strong&gt; (tool contracts and policy gates)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;debuggable&lt;/strong&gt; (you can explain why a tool was chosen)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;maintainable&lt;/strong&gt; (tool catalogs evolve constantly)&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Tool discovery is an &lt;strong&gt;IR problem + a policy problem&lt;/strong&gt;, not a prompt trick.&lt;/li&gt;
&lt;li&gt;Use a &lt;strong&gt;3-stage selector&lt;/strong&gt;:&lt;/li&gt;
&lt;/ul&gt;
&lt;ol&gt;
&lt;li&gt;coarse filter (tags / domain / allowlist)&lt;/li&gt;
&lt;li&gt;retrieval (BM25 + embeddings)&lt;/li&gt;
&lt;li&gt;rerank (LLM or learned ranker)&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Treat tool descriptions as a product:&lt;/li&gt;
&lt;li&gt;consistent naming&lt;/li&gt;
&lt;li&gt;sharp &amp;ldquo;when to use&amp;rdquo; / &amp;ldquo;when not to use&amp;rdquo;&lt;/li&gt;
&lt;li&gt;examples of correct arguments&lt;/li&gt;
&lt;li&gt;Add &lt;strong&gt;tool quality scoring&lt;/strong&gt; (latency, error rate, drift, safety incidents).&lt;/li&gt;
&lt;li&gt;Build a tight evaluation harness (ToolBench/StableToolBench ideas apply). [3][4]&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="contents"&gt;Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#why-include-all-tools-fails"&gt;Why &amp;ldquo;include all tools&amp;rdquo; fails&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#the-3-stage-tool-selector"&gt;The 3-stage tool selector&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#tool-metadata-that-makes-models-smarter"&gt;Tool metadata that makes models smarter&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#ranking-bm25--embeddings--rerank"&gt;Ranking: BM25 + embeddings + rerank&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#safety-allowlists-danger-gates-and-budgets"&gt;Safety: allowlists, &amp;ldquo;danger gates,&amp;rdquo; and budgets&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#quality-scoring-and-tool-quarantine"&gt;Quality scoring and tool quarantine&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#debuggability-explainable-tool-selection"&gt;Debuggability: explainable tool selection&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-minimal-reference-architecture"&gt;A minimal reference architecture&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-production-checklist"&gt;A production checklist&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#references"&gt;References&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="why-include-all-tools-fails"&gt;Why &amp;ldquo;include all tools&amp;rdquo; fails&lt;/h2&gt;
&lt;h3 id="token-and-latency-pressure"&gt;Token and latency pressure&lt;/h3&gt;
&lt;p&gt;Even if your tool schemas are &amp;ldquo;small,&amp;rdquo; they add up. Once you cross a few dozen tools, you spend more tokens describing tools than describing the task.&lt;/p&gt;
&lt;h3 id="confusability"&gt;Confusability&lt;/h3&gt;
&lt;p&gt;Tools with similar names or overlapping domains cause selection errors:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;search_events&lt;/code&gt; vs &lt;code&gt;list_events&lt;/code&gt; vs &lt;code&gt;get_event&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;create_task&lt;/code&gt; vs &lt;code&gt;create_issue&lt;/code&gt; vs &lt;code&gt;create_ticket&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="the-long-tail-problem"&gt;The long tail problem&lt;/h3&gt;
&lt;p&gt;Most catalogs have a long tail:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;10 tools get used daily&lt;/li&gt;
&lt;li&gt;100 tools get used weekly&lt;/li&gt;
&lt;li&gt;1,000 tools are niche, but critical when needed&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is exactly the kind of situation information retrieval was invented for.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="the-3-stage-tool-selector"&gt;The 3-stage tool selector&lt;/h2&gt;
&lt;p&gt;Think like a search engine:&lt;/p&gt;
&lt;h3 id="stage-0-policy-filter-mandatory"&gt;Stage 0: Policy filter (mandatory)&lt;/h3&gt;
&lt;p&gt;Before ranking, enforce policy:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;which tools is this client allowed to call?&lt;/li&gt;
&lt;li&gt;which tools are enabled for this tenant/environment?&lt;/li&gt;
&lt;li&gt;which tools are safe for this context (read-only mode, incident mode, etc.)?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;MCP makes tool discovery explicit via listing tools and schemas. That&amp;rsquo;s an interface you can mediate with policy. [1]&lt;/p&gt;
&lt;h3 id="stage-1-coarse-routing-cheap"&gt;Stage 1: Coarse routing (cheap)&lt;/h3&gt;
&lt;p&gt;Route into the right &amp;ldquo;tool neighborhood&amp;rdquo; using:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;tags (&lt;code&gt;kubernetes&lt;/code&gt;, &lt;code&gt;calendar&lt;/code&gt;, &lt;code&gt;email&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;domains (&amp;ldquo;devops&amp;rdquo;, &amp;ldquo;productivity&amp;rdquo;, &amp;ldquo;security&amp;rdquo;)&lt;/li&gt;
&lt;li&gt;environment (&amp;ldquo;prod&amp;rdquo; vs &amp;ldquo;dev&amp;rdquo;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Goal: reduce the candidate set from 10,000 -&amp;gt; 300.&lt;/p&gt;
&lt;h3 id="stage-2-retrieval-bm25--embeddings"&gt;Stage 2: Retrieval (BM25 + embeddings)&lt;/h3&gt;
&lt;p&gt;Run a hybrid search over:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;tool name&lt;/li&gt;
&lt;li&gt;tool description&lt;/li&gt;
&lt;li&gt;parameter names&lt;/li&gt;
&lt;li&gt;example calls&lt;/li&gt;
&lt;li&gt;&amp;ldquo;when not to use&amp;rdquo; hints&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Hybrid search is pragmatic:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;lexical retrieval (BM25-style) is great for exact matches and acronyms [9]&lt;/li&gt;
&lt;li&gt;embeddings are great for semantic similarity [7]&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Goal: 300 -&amp;gt; 30.&lt;/p&gt;
&lt;h3 id="stage-3-rerank-expensive-accurate"&gt;Stage 3: Rerank (expensive, accurate)&lt;/h3&gt;
&lt;p&gt;Rerank the top-K tools using:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;an LLM judge (cheap if K is small)&lt;/li&gt;
&lt;li&gt;or a learned ranker&lt;/li&gt;
&lt;li&gt;or deterministic rules + a smaller LLM tie-breaker&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Goal: 30 -&amp;gt; 5.&lt;/p&gt;
&lt;p&gt;Then the agent sees a small, high-quality tool set.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="tool-metadata-that-makes-models-smarter"&gt;Tool metadata that makes models smarter&lt;/h2&gt;
&lt;p&gt;If you want better tool selection, stop treating tool schemas as &amp;ldquo;just types.&amp;rdquo; Add metadata that improves discrimination.&lt;/p&gt;
&lt;h3 id="tool-card-fields-recommended"&gt;Tool card fields (recommended)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Name&lt;/strong&gt;: stable, verb-first&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Purpose&lt;/strong&gt;: one sentence&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;When to use&lt;/strong&gt;: 2-4 bullets&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;When NOT to use&lt;/strong&gt;: 2-4 bullets (this is underrated)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Side effects&lt;/strong&gt;: none / read-only / creates / updates / deletes&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Required arguments&lt;/strong&gt;: and why they&amp;rsquo;re required&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Examples&lt;/strong&gt;: 2-3 example invocations with realistic args&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Error modes&lt;/strong&gt;: rate limit, auth, not found, validation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This reduces tool confusion dramatically because it gives the model &lt;em&gt;differentiating features&lt;/em&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="ranking-bm25--embeddings--rerank"&gt;Ranking: BM25 + embeddings + rerank&lt;/h2&gt;
&lt;h3 id="lexical-retrieval-bm25"&gt;Lexical retrieval (BM25)&lt;/h3&gt;
&lt;p&gt;BM25 and probabilistic retrieval approaches are foundational in search. [9]&lt;/p&gt;
&lt;p&gt;Practical benefit: it handles queries like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;S3&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;JWT&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;PodDisruptionBudget&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Cron&amp;rdquo;
&amp;hellip;where embeddings can be inconsistent.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="embeddings"&gt;Embeddings&lt;/h3&gt;
&lt;p&gt;Sentence embeddings (like SBERT-style approaches) are designed to enable efficient semantic similarity search. [7]&lt;/p&gt;
&lt;p&gt;Practical benefit: it handles intent queries like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;delete all tasks due tomorrow&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;find calendar conflicts next week&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;check if deployment is stuck&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="approximate-nearest-neighbor-indexing"&gt;Approximate nearest neighbor indexing&lt;/h3&gt;
&lt;p&gt;At scale, you&amp;rsquo;ll want ANN indexing (FAISS is a well-known library in this space). [8]&lt;/p&gt;
&lt;h3 id="rerank"&gt;Rerank&lt;/h3&gt;
&lt;p&gt;This is where you incorporate:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;tool quality score&lt;/li&gt;
&lt;li&gt;tenant policy&lt;/li&gt;
&lt;li&gt;&amp;ldquo;danger tool&amp;rdquo; gating&lt;/li&gt;
&lt;li&gt;recent tool drift&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Reranking is also where you can enforce &amp;ldquo;don&amp;rsquo;t pick write tools unless necessary.&amp;rdquo;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="safety-allowlists-danger-gates-and-budgets"&gt;Safety: allowlists, &amp;ldquo;danger gates,&amp;rdquo; and budgets&lt;/h2&gt;
&lt;p&gt;Tool discovery is not neutral. It&amp;rsquo;s an &lt;em&gt;authorization problem&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Your selector should be policy-aware:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Read-only mode&lt;/strong&gt;: only surface read tools&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No-delete mode&lt;/strong&gt;: deletes never appear&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Prod incident mode&lt;/strong&gt;: allow observation tools, restrict mutation&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Human approval mode&lt;/strong&gt;: show write tools, but require confirmation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Also: build budgets into selection.
If a tool is expensive (slow, rate-limited, high blast radius), rank it lower unless strongly justified.&lt;/p&gt;
&lt;p&gt;For tool-using agents, OWASP highlights prompt injection and excessive agency as key risks - exactly the failure modes you get when tools are over-exposed without gates. [10]&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="quality-scoring-and-tool-quarantine"&gt;Quality scoring and tool quarantine&lt;/h2&gt;
&lt;p&gt;You need a &lt;strong&gt;tool quality score&lt;/strong&gt; because tools drift:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;upstream APIs change&lt;/li&gt;
&lt;li&gt;auth breaks&lt;/li&gt;
&lt;li&gt;quotas shift&lt;/li&gt;
&lt;li&gt;tool server regressions happen&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Track per tool:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;p50 / p95 latency&lt;/li&gt;
&lt;li&gt;error rate&lt;/li&gt;
&lt;li&gt;timeout rate&lt;/li&gt;
&lt;li&gt;&amp;ldquo;invalid argument&amp;rdquo; rate (often a selection problem)&lt;/li&gt;
&lt;li&gt;&amp;ldquo;unsafe attempt&amp;rdquo; rate (policy violations)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Then take action:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;quarantine tools with regression spikes&lt;/li&gt;
&lt;li&gt;degrade to read-only tools during outages&lt;/li&gt;
&lt;li&gt;route to backups (alternate implementations)&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="debuggability-explainable-tool-selection"&gt;Debuggability: explainable tool selection&lt;/h2&gt;
&lt;p&gt;If you can&amp;rsquo;t answer &lt;strong&gt;&amp;ldquo;why did the agent pick that tool?&amp;rdquo;&lt;/strong&gt;, you won&amp;rsquo;t be able to operate the system.&lt;/p&gt;
&lt;p&gt;Log (or attach to traces) the selection evidence:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;query text&lt;/li&gt;
&lt;li&gt;candidate tools (top 30)&lt;/li&gt;
&lt;li&gt;retrieval scores&lt;/li&gt;
&lt;li&gt;rerank scores&lt;/li&gt;
&lt;li&gt;policy filters applied&lt;/li&gt;
&lt;li&gt;final selected tools and why&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This also becomes training data later.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="a-minimal-reference-architecture"&gt;A minimal reference architecture&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;-------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- Agent runtime (planner) -
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;-------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; v
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;-------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- Tool Selector Service -
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- - policy filter -
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- - hybrid retrieval -
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- - rerank -
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- - tool quality weighting -
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;-------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; - returns top-K tools + schemas
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; v
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;-------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- Agent execution -
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- - calls tools via MCP -
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;-------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Where MCP fits: MCP provides a standardized way for clients to discover tools and invoke them. [1]&lt;/p&gt;
&lt;p&gt;The selector doesn&amp;rsquo;t replace MCP. It makes MCP usable at scale.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="a-production-checklist"&gt;A production checklist&lt;/h2&gt;
&lt;h3 id="tool-catalog-hygiene"&gt;Tool catalog hygiene&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Stable naming conventions.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; &amp;ldquo;When NOT to use&amp;rdquo; bullets exist.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Examples exist for the top tools.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Tool side effects are classified.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="selection-pipeline"&gt;Selection pipeline&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Mandatory policy filter before ranking.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Hybrid retrieval (lexical + embeddings). [7][9]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Rerank top-K with quality + policy.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Candidate set bounded (K is small).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="safety"&gt;Safety&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Dangerous tools are gated and not surfaced by default.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Budget-aware ranking exists.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; OWASP LLM risks considered in tool exposure strategy. [10]&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="operability"&gt;Operability&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Selection decisions are explainable (log evidence).&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Tool quality scoring exists and drives quarantine.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Selection regressions are covered by evals (next article).&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;p&gt;[1] Model Context Protocol (MCP) - Specification (Protocol Revision 2025-11-25): &lt;a href="https://modelcontextprotocol.io/specification/2025-11-25" target="_blank" rel="noopener noreferrer"&gt;https://modelcontextprotocol.io/specification/2025-11-25&lt;/a&gt;
[2] MCP - Transports (including stdio and Streamable HTTP): &lt;a href="https://modelcontextprotocol.io/specification/2025-03-26/basic/transports" target="_blank" rel="noopener noreferrer"&gt;https://modelcontextprotocol.io/specification/2025-03-26/basic/transports&lt;/a&gt;
[3] ToolLLM / ToolBench (tool-use dataset + evaluation): &lt;a href="https://arxiv.org/abs/2307.16789" target="_blank" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2307.16789&lt;/a&gt;
[4] StableToolBench (stable tool-use benchmarking): &lt;a href="https://arxiv.org/abs/2403.07714" target="_blank" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2403.07714&lt;/a&gt;
[5] tau-bench (tool-agent-user interaction benchmark): &lt;a href="https://arxiv.org/abs/2406.12045" target="_blank" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2406.12045&lt;/a&gt;
[6] AgentBench (evaluating LLMs as agents): &lt;a href="https://arxiv.org/abs/2308.03688" target="_blank" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2308.03688&lt;/a&gt;
[7] Sentence-BERT (efficient semantic similarity search via embeddings): &lt;a href="https://arxiv.org/abs/1908.10084" target="_blank" rel="noopener noreferrer"&gt;https://arxiv.org/abs/1908.10084&lt;/a&gt;
[8] FAISS / Billion-scale similarity search with GPUs: &lt;a href="https://arxiv.org/abs/1702.08734" target="_blank" rel="noopener noreferrer"&gt;https://arxiv.org/abs/1702.08734&lt;/a&gt;
and &lt;a href="https://github.com/facebookresearch/faiss" target="_blank" rel="noopener noreferrer"&gt;https://github.com/facebookresearch/faiss&lt;/a&gt;
[9] Robertson (BM25 and probabilistic relevance framework): &lt;a href="https://dl.acm.org/doi/abs/10.1561/1500000019" target="_blank" rel="noopener noreferrer"&gt;https://dl.acm.org/doi/abs/10.1561/1500000019&lt;/a&gt;
[10] OWASP - Top 10 for Large Language Model Applications: &lt;a href="https://owasp.org/www-project-top-10-for-large-language-model-applications/" target="_blank" rel="noopener noreferrer"&gt;https://owasp.org/www-project-top-10-for-large-language-model-applications/&lt;/a&gt;
&lt;/p&gt;</content:encoded></item><item><title>The Service Template That Prevents Incidents</title><link>https://roygabriel.dev/blog/paved-road-service-template/</link><pubDate>Sat, 25 Oct 2025 12:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/paved-road-service-template/</guid><description>&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;Most enterprises try to standardize software delivery with:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PDFs&lt;/li&gt;
&lt;li&gt;Confluence pages&lt;/li&gt;
&lt;li&gt;slide decks&lt;/li&gt;
&lt;li&gt;architecture review boards&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It doesn&amp;rsquo;t scale.&lt;/p&gt;
&lt;p&gt;Teams don&amp;rsquo;t move faster because the &lt;em&gt;rules&lt;/em&gt; exist. Teams move faster because the &lt;strong&gt;defaults&lt;/strong&gt; exist.&lt;/p&gt;
&lt;p&gt;Platform engineering language captures this well: paved roads / golden paths reduce cognitive load and make the &amp;ldquo;right way&amp;rdquo; the easy way. [1][2]
The CNCF Platforms White Paper makes the case for internal platforms as a lever that impacts value streams indirectly - through better flow and developer experience. [3]&lt;/p&gt;</description><content:encoded>&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;Most enterprises try to standardize software delivery with:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PDFs&lt;/li&gt;
&lt;li&gt;Confluence pages&lt;/li&gt;
&lt;li&gt;slide decks&lt;/li&gt;
&lt;li&gt;architecture review boards&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It doesn&amp;rsquo;t scale.&lt;/p&gt;
&lt;p&gt;Teams don&amp;rsquo;t move faster because the &lt;em&gt;rules&lt;/em&gt; exist. Teams move faster because the &lt;strong&gt;defaults&lt;/strong&gt; exist.&lt;/p&gt;
&lt;p&gt;Platform engineering language captures this well: paved roads / golden paths reduce cognitive load and make the &amp;ldquo;right way&amp;rdquo; the easy way. [1][2]
The CNCF Platforms White Paper makes the case for internal platforms as a lever that impacts value streams indirectly - through better flow and developer experience. [3]&lt;/p&gt;
&lt;p&gt;This article is a practical blueprint for the thing that actually changes outcomes:&lt;/p&gt;
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;A service template that bakes reliability, security, and operability into day-one defaults.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Build one paved road for APIs:&lt;/li&gt;
&lt;li&gt;repo template + CI pipeline + runtime defaults&lt;/li&gt;
&lt;li&gt;Include &amp;ldquo;boring&amp;rdquo; but critical capabilities:&lt;/li&gt;
&lt;li&gt;health probes, resource requests/limits, disruption budgets [4][5][6]&lt;/li&gt;
&lt;li&gt;tracing/metrics/logging via OpenTelemetry [7]&lt;/li&gt;
&lt;li&gt;timeouts, retries, rate limits&lt;/li&gt;
&lt;li&gt;standardized deployment and rollout&lt;/li&gt;
&lt;li&gt;Measure success with outcomes (DORA metrics): lead time, deploy frequency, change failure rate, MTTR. [8]&lt;/li&gt;
&lt;li&gt;Optimize for day 2 to day 50, not just &amp;ldquo;hello world.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="contents"&gt;Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#what-a-paved-road-is-and-isnt"&gt;What a paved road is (and isn&amp;rsquo;t)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#the-api-service-template-required-capabilities"&gt;The API service template: required capabilities&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-reference-repository-structure"&gt;A reference repository structure&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#kubernetes-defaults-that-save-you-later"&gt;Kubernetes defaults that save you later&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#observability-by-default"&gt;Observability by default&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#security-by-default"&gt;Security by default&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#rollouts-and-operational-controls"&gt;Rollouts and operational controls&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#how-to-roll-this-out-without-a-platform-revolt"&gt;How to roll this out without a platform revolt&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-production-checklist"&gt;A production checklist&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#references"&gt;References&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="what-a-paved-road-is-and-isnt"&gt;What a paved road is (and isn&amp;rsquo;t)&lt;/h2&gt;
&lt;h3 id="a-paved-road-is"&gt;A paved road is&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;a &lt;strong&gt;recommended&lt;/strong&gt; path to production&lt;/li&gt;
&lt;li&gt;preconfigured defaults that make safe delivery easy&lt;/li&gt;
&lt;li&gt;automation that eliminates repetitive decisions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Microsoft describes this in internal developer platform terms: recommended and supported development paths, incrementally paved through an internal platform. [2]&lt;/p&gt;
&lt;h3 id="a-paved-road-is-not"&gt;A paved road is not&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;a mandate that blocks all other approaches&lt;/li&gt;
&lt;li&gt;a committee process&lt;/li&gt;
&lt;li&gt;a doc nobody reads&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your paved road becomes a gate, teams will route around it.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="the-api-service-template-required-capabilities"&gt;The API service template: required capabilities&lt;/h2&gt;
&lt;p&gt;Here&amp;rsquo;s what &amp;ldquo;enterprise production API&amp;rdquo; should mean out of the box.&lt;/p&gt;
&lt;h3 id="operability"&gt;Operability&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;structured logging with correlation IDs&lt;/li&gt;
&lt;li&gt;metrics (request rate/latency/errors)&lt;/li&gt;
&lt;li&gt;tracing across inbound/outbound calls [7]&lt;/li&gt;
&lt;li&gt;runtime config and feature flags&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="reliability"&gt;Reliability&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;timeouts everywhere&lt;/li&gt;
&lt;li&gt;bounded retries with backoff&lt;/li&gt;
&lt;li&gt;health probes (liveness/readiness/startup) [5]&lt;/li&gt;
&lt;li&gt;graceful shutdown&lt;/li&gt;
&lt;li&gt;rate limits / concurrency caps&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="platform-fit"&gt;Platform fit&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Kubernetes-ready manifests&lt;/li&gt;
&lt;li&gt;resource requests/limits [4]&lt;/li&gt;
&lt;li&gt;PodDisruptionBudget for availability during maintenance [6]&lt;/li&gt;
&lt;li&gt;standardized rollout strategy&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="security"&gt;Security&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;auth middleware&lt;/li&gt;
&lt;li&gt;input validation&lt;/li&gt;
&lt;li&gt;secret injection patterns (no secrets in repo)&lt;/li&gt;
&lt;li&gt;least privilege service accounts&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="delivery"&gt;Delivery&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;CI pipeline: lint/test/build/scan&lt;/li&gt;
&lt;li&gt;SBOM generation&lt;/li&gt;
&lt;li&gt;deploy automation (GitOps or pipeline)&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="a-reference-repository-structure"&gt;A reference repository structure&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;--- cmd/service/ # main
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;--- internal/ # business logic
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;--- pkg/ # shared libs (optional)
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;--- api/ # OpenAPI spec, schemas
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;--- deploy/
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- --- k8s/ # manifests (or Helm/Kustomize)
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- --- policy/ # OPA/constraints (optional)
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;--- docs/
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- --- index.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- --- runbooks/
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;--- Makefile
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;--- .github/workflows/ # CI
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Key idea: the template is not just code - it is the full production story:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;how to run locally&lt;/li&gt;
&lt;li&gt;how to deploy&lt;/li&gt;
&lt;li&gt;how to observe&lt;/li&gt;
&lt;li&gt;how to operate on-call&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="kubernetes-defaults-that-save-you-later"&gt;Kubernetes defaults that save you later&lt;/h2&gt;
&lt;h3 id="1-resource-requests-and-limits"&gt;1) Resource requests and limits&lt;/h3&gt;
&lt;p&gt;Kubernetes scheduling and stability depend on requests/limits. The official docs explain how pod requests/limits are derived from container values. [4]&lt;/p&gt;
&lt;p&gt;Template default:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;set conservative requests&lt;/li&gt;
&lt;li&gt;set safe limits&lt;/li&gt;
&lt;li&gt;provide guidance for right-sizing&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="2-probes"&gt;2) Probes&lt;/h3&gt;
&lt;p&gt;Kubernetes supports liveness, readiness, and startup probes. The docs describe how to configure them and why they matter. [5]&lt;/p&gt;
&lt;p&gt;Template default:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;readinessProbe&lt;/code&gt; ensures traffic only goes to ready pods&lt;/li&gt;
&lt;li&gt;&lt;code&gt;livenessProbe&lt;/code&gt; catches deadlocks / stuck processes&lt;/li&gt;
&lt;li&gt;&lt;code&gt;startupProbe&lt;/code&gt; prevents early restarts for slow boot services&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="3-disruption-budgets"&gt;3) Disruption budgets&lt;/h3&gt;
&lt;p&gt;PodDisruptionBudgets limit concurrent disruptions during voluntary maintenance. [6]&lt;/p&gt;
&lt;p&gt;Template default:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;include a PDB for replicated services&lt;/li&gt;
&lt;li&gt;define min available or max unavailable&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="observability-by-default"&gt;Observability by default&lt;/h2&gt;
&lt;p&gt;If you do one thing: instrument the template so every service ships with telemetry.&lt;/p&gt;
&lt;p&gt;OpenTelemetry provides the framework for standard traces/metrics/logs. [7]&lt;/p&gt;
&lt;p&gt;Template defaults:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;standard HTTP server instrumentation&lt;/li&gt;
&lt;li&gt;propagation of trace context (W3C headers)&lt;/li&gt;
&lt;li&gt;request logs include trace IDs&lt;/li&gt;
&lt;li&gt;golden dashboard:&lt;/li&gt;
&lt;li&gt;RPS&lt;/li&gt;
&lt;li&gt;p95 latency&lt;/li&gt;
&lt;li&gt;error rate&lt;/li&gt;
&lt;li&gt;saturation (CPU/memory)&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="security-by-default"&gt;Security by default&lt;/h2&gt;
&lt;p&gt;Avoid &amp;ldquo;security guidance documents.&amp;rdquo; Make secure defaults.&lt;/p&gt;
&lt;p&gt;Template defaults:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;auth middleware with standardized claims/roles mapping&lt;/li&gt;
&lt;li&gt;structured validation for request bodies&lt;/li&gt;
&lt;li&gt;outbound allowlists (where feasible)&lt;/li&gt;
&lt;li&gt;secret injection via environment/secret store (no plain text)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Your paved road becomes a security accelerator because teams start secure.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="rollouts-and-operational-controls"&gt;Rollouts and operational controls&lt;/h2&gt;
&lt;p&gt;Default rollout patterns:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;canary or progressive delivery when needed&lt;/li&gt;
&lt;li&gt;safe rollback&lt;/li&gt;
&lt;li&gt;feature flags for risky changes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Default operational controls:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;rate limiting&lt;/li&gt;
&lt;li&gt;concurrency limits&lt;/li&gt;
&lt;li&gt;timeouts and circuit breakers&lt;/li&gt;
&lt;li&gt;&amp;ldquo;maintenance mode&amp;rdquo; toggle&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="how-to-roll-this-out-without-a-platform-revolt"&gt;How to roll this out without a platform revolt&lt;/h2&gt;
&lt;p&gt;This is the part platform teams often miss.&lt;/p&gt;
&lt;h3 id="1-make-it-optional---but-obviously-better"&gt;1) Make it optional - but obviously better&lt;/h3&gt;
&lt;p&gt;If adopting the template reduces weeks of work to hours, teams will choose it.&lt;/p&gt;
&lt;h3 id="2-provide-migration-paths"&gt;2) Provide migration paths&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;minimal adoption: observability + probes&lt;/li&gt;
&lt;li&gt;medium: deploy manifests + CI&lt;/li&gt;
&lt;li&gt;full: service template + libraries&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="3-measure-outcomes-not-adoption"&gt;3) Measure outcomes, not adoption&lt;/h3&gt;
&lt;p&gt;Use DORA metrics to show impact: lead time, deploy frequency, change failure rate, time to restore service. [8]&lt;/p&gt;
&lt;p&gt;If the paved road doesn&amp;rsquo;t move these, it&amp;rsquo;s not paved.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="a-production-checklist"&gt;A production checklist&lt;/h2&gt;
&lt;h3 id="template"&gt;Template&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Repo template includes CI, deploy, docs, runbooks.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Observability instrumentation included by default. [7]&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="kubernetes"&gt;Kubernetes&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Resource requests/limits included. [4]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Liveness/readiness/startup probes included. [5]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; PodDisruptionBudget included for replicated services. [6]&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="reliability-1"&gt;Reliability&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Timeouts and bounded retries are standard.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Graceful shutdown is implemented.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Rate limiting/concurrency caps exist.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="security-1"&gt;Security&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Auth middleware included.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Secrets handled via secure injection (not repo).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="outcomes"&gt;Outcomes&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; DORA metrics tracked to validate improvement. [8]&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;p&gt;[1] CNCF - What is platform engineering? (golden paths/paved roads framing): &lt;a href="https://www.cncf.io/blog/2025/11/19/what-is-platform-engineering/" target="_blank" rel="noopener noreferrer"&gt;https://www.cncf.io/blog/2025/11/19/what-is-platform-engineering/&lt;/a&gt;
[2] Microsoft Learn - What is platform engineering? (paved paths / internal developer platform): &lt;a href="https://learn.microsoft.com/en-us/platform-engineering/what-is-platform-engineering" target="_blank" rel="noopener noreferrer"&gt;https://learn.microsoft.com/en-us/platform-engineering/what-is-platform-engineering&lt;/a&gt;
[3] CNCF TAG App Delivery - Platforms White Paper: &lt;a href="https://tag-app-delivery.cncf.io/whitepapers/platforms/" target="_blank" rel="noopener noreferrer"&gt;https://tag-app-delivery.cncf.io/whitepapers/platforms/&lt;/a&gt;
[4] Kubernetes - Resource Management for Pods and Containers (requests/limits): &lt;a href="https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/" target="_blank" rel="noopener noreferrer"&gt;https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/&lt;/a&gt;
[5] Kubernetes - Configure Liveness, Readiness and Startup Probes: &lt;a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/" target="_blank" rel="noopener noreferrer"&gt;https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/&lt;/a&gt;
[6] Kubernetes - Specifying a Disruption Budget for your Application (PDB): &lt;a href="https://kubernetes.io/docs/tasks/run-application/configure-pdb/" target="_blank" rel="noopener noreferrer"&gt;https://kubernetes.io/docs/tasks/run-application/configure-pdb/&lt;/a&gt;
[7] OpenTelemetry - Documentation (instrumentation and telemetry): &lt;a href="https://opentelemetry.io/docs/" target="_blank" rel="noopener noreferrer"&gt;https://opentelemetry.io/docs/&lt;/a&gt;
[8] DORA - DORA&amp;rsquo;s software delivery performance metrics: &lt;a href="https://dora.dev/guides/dora-metrics/" target="_blank" rel="noopener noreferrer"&gt;https://dora.dev/guides/dora-metrics/&lt;/a&gt;
&lt;/p&gt;</content:encoded></item><item><title>Go MCP Server Ecosystem</title><link>https://roygabriel.dev/projects/mcp-servers/</link><pubDate>Sun, 01 Sep 2024 00:00:00 +0000</pubDate><guid>https://roygabriel.dev/projects/mcp-servers/</guid><description>Production-grade MCP servers in Go that expose iCloud, Todoist, and Notion as safe, typed tools for LLM agents.</description><content:encoded>&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;This project is a growing ecosystem of &lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt; servers written in &lt;strong&gt;Go&lt;/strong&gt;. Each server wraps a real service (calendar, email, task management, knowledge base, etc.) and exposes it as a &lt;strong&gt;typed, tool-based interface&lt;/strong&gt; for MCP clients (e.g., Claude Desktop / Claude Code). [1][2]&lt;/p&gt;
&lt;p&gt;The theme is simple: &lt;strong&gt;agents are only as useful as the tools they can call&lt;/strong&gt;, and &amp;ldquo;tooling&amp;rdquo; needs the same production bar as any other integration layer: security boundaries, backpressure, observability, and predictable failure modes.&lt;/p&gt;
&lt;h3 id="open-source-mcp-servers"&gt;Open-source MCP servers&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;iCloud Calendar MCP Server (CalDAV):&lt;/strong&gt; list calendars, search events, create/update/delete events; includes recurring event expansion, multi-account support, rate limiting, retries, audit logs, and Prometheus/health endpoints. [4]&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;iCloud Email MCP Server (IMAP/SMTP):&lt;/strong&gt; search and read mail, send/reply, manage folders, handle attachments, and apply safety annotations (read-only vs destructive) with strict input validation. [5]&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Todoist MCP Server (REST API v2 + Sync batching):&lt;/strong&gt; manage tasks/projects/labels/comments; supports bulk operations with rate-limit-aware batching and Todoist filter syntax. [6]&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Notion MCP Server (Notion REST API):&lt;/strong&gt; pages, databases, blocks, comments, users; includes templates, exports (Markdown/CSV), smart queries, and built-in throttling/retries. [7]&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="private--not-yet-open-sourced-connectors"&gt;Private / not-yet-open-sourced connectors&lt;/h3&gt;
&lt;p&gt;I&amp;rsquo;ve also built MCP connectors for enterprise systems that aren&amp;rsquo;t ready to open-source yet (either due to org-specific assumptions, credentials, or hard-coded domain models):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Kubernetes&lt;/li&gt;
&lt;li&gt;Argo CD&lt;/li&gt;
&lt;li&gt;SonarQube&lt;/li&gt;
&lt;li&gt;GitHub&lt;/li&gt;
&lt;li&gt;Temporal&lt;/li&gt;
&lt;li&gt;OpenText Octane&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;(These follow the same design patterns described below.)&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="problem"&gt;Problem&lt;/h2&gt;
&lt;p&gt;Agents need to interact with real systems: calendars, email, task systems, and internal developer platforms. Without a standard interface, every tool integration becomes a one-off, and reliability/guardrails drift between projects.&lt;/p&gt;
&lt;p&gt;MCP solves the &amp;ldquo;standard interface&amp;rdquo; problem by defining how a host/client can discover and call server-exposed tools over a consistent protocol. [1][2]
This ecosystem focuses on solving the remaining hard part: &lt;strong&gt;making those integrations production-grade&lt;/strong&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="constraints"&gt;Constraints&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Local-first security boundary&lt;/strong&gt;: credentials live on the host where the server runs (env vars, secret mounts, keychain tooling); the server talks directly to the upstream service with no proxy SaaS. [4][5]&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Safety by design&lt;/strong&gt;: explicit tool schemas, input validation, and tool classification (read-only vs mutating) so clients can apply guardrails. [4][5]&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fast &amp;amp; predictable&lt;/strong&gt;: low startup time and bounded tool-call latency (timeouts + backpressure). [4][5][7]&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Operable like a real service&lt;/strong&gt;: logs that correlate per request, rate limiting, retries/backoff where appropriate, and health/metrics where it matters. [4][6][7]&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Portable distribution&lt;/strong&gt;: ship as single Go binaries (and containers where useful), so the &amp;ldquo;tool layer&amp;rdquo; is easy to deploy alongside agents. [4][5]&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="architecture"&gt;Architecture&lt;/h2&gt;
&lt;p&gt;At a high level, every server follows the same pattern:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;MCP client (hosted by Claude / an agent runtime)&lt;/strong&gt; communicates with the server (typically over stdio transport).&lt;/li&gt;
&lt;li&gt;The server validates inputs, applies middleware (timeouts, logging, rate limits), and calls the upstream API/protocol.&lt;/li&gt;
&lt;li&gt;Results are mapped into safe, typed tool outputs (and errors are normalized for the client).&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;┌───────────────────────────┐ MCP (tools) ┌────────────────────────────┐
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;│ Claude Desktop / Code │ ───────────────────────────▶ │ mcp-&amp;lt;service&amp;gt; (Go binary) │
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;│ (MCP host + client) │ ◀─────────────────────────── │ - tool schemas + handlers │
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;└───────────────────────────┘ JSON-RPC/session │ - auth + validation │
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; │ - rate limit + retries │
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; └───────────┬────────────────┘
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; │
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; │ service protocol / API
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; ▼
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; ┌──────────────────────────────┐
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; │ iCloud / Todoist / Notion ... │
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; └──────────────────────────────┘
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="cross-cutting-production-traits"&gt;Cross-cutting &amp;ldquo;production traits&amp;rdquo;&lt;/h3&gt;
&lt;p&gt;Instead of building one-off scripts, these servers implement common production patterns:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Timeout middleware&lt;/strong&gt; on every tool call (so agents don&amp;rsquo;t hang forever). [4][5][7]&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Request correlation IDs&lt;/strong&gt; and structured logs (debuggable across multi-step agent runs). [4][5]&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rate limiting + backoff&lt;/strong&gt; when upstream services throttle (e.g., iCloud and Notion). [4][7]&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Bulk operation strategies&lt;/strong&gt; that reduce API calls (e.g., Todoist Sync API batching for bulk changes). [6]&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Health + metrics endpoints&lt;/strong&gt; where running in containers makes sense (notably the iCloud Calendar server). [4]&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Automated CI&lt;/strong&gt; (race detector, linting, vulnerability checks) to keep &amp;ldquo;tool servers&amp;rdquo; from becoming unreviewed glue. [4][5]&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="key-decisions"&gt;Key decisions&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Go for tool servers&lt;/strong&gt;: predictable concurrency, easy cross-platform builds, and the &amp;ldquo;single static-ish binary&amp;rdquo; deployment model fits MCP servers well, especially when they&amp;rsquo;re launched per-session or run as small sidecars. [4][5]&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Independent binaries per integration&lt;/strong&gt;: calendar ≠ email ≠ tasks. Separate processes isolate failures, limit blast radius, and make upgrades/rollbacks straightforward.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Local-first auth&lt;/strong&gt;: app-specific passwords (iCloud), API tokens (Todoist), integration tokens (Notion). The servers are designed so secrets stay on your machine / in your cluster secrets manager, not copied into prompts. [4][5][6][7]&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Use an MCP SDK, focus on semantics&lt;/strong&gt;: the implementations use the Go MCP SDK (&lt;code&gt;mark3labs/mcp-go&lt;/code&gt;) so most effort goes into tool behavior, validation, and safety. [8][9]&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="outcome"&gt;Outcome&lt;/h2&gt;
&lt;p&gt;This ecosystem has produced multiple MCP servers that are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;useful&lt;/strong&gt; (real workflows: schedule management, inbox operations, task execution, knowledge base automation),&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;operationally hardened&lt;/strong&gt; (timeouts, retries, rate limits, observability),&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;portable&lt;/strong&gt; (binaries + releases for easy distribution),&lt;/li&gt;
&lt;li&gt;and &lt;strong&gt;structured enough to be safe&lt;/strong&gt; (typed schemas, validation, tool annotations).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Concrete examples from the current repos:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;iCloud Calendar server&lt;/strong&gt; exposes &lt;strong&gt;5 tools&lt;/strong&gt;, supports &lt;strong&gt;multi-account&lt;/strong&gt;, and includes &lt;strong&gt;health + Prometheus metrics&lt;/strong&gt;, &lt;strong&gt;audit logging without PII&lt;/strong&gt;, retries/backoff, and rate limiting. [4]&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;iCloud Email server&lt;/strong&gt; exposes &lt;strong&gt;14 tools&lt;/strong&gt; and includes &lt;strong&gt;thread-safe IMAP access&lt;/strong&gt;, request correlation IDs, strict validation, and &amp;ldquo;read-only vs destructive&amp;rdquo; tool annotations. [5]&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tagged releases&lt;/strong&gt; exist across the servers (e.g., iCloud Calendar &lt;code&gt;v1.1.0&lt;/code&gt;, iCloud Email &lt;code&gt;v0.6.0&lt;/code&gt;, Todoist &lt;code&gt;v1.0.0&lt;/code&gt;, Notion &lt;code&gt;v0.8.0&lt;/code&gt; published on Feb 7, 2026). [4][5][6][7]&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="stack"&gt;Stack&lt;/h2&gt;
&lt;p&gt;Go, MCP, &lt;code&gt;mark3labs/mcp-go&lt;/code&gt;, CalDAV, IMAP/SMTP, REST APIs (Todoist/Notion), Docker (distroless where applicable), Prometheus metrics (where applicable).&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;p&gt;[1] Model Context Protocol (MCP): Specification (Protocol Revision 2025-11-25). &lt;a href="https://modelcontextprotocol.io/specification/2025-11-25" target="_blank" rel="noopener noreferrer"&gt;https://modelcontextprotocol.io/specification/2025-11-25&lt;/a&gt;
[2] Model Context Protocol (MCP): Architecture (Protocol Revision 2025-06-18). &lt;a href="https://modelcontextprotocol.io/specification/2025-06-18/architecture" target="_blank" rel="noopener noreferrer"&gt;https://modelcontextprotocol.io/specification/2025-06-18/architecture&lt;/a&gt;
[3] Roy Gabriel: &amp;ldquo;Go MCP Server Ecosystem&amp;rdquo; (original portfolio page). &lt;a href="https://www.roygabriel.dev/projects/mcp-servers/" target="_blank" rel="noopener noreferrer"&gt;https://www.roygabriel.dev/projects/mcp-servers/&lt;/a&gt;
[4] GitHub: roygabriel/mcp-icloud-calendar. &lt;a href="https://github.com/roygabriel/mcp-icloud-calendar" target="_blank" rel="noopener noreferrer"&gt;https://github.com/roygabriel/mcp-icloud-calendar&lt;/a&gt;
[5] GitHub: roygabriel/mcp-icloud-email. &lt;a href="https://github.com/roygabriel/mcp-icloud-email" target="_blank" rel="noopener noreferrer"&gt;https://github.com/roygabriel/mcp-icloud-email&lt;/a&gt;
[6] GitHub: roygabriel/mcp-todoist. &lt;a href="https://github.com/roygabriel/mcp-todoist" target="_blank" rel="noopener noreferrer"&gt;https://github.com/roygabriel/mcp-todoist&lt;/a&gt;
[7] GitHub: roygabriel/mcp-notion. &lt;a href="https://github.com/roygabriel/mcp-notion" target="_blank" rel="noopener noreferrer"&gt;https://github.com/roygabriel/mcp-notion&lt;/a&gt;
[8] GitHub: mark3labs/mcp-go. &lt;a href="https://github.com/mark3labs/mcp-go" target="_blank" rel="noopener noreferrer"&gt;https://github.com/mark3labs/mcp-go&lt;/a&gt;
[9] go.mod (module dependencies) for the MCP servers (e.g., &lt;code&gt;mark3labs/mcp-go&lt;/code&gt; used in this ecosystem).
- &lt;a href="https://raw.githubusercontent.com/roygabriel/mcp-icloud-calendar/main/go.mod" target="_blank" rel="noopener noreferrer"&gt;https://raw.githubusercontent.com/roygabriel/mcp-icloud-calendar/main/go.mod&lt;/a&gt;
- &lt;a href="https://raw.githubusercontent.com/roygabriel/mcp-icloud-email/main/go.mod" target="_blank" rel="noopener noreferrer"&gt;https://raw.githubusercontent.com/roygabriel/mcp-icloud-email/main/go.mod&lt;/a&gt;
- &lt;a href="https://raw.githubusercontent.com/roygabriel/mcp-todoist/main/go.mod" target="_blank" rel="noopener noreferrer"&gt;https://raw.githubusercontent.com/roygabriel/mcp-todoist/main/go.mod&lt;/a&gt;
- &lt;a href="https://raw.githubusercontent.com/roygabriel/mcp-notion/main/go.mod" target="_blank" rel="noopener noreferrer"&gt;https://raw.githubusercontent.com/roygabriel/mcp-notion/main/go.mod&lt;/a&gt;
&lt;/p&gt;</content:encoded></item></channel></rss>