SRE | Roy Gabriel

Go vs Spring Boot for Enterprise APIs: Cost, Performance, and Cloud-Native Ops

Sun, 01 Feb 2026 10:00:00 -0500

As-of note: This is a production engineering perspective, not a benchmark scoreboard. If you care about cost or p99 latency, measure your service with your dependencies and your deployment constraints.

Why this comparison keeps showing up

If you build enterprise APIs long enough, you’ll see the same pattern:

The “language choice” isn’t what breaks production.
The runtime envelope and operational model usually are.

When teams compare Go and Java Spring Boot, they’re often asking a more specific question:

“What will it cost to run this API at scale, and how predictable is it under real production conditions?”

Spring Boot’s value proposition is speed-to-service: stand-alone, production-grade Spring applications you can “just run,” with strong ecosystem defaults and integration breadth. [1]

Go’s value proposition is operational simplicity: compile to an executable, ship a small container, run with fewer moving pieces, and keep latency and resource usage easier to reason about. go build compiles packages into an executable. [5]

This article is about the production-relevant tradeoffs: cost/resource usage, performance under load, cloud-native deployability, and the “you will be on call for this” realities.

On code quality: This isn’t “Go good / Java bad.” It’s an observation about failure modes: framework-heavy stacks can hide complexity until it shows up in startup time, memory, and surprises under load. Go’s bias toward explicitness often makes problems easier to see and cheaper to operate, even before the codebase is perfect.

TL;DR

If your org is already Spring-heavy, Spring Boot can be the fastest path to a robust API, especially when you need Spring’s ecosystem (security, data, integrations). [1]
If you run many small services, care about density, or need fast scale-to-zero/scale-from-zero behavior, Go often has an operational edge due to simpler packaging and typically lower baseline resource footprint.
Kubernetes costs are strongly influenced by requests/limits and scheduling density, so baseline memory is often a bigger lever than micro-optimizing CPU. [7][8]
Both ecosystems support hardened container builds (including distroless) to reduce attack surface. [9][10]
Observability is excellent in both; Java has very mature zero-code instrumentation via the OpenTelemetry Java agent. [13][14] Go has strong SDK support and growing options for auto-instrumentation. [11]
“Best” depends on your constraints. The best move is to benchmark your service envelope and compare p95/p99 latency, RSS, startup, and error rates under load.

The cost model: what you actually pay for

In cloud and Kubernetes environments, cost is strongly driven by:

How many replicas you need
How much CPU/memory you request per replica
How quickly you can scale (up and down)
How much time you spend operating the service

Kubernetes scheduling and resource guarantees are based on requests and limits. Requests influence where Pods can be scheduled; limits cap what they can consume. [7][8]

That means your “baseline footprint” matters:

A service that requests 512Mi RAM even when idle reduces node density.
A service that requests 128Mi RAM allows more Pods per node.

A simple (illustrative) density example

Assume you run 100 replicas of an API, and memory is your limiting resource:

Case A: 100 × 512Mi = 51,200Mi ≈ 50Gi reserved
Case B: 100 × 128Mi = 12,800Mi ≈ 12.5Gi reserved

That’s a ~37.5Gi delta in reserved memory before you count overhead (sidecars, DaemonSets, kube-system). This is not “Go vs Java math.” It’s “baseline footprint sets cluster size.”

The point: cost discussions are often memory-and-startup discussions wearing a language-comparison mask.

Go’s production advantages (when they matter)

1) Packaging simplicity and deployment surface

Go’s toolchain compiles code into an executable (go build). [5] Go’s modern toolchain approach (including toolchain selection starting in recent Go releases) helps keep builds reproducible across environments. [6]

In practice, Go services often ship as:

a single process
a single container layer containing a single binary
minimal runtime dependencies

That tends to reduce:

container image complexity
“works on my machine” drift
runtime patch surface area

This matters most when you operate many services and want upgrades to be boring.

2) Fast start and “scale events”

In real systems, performance isn’t only request/response speed, it’s also how the service behaves during:

deployments
autoscaling
node drains
crashes

Go services commonly start quickly because they don’t require JVM warmup/classloading/JIT compilation. (Exact numbers vary; measure your service.)

Spring Boot can start fast enough for most use cases, but cold starts can become a visible factor when:

you scale from zero frequently (serverless-like patterns)
you do aggressive HPA scaling
you run lots of short-lived jobs

Spring Boot also supports building native images with GraalVM, which can materially improve startup and memory in some cases, but introduces different tradeoffs (build time, reflection limits, operational differences). [3][4]

3) Resource envelope predictability

For many “API gateway / orchestration / integration” services, CPU isn’t the bottleneck. Latency, network, and downstream behavior are.

Go’s strengths here tend to be:

predictable concurrency behavior
straightforward backpressure patterns (bounded queues, semaphores)
fewer runtime tuning knobs compared to JVM-heavy stacks

This is not “Go always uses less RAM.” It’s “Go often gives you a tighter baseline envelope for simpler services, which improves scheduling density.”

4) Cloud-native ergonomics: minimalism wins over time

Enterprise services accrete complexity over years. The less your runtime depends on:

classpath complexity
reflection-driven magic
extensive framework graphs

…the easier it is to keep production surprises rare.

Go’s bias toward explicit wiring tends to help with long-term operability, especially in platform/API layers where consistency matters.

Where Spring Boot is still the right tool

Spring Boot exists for a reason, and in many enterprises it’s still the correct default:

1) Ecosystem and “starter” leverage

Spring Boot’s opinionated defaults and starter ecosystem are an enormous accelerator for:

auth (OAuth2/OIDC)
data access and ORM patterns
enterprise integrations
standardized configuration and profiles

Spring Boot is explicitly designed to minimize configuration and help you ship “production-grade” applications quickly. [1]

If you already have:

shared Spring libraries
internal Spring starters
company-wide Spring conventions

…then choosing Go for “purity” can be expensive in human terms.

2) JVM performance can be excellent

For long-lived services under sustained load, HotSpot JIT compilation can deliver extremely strong performance, sometimes outperforming Go in CPU-bound or allocation-sensitive scenarios.

It’s a mistake to assume “compiled native binary” automatically means “faster.” The real question is: p99 latency, throughput per core, and behavior under GC pressure for your workload.

3) Operational maturity and tooling

Spring Boot has well-worn operational patterns:

actuator endpoints
consistent configuration patterns
deep tracing/profiling options
broad community knowledge

Also: if your org has deep Java on-call expertise, “operational simplicity” may already be solved socially.

Cloud-native reality: images, CVEs, and deploy surface

Distroless is not a Go-only advantage

A common Go pattern is “static binary + scratch/distroless.” But distroless images exist for Java too.

Distroless images contain only the application and its runtime dependencies, with no package manager and no shell, reducing attack surface. [9] The distroless project includes Java images as well. [10]

Operational implication: smaller, simpler images usually mean:

faster pulls and rollouts
fewer things to patch
fewer “shell inside container” habits (a feature, not a bug)

Whether you ship Go or Spring Boot, you can adopt hardened bases.

Two Dockerfile patterns (illustrative)

Go (multi-stage + distroless):

FROM golang:1.22-alpine AS build
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -trimpath -ldflags "-s -w" -o /out/api ./cmd/api

FROM gcr.io/distroless/static-debian12:nonroot
COPY --from=build /out/api /api
USER nonroot:nonroot
ENTRYPOINT ["/api"]

Spring Boot (JAR + distroless Java):

FROM eclipse-temurin:21-jdk AS build
WORKDIR /src
COPY . .
RUN ./mvnw -DskipTests package

FROM gcr.io/distroless/java21-debian12:nonroot
COPY --from=build /src/target/app.jar /app.jar
USER nonroot:nonroot
ENTRYPOINT ["java","-jar","/app.jar"]

The important part isn’t the exact base image, it’s the principle: reduce image surface area and keep the deploy artifact boring.

Observability and operations

Both ecosystems are strong here, but they differ in “how quickly can I get real telemetry.”

OpenTelemetry support

OpenTelemetry is the vendor-neutral standard for traces/metrics/logs. [11]

Go language docs: SDK + instrumentation guidance. [11]
Java language docs: SDK + instrumentation guidance. [12]

Java’s advantage: zero-code instrumentation

The OpenTelemetry Java agent can attach to Java applications and automatically instrument popular libraries via bytecode injection. [13] The OpenTelemetry Java instrumentation project provides the agent and broad library coverage. [14]

Practical implication: you can often get useful traces without touching code. That’s a meaningful ops advantage in large enterprises.

Go’s reality: explicit instrumentation (plus growing options)

Go’s OpenTelemetry SDK support is strong. [11] Go auto-instrumentation options exist and are improving, but your fastest path today is still typically:

instrument key inbound/outbound edges in code
standardize middleware across services
treat telemetry as part of the API contract

That’s not bad. It’s just a different default.

A decision matrix

Use this as a starting point, not a rule.

Constraint / Goal	Go tends to win	Spring Boot tends to win
Many small services, high density	✅ smaller baseline envelopes often help	⚠️ can be heavier per-service
Fast scale-from-zero, frequent redeploys	✅ typically quick startup	✅ with care; ✅✅ with native image tradeoffs [3][4]
Enterprise integration breadth	⚠️ you build more glue yourself	✅ Spring ecosystem leverage [1]
Team expertise	✅ if Go is your platform standard	✅ if Java/Spring is your standard
“Boring deployments”	✅ single binary patterns	✅ well-trodden JVM patterns
Zero-code observability	⚠️ emerging	✅ OTel Java agent maturity [13][14]
Long-lived CPU-heavy services	✅ sometimes	✅ JVM can be extremely strong

How to validate with a real experiment

If you want a decision you can defend, run a 2-4 hour experiment:

1) Define a representative endpoint mix

1 simple “health/read” endpoint
1 endpoint that hits your DB
1 endpoint that calls a downstream HTTP service
1 endpoint with payload validation + auth

2) Measure the four numbers that matter

Startup time (cold start to ready)
Steady-state RSS at idle
p95 / p99 latency under load
Error rate under load + partial downstream failure

3) Run the same load and failure profile

Use the same:

container runtime
resource requests/limits
ingress configuration
downstream simulators

4) Compare operational work, not only performance

How painful is debugging?
How much config is required?
How quickly can your team ship fixes safely?

This is where enterprise reality lives.

Common failure modes

Go pitfalls

Teams reinvent frameworks inconsistently across services.
Too much “just a handler” code without shared middleware for auth, limits, tracing, and error handling.
Ignoring backpressure (unbounded goroutines) → memory blowups.

Spring Boot pitfalls

Default dependency graphs grow quietly until startup time and memory become a problem.
Classpath/auto-config complexity makes “why did it do that?” debugging expensive.
Container runtime tuning gets deferred, then becomes urgent during cost reviews.

Both ecosystems

No explicit timeouts (inbound and outbound).
No limits or budgets.
No telemetry until after the first incident.

Closing thought

If your enterprise APIs are:

small, numerous, latency-sensitive, and cost-sensitive
…Go is often a strong default.

If your enterprise APIs are:

integration-heavy, domain-rich, and built on existing Spring conventions
…Spring Boot is usually the shortest path to “production-grade.”

The best answer is the one you can operate confidently, on call, at scale.

References

Spring Boot project overview:
Spring Boot reference: Graceful Shutdown:
Spring Boot reference: GraalVM Native Images:
GraalVM guide: Build a Spring Boot app into a native executable:
Go tutorial: Compile and install the application (go build produces an executable):
Go docs: Toolchains and the go command:
Kubernetes docs: Resource Management for Pods and Containers (requests/limits):
Google Cloud: Kubernetes best practices for resource requests and limits:
Distroless container images (project overview):
Distroless Java images:
OpenTelemetry Go docs:
OpenTelemetry Java docs:
OpenTelemetry Java Agent (zero-code):
OpenTelemetry Java instrumentation (agent JAR + library coverage):

MCP Servers in Production: Hardening, Backpressure, and Observability (Go)

Sat, 31 Jan 2026 09:00:00 -0500

As-of note: MCP is evolving. This article references the MCP specification versioned 2025-11-25 and related docs; verify details against the current spec before shipping changes. [1][2][4]

Why this matters

Most “agent demos” fail in production for boring reasons: missing timeouts, unbounded concurrency, ambiguous tool interfaces, and logging that accidentally turns into data exfiltration.

An MCP server isn’t “just an integration.” It’s a capability boundary between an LLM host (IDE, desktop app, agent runner) and the real world: files, APIs, databases, tickets, home automation, and anything else you wire up. MCP uses JSON-RPC 2.0 messages over transports like stdio (local) and Streamable HTTP (remote). [1][2][5]

That means an MCP server is:

an API gateway for tools
a policy enforcement point (whether you intended it or not)
a reliability hotspot (tool calls are where latency and failure concentrate)
a security hotspot (tools are where “read” becomes “exfil” and “write” becomes “impact”)

This post is a pragmatic checklist + a set of Go patterns to harden an MCP server so it keeps working when it’s under real load, and remains safe when the model gets “creative.”

TL;DR

Treat tool inputs as untrusted. Validate and constrain everything.
Put budgets everywhere: timeouts, concurrency limits, rate limits, and payload caps.
Build for partial failure: retries, idempotency keys, circuit breaking, fallbacks.
Log like a security engineer: structured, redacted, auditable, and useful. [11]
Instrument with traces/metrics early; “we’ll add telemetry later” is a trap. [13]
Prefer Go for MCP servers because deployment and operational behavior are predictable: single binary, fast startup, structured concurrency via context, and a strong standard library.

A production mental model for MCP servers
Threat model: what actually goes wrong
Hardening layer 1: identity and authorization
Hardening layer 2: tool contracts that resist ambiguity
Hardening layer 3: budgets and backpressure
Hardening layer 4: safe networking and SSRF containment
Hardening layer 5: observability without leaking secrets
Hardening layer 6: versioning and rollout discipline
A production checklist
References

A production mental model for MCP servers

MCP’s docs describe a host (the AI application), a client (connector inside the host), and servers (capabilities/providers). Servers can be “local” (stdio) or “remote” (Streamable HTTP). [2][3]

Here’s the production mental model that matters:

Your MCP server is a tool gateway.
Every tool is effectively an RPC method exposed to an agent. MCP uses JSON-RPC 2.0 semantics for requests/responses/notifications. [1][5]
LLM tool arguments are not trustworthy.
Even if the LLM is “helpful,” arguments can be malformed, overbroad, or dangerous, especially under prompt injection or user-provided hostile input.
The host UI is not a security boundary.
The spec emphasizes user consent and tool safety, but the protocol can’t enforce your policy for you. You still need server-side controls. [1]
Transport changes your blast radius, not your responsibilities.
Stdio reduces network exposure, but doesn’t remove safety requirements. Streamable HTTP adds multi-client/multi-tenant concerns and requires real auth. [2][3]

If you remember nothing else: treat the MCP server like a production API you’d be willing to put on call for.

Threat model: what actually goes wrong

When MCP servers cause incidents, it’s usually one of these:

1) Input ambiguity → destructive actions

A “delete” tool with optional filters
A “run command” tool with free-form strings
A “sync” tool that can touch thousands of objects

Mitigation: schema + semantic validation, safe defaults, two-phase commit patterns (preview then apply), and explicit “danger gates.”

2) Prompt injection → tool misuse

The model can be tricked into calling tools with attacker-provided arguments. If your tool can read internal data or call internal APIs, you’ve created an exfil path.

Mitigation: least privilege, allowlists, strong auth, egress controls, and redaction.

3) SSRF / network pivoting

Any tool that fetches URLs, loads webhooks, or calls dynamic endpoints can be abused to hit internal networks or metadata endpoints. OWASP treats SSRF as a major category for a reason. [10]

Mitigation: deny-by-default networking (CIDR blocks, DNS/IP resolution checks, allowlisted destinations).

4) Unbounded concurrency → resource collapse

Agents can fire tools in parallel. Without limits you’ll blow up:

API quotas
DB connections
CPU/memory
downstream latency

Mitigation: per-tenant rate limiting, concurrency caps, queues, and backpressure.

5) “Helpful logs” → data leak

Tool arguments and tool responses often contain secrets, tokens, or private data. If you log everything, you’ve built an involuntary data lake.

Mitigation: structured + redacted logging, security logging guidelines, and minimal retention. [11][12]

Hardening layer 1: identity and authorization

If you run Streamable HTTP, assume:

multiple clients
untrusted networks
tokens will leak eventually

MCP’s architecture guidance recommends standard HTTP authentication methods and mentions OAuth as a recommended way to obtain tokens for remote servers. [2][3]

Practical rules

Authenticate every request.
Use bearer tokens or mTLS depending on environment.
Authorize per tool.
“Authenticated” ≠ “allowed to run delete_everything”.
Prefer short-lived tokens and rotate them. [12]
Multi-tenant? Put the tenant identity into:
- auth token claims, or
- an explicit, validated tenant header (signed), then
- enforce it everywhere.

Go pattern: a minimal auth middleware skeleton (HTTP transport)

This is not a full MCP implementation, just the hardening pattern you’ll wrap around your MCP handler.

// Pseudocode-ish middleware skeleton. Replace verifyToken with your auth logic.
func authMiddleware(next http.Handler) http.Handler {
 return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 token := strings.TrimPrefix(r.Header.Get("Authorization"), "Bearer ")
 if token == "" {
 http.Error(w, "missing auth", http.StatusUnauthorized)
 return
 }

 ident, err := verifyToken(r.Context(), token) // includes tenant + scopes
 if err != nil {
 http.Error(w, "invalid auth", http.StatusUnauthorized)
 return
 }

 ctx := context.WithValue(r.Context(), ctxKeyIdentity{}, ident)
 next.ServeHTTP(w, r.WithContext(ctx))
 })
}

Key point: authorization should happen after you parse the requested tool name, but before you execute anything.

Hardening layer 2: tool contracts that resist ambiguity

Most MCP tool failures are self-inflicted: tool interfaces are too vague.

Design tools like production APIs

Bad tool signature:

run(command: string)

Better:

run_command(program: enum, args: string[], cwd: string, timeout_ms: int, dry_run: bool)

Why it’s better:

forces structure
allows you to enforce allowlists
gives you timeouts and safe defaults

Add a “preview → apply” flow for risky tools

For any tool that writes data or triggers side effects, do a two-step approach:

plan_* returns a machine-readable plan + a plan_id
apply_* requires plan_id and optional user confirmation token

This mirrors how we run infra changes (plan/apply) and dramatically reduces accidental blast radius.

Hardening layer 3: budgets and backpressure

Production systems are budget systems.

If you don’t set explicit budgets, your MCP server will eventually allocate them for you via outages.

Budget checklist

Server timeouts (header read, request read, write, idle)
Request body caps
Outbound timeouts to dependencies
Concurrency caps per tool and per tenant
Rate limits per tenant and per identity
Queue limits (bounded channels) to avoid memory blowups
Circuit breaking for flaky downstream dependencies

Go: server timeouts are not optional

Go’s net/http provides explicit server timeouts; leaving them at zero is a common footgun. [6][7]

srv := &http.Server{
 Addr: ":8080",
 Handler: handler, // your MCP handler + middleware
 ReadHeaderTimeout: 5 * time.Second,
 ReadTimeout: 30 * time.Second,
 WriteTimeout: 30 * time.Second,
 IdleTimeout: 60 * time.Second,
}
log.Fatal(srv.ListenAndServe())

Go: propagate cancellation everywhere with `context`

context.Context is the backbone of “structured concurrency” in Go: deadlines and cancellation signals flow through your call stack. [8][9]

Rule: every tool execution must accept a context.Context, and every outbound call must honor it.

func (s *Server) toolCall(ctx context.Context, req ToolRequest) (ToolResponse, error) {
 ctx, cancel := context.WithTimeout(ctx, 15*time.Second)
 defer cancel()

 // ... outbound calls use ctx
 return s.integration.Do(ctx, req)
}

Go: per-tenant rate limiting with `x/time/rate`

golang.org/x/time/rate implements a token bucket limiter. [9]

type limiters struct {
 mu sync.Mutex
 m map[string]*rate.Limiter
}

func (l *limiters) get(key string) *rate.Limiter {
 l.mu.Lock()
 defer l.mu.Unlock()
 if l.m == nil { l.m = map[string]*rate.Limiter{} }
 if lim, ok := l.m[key]; ok { return lim }

 // Example: 5 req/sec with bursts up to 10
 lim := rate.NewLimiter(5, 10)
 l.m[key] = lim
 return lim
}

func rateLimitMiddleware(lims *limiters, next http.Handler) http.Handler {
 return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 ident := mustIdentity(r.Context())
 if !lims.get(ident.TenantID).Allow() {
 http.Error(w, "rate limited", http.StatusTooManyRequests)
 return
 }
 next.ServeHTTP(w, r)
 })
}

Backpressure: choose a policy

When you’re overloaded, you need a policy. Pick one explicitly:

Fail fast with 429 / “busy” (simplest, safest)
Queue with bounded depth (more complex; must cap memory)
Degrade by disabling expensive tools first

The “fail fast” approach is often correct for tool gateways.

Hardening layer 4: safe networking and SSRF containment

If any tool can fetch a user-provided URL or call a user-influenced endpoint, SSRF is on the table. [10]

SSRF containment strategies that actually work

OWASP’s SSRF guidance boils down to a few themes: don’t trust user-controlled URLs, use allowlists, and enforce network controls. [10]

In practice, for MCP servers:

Prefer allowlists over blocklists.
“Only these domains” beats “block internal IPs.” Attackers are creative.
Resolve and validate IPs before dialing.
DNS can be weaponized. Validate the final destination IP (and re-validate on redirects).
Disable redirects or re-validate each hop.
Redirect chains are SSRF’s favorite tool.
Enforce egress policy at the network layer too.
Kubernetes NetworkPolicies / firewall rules are your last line of defense.

Go pattern: an outbound HTTP client with strict timeouts

client := &http.Client{
 Timeout: 10 * time.Second, // whole request budget
 Transport: &http.Transport{
 Proxy: http.ProxyFromEnvironment,
 DialContext: (&net.Dialer{
 Timeout: 5 * time.Second,
 KeepAlive: 30 * time.Second,
 }).DialContext,
 TLSHandshakeTimeout: 5 * time.Second,
 ResponseHeaderTimeout: 5 * time.Second,
 ExpectContinueTimeout: 1 * time.Second,
 MaxIdleConns: 100,
 IdleConnTimeout: 90 * time.Second,
 },
}

Then wrap URL validation around any request creation. Keep it boring and strict.

Hardening layer 5: observability without leaking secrets

Telemetry is how you prove:

you’re within budgets
tools behave as expected
failures are localized
incidents can be diagnosed without “ssh and guess”

But logging is also where teams accidentally leak sensitive data.

OWASP’s logging guidance emphasizes logging that supports detection/response while avoiding sensitive data exposure. [11] Pair that with secrets management discipline. [12]

What to measure (minimum viable MCP telemetry)

Counters

tool_calls_total{tool, tenant, status}
auth_failures_total{reason}
rate_limited_total{tenant}

Histograms

tool_latency_seconds{tool}
outbound_latency_seconds{dependency}

Gauges

in_flight_tool_calls{tool}
queue_depth{tool}

Trace boundaries

Instrument:

request → tool routing
tool execution span
downstream calls span

OpenTelemetry’s Go docs show how to add instrumentation and emit traces/metrics. [13]

Logging rules that save you later

Use structured logging (JSON).
Add correlation IDs (trace IDs) to logs.
Redact:
- Authorization headers
- tokens
- cookies
- tool payload fields known to contain secrets
Log events, not raw payloads:
- “tool X called”
- “resource Y read”
- “write operation requested (dry_run=true)”

Audit logs

For high-impact tools, write an append-only audit record:
- who (identity)
- what (tool + parameters summary)
- when
- result (success/failure)
- plan_id / idempotency_key

Audit logs should be treated as security data.

Hardening layer 6: versioning and rollout discipline

MCP uses string-based version identifiers like YYYY-MM-DD to represent the last date of backwards-incompatible changes. [4]

That’s helpful, but it doesn’t solve the operational problem:

clients upgrade at different times
schema changes drift
hosts differ in which capabilities they support

Practical compatibility rules

Pin your server’s supported protocol version and expose it in health or diagnostics.
Add contract tests that run against:
- one “current” client
- one “previous” client version
Support additive changes first:
- new tools
- new optional fields
Use feature flags for risky tools.

Rollout like a platform team

Canaries for remote servers
“Shadow mode” for new tools (log what would happen)
Slow ramp with budget monitoring

A production checklist

If you’re building (or inheriting) an MCP server, run this checklist:

Safety

Tool contracts are structured (no free-form “do anything” strings).
Every tool has a safe default (dry_run=true, limit required, etc.).
Destructive tools require a plan/apply step (or explicit confirmation gates).
Tool inputs are validated and bounded (length, ranges, enums).

Identity & access

Remote transport requires authentication and per-tool authorization.
Tokens are short-lived and rotated; secrets are not in source control. [12]
Tenant identity is enforced at every access point (not “best effort”).

Budgets & resilience

HTTP server timeouts are configured. [6][7]
Outbound clients have timeouts and connection limits.
Rate limiting exists per tenant/identity. [9]
Concurrency caps exist per tool; overload behavior is explicit (fail fast / queue).
Retries are bounded and idempotent where side effects exist.

Networking

URL fetch tools have allowlists and SSRF protections. [10]
Redirect policies are explicit (disabled or re-validated).
Egress is constrained at the network layer (not only in code).

Observability

Metrics cover tool calls, latency, errors, and rate limiting.
Tracing exists across tool execution and downstream calls. [13]
Logs are structured, correlated, and redacted. [11]
Audit logging exists for high-impact tools.

Operations

Health checks and readiness checks exist.
Configuration is explicit and validated on startup.
Versioning strategy is documented and tested. [4]

References

Model Context Protocol (MCP) Specification (version 2025-11-25): https://modelcontextprotocol.io/specification/2025-11-25
MCP Architecture Overview (participants, transports, concepts): https://modelcontextprotocol.io/docs/learn/architecture
MCP Transport details (Streamable HTTP transport overview): https://modelcontextprotocol.io/specification/2025-03-26/basic/transports
MCP Versioning: https://modelcontextprotocol.io/specification/versioning
JSON-RPC 2.0 Specification: https://www.jsonrpc.org/specification
Go net/http package documentation: https://pkg.go.dev/net/http
Cloudflare: “The complete guide to Go net/http timeouts”: https://blog.cloudflare.com/the-complete-guide-to-golang-net-http-timeouts/
Go context package documentation: https://pkg.go.dev/context
Go x/time/rate documentation: https://pkg.go.dev/golang.org/x/time/rate
OWASP SSRF Prevention Cheat Sheet / SSRF category references:

OWASP Logging Cheat Sheet (security-focused logging guidance): https://cheatsheetseries.owasp.org/cheatsheets/Logging_Cheat_Sheet.html
Secrets management guidance:

OWASP Secrets Management Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/Secrets_Management_Cheat_Sheet.html
Kubernetes “Good practices for Kubernetes Secrets”: https://kubernetes.io/docs/concepts/security/secrets-good-practices/

OpenTelemetry Go instrumentation docs: https://opentelemetry.io/docs/languages/go/instrumentation/

Agent Observability That Doesn't Lie

Sat, 20 Dec 2025 12:00:00 -0500

Why this matters

Most “agent observability” is either:

too shallow (a chat transcript and a couple logs), or
too noisy (every token logged, every tool payload stored, no signal)

Neither works in production.

If you’re serious about operating agents, you need observability that answers three questions quickly:

What happened? (forensics)
Why did it happen? (debuggability)
How often does it happen? (reliability)

OpenTelemetry exists to standardize how you instrument, generate, and export telemetry across traces, metrics, and logs. [1] W3C Trace Context defines how trace context propagates across service boundaries. [2]

Agents add two new requirements:

tool calls are part of your “distributed trace”
“decisioning” is a first-class component (not just business logic)

This article is a practical blueprint.

TL;DR

Instrument agents like distributed systems:
traces for causality (what triggered what)
metrics for health (p95 latency, error rates)
logs for human context (but redacted)
Propagate a single trace across:
agent runtime -> MCP gateway -> MCP tool servers -> upstream APIs
Capture decision summaries, not chain-of-thought.
Treat cost as a production signal: emit per-run and per-tool cost metrics.
Use semantic conventions where possible to keep telemetry queryable. [3]
Don’t turn observability into a data breach: OWASP highlights sensitive info disclosure and prompt injection as key risks. [7]

What to observe in an agent system
A trace model for agents
Metrics that matter
Logs and redaction
Audit events vs debug logs
Dashboards and alerts
A production checklist
References

What to observe in an agent system

Agents have four observable subsystems:

Planner/Reasoner (creates the plan, chooses tools)
Tool execution (calls MCP tools and interprets results)
Memory/state (what was stored or retrieved)
Policy/budget (what was allowed or blocked)

If you only observe #2, you’ll miss why the agent chose the wrong tool. If you only observe #1, you’ll miss production failures.

You need the full chain.

A trace model for agents

The core idea

A single “agent run” is a distributed trace:

it spans model calls
tool calls
downstream system calls

Use W3C Trace Context (traceparent, tracestate) to propagate the trace across boundaries. [2]

Suggested spans (minimum viable)

Root span

agent.run
attributes: agent.name, tenant, user, session, goal_hash

Planner

agent.plan
attributes: planner.model, plan.step_count

Model calls

llm.call
attributes: model, prompt_tokens, completion_tokens, latency_ms

Tool selection

agent.tool_select
attributes: selector.version, candidate_count, selected_count

Tool call

tool.call
attributes: tool.name, tool.class (read/write/danger), tool.server, status

Policy

policy.check
attributes: policy.rule_id, decision (allow/deny), reason_code

Memory

memory.read / memory.write
attributes: store, keys, bytes

Why spans > logs

Spans give you causality:

which tool call caused a failure
which step blew the budget
which upstream dependency was slow

With OpenTelemetry, you can emit traces and metrics using the same SDK approach. [1][4]

Metrics that matter

Tool health metrics

tool_calls_total{tool,status}
tool_latency_ms_bucket{tool}
tool_timeouts_total{tool}
tool_retries_total{tool}

Agent run health metrics

agent_runs_total{status}
agent_run_latency_ms_bucket{agent}
agent_steps_total_bucket{agent}

Cost metrics (treat cost like reliability)

llm_tokens_total{model,type=prompt|completion}
llm_cost_usd_total{model}
run_cost_usd_bucket{agent}

Policy metrics

policy_denied_total{rule_id}
danger_tool_attempt_total{tool}

Semantic conventions help your metrics stay queryable and consistent across systems. OpenTelemetry documents semantic conventions for HTTP spans/metrics, for example. [3][5]

Logs and redaction

Logs should add human context, not become a data lake of secrets.

Rules I like:

Do not log prompts by default.
Do not log tool payloads by default.
Log summaries and hashes:
goal_hash, plan_hash, tool_args_hash
Log structured error reasons:
validation_error, upstream_rate_limited, auth_failed, policy_denied

For agent systems, OWASP highlights sensitive information disclosure and insecure output handling. Logging is one of the easiest ways to accidentally create both. [7]

“Debug mode” that isn’t dangerous

If you must support deeper logs:

only enable per tenant/user for a limited window
auto-expire
redact aggressively
never store raw secrets

Audit events vs debug logs

Treat them as different products:

Audit events (for governance)

immutable-ish records of side effects
minimal sensitive data
always on
long retention

Example audit fields:

who: tenant/user/client
what: tool + action class (create/update/delete)
when: timestamp
where: environment
result: success/failure
resource IDs (safe identifiers)
idempotency keys / plan IDs

Debug logs (for engineers)

short retention
more context
highly controlled access

Mixing these two is how you end up with “SharePoint logs full of PII” and no one wants to touch them.

Dashboards and alerts

Dashboards (start simple)

Tool reliability

top tools by error rate
top tools by p95 latency
timeouts per tool

Agent success

success rate by agent type
“stuck runs” (runs exceeding max duration)
average steps per run

Cost

cost per run
cost per tenant
top drivers (which tools/model calls)

Alerts (avoid noise)

Alert on what is actionable:

tool error rate spikes for critical tools
tool latency p95 spikes beyond SLO
budget exceeded spike (runaway behavior)
policy denied spike (possible prompt injection attempt)

If you use SLOs and error budgets, Google’s SRE material is a practical reference for turning SLOs into alerting strategies. [6]

A production checklist

Tracing

Every agent run has a trace ID.
Trace context propagates across MCP boundaries (W3C Trace Context). [2]
Tool calls are spans with stable tool identifiers.

Metrics

Tool success/error/latency metrics exist.
Agent run success/latency/steps metrics exist.
Cost metrics exist and are monitored.

Logging

Default logs are redacted summaries, not raw payloads.
Debug logging is time-bounded and access-controlled.

Audit

Audit events exist for all side-effecting tools.
Audit records include “who/what/when/result” without leaking secrets.

Security

Observability does not become a secret exfil path (OWASP risks considered). [7]

References

[1] OpenTelemetry - Documentation (overview): https://opentelemetry.io/docs/ [2] W3C - Trace Context: https://www.w3.org/TR/trace-context/ [3] OpenTelemetry - Semantic conventions for HTTP (spans/metrics/logs): https://opentelemetry.io/docs/specs/semconv/http/ [4] OpenTelemetry Go - Instrumentation docs: https://opentelemetry.io/docs/languages/go/instrumentation/ [5] OpenTelemetry - Semantic conventions for HTTP metrics: https://opentelemetry.io/docs/specs/semconv/http/http-metrics/ [6] Google SRE Workbook - Alerting on SLOs: https://sre.google/workbook/alerting-on-slos/ [7] OWASP - Top 10 for Large Language Model Applications: https://owasp.org/www-project-top-10-for-large-language-model-applications/

Cost Is a Reliability Problem

Sat, 13 Dec 2025 12:00:00 -0500

Why this matters

Traditional reliability focuses on uptime. AI systems add a second axis:

Your system can be “up” while your budget is on fire.

A runaway agent doesn’t always crash services. Sometimes it:

loops tool calls
retries incorrectly
escalates to larger models repeatedly
expands context windows unnecessarily
performs expensive searches without stopping

The result: surprise bills, throttling, and eventually hard outages when quotas are hit.

Google’s SRE framing around error budgets is a useful mental model: budgets create a control mechanism that balances stability with velocity. [1][2] FinOps frames cost management as a collaboration practice between engineering, finance, and business. [3]

This article is the practical bridge: use budgets and guardrails like you would for reliability.

TL;DR

Treat cost as an SLO: define acceptable spend per run / per tenant / per day.
Enforce budgets at multiple layers:
per request/run
per tool
per tenant
per environment
Use hard limits + soft limits:
soft: degrade model/tool choices
hard: stop the run and ask for approval
Add cost circuit breakers:
abort on runaway loops
quarantine tools causing repeated retries
Make cost visible (metrics + dashboards) so teams can improve it.
Align with FinOps: shared accountability, not “billing surprises.” [3]

Cost failure modes in agent systems
Define cost SLOs and budgets
Budget layers: run, tool, tenant, environment
Soft limits vs hard limits
Circuit breakers for runaway behavior
Cost-aware tool and model selection
Dashboards and alerts
A production checklist
References

Cost failure modes in agent systems

1) Infinite or long loops

Common triggers:

ambiguous tool outputs
brittle parsing
“try again” reflexes
non-idempotent retries

2) Tool spam

Agents sometimes “search until confident.” If you don’t cap it, you get 20+ tool calls on a single request.

3) Model escalation cascades

If your policy says “if uncertain, use a better model,” you can create a cost escalator:

cheap model -> “uncertain” -> expensive model
expensive model -> still uncertain -> more calls

4) Context growth

If you keep appending tool outputs to the prompt, costs grow superlinearly and performance can degrade.

5) External quotas become outages

Even if cost is acceptable, external services (email APIs, GitHub, calendars) can rate limit you. Cost and reliability are coupled.

Define cost SLOs and budgets

Start with simple “production truths”:

How much is one agent run allowed to cost?
What is an acceptable daily spend per tenant?
What is the max “blast radius” of a single request?

This maps cleanly to SRE’s error budget concept: budgets constrain unsafe behavior while preserving velocity. [2]

Example cost SLOs (pragmatic)

Per run: <= $0.10 (p95), <= $0.50 (max)
Per tenant/day: <= $50/day
Per user/day: <= $5/day
Per tool call: <= 3 calls to expensive tools

These aren’t universal. They’re explicit. That’s what matters.

Budget layers: run, tool, tenant, environment

1) Per-run budget

Tracks:

max model tokens
max tool calls
max wall-clock time
max “expensive operations” count

Most important budget. This is where you stop runaway behavior early.

2) Per-tool budget

Some tools are inherently expensive:

large searches
long-running jobs
heavy data exports

Budget these separately:

max calls
max payload size
max time range

3) Per-tenant budget

Without this, your best customers can melt your infra.

Per-tenant limits:

requests/min
concurrent runs
daily cost cap

4) Per-environment budget

Environments have different rules:

dev: cheap, permissive, more logging
prod: bounded, gated, auditable

This is where you implement “read-only mode” during incidents.

Soft limits vs hard limits

Soft limits (degrade gracefully)

When approaching budget:

switch to cheaper models
reduce context size (summarize)
narrow tool search range
skip non-essential steps

Hard limits (stop the run)

When budget is exceeded:

stop tool calls
stop escalation
request user confirmation / approval
produce a partial answer with an explanation

This is exactly the “control mechanism” idea behind error budgets: it gives the system permission to shift focus when constraints are exceeded. [1]

Circuit breakers for runaway behavior

Add circuit breakers that detect “this is going bad”:

loop detector: same tool called with similar args repeatedly
retry storm: high retry count for a tool within a run
no progress: plan step count increases without new evidence
latency breaker: tool p95 spikes beyond threshold

When triggered:

stop the run
quarantine the tool for this run
degrade to safe alternatives
emit high-signal telemetry

Cost-aware tool and model selection

Cost control is easier if it’s designed into selection:

Rank tools with a “cost weight” (latency + upstream cost + risk)
Prefer read-only tools unless a write is required
Use caches for common retrieval results
Use deterministic summarization boundaries for tool outputs

If you already implement a tool selector (see “Million Tool Problem”), cost becomes another rerank feature.

Dashboards and alerts

This is where FinOps and SRE meet: cost is an operational signal.

Dashboards

spend/day by tenant
cost per run distribution
top cost drivers (tools and models)
runaway breaker triggers

Alerts

daily spend exceeded
sudden spend spikes (slope alerts)
high frequency of loop breaker events
high fraction of runs hitting hard limits

AWS’s Well-Architected Cost Optimization pillar frames cost optimization as a continual process across the workload lifecycle. That mindset applies here too. [4]

A production checklist

Budgets

Per-run cost and tool-call budgets exist.
Per-tenant daily caps exist.
Per-tool “expensive operation” caps exist.

Enforcement

Soft limits degrade gracefully (cheaper models, narrower queries).
Hard limits stop and request approval.
Circuit breakers detect loops/retry storms.

Telemetry

Cost metrics emitted per run and per tenant.
Breaker events recorded and alertable.

Culture

Cost management is a shared practice (FinOps), not a surprise invoice. [3]

References

[1] Google SRE Workbook - Example Error Budget Policy: https://sre.google/workbook/error-budget-policy/ [2] Google SRE Book - Embracing Risk (error budgets as control mechanism): https://sre.google/sre-book/embracing-risk/ [3] FinOps Foundation - What is FinOps? (definition and principles): https://www.finops.org/introduction/what-is-finops/ [4] AWS Well-Architected Framework - Cost Optimization pillar: https://docs.aws.amazon.com/wellarchitected/latest/framework/cost-optimization.html

From Stdio to Enterprise: The MCP Gateway Pattern

Sat, 22 Nov 2025 12:00:00 -0500

As-of note: MCP evolves quickly. This article references the MCP spec revision 2025-11-25. Validate details against the current spec before shipping changes. [1][2][3]

Why this matters

Local MCP servers over stdio are an amazing developer experience: you install a tool server, the host (Claude Desktop / Claude Code / an agent runtime) launches it, and you’re productive in minutes. [2]

But as soon as MCP becomes shared infrastructure - multiple clients, multiple users, multiple environments - the “local tool server” model runs into the same constraints every integration layer hits:

Who is allowed to call what tool?
How do you prevent one noisy user from melting shared dependencies?
How do you audit tool side effects?
How do you roll out tool changes without breaking clients?
How do you keep secrets out of prompts, logs, and screenshots?

This is where the MCP Gateway Pattern shows up.

A gateway is not “another service.” It’s a capability boundary: the place where you enforce policy, budgets, and observability for tool use at scale.

TL;DR

Stdio is great for local, single-user, low-blast-radius setups.
HTTP transports (Streamable HTTP) enable multi-client servers - but they also require real auth and multi-tenant safety. [2][3]
An MCP gateway sits between clients and tool servers to provide:
authentication & authorization
tenant isolation
rate limits / concurrency / cost budgets
consistent tool schemas + safety gates
audit logs and observability
routing, versioning, rollout controls
Build the gateway to be boring: small surface area, strict validation, explicit policies, great telemetry.

When stdio stops being enough
The MCP Gateway Pattern
Responsibilities of a gateway
Reference architecture
Policy patterns that actually work
Scaling and isolation strategies
Observability and audit
Rollouts and versioning
A production checklist
References

When stdio stops being enough

MCP supports multiple transports; stdio is common for local servers. [2] In that model, the host controls process lifetime and secrets typically come from the environment on the local machine.

Stdio starts to strain when you need:

multi-client concurrency
shared tenancy
central policy enforcement
centralized audit
fleet-level rollout controls

At that point, you’re effectively building a platform. The platform needs a stable ingress point with consistent security and operational behavior.

MCP’s HTTP-based transports (like Streamable HTTP) are designed for servers that can handle multiple connections and enable streaming/notifications. [2] MCP also defines an authorization flow for HTTP-based transports. [3]

That’s the entry point for a gateway.

The MCP Gateway Pattern

Definition: An MCP gateway is an MCP server (or MCP-adjacent ingress layer) that:

authenticates and authorizes the client
routes requests to one or more downstream MCP servers (or tool backends)
enforces budgets and safety gates
emits consistent telemetry and audit records

It looks like an API gateway, but the payload is “tool capability” not “REST endpoints.”

Responsibilities of a gateway

1) Authentication and authorization

If you expose MCP servers over HTTP, you need strong auth. MCP includes an authorization framework at the transport layer for HTTP-based transports. [3]

Practical gateway rules:

Authenticate every client (bearer tokens, mTLS, OAuth-derived access tokens).
Authorize per tool, not per server.
Prefer least privilege scopes:
calendar.read
calendar.write
email.read
email.send
k8s.readonly
k8s.apply
For high-impact tools: require explicit confirmation tokens and/or multi-party approval.

2) Tool contract enforcement

MCP tools are invoked by an LLM-driven client. That means tool arguments are untrusted.

The gateway is the ideal place to enforce:

schema validation
payload size caps
allowlists and blocklists
“danger gates” (preview/apply, confirmations)
“semantic validation” (not just types - e.g., limits required, date ranges bounded)

MCP’s spec is grounded in structured schemas; treat those schemas as contracts. [1]

3) Budgets and backpressure

Agents can trigger bursty tool calls. Without backpressure you get the classic cascade:

upstream rate limits
DB pool exhaustion
thread/goroutine explosion
timeouts everywhere

At the gateway you can enforce:

per-tenant rate limits
per-tool concurrency limits
timeouts and deadline propagation
queue depth caps (bounded memory)
circuit breakers for flaky dependencies

This is where you keep “one user spamming tools” from becoming “everyone is down.”

4) Secret handling and redaction

Gateways are a natural place to centralize:

secret injection (short-lived tokens per tenant)
output redaction (strip tokens, emails, PII fields)
logging policies (never log raw tool payloads by default)

For agent systems, OWASP highlights risks like prompt injection and sensitive info disclosure as major categories. [7]

Your gateway should assume that anything returned by a tool could be coerced into exfiltration if you’re careless.

5) Observability and audit

Operationally, the gateway is your best place to emit consistent:

request logs
tool call metrics
traces across tool chains
audit events for side effects

OpenTelemetry is the de facto standard for collecting and exporting telemetry. [5] W3C Trace Context defines headers like traceparent/tracestate for trace propagation across services. [6]

If you want an enterprise to trust agents, you need the forensic trail.

6) Routing and discovery at scale

The gateway becomes:

the routing table (“tool X lives in cluster Y”)
the discovery system (“list tools available for tenant Z”)
the version broker (“tool schema v3 for client A, v4 for client B”)

This is also where you can implement “tool quality” policies:

quarantine tools with high error rates
fallback to read-only alternatives
degrade gracefully under partial outages

Reference architecture

Here’s a simple, effective gateway architecture:

--------------------------------
- Agent host / IDE / runtime -
- (MCP client) -
--------------------------------
 - Streamable HTTP / JSON-RPC [2][4]
 v
------------------------------------------------
- MCP Gateway -
- - AuthN/Z [3] -
- - Schema + safety gates -
- - Budgets (rate, concurrency, cost) -
- - Audit + telemetry (OTel) [5][6] -
- - Routing + tool registry -
------------------------------------------------
 -
 ------------------------
 v v
----------------- ------------------
- MCP Server A - - MCP Server B -
- (calendar) - - (k8s, github...)-
------------------ ------------------
 v v
 Upstream APIs Upstream APIs

Key design decision: the gateway should not contain business logic. It enforces policy and routes tool calls. Tool semantics live in tool servers.

Policy patterns that actually work

Pattern: Read vs write tool classes

Classify tools into tiers:

Read-only: listing, searching, fetching
Write-safe: creates/updates that are naturally reversible
Dangerous: deletes, bulk updates, destructive actions, privileged ops

Then enforce different rules per tier:

Read-only: wide availability, higher concurrency
Write-safe: lower concurrency, stronger audit, idempotency keys
Dangerous: preview/apply, explicit confirmations, restricted scopes

Pattern: Preview -> Apply

For any tool that can cause harm:

plan_* returns a plan + summary + plan_id
apply_* requires plan_id (and optionally a user confirmation token)

This is the “terraform plan/apply” mental model applied to tools.

Pattern: Allowlisted egress (SSRF containment)

If tools can fetch URLs or call arbitrary endpoints, treat it as SSRF risk. OWASP’s SSRF prevention guidance is a useful baseline. [8]

At the gateway, enforce:

allowlisted domains
IP/CIDR blocks for internal metadata ranges
redirect re-validation

Pattern: Tenant-bound tokens

Instead of giving tool servers “global” credentials, mint tenant-scoped tokens and inject them for each call.

reduces blast radius
makes audit meaningful
enables “kill switch” revocation per tenant

Scaling and isolation strategies

A gateway is where multi-tenancy becomes real. Choose an isolation model:

Option A: Process isolation per tool server (simple, strong isolation)

each integration is its own process/container
faults stay contained
rollouts per integration are easy

Tradeoff: more processes to manage.

Option B: Shared server with strong tenant sandboxing

single multi-tenant server handles many clients
cheaper to run
requires rigorous isolation inside the process

Tradeoff: higher risk if a bug leaks across tenants.

Option C: Hybrid

“sensitive” integrations are isolated
“low-risk” read-only tools can be multi-tenant

Most enterprises end up here.

Observability and audit

What to emit (minimum viable)

Metrics

tool_calls_total{tool, tenant, status}
tool_latency_ms{tool}
rate_limited_total{tenant}
budget_exceeded_total{tenant, budget_type}

Traces

request span (client -> gateway)
tool execution span (gateway -> server)
downstream spans (server -> upstream API)

Audit events

who (tenant/user/client)
what (tool + summarized parameters)
when
result (success/failure)
side effect IDs (resource IDs, plan_id, idempotency_key)

OpenTelemetry’s Go docs are a good reference for instrumentation patterns. [5]

Rollouts and versioning

Tool contracts drift. Clients upgrade at different times. Gateways can reduce pain by:

pinning tool schema versions per client
supporting additive changes first (new fields optional)
allowing parallel tool versions for a period
enabling canary rollouts per tenant

If you do nothing else: never deploy a breaking tool change to 100% of tenants at once.

A production checklist

Security

AuthN required for all HTTP-based access. [3]
AuthZ enforced per tool (least privilege).
Tool inputs validated and bounded.
Dangerous tools require preview/apply and explicit confirmations.
Egress allowlists exist for URL/network tools. [8]

Reliability

Per-tenant rate limiting and per-tool concurrency caps.
Timeouts everywhere; deadlines propagate.
Bounded queues (no unbounded memory growth).
Circuit breakers for flaky dependencies.

Operability

Traces propagate end-to-end (W3C Trace Context). [6]
Metrics and logs are consistent and redacted.
Audit events exist for side effects.

Delivery

Tool schemas versioned; canary rollouts supported.
Quarantine and fallback policies exist for failing tools.

References

[1] Model Context Protocol (MCP) - Specification (Protocol Revision 2025-11-25): https://modelcontextprotocol.io/specification/2025-11-25 [2] MCP - Transports (including Streamable HTTP): https://modelcontextprotocol.io/specification/2025-03-26/basic/transports [3] MCP - Authorization (HTTP-based transports): https://modelcontextprotocol.io/specification/2025-11-25/basic/authorization [4] JSON-RPC 2.0 Specification: https://www.jsonrpc.org/specification [5] OpenTelemetry Go - Instrumentation docs: https://opentelemetry.io/docs/languages/go/instrumentation/ [6] W3C - Trace Context: https://www.w3.org/TR/trace-context/ [7] OWASP - Top 10 for Large Language Model Applications: https://owasp.org/www-project-top-10-for-large-language-model-applications/ [8] OWASP - SSRF Prevention Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/Server_Side_Request_Forgery_Prevention_Cheat_Sheet.html

The Service Template That Prevents Incidents

Sat, 25 Oct 2025 12:00:00 -0500

Why this matters

Most enterprises try to standardize software delivery with:

PDFs
Confluence pages
slide decks
architecture review boards

It doesn’t scale.

Teams don’t move faster because the rules exist. Teams move faster because the defaults exist.

Platform engineering language captures this well: paved roads / golden paths reduce cognitive load and make the “right way” the easy way. [1][2] The CNCF Platforms White Paper makes the case for internal platforms as a lever that impacts value streams indirectly - through better flow and developer experience. [3]

This article is a practical blueprint for the thing that actually changes outcomes:

A service template that bakes reliability, security, and operability into day-one defaults.

TL;DR

Build one paved road for APIs:
repo template + CI pipeline + runtime defaults
Include “boring” but critical capabilities:
health probes, resource requests/limits, disruption budgets [4][5][6]
tracing/metrics/logging via OpenTelemetry [7]
timeouts, retries, rate limits
standardized deployment and rollout
Measure success with outcomes (DORA metrics): lead time, deploy frequency, change failure rate, MTTR. [8]
Optimize for day 2 to day 50, not just “hello world.”

What a paved road is (and isn’t)
The API service template: required capabilities
A reference repository structure
Kubernetes defaults that save you later
Observability by default
Security by default
Rollouts and operational controls
How to roll this out without a platform revolt
A production checklist
References

What a paved road is (and isn’t)

A paved road is

a recommended path to production
preconfigured defaults that make safe delivery easy
automation that eliminates repetitive decisions

Microsoft describes this in internal developer platform terms: recommended and supported development paths, incrementally paved through an internal platform. [2]

A paved road is not

a mandate that blocks all other approaches
a committee process
a doc nobody reads

If your paved road becomes a gate, teams will route around it.

The API service template: required capabilities

Here’s what “enterprise production API” should mean out of the box.

Operability

structured logging with correlation IDs
metrics (request rate/latency/errors)
tracing across inbound/outbound calls [7]
runtime config and feature flags

Reliability

timeouts everywhere
bounded retries with backoff
health probes (liveness/readiness/startup) [5]
graceful shutdown
rate limits / concurrency caps

Platform fit

Kubernetes-ready manifests
resource requests/limits [4]
PodDisruptionBudget for availability during maintenance [6]
standardized rollout strategy

Security

auth middleware
input validation
secret injection patterns (no secrets in repo)
least privilege service accounts

Delivery

CI pipeline: lint/test/build/scan
SBOM generation
deploy automation (GitOps or pipeline)

A reference repository structure

.
--- cmd/service/ # main
--- internal/ # business logic
--- pkg/ # shared libs (optional)
--- api/ # OpenAPI spec, schemas
--- deploy/
- --- k8s/ # manifests (or Helm/Kustomize)
- --- policy/ # OPA/constraints (optional)
--- docs/
- --- index.md
- --- runbooks/
--- Makefile
--- .github/workflows/ # CI

Key idea: the template is not just code - it is the full production story:

how to run locally
how to deploy
how to observe
how to operate on-call

Kubernetes defaults that save you later

1) Resource requests and limits

Kubernetes scheduling and stability depend on requests/limits. The official docs explain how pod requests/limits are derived from container values. [4]

Template default:

set conservative requests
set safe limits
provide guidance for right-sizing

2) Probes

Kubernetes supports liveness, readiness, and startup probes. The docs describe how to configure them and why they matter. [5]

Template default:

readinessProbe ensures traffic only goes to ready pods
livenessProbe catches deadlocks / stuck processes
startupProbe prevents early restarts for slow boot services

3) Disruption budgets

PodDisruptionBudgets limit concurrent disruptions during voluntary maintenance. [6]

Template default:

include a PDB for replicated services
define min available or max unavailable

Observability by default

If you do one thing: instrument the template so every service ships with telemetry.

OpenTelemetry provides the framework for standard traces/metrics/logs. [7]

Template defaults:

standard HTTP server instrumentation
propagation of trace context (W3C headers)
request logs include trace IDs
golden dashboard:
RPS
p95 latency
error rate
saturation (CPU/memory)

Security by default

Avoid “security guidance documents.” Make secure defaults.

Template defaults:

auth middleware with standardized claims/roles mapping
structured validation for request bodies
outbound allowlists (where feasible)
secret injection via environment/secret store (no plain text)

Your paved road becomes a security accelerator because teams start secure.

Rollouts and operational controls

Default rollout patterns:

canary or progressive delivery when needed
safe rollback
feature flags for risky changes

Default operational controls:

rate limiting
concurrency limits
timeouts and circuit breakers
“maintenance mode” toggle

How to roll this out without a platform revolt

This is the part platform teams often miss.

1) Make it optional - but obviously better

If adopting the template reduces weeks of work to hours, teams will choose it.

2) Provide migration paths

minimal adoption: observability + probes
medium: deploy manifests + CI
full: service template + libraries

3) Measure outcomes, not adoption

Use DORA metrics to show impact: lead time, deploy frequency, change failure rate, time to restore service. [8]

If the paved road doesn’t move these, it’s not paved.

A production checklist

Template

Repo template includes CI, deploy, docs, runbooks.
Observability instrumentation included by default. [7]

Kubernetes

Resource requests/limits included. [4]
Liveness/readiness/startup probes included. [5]
PodDisruptionBudget included for replicated services. [6]

Reliability

Timeouts and bounded retries are standard.
Graceful shutdown is implemented.
Rate limiting/concurrency caps exist.

Security

Auth middleware included.
Secrets handled via secure injection (not repo).

Outcomes

DORA metrics tracked to validate improvement. [8]

References

[1] CNCF - What is platform engineering? (golden paths/paved roads framing): https://www.cncf.io/blog/2025/11/19/what-is-platform-engineering/ [2] Microsoft Learn - What is platform engineering? (paved paths / internal developer platform): https://learn.microsoft.com/en-us/platform-engineering/what-is-platform-engineering [3] CNCF TAG App Delivery - Platforms White Paper: https://tag-app-delivery.cncf.io/whitepapers/platforms/ [4] Kubernetes - Resource Management for Pods and Containers (requests/limits): https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ [5] Kubernetes - Configure Liveness, Readiness and Startup Probes: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ [6] Kubernetes - Specifying a Disruption Budget for your Application (PDB): https://kubernetes.io/docs/tasks/run-application/configure-pdb/ [7] OpenTelemetry - Documentation (instrumentation and telemetry): https://opentelemetry.io/docs/ [8] DORA - DORA’s software delivery performance metrics: https://dora.dev/guides/dora-metrics/

The Real Security Model for Agents

Sat, 18 Oct 2025 12:00:00 -0500

Why this matters

If you ship tool-using agents, you are shipping:

an execution engine
with access to external systems
controlled by untrusted inputs

That is the same security posture as any automation platform - except the “operator” is probabilistic.

OWASP’s Top 10 for LLM Applications makes it clear: prompt injection, insecure output handling, sensitive info disclosure, excessive agency… these are mainstream risks, not edge cases. [1] The good news: most mitigations are classic security engineering applied to a new execution model.

This article is a practical, production-first security model for agents and MCP tool ecosystems.

TL;DR

Don’t “secure the model.” Secure the system.
Treat all inputs as untrusted:
user text
tool outputs
retrieved documents
Design tools with least privilege:
separate read/write/danger tools
require preview -> apply for destructive actions
Centralize auth and policy:
MCP defines authorization for HTTP transports - use it. [2]
Control egress and prevent SSRF by default. [3]
Never let raw model output drive execution without validation (OWASP LLM02). [4]
Redact logs and manage secrets like an adult (OWASP cheat sheets). [5][6]

Threat model: what can go wrong
Security layers that actually work
Tool design: read/write/danger tiers
Output handling: never execute raw model output
Secrets: minimize, scope, rotate
Network and egress controls
Logging and audit without data leaks
A production checklist
References

Threat model: what can go wrong

1) Prompt injection -> policy bypass attempt

A user or document says:

“Ignore previous instructions”
“Call this tool with these parameters”
“Reveal secrets” OWASP calls this out as a primary risk category. [1]

2) Insecure output handling -> downstream exploitation

If you pass model output into:

a shell
SQL
YAML manifests
HTTP requests …without validation, you’ve built an indirect code execution path.

OWASP’s LLM02 describes this precisely: insufficient validation and handling of LLM outputs before passing them downstream. [4]

3) Excessive agency -> unintended side effects

The agent is over-permissioned:

it can delete resources
send emails
modify production …and it will eventually do something you didn’t mean.

4) Data exfiltration via tools

Tool outputs are rich and often sensitive:

calendar events
emails
internal tickets
source code
cluster configs

Exfil happens through:

model responses
logs
“helpful” summaries
tool chaining

5) Network abuse / SSRF

Any “fetch URL” capability is an SSRF invitation unless you constrain egress. OWASP’s SSRF cheat sheet is still relevant. [3]

Security layers that actually work

Security in agent systems is defense-in-depth:

Identity (who is calling?)
Authorization (what can they do?)
Contracts (what does a tool accept/return?)
Validation (are inputs/outputs safe?)
Egress control (where can the system talk to?)
Audit (what happened?)
Kill switches (how do you stop it fast?)

Tool design: read/write/danger tiers

Tiering is mandatory

Split tools by side effects:

Read tools: list/search/get
Write tools: create/update with bounded scope
Danger tools: deletes, bulk updates, privileged actions

Then enforce policy:

Read tools are widely available
Write tools require explicit scopes and tighter budgets
Danger tools require:
preview -> apply
confirmation tokens
additional policy checks

Preview -> Apply pattern

For dangerous operations:

plan_* returns a plan summary + plan_id
apply_* requires plan_id + user confirmation

This prevents “drive-by deletes” and supports audit.

Output handling: never execute raw model output

This is the most common real-world failure.

Rule: model output is data, not code

If the agent is generating:

kubernetes YAML
SQL statements
curl commands
Terraform changes …treat the output as untrusted data.

OWASP’s LLM02 guidance exists because people keep wiring LLM output directly into execution paths. [4]

Safer alternative: structured intent -> validated execution

Instead of:

LLM writes YAML -> apply

Do:

LLM proposes a structured change request (schema)
server validates:
allowlisted fields
bounded ranges
namespace/tenant scope
server executes with known-safe libraries

This is where “tool contracts” win.

Secrets: minimize, scope, rotate

Secrets are the other common failure path.

Minimum viable rules

Never put long-lived secrets in prompts.
Prefer short-lived tokens and scoped credentials.
Inject secrets server-side, not in the model context.

OWASP’s Secrets Management Cheat Sheet is a good baseline for central storage, rotation, auditing, and least privilege. [5]

Scope secrets to tenants and tools

Instead of “one OAuth token for everything,” mint:

per tenant
per tool category
short TTL

When something goes wrong, you want the blast radius small and revocation easy.

Network and egress controls

If your agent system can reach the open internet or internal networks, you need guardrails.

Egress allowlists

allowlist domains for integrations
block metadata IP ranges
re-validate after redirects

OWASP’s SSRF prevention guidance provides practical patterns for validation and blocking internal addresses. [3]

Separate network planes

Keep tool servers in a network segment that:

can reach only what they need
cannot reach internal admin endpoints
cannot reach secrets stores directly unless necessary

Logging and audit without data leaks

Logging is security. Logging is also a leak vector.

OWASP’s Logging Cheat Sheet calls out that logs may contain personal and sensitive information and must be protected from misuse. [6]

Practical logging rules

do not log raw prompts by default
do not log raw tool payloads by default
log structured summaries:
tool name
action class
resource IDs (safe identifiers)
status
latency
store audit events separately from debug logs

Audit events (always on)

Every write/danger tool should emit:

who / what / when / result
plan_id / idempotency_key
before/after resource identifiers (not content)

Audit is what makes “agents in production” defensible to security and compliance teams.

A production checklist

Identity and authorization

Strong auth for clients.
Least-privilege scopes per tool.
MCP HTTP authorization flow implemented where applicable. [2]

Tool contracts

Tools tiered: read/write/danger.
Preview -> apply for dangerous actions.
Schema validation + bounded arguments.

Output handling

No raw model output is executed without validation (OWASP LLM02). [4]

Secrets

Secrets never placed in prompts.
Short-lived, scoped tokens used.
Rotation/audit practices exist (OWASP Secrets Mgmt). [5]

Network

Egress allowlists exist.
SSRF protections implemented. [3]

Logging and audit

Logs are redacted and access-controlled.
Audit events exist for all side-effecting tools.
Log systems protected per OWASP guidance. [6]

References

[1] OWASP - Top 10 for Large Language Model Applications (v1.1): https://owasp.org/www-project-top-10-for-large-language-model-applications/ [2] Model Context Protocol (MCP) - Authorization (Protocol Revision 2025-11-25): https://modelcontextprotocol.io/specification/2025-11-25/basic/authorization [3] OWASP - SSRF Prevention Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/Server_Side_Request_Forgery_Prevention_Cheat_Sheet.html [4] OWASP GenAI Security Project - LLM02: Insecure Output Handling: https://genai.owasp.org/llmrisk2023-24/llm02-insecure-output-handling/ [5] OWASP - Secrets Management Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/Secrets_Management_Cheat_Sheet.html [6] OWASP - Logging Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/Logging_Cheat_Sheet.html

SRE | Roy Gabriel

Go vs Spring Boot for Enterprise APIs: Cost, Performance, and Cloud-Native Ops

Why this comparison keeps showing up

TL;DR

Contents

The cost model: what you actually pay for

A simple (illustrative) density example

Go’s production advantages (when they matter)

1) Packaging simplicity and deployment surface

2) Fast start and “scale events”

3) Resource envelope predictability

4) Cloud-native ergonomics: minimalism wins over time

Where Spring Boot is still the right tool

1) Ecosystem and “starter” leverage

2) JVM performance can be excellent

3) Operational maturity and tooling

Cloud-native reality: images, CVEs, and deploy surface

Distroless is not a Go-only advantage

Two Dockerfile patterns (illustrative)

Observability and operations

OpenTelemetry support

Java’s advantage: zero-code instrumentation

Go’s reality: explicit instrumentation (plus growing options)

A decision matrix

How to validate with a real experiment

1) Define a representative endpoint mix

2) Measure the four numbers that matter

3) Run the same load and failure profile

4) Compare operational work, not only performance

Common failure modes

Go pitfalls

Spring Boot pitfalls

Both ecosystems

Closing thought

References

MCP Servers in Production: Hardening, Backpressure, and Observability (Go)

Why this matters

TL;DR

Contents

A production mental model for MCP servers

Threat model: what actually goes wrong

1) Input ambiguity → destructive actions

2) Prompt injection → tool misuse

3) SSRF / network pivoting

4) Unbounded concurrency → resource collapse

5) “Helpful logs” → data leak

Hardening layer 1: identity and authorization

Practical rules

Go pattern: a minimal auth middleware skeleton (HTTP transport)

Hardening layer 2: tool contracts that resist ambiguity

Design tools like production APIs

Add a “preview → apply” flow for risky tools

Hardening layer 3: budgets and backpressure

Budget checklist

Go: server timeouts are not optional

Go: propagate cancellation everywhere with context

Go: per-tenant rate limiting with x/time/rate

Backpressure: choose a policy

Hardening layer 4: safe networking and SSRF containment

SSRF containment strategies that actually work

Go pattern: an outbound HTTP client with strict timeouts

Hardening layer 5: observability without leaking secrets

What to measure (minimum viable MCP telemetry)

Trace boundaries

Logging rules that save you later

Hardening layer 6: versioning and rollout discipline

Practical compatibility rules

Rollout like a platform team

A production checklist

Safety

Identity & access

Budgets & resilience

Networking

Observability

Operations

References

Agent Observability That Doesn't Lie

Why this matters

TL;DR

Contents

Go: propagate cancellation everywhere with `context`

Go: per-tenant rate limiting with `x/time/rate`