<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Roy Gabriel - DevOps Architect &amp; Applied AI Engineer | Roy Gabriel</title><link>https://roygabriel.dev/</link><description>Roy Gabriel: DevOps Architect &amp; Applied AI Engineer. Technical blog on Go, MCP servers, Kubernetes, and production AI systems.</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Fri, 27 Feb 2026 03:18:04 +0000</lastBuildDate><atom:link href="https://roygabriel.dev/index.xml" rel="self" type="application/rss+xml"/><item><title>Agent Swarms: The New Workforce Architects and Engineers Must Lead</title><link>https://roygabriel.dev/blog/agent-swarms-new-workforce/</link><pubDate>Thu, 26 Feb 2026 13:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/agent-swarms-new-workforce/</guid><description>&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;As-of note:&lt;/strong&gt; This is a production engineering perspective based on building and running agent swarms in production in early 2026. The workflows, numbers, and fine-tuning example come from real runs on my hardware and standards.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="why-this-shift-keeps-showing-up"&gt;Why this shift keeps showing up&lt;/h2&gt;
&lt;p&gt;If you build production systems long enough, you see the same pattern:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The “tool choice” is not what breaks velocity.&lt;/li&gt;
&lt;li&gt;The operational model and who does the work usually are.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When architects and engineers compare traditional coding to agent swarms, they are really asking:&lt;/p&gt;</description><content:encoded>
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;As-of note:&lt;/strong&gt; This is a production engineering perspective based on building and running agent swarms in production in early 2026. The workflows, numbers, and fine-tuning example come from real runs on my hardware and standards.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="why-this-shift-keeps-showing-up"&gt;Why this shift keeps showing up&lt;/h2&gt;
&lt;p&gt;If you build production systems long enough, you see the same pattern:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The “tool choice” is not what breaks velocity.&lt;/li&gt;
&lt;li&gt;The operational model and who does the work usually are.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When architects and engineers compare traditional coding to agent swarms, they are really asking:&lt;/p&gt;
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;“What will it cost to ship features at scale in the next 12 to 36 months, and who will actually build them?”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Single agents autocomplete code. Agent swarms plan features, assign tasks, review security, run tests, and deploy. The output is cleaner and more consistent than what most senior engineers produce alone.&lt;/p&gt;
&lt;p&gt;Anthropic&amp;rsquo;s 2026 Agentic Coding Trends Report shows single agents evolving into coordinated teams that handle long-running tasks across the full development lifecycle. Multi-agent orchestration is the breakthrough.&lt;/p&gt;
&lt;p&gt;Forbes reported in December 2025 that Gartner expects 40 percent of enterprise applications to embed task-specific agents by the end of 2026. Orchestrated swarms already deliver 30 to 50 percent faster feature delivery in early adopters.&lt;/p&gt;
&lt;p&gt;O&amp;rsquo;Reilly&amp;rsquo;s Signals for 2026 confirms the pattern: engineers move from writing code to orchestrating agents. Fundamentals stay essential. The difference is scale.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Agent swarms already build production code at higher quality and speed than the top 1 percent of FAANG engineers.&lt;/li&gt;
&lt;li&gt;The new job is writing precise specs, designing the agent workforce, and verifying outcomes.&lt;/li&gt;
&lt;li&gt;I built &lt;a href="https://cruvero.ai" target="_blank" rel="noopener noreferrer"&gt;Cruvero&lt;/a&gt;
exactly for this model. It is the production control plane that manages swarms in production.&lt;/li&gt;
&lt;li&gt;In one run I used &lt;a href="https://cruvero.ai" target="_blank" rel="noopener noreferrer"&gt;Cruvero&lt;/a&gt;
to prep my entire personal knowledge base and fine-tune Qwen2.5-Coder-72B on two RTX 5090 GPUs. The resulting model now writes code exactly to my standards.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cruvero.ai" target="_blank" rel="noopener noreferrer"&gt;Cruvero&lt;/a&gt;
will be released as open source in the next two weeks so anyone can run the same production-grade agent swarm control plane.&lt;/li&gt;
&lt;li&gt;Start today or watch the gap widen.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="contents"&gt;Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#what-agent-swarms-do-today"&gt;What Agent Swarms Do Today&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cruvero.ai" target="_blank" rel="noopener noreferrer"&gt;Cruvero&lt;/a&gt;
as the &lt;a href="#cruvero-as-the-working-example"&gt;Working Example&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#the-new-role-for-architects-and-engineers"&gt;The New Role for Architects and Engineers&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#why-you-must-start-today"&gt;Why You Must Start Today&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#references"&gt;References&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="what-agent-swarms-do-today"&gt;What Agent Swarms Do Today&lt;/h2&gt;
&lt;p&gt;A single agent writes a function. A swarm plans an entire feature, breaks it into tasks, assigns roles, iterates on failures, runs tests, and deploys. The output is cleaner, more consistent, and more reliable than what most senior engineers produce alone.&lt;/p&gt;
&lt;p&gt;Multi-agent orchestration lets one agent plan, another implement, a third review for security, and a fourth measure performance. They collaborate without human prompts for hours or days.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="cruvero-as-the-working-example"&gt;&lt;a href="https://cruvero.ai" target="_blank" rel="noopener noreferrer"&gt;Cruvero&lt;/a&gt;
as the Working Example&lt;/h2&gt;
&lt;p&gt;Over the last 2 months on weekends and nights, I built &lt;a href="https://cruvero.ai" target="_blank" rel="noopener noreferrer"&gt;Cruvero&lt;/a&gt;
as a full workflow builder and the complete DevOps/SRE agent swarm platform. It is 350k lines of clean Go and React running as a production control plane, with a clean UI and API layer for knowledge bases, multi-agent runs, agent registration, cost tracking, flows, and MCP bindings in Kubernetes. In early 2026, I have not seen another platform that ships this full stack in one place.&lt;/p&gt;
&lt;p&gt;Core capabilities include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Knowledge bases that store versioned domain context so agents never repeat mistakes&lt;/li&gt;
&lt;li&gt;Runs page for launching, monitoring, and intervening in complex multi-agent workflows with patterns like Delegate&lt;/li&gt;
&lt;li&gt;Agents page that lets you register, trust-score, deploy, and bind agents directly from the UI&lt;/li&gt;
&lt;li&gt;Incident triage that runs faster than human response, then posts root-cause analysis and recommended fixes directly into Slack or Microsoft Teams&lt;/li&gt;
&lt;li&gt;Pre-run previews, compliance checks, per-agent overrides, full audit trails, and cost tracking&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This turns raw agent swarms into a manageable, auditable workforce. This week the platform reached the point where its own agent swarms are building new features inside &lt;a href="https://cruvero.ai" target="_blank" rel="noopener noreferrer"&gt;Cruvero&lt;/a&gt;
itself.&lt;/p&gt;
&lt;p&gt;In one recent run, a swarm of 12 agents took a high-level spec for a new MCP gateway, generated the Go handlers, added observability, wrote tests, and produced a Helm chart ready for deployment. The entire process took 47 minutes. I reviewed the plan, approved two overrides, and signed off. The code passed every gate I set.&lt;/p&gt;
&lt;p&gt;Here is a concrete example of how deeply I integrated my own standards. This week I used &lt;a href="https://cruvero.ai" target="_blank" rel="noopener noreferrer"&gt;Cruvero&lt;/a&gt;
to create a domain-specific fine-tuned model. I pointed the swarm at my private knowledge base: every personal(not employer code) code example I have written in the last five years, my strict documentation standards (every public function must include invariants, performance notes, and failure modes in a precise JSDoc format), my architecture standards (hexagonal with CQRS and explicit boundaries), and my full infrastructure layout (multi-region Kubernetes with Cilium CNI, ArgoCD GitOps, external secrets, and cross-cluster service mesh using Istio).&lt;/p&gt;
&lt;p&gt;The swarm prepped the entire dataset in 18 minutes. It extracted, cleaned, deduplicated, and converted everything into instruction-response pairs that perfectly match the way I write and review code. Then it launched a QLoRA fine-tune of Qwen2.5-Coder-72B across my two RTX 5090 GPUs. The training completed overnight. The resulting model, now called &lt;a href="https://cruvero.ai" target="_blank" rel="noopener noreferrer"&gt;Cruvero&lt;/a&gt;
-Gabriel-72B, is registered directly inside the agent swarm as a first-class capability.&lt;/p&gt;
&lt;p&gt;When I give the swarm a new spec today, it calls my fine-tuned model for every code generation step. The output follows my exact coding style, documentation format, architecture patterns, and infrastructure conventions without any manual correction. The swarm literally thinks and builds like I do, at 100 times the speed.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cruvero.ai" target="_blank" rel="noopener noreferrer"&gt;Cruvero&lt;/a&gt;
will be released as open source in the next two weeks so any engineer or team can self-host the same production-grade agent swarm control plane I use daily.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="the-new-role-for-architects-and-engineers"&gt;The New Role for Architects and Engineers&lt;/h2&gt;
&lt;p&gt;You will spend your time on three things:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Write precise specification documents.&lt;/strong&gt; Define goals, constraints, acceptance criteria, security boundaries, and stop rules. These become the source of truth the swarm follows.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Design the agent workforce.&lt;/strong&gt; Choose which agents handle planning, implementation, review, and verification. Set trust scores, capabilities, and escalation paths.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Verify and steer.&lt;/strong&gt; Run preflight checks, inspect outputs against intent, and adjust the swarm in real time. Your judgment determines whether the final system meets business needs.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This is harder than writing code. It requires deep systems thinking, clear communication, and production discipline. The best architects already excel at it. The rest must learn fast.&lt;/p&gt;
&lt;p&gt;Pragmatic Engineer noted in January 2026 that teams where AI writes 90 percent of the code still need senior engineers for architecture decisions and coherence. Karpathy has said the same: technical expertise becomes more valuable because skilled people extract far more from agents.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="why-you-must-start-today"&gt;Why You Must Start Today&lt;/h2&gt;
&lt;p&gt;The gap between those who manage agent swarms and those who do not will widen quickly. In 12 months, the best teams will ship features that used to take quarters. In 36 months, organizations without swarm capability will struggle to compete on velocity and quality.&lt;/p&gt;
&lt;p&gt;Start small. Take one workflow you own. Break it into phases. Build a prompt library. Run a swarm on a non-critical task. Measure the output against your own work. Iterate.&lt;/p&gt;
&lt;p&gt;The tools exist now: Claude Code, Codex, open MCP gateways, and platforms like &lt;a href="https://cruvero.ai" target="_blank" rel="noopener noreferrer"&gt;Cruvero&lt;/a&gt;
. The question is whether you will lead the workforce or watch someone else do it.&lt;/p&gt;
&lt;p&gt;The next generation of software systems will be built by agent swarms under human direction. Architects and engineers who master this model will define the future. Those who wait will find their skills replaced.&lt;/p&gt;
&lt;p&gt;The shift is here. Lead it.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;Anthropic. 2026 Agentic Coding Trends Report. January 2026.&lt;/li&gt;
&lt;li&gt;Minevich, Mark. &amp;ldquo;Agentic AI Takes Over — 11 Shocking 2026 Predictions.&amp;rdquo; Forbes. December 31, 2025.&lt;/li&gt;
&lt;li&gt;O&amp;rsquo;Reilly Media. &amp;ldquo;Signals for 2026.&amp;rdquo; January 9, 2026.&lt;/li&gt;
&lt;li&gt;Various sources on Karpathy statements compiled in Pragmatic Engineer and The New Stack, 2025–2026.&lt;/li&gt;
&lt;/ol&gt;</content:encoded></item><item><title>Chapter 16: Worked Example: Converting an Ansible Playbook to a Go Temporal Workflow</title><link>https://roygabriel.dev/blog/llm-development-guide/15-worked-example-ansible-to-temporal/</link><pubDate>Fri, 13 Feb 2026 09:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/llm-development-guide/15-worked-example-ansible-to-temporal/</guid><description>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 16 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/14-worked-example-helm-chart/"&gt;Chapter 15: Worked Example: Creating a Helm Chart From a Reference Chart&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to migrate a procedural automation (for example, an Ansible playbook) into a durable Temporal workflow:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Extract discrete steps from the playbook.&lt;/li&gt;
&lt;li&gt;Map steps to activities.&lt;/li&gt;
&lt;li&gt;Implement a workflow that follows your team&amp;rsquo;s existing Temporal patterns.&lt;/li&gt;
&lt;li&gt;Add verification and tests so the migration is not faith-based.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Start from a working reference workflow in your repo.&lt;/li&gt;
&lt;li&gt;Paste both the playbook and the reference into the planning prompt.&lt;/li&gt;
&lt;li&gt;Define activities first, then the workflow.&lt;/li&gt;
&lt;li&gt;Verify with Temporal workflow tests and any integration checks you can run safely.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#scenario"&gt;Scenario&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#reference-inputs"&gt;Reference inputs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#plan-and-phase-structure"&gt;Plan and phase structure&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#implementation-skeleton-go"&gt;Implementation skeleton (Go)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#gotchas"&gt;Gotchas&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="scenario"&gt;Scenario&lt;/h2&gt;
&lt;p&gt;Example: convert a playbook that creates a Kubernetes namespace and resource quota into a Temporal workflow.&lt;/p&gt;</description><content:encoded>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 16 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/14-worked-example-helm-chart/"&gt;Chapter 15: Worked Example: Creating a Helm Chart From a Reference Chart&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to migrate a procedural automation (for example, an Ansible playbook) into a durable Temporal workflow:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Extract discrete steps from the playbook.&lt;/li&gt;
&lt;li&gt;Map steps to activities.&lt;/li&gt;
&lt;li&gt;Implement a workflow that follows your team&amp;rsquo;s existing Temporal patterns.&lt;/li&gt;
&lt;li&gt;Add verification and tests so the migration is not faith-based.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Start from a working reference workflow in your repo.&lt;/li&gt;
&lt;li&gt;Paste both the playbook and the reference into the planning prompt.&lt;/li&gt;
&lt;li&gt;Define activities first, then the workflow.&lt;/li&gt;
&lt;li&gt;Verify with Temporal workflow tests and any integration checks you can run safely.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#scenario"&gt;Scenario&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#reference-inputs"&gt;Reference inputs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#plan-and-phase-structure"&gt;Plan and phase structure&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#implementation-skeleton-go"&gt;Implementation skeleton (Go)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#gotchas"&gt;Gotchas&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="scenario"&gt;Scenario&lt;/h2&gt;
&lt;p&gt;Example: convert a playbook that creates a Kubernetes namespace and resource quota into a Temporal workflow.&lt;/p&gt;
&lt;p&gt;Why this is a good fit:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Work is step-based.&lt;/li&gt;
&lt;li&gt;You want retries and observability.&lt;/li&gt;
&lt;li&gt;You want an execution history.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="reference-inputs"&gt;Reference inputs&lt;/h2&gt;
&lt;p&gt;Paste these into your planning prompt:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The playbook file (or the relevant section): &lt;code&gt;playbooks/create-namespace.yml&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;A reference workflow file that represents your team&amp;rsquo;s patterns.&lt;/li&gt;
&lt;li&gt;A reference activity implementation file (if you have one).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Suggested commands:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;sed -n &lt;span class="s1"&gt;&amp;#39;1,200p&amp;#39;&lt;/span&gt; playbooks/create-namespace.yml
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;sed -n &lt;span class="s1"&gt;&amp;#39;1,200p&amp;#39;&lt;/span&gt; internal/workflows/provision_cluster.go
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;ls -la internal/activities &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="plan-and-phase-structure"&gt;Plan and phase structure&lt;/h2&gt;
&lt;p&gt;A reasonable phase split:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Phase 1: define activity I/O and implement activities.&lt;/li&gt;
&lt;li&gt;Phase 2: implement workflow with retries and timeouts.&lt;/li&gt;
&lt;li&gt;Phase 3: add tests.&lt;/li&gt;
&lt;li&gt;Phase 4: register and deploy.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The plan should include verification per phase.&lt;/p&gt;
&lt;h2 id="implementation-skeleton-go"&gt;Implementation skeleton (Go)&lt;/h2&gt;
&lt;p&gt;This is a minimal skeleton to illustrate structure. It is intentionally not a full program.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;package&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;workflows&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;time&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;go.temporal.io/sdk/temporal&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;go.temporal.io/sdk/workflow&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;CreateNamespaceInput&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Namespace&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;CPULimit&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;MemLimit&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;CreateNamespaceOutput&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Namespace&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="c1"&gt;// CreateNamespaceWorkflow orchestrates namespace creation.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="c1"&gt;// It assumes there are activities registered for create + quota + verify.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;CreateNamespaceWorkflow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;CreateNamespaceInput&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;CreateNamespaceOutput&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetLogger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#34;starting namespace workflow&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;namespace&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Namespace&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithActivityOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ActivityOptions&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;StartToCloseTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Minute&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;RetryPolicy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;temporal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;RetryPolicy&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;MaximumAttempts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// Pseudocode: adapt to your activity names and input/output types.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// var nsResult activities.CreateNamespaceOutput&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// err := workflow.ExecuteActivity(ctx, activities.CreateNamespace, activities.CreateNamespaceInput{...}).Get(ctx, &amp;amp;nsResult)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// if err != nil { return nil, err }&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// var quotaResult activities.CreateResourceQuotaOutput&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// err = workflow.ExecuteActivity(ctx, activities.CreateResourceQuota, activities.CreateResourceQuotaInput{...}).Get(ctx, &amp;amp;quotaResult)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// if err != nil { return nil, err }&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// var verifyResult activities.VerifyNamespaceOutput&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// err = workflow.ExecuteActivity(ctx, activities.VerifyNamespace, activities.VerifyNamespaceInput{...}).Get(ctx, &amp;amp;verifyResult)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// if err != nil { return nil, err }&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;CreateNamespaceOutput&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;Namespace&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Namespace&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Notes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;workflow.ExecuteActivity&lt;/code&gt; calls are left as pseudocode because activity package names and types are repo-specific.&lt;/li&gt;
&lt;li&gt;Keep the skeleton syntactically correct.&lt;/li&gt;
&lt;li&gt;Use your reference workflow as the style guide.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="verification"&gt;Verification&lt;/h2&gt;
&lt;p&gt;Your verification should include at least one of these:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Temporal workflow unit tests (Temporal test framework).&lt;/li&gt;
&lt;li&gt;Activity unit tests (mock external systems).&lt;/li&gt;
&lt;li&gt;A safe integration test against a non-production environment.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example commands (adapt to your repo):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Run unit tests.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;go &lt;span class="nb"&gt;test&lt;/span&gt; ./...
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# If you have a focused workflow test package.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;go &lt;span class="nb"&gt;test&lt;/span&gt; ./internal/workflows -run TestCreateNamespaceWorkflow
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Expected results:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tests exit with code 0.&lt;/li&gt;
&lt;li&gt;Failures are actionable (not timeouts with no logs).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="gotchas"&gt;Gotchas&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;If you skip the reference workflow, your new workflow will not match team patterns.&lt;/li&gt;
&lt;li&gt;If you skip tests, you will not know whether retries and timeouts behave correctly.&lt;/li&gt;
&lt;li&gt;LLMs will happily invent Temporal APIs. Verify imports and method names exist in your actual SDK version.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>Chapter 15: Worked Example: Creating a Helm Chart From a Reference Chart</title><link>https://roygabriel.dev/blog/llm-development-guide/14-worked-example-helm-chart/</link><pubDate>Wed, 11 Feb 2026 07:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/llm-development-guide/14-worked-example-helm-chart/</guid><description>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 15 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/13-building-a-prompt-library/"&gt;Chapter 14: Building a Prompt Library: Governance + Quality Bar&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/15-worked-example-ansible-to-temporal/"&gt;Chapter 16: Worked Example: Converting an Ansible Playbook to a Go Temporal Workflow&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to create a production-quality Helm chart by following an existing chart in your repo as the reference:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Gather high-signal reference inputs.&lt;/li&gt;
&lt;li&gt;Produce a phased plan and prompt docs.&lt;/li&gt;
&lt;li&gt;Execute in reviewable commits.&lt;/li&gt;
&lt;li&gt;Verify the chart renders and lints cleanly.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;The reference chart is the source of truth for structure and conventions.&lt;/li&gt;
&lt;li&gt;Paste the reference inputs into your planning prompt.&lt;/li&gt;
&lt;li&gt;Execute one file at a time, with &lt;code&gt;helm lint&lt;/code&gt; and &lt;code&gt;helm template&lt;/code&gt; as gates.&lt;/li&gt;
&lt;li&gt;If you do not have a real reference chart, pick a different worked example.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#scenario"&gt;Scenario&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#reference-inputs"&gt;Reference inputs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#phase-1-plan"&gt;Phase 1: Plan&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#phase-2-prompt-docs"&gt;Phase 2: Prompt docs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#phase-3-execute-in-logical-units"&gt;Phase 3: Execute in logical units&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#gotchas"&gt;Gotchas&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="scenario"&gt;Scenario&lt;/h2&gt;
&lt;p&gt;Goal: create a new chart (example: &lt;code&gt;metrics-gateway&lt;/code&gt;) based on a known-good reference chart (example: &lt;code&gt;event-processor&lt;/code&gt;).&lt;/p&gt;</description><content:encoded>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 15 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/13-building-a-prompt-library/"&gt;Chapter 14: Building a Prompt Library: Governance + Quality Bar&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/15-worked-example-ansible-to-temporal/"&gt;Chapter 16: Worked Example: Converting an Ansible Playbook to a Go Temporal Workflow&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to create a production-quality Helm chart by following an existing chart in your repo as the reference:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Gather high-signal reference inputs.&lt;/li&gt;
&lt;li&gt;Produce a phased plan and prompt docs.&lt;/li&gt;
&lt;li&gt;Execute in reviewable commits.&lt;/li&gt;
&lt;li&gt;Verify the chart renders and lints cleanly.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;The reference chart is the source of truth for structure and conventions.&lt;/li&gt;
&lt;li&gt;Paste the reference inputs into your planning prompt.&lt;/li&gt;
&lt;li&gt;Execute one file at a time, with &lt;code&gt;helm lint&lt;/code&gt; and &lt;code&gt;helm template&lt;/code&gt; as gates.&lt;/li&gt;
&lt;li&gt;If you do not have a real reference chart, pick a different worked example.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#scenario"&gt;Scenario&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#reference-inputs"&gt;Reference inputs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#phase-1-plan"&gt;Phase 1: Plan&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#phase-2-prompt-docs"&gt;Phase 2: Prompt docs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#phase-3-execute-in-logical-units"&gt;Phase 3: Execute in logical units&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#gotchas"&gt;Gotchas&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="scenario"&gt;Scenario&lt;/h2&gt;
&lt;p&gt;Goal: create a new chart (example: &lt;code&gt;metrics-gateway&lt;/code&gt;) based on a known-good reference chart (example: &lt;code&gt;event-processor&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;This is a workflow example. You will need to substitute your real chart names and paths.&lt;/p&gt;
&lt;h2 id="reference-inputs"&gt;Reference inputs&lt;/h2&gt;
&lt;p&gt;Run these commands in your repo and paste the output into the planning prompt.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Chart structure and key files.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;tree charts/event-processor/
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;sed -n &lt;span class="s1"&gt;&amp;#39;1,200p&amp;#39;&lt;/span&gt; charts/event-processor/Chart.yaml
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;sed -n &lt;span class="s1"&gt;&amp;#39;1,200p&amp;#39;&lt;/span&gt; charts/event-processor/values.yaml
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;sed -n &lt;span class="s1"&gt;&amp;#39;1,200p&amp;#39;&lt;/span&gt; charts/event-processor/templates/_helpers.tpl
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# If your reference chart uses these, include them too.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;ls -la charts/event-processor &lt;span class="p"&gt;|&lt;/span&gt; rg -n &lt;span class="s2"&gt;&amp;#34;values-&amp;#34;&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Why this matters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Structure: avoids &amp;ldquo;generic Helm&amp;rdquo; output.&lt;/li&gt;
&lt;li&gt;Naming and labels: keeps your charts consistent.&lt;/li&gt;
&lt;li&gt;Values shape: keeps operator UX consistent.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="phase-1-plan"&gt;Phase 1: Plan&lt;/h2&gt;
&lt;p&gt;Create a plan that is mostly:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What files will exist.&lt;/li&gt;
&lt;li&gt;What differences are specific to &lt;code&gt;metrics-gateway&lt;/code&gt; (ports, probes, resources).&lt;/li&gt;
&lt;li&gt;How you will verify each phase.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example plan skeleton:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-markdown" data-lang="markdown"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;# metrics-gateway Helm Chart Plan
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Goals
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; Create charts/metrics-gateway matching reference structure.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Render successfully with helm template.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Lint cleanly.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## References
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; charts/event-processor/
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Phase 1: Analysis
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; Document naming conventions from reference.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Verification:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; tree charts/event-processor
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Phase 2: Scaffold
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; Create Chart.yaml
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; Create values.yaml
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; Create templates/_helpers.tpl
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Verification:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; helm lint charts/metrics-gateway
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Phase 3: Core templates
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; deployment
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; service
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; configmap
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Verification:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; helm template charts/metrics-gateway &amp;gt; /tmp/rendered.yaml
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Definition of done
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; helm lint exits 0
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; helm template exits 0
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="phase-2-prompt-docs"&gt;Phase 2: Prompt docs&lt;/h2&gt;
&lt;p&gt;Generate one prompt file per phase. Include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The plan path.&lt;/li&gt;
&lt;li&gt;The work-notes path.&lt;/li&gt;
&lt;li&gt;Reference chart file paths.&lt;/li&gt;
&lt;li&gt;Deliverables (exact files).&lt;/li&gt;
&lt;li&gt;Constraints (MUST and MUST NOT).&lt;/li&gt;
&lt;li&gt;Verification commands.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A good constraint to include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;Match reference structure exactly.&amp;rdquo; (and name what that means)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="phase-3-execute-in-logical-units"&gt;Phase 3: Execute in logical units&lt;/h2&gt;
&lt;p&gt;You have two implementation strategies.&lt;/p&gt;
&lt;h3 id="strategy-a-recommended-copy-the-reference-chart-then-adapt"&gt;Strategy A (recommended): copy the reference chart, then adapt&lt;/h3&gt;
&lt;p&gt;This is often the fastest way to guarantee structure consistency.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cp -R charts/event-processor charts/metrics-gateway
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Then rename strings and values in a controlled way.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Review each replacement before committing.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;rg -n &lt;span class="s2"&gt;&amp;#34;event-processor&amp;#34;&lt;/span&gt; charts/metrics-gateway
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now execute in logical units:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Update &lt;code&gt;Chart.yaml&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Update &lt;code&gt;values.yaml&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Update &lt;code&gt;_helpers.tpl&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Update one template file at a time.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For each logical unit:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Update work notes.&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;helm lint&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Propose a commit.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="strategy-b-scaffold-from-scratch-guided-by-the-reference"&gt;Strategy B: scaffold from scratch, guided by the reference&lt;/h3&gt;
&lt;p&gt;Use this when copying would bring too much baggage.&lt;/p&gt;
&lt;p&gt;You still paste the reference files, but ask the model to reproduce the structure explicitly.&lt;/p&gt;
&lt;h2 id="verification"&gt;Verification&lt;/h2&gt;
&lt;p&gt;Run both linting and rendering.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;helm lint charts/metrics-gateway
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;helm template charts/metrics-gateway &amp;gt; /tmp/metrics-gateway.rendered.yaml
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;test&lt;/span&gt; -s /tmp/metrics-gateway.rendered.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Expected results:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;All commands exit with code 0.&lt;/li&gt;
&lt;li&gt;The rendered YAML is non-empty.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Optional: diff against the reference chart structure:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Compare structure only.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; charts/event-processor &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; find . -type f &lt;span class="p"&gt;|&lt;/span&gt; sort&lt;span class="o"&gt;)&lt;/span&gt; &amp;gt; /tmp/ref-files.txt
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; charts/metrics-gateway &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; find . -type f &lt;span class="p"&gt;|&lt;/span&gt; sort&lt;span class="o"&gt;)&lt;/span&gt; &amp;gt; /tmp/new-files.txt
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;diff -u /tmp/ref-files.txt /tmp/new-files.txt &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Expected result:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The file lists are close, with only intentional differences.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="gotchas"&gt;Gotchas&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;If you do not paste the reference files, you will get generic charts.&lt;/li&gt;
&lt;li&gt;Be explicit about service ports, probe paths, and resource defaults.&lt;/li&gt;
&lt;li&gt;Add negative constraints (&amp;ldquo;do not add ingress yet&amp;rdquo;) so scope doesn&amp;rsquo;t expand.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Continue -&amp;gt; &lt;a href="https://roygabriel.dev/blog/llm-development-guide/15-worked-example-ansible-to-temporal/"&gt;Chapter 16: Worked Example: Converting an Ansible Playbook to a Go Temporal Workflow&lt;/a&gt;
&lt;/p&gt;</content:encoded></item><item><title>Chapter 14: Building a Prompt Library: Governance + Quality Bar</title><link>https://roygabriel.dev/blog/llm-development-guide/13-building-a-prompt-library/</link><pubDate>Mon, 09 Feb 2026 06:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/llm-development-guide/13-building-a-prompt-library/</guid><description>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 14 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/12-templates-checklists/"&gt;Chapter 13: Templates + Checklists: The Copy/Paste Kit&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/14-worked-example-helm-chart/"&gt;Chapter 15: Worked Example: Creating a Helm Chart From a Reference Chart&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to build a prompt library that doesn&amp;rsquo;t turn into a junk drawer:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Organize prompts by task type.&lt;/li&gt;
&lt;li&gt;Define a consistent prompt entry format.&lt;/li&gt;
&lt;li&gt;Set a contribution and maintenance policy.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;A prompt library is a shared collection of prompts proven in real usage.&lt;/li&gt;
&lt;li&gt;Require prereqs, recommended model tier, expected output, and common failure fixes.&lt;/li&gt;
&lt;li&gt;Assign maintainers.&lt;/li&gt;
&lt;li&gt;Version prompts with a changelog.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#library-structure"&gt;Library structure&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#prompt-entry-template"&gt;Prompt entry template&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#contribution-guidelines"&gt;Contribution guidelines&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#governance"&gt;Governance&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="library-structure"&gt;Library structure&lt;/h2&gt;
&lt;p&gt;A simple layout that scales:&lt;/p&gt;</description><content:encoded>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 14 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/12-templates-checklists/"&gt;Chapter 13: Templates + Checklists: The Copy/Paste Kit&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/14-worked-example-helm-chart/"&gt;Chapter 15: Worked Example: Creating a Helm Chart From a Reference Chart&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to build a prompt library that doesn&amp;rsquo;t turn into a junk drawer:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Organize prompts by task type.&lt;/li&gt;
&lt;li&gt;Define a consistent prompt entry format.&lt;/li&gt;
&lt;li&gt;Set a contribution and maintenance policy.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;A prompt library is a shared collection of prompts proven in real usage.&lt;/li&gt;
&lt;li&gt;Require prereqs, recommended model tier, expected output, and common failure fixes.&lt;/li&gt;
&lt;li&gt;Assign maintainers.&lt;/li&gt;
&lt;li&gt;Version prompts with a changelog.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#library-structure"&gt;Library structure&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#prompt-entry-template"&gt;Prompt entry template&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#contribution-guidelines"&gt;Contribution guidelines&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#governance"&gt;Governance&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="library-structure"&gt;Library structure&lt;/h2&gt;
&lt;p&gt;A simple layout that scales:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;prompt-library/
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; README.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; CONTRIBUTING.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; planning/
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; implementation/
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; testing/
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; review/
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; debugging/
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Keep it boring. Avoid inventing new categories every week.&lt;/p&gt;
&lt;h2 id="prompt-entry-template"&gt;Prompt entry template&lt;/h2&gt;
&lt;p&gt;Require a consistent format so prompts are reusable:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-markdown" data-lang="markdown"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;# &amp;lt;Task Name&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## When to use
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Prerequisites
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;-
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Recommended model tier
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## The prompt
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Customization points
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Expected output
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Common issues and fixes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Examples
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Changelog
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; YYYY-MM-DD: &amp;lt;what changed&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="contribution-guidelines"&gt;Contribution guidelines&lt;/h2&gt;
&lt;p&gt;Set a quality bar:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A prompt must have been used successfully multiple times.&lt;/li&gt;
&lt;li&gt;It must specify required reference files.&lt;/li&gt;
&lt;li&gt;It must include verification.&lt;/li&gt;
&lt;li&gt;It must include common failure modes and fixes.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A contribution checklist:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Used successfully 3+ times.&lt;/li&gt;
&lt;li&gt;Another person can run it with the listed prereqs.&lt;/li&gt;
&lt;li&gt;Changelog updated.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="governance"&gt;Governance&lt;/h2&gt;
&lt;p&gt;If nobody owns it, it rots.&lt;/p&gt;
&lt;p&gt;Assign 1 to 2 maintainers to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Review new prompts.&lt;/li&gt;
&lt;li&gt;De-duplicate similar prompts.&lt;/li&gt;
&lt;li&gt;Archive prompts that no longer work.&lt;/li&gt;
&lt;li&gt;Run a quarterly cleanup.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="verification"&gt;Verification&lt;/h2&gt;
&lt;p&gt;Bootstrap the skeleton:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;mkdir -p prompt-library/&lt;span class="o"&gt;{&lt;/span&gt;planning,implementation,testing,review,debugging&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;touch prompt-library/README.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;touch prompt-library/CONTRIBUTING.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cat &amp;gt; prompt-library/planning/new-task.md &lt;span class="s"&gt;&amp;lt;&amp;lt;&amp;#39;MD&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;# New Task Planning
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## When to use
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Prerequisites
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Recommended model tier
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## The prompt
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Verification
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;MD&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Expected result:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You have a real place to put prompts that worked, with enough structure to keep it maintainable.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Continue -&amp;gt; &lt;a href="https://roygabriel.dev/blog/llm-development-guide/14-worked-example-helm-chart/"&gt;Chapter 15: Worked Example: Creating a Helm Chart From a Reference Chart&lt;/a&gt;
&lt;/p&gt;</content:encoded></item><item><title>When Enterprise Defaults Become Enterprise Debt</title><link>https://roygabriel.dev/blog/enterprise-defaults-enterprise-debt/</link><pubDate>Sat, 07 Feb 2026 09:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/enterprise-defaults-enterprise-debt/</guid><description>&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;Note on examples:&lt;/strong&gt; The scenarios below are &lt;strong&gt;anonymized composites&lt;/strong&gt;. They&amp;rsquo;re not a critique of any one organization; they&amp;rsquo;re patterns that repeat across industries.
The goal isn&amp;rsquo;t to &amp;ldquo;modernize for fun.&amp;rdquo; It&amp;rsquo;s to protect speed-to-market &lt;em&gt;and&lt;/em&gt; reliability as systems and organizations scale.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;Most enterprises don&amp;rsquo;t lose because they picked the &amp;ldquo;wrong&amp;rdquo; framework or cloud provider. They lose because old defaults - once rational - become invisible policy.&lt;/p&gt;</description><content:encoded>
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;Note on examples:&lt;/strong&gt; The scenarios below are &lt;strong&gt;anonymized composites&lt;/strong&gt;. They&amp;rsquo;re not a critique of any one organization; they&amp;rsquo;re patterns that repeat across industries.
The goal isn&amp;rsquo;t to &amp;ldquo;modernize for fun.&amp;rdquo; It&amp;rsquo;s to protect speed-to-market &lt;em&gt;and&lt;/em&gt; reliability as systems and organizations scale.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;Most enterprises don&amp;rsquo;t lose because they picked the &amp;ldquo;wrong&amp;rdquo; framework or cloud provider. They lose because old defaults - once rational - become invisible policy.&lt;/p&gt;
&lt;p&gt;The 90s and early 2000s optimized for constraints that were real at the time:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;hardware was expensive&lt;/li&gt;
&lt;li&gt;automation was immature&lt;/li&gt;
&lt;li&gt;environments were scarce&lt;/li&gt;
&lt;li&gt;security controls were largely manual&lt;/li&gt;
&lt;li&gt;uptime was achieved by cautious change, not by safe change&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Those constraints have shifted. But many organizations still run on &lt;strong&gt;architectural and governance defaults&lt;/strong&gt; designed for a different era.&lt;/p&gt;
&lt;p&gt;The result is predictable:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;innovation slows&lt;/strong&gt; (lead time grows)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;quality degrades&lt;/strong&gt; (late integration + big-bang changes)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;reliability suffers&lt;/strong&gt; (risk is batched, blast radius expands)&lt;/li&gt;
&lt;li&gt;engineers spend more time navigating the system than improving it&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you want a single sentence summary: &lt;strong&gt;old patterns don&amp;rsquo;t just slow delivery - they also create the conditions for outages.&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Retire &amp;ldquo;analysis as delivery.&amp;rdquo; Timebox discovery and ship thin vertical slices.&lt;/li&gt;
&lt;li&gt;Treat cloud primitives as &lt;em&gt;primitives&lt;/em&gt;, not research projects (e.g., object storage is solved).&lt;/li&gt;
&lt;li&gt;Default to &lt;strong&gt;containers + orchestration&lt;/strong&gt; for most stateless services; use VMs deliberately, not reflexively. [5]&lt;/li&gt;
&lt;li&gt;Replace ticket queues and boards with &lt;strong&gt;guardrails + paved roads + policy-as-code&lt;/strong&gt;. [7][8]&lt;/li&gt;
&lt;li&gt;Measure what matters: &lt;strong&gt;lead time, deploy frequency, change failure rate, MTTR&lt;/strong&gt;. [1][2]&lt;/li&gt;
&lt;li&gt;Modernization works best as an incremental program, not a rewrite (Strangler Fig pattern). [12]&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="contents"&gt;Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#pattern-1-analysis-as-a-substitute-for-delivery"&gt;Pattern 1: Analysis as a substitute for delivery&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#pattern-2-reinventing-commodity-infrastructure"&gt;Pattern 2: Reinventing commodity infrastructure&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#pattern-3-vm-first-thinking-as-the-default"&gt;Pattern 3: VM-first thinking as the default&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#pattern-4-ticket-driven-infrastructure"&gt;Pattern 4: Ticket-driven infrastructure&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#pattern-5-change-advisory-board-for-routine-changes"&gt;Pattern 5: Change Advisory Board for routine changes&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#pattern-6-the-shared-database-empire"&gt;Pattern 6: The shared database empire&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#pattern-7-central-integration-as-a-chokepoint"&gt;Pattern 7: Central integration as a chokepoint&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#pattern-8-perma-pocs-and-innovation-theater"&gt;Pattern 8: Perma-POCs and innovation theater&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#replace-committees-with-guardrails"&gt;Replace committees with guardrails&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#modernize-without-a-rewrite"&gt;Modernize without a rewrite&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification-how-you-know-its-working"&gt;Verification: how you know it&amp;rsquo;s working&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-practical-checklist"&gt;A practical checklist&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#references"&gt;References&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="pattern-1-analysis-as-a-substitute-for-delivery"&gt;Pattern 1: Analysis as a substitute for delivery&lt;/h2&gt;
&lt;h3 id="what-it-looks-like"&gt;What it looks like&lt;/h3&gt;
&lt;p&gt;A team spends months (sometimes a year) doing &amp;ldquo;analysis&amp;rdquo; for a capability that won&amp;rsquo;t be used until it&amp;rsquo;s built - often with the intention of eliminating all risk up front.&lt;/p&gt;
&lt;p&gt;Common examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;multi-tenant &amp;ldquo;high availability image storage&amp;rdquo; designed from scratch&lt;/li&gt;
&lt;li&gt;designing bespoke event systems when managed queues exist&lt;/li&gt;
&lt;li&gt;writing 40-page architecture documents before the first running slice exists&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="why-it-existed"&gt;Why it existed&lt;/h3&gt;
&lt;p&gt;When provisioning took weeks and environments were scarce, analysis was a rational risk-reducer.&lt;/p&gt;
&lt;h3 id="the-hidden-tax"&gt;The hidden tax&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;You push real learning to the end (integration failures happen late).&lt;/li&gt;
&lt;li&gt;Decisions get made with imaginary constraints, not measured ones.&lt;/li&gt;
&lt;li&gt;Teams optimize for &amp;ldquo;approval&amp;rdquo; rather than &amp;ldquo;outcome.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="the-replacement-pattern"&gt;The replacement pattern&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Timebox discovery and require a running slice early.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A strong default:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;1-2 week spike to validate constraints&lt;/li&gt;
&lt;li&gt;a thin vertical slice in production (even behind a flag)&lt;/li&gt;
&lt;li&gt;iterate based on real telemetry and user feedback&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="transition-step-low-drama"&gt;Transition step (low drama)&lt;/h3&gt;
&lt;p&gt;Create an &amp;ldquo;RFC-lite&amp;rdquo; template:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;problem statement + constraints&lt;/li&gt;
&lt;li&gt;1-2 options with tradeoffs&lt;/li&gt;
&lt;li&gt;a &lt;strong&gt;plan to measure&lt;/strong&gt; (latency, cost, reliability)&lt;/li&gt;
&lt;li&gt;a &lt;strong&gt;thin-slice milestone&lt;/strong&gt; date&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="pattern-2-reinventing-commodity-infrastructure"&gt;Pattern 2: Reinventing commodity infrastructure&lt;/h2&gt;
&lt;h3 id="what-it-looks-like-1"&gt;What it looks like&lt;/h3&gt;
&lt;p&gt;Teams treat widely-proven primitives as novel:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;object storage&lt;/li&gt;
&lt;li&gt;queues&lt;/li&gt;
&lt;li&gt;identity&lt;/li&gt;
&lt;li&gt;metrics + tracing&lt;/li&gt;
&lt;li&gt;load balancing&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A classic symptom: &amp;ldquo;We need to design HA multi-tenant object storage,&amp;rdquo; as if durable object storage isn&amp;rsquo;t already a standard building block.&lt;/p&gt;
&lt;h3 id="why-it-existed-1"&gt;Why it existed&lt;/h3&gt;
&lt;p&gt;On-prem and early hosting eras forced you to build a lot yourself.&lt;/p&gt;
&lt;h3 id="the-hidden-tax-1"&gt;The hidden tax&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Reinventing primitives becomes a multi-quarter project.&lt;/li&gt;
&lt;li&gt;Reliability becomes your problem (and you will be on call for it).&lt;/li&gt;
&lt;li&gt;The business pays for the same capability twice: once in time, and again in incidents.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="the-replacement-pattern-1"&gt;The replacement pattern&lt;/h3&gt;
&lt;p&gt;Default to &lt;strong&gt;managed or proven primitives&lt;/strong&gt; unless you have a documented reason not to.&lt;/p&gt;
&lt;p&gt;For example, modern object storage services are explicitly designed for very high durability and availability (provider details vary). [11]&lt;/p&gt;
&lt;h3 id="transition-step"&gt;Transition step&lt;/h3&gt;
&lt;p&gt;Maintain a &amp;ldquo;Reference Implementations&amp;rdquo; catalog:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;How we do object storage&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;How we do queues&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;How we do auth&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;How we do telemetry&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If the default is documented and supported, teams stop re-litigating fundamentals.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="pattern-3-vm-first-thinking-as-the-default"&gt;Pattern 3: VM-first thinking as the default&lt;/h2&gt;
&lt;h3 id="what-it-looks-like-2"&gt;What it looks like&lt;/h3&gt;
&lt;p&gt;Everything runs on VMs because &amp;ldquo;that&amp;rsquo;s what we do,&amp;rdquo; even when the workload is a stateless API, worker, or event consumer.&lt;/p&gt;
&lt;h3 id="why-it-existed-2"&gt;Why it existed&lt;/h3&gt;
&lt;p&gt;VMs were the universal unit of deployment for a long time, and they map cleanly to org boundaries (&amp;ldquo;this server is mine&amp;rdquo;).&lt;/p&gt;
&lt;h3 id="the-hidden-tax-2"&gt;The hidden tax&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;drift (snowflake servers)&lt;/li&gt;
&lt;li&gt;slow rollouts&lt;/li&gt;
&lt;li&gt;inconsistent security posture&lt;/li&gt;
&lt;li&gt;wasted compute due to poor bin-packing&lt;/li&gt;
&lt;li&gt;limited standardization across services&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="the-replacement-pattern-2"&gt;The replacement pattern&lt;/h3&gt;
&lt;p&gt;For many enterprise services, &lt;strong&gt;containers orchestrated by Kubernetes&lt;/strong&gt; are a strong default for stateless workloads. Kubernetes itself describes Deployments as a good fit for managing stateless applications where Pods are interchangeable and replaceable. [5]&lt;/p&gt;
&lt;p&gt;This doesn&amp;rsquo;t mean &amp;ldquo;Kubernetes for everything,&amp;rdquo; but it does mean:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;prefer declarative workloads with health checks and rollout controls&lt;/li&gt;
&lt;li&gt;keep VMs for deliberate cases (legacy constraints, special licensing, unique state, or when orchestration adds no value)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="transition-step-1"&gt;Transition step&lt;/h3&gt;
&lt;p&gt;Start with &amp;ldquo;Kubernetes-first for new stateless services,&amp;rdquo; not a migration mandate.&lt;/p&gt;
&lt;p&gt;Then build operational guardrails:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;resource requests/limits so services behave predictably under load [6]&lt;/li&gt;
&lt;li&gt;standardized readiness/liveness probes&lt;/li&gt;
&lt;li&gt;standard ingress + auth patterns&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="pattern-4-ticket-driven-infrastructure"&gt;Pattern 4: Ticket-driven infrastructure&lt;/h2&gt;
&lt;h3 id="what-it-looks-like-3"&gt;What it looks like&lt;/h3&gt;
&lt;p&gt;Need a database? Ticket.
Need an environment? Ticket.
Need DNS? Ticket.
Need a queue? Ticket.&lt;/p&gt;
&lt;p&gt;Eventually, the ticketing system becomes the true control plane.&lt;/p&gt;
&lt;h3 id="why-it-existed-3"&gt;Why it existed&lt;/h3&gt;
&lt;p&gt;It&amp;rsquo;s a reasonable response when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;environments are scarce&lt;/li&gt;
&lt;li&gt;changes are risky&lt;/li&gt;
&lt;li&gt;platform knowledge is specialized&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="the-hidden-tax-3"&gt;The hidden tax&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;queues become normalized (&amp;ldquo;it takes 3 weeks to get a namespace&amp;rdquo;)&lt;/li&gt;
&lt;li&gt;teams route around the platform&lt;/li&gt;
&lt;li&gt;reliability doesn&amp;rsquo;t improve; delivery just slows&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="the-replacement-pattern-3"&gt;The replacement pattern&lt;/h3&gt;
&lt;p&gt;Self-service via &lt;strong&gt;GitOps&lt;/strong&gt; and platform &amp;ldquo;paved roads.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;OpenGitOps describes GitOps as a set of standards/best practices for adopting a structured approach to GitOps. [7] The point isn&amp;rsquo;t a specific tool - it&amp;rsquo;s the principle: &lt;strong&gt;desired state is declarative and auditable.&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="transition-step-2"&gt;Transition step&lt;/h3&gt;
&lt;p&gt;Pick one high-frequency request and eliminate it:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;create a service with a standard ingress/auth/telemetry&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;provision a queue&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;create a dev environment&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Make the paved road the path of least resistance.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="pattern-5-change-advisory-board-for-routine-changes"&gt;Pattern 5: Change Advisory Board for routine changes&lt;/h2&gt;
&lt;h3 id="what-it-looks-like-4"&gt;What it looks like&lt;/h3&gt;
&lt;p&gt;Every change - routine or risky - requires synchronous approval.&lt;/p&gt;
&lt;h3 id="why-it-existed-4"&gt;Why it existed&lt;/h3&gt;
&lt;p&gt;When changes were large, rare, and manual, centralized review reduced catastrophic surprises.&lt;/p&gt;
&lt;h3 id="the-hidden-tax-4"&gt;The hidden tax&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;you batch changes (bigger releases are riskier)&lt;/li&gt;
&lt;li&gt;emergency changes bypass process (creating inconsistency)&lt;/li&gt;
&lt;li&gt;&amp;ldquo;approval&amp;rdquo; becomes the goal rather than &lt;strong&gt;evidence of safety&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;DORA&amp;rsquo;s guidance on streamlining change approval emphasizes making the regular change process fast and reliable enough that it can handle emergencies, and reframes how CAB fits into continuous delivery. [3] Continuous delivery literature makes a similar point: smaller, more frequent changes reduce risk and ease remediation. [4]&lt;/p&gt;
&lt;h3 id="the-replacement-pattern-4"&gt;The replacement pattern&lt;/h3&gt;
&lt;p&gt;Move to &lt;strong&gt;evidence-based change approval&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;automated tests&lt;/li&gt;
&lt;li&gt;policy-as-code checks&lt;/li&gt;
&lt;li&gt;progressive delivery (canaries, phased rollouts)&lt;/li&gt;
&lt;li&gt;real-time telemetry tied to the release&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="transition-step-3"&gt;Transition step&lt;/h3&gt;
&lt;p&gt;Keep CAB, but change its scope:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;focus on high-risk changes and cross-team coordination&lt;/li&gt;
&lt;li&gt;use automation and metrics for routine changes&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="pattern-6-the-shared-database-empire"&gt;Pattern 6: The shared database empire&lt;/h2&gt;
&lt;h3 id="what-it-looks-like-5"&gt;What it looks like&lt;/h3&gt;
&lt;p&gt;A central database is shared by many services.
Teams coordinate schema changes across multiple apps and releases.&lt;/p&gt;
&lt;p&gt;Microservices.io describes the &amp;ldquo;shared database&amp;rdquo; pattern explicitly: multiple services access a single database directly. [10]&lt;/p&gt;
&lt;h3 id="why-it-existed-5"&gt;Why it existed&lt;/h3&gt;
&lt;p&gt;It&amp;rsquo;s simple at first:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;one place for data&lt;/li&gt;
&lt;li&gt;easy joins&lt;/li&gt;
&lt;li&gt;one backup plan&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="the-hidden-tax-5"&gt;The hidden tax&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;coupling spreads everywhere&lt;/li&gt;
&lt;li&gt;every change becomes cross-team work&lt;/li&gt;
&lt;li&gt;reliability suffers because one DB problem becomes everyone&amp;rsquo;s problem&lt;/li&gt;
&lt;li&gt;schema evolution becomes political&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="the-replacement-pattern-5"&gt;The replacement pattern&lt;/h3&gt;
&lt;p&gt;Prefer service-owned data boundaries. Microservices.io&amp;rsquo;s &amp;ldquo;database per service&amp;rdquo; pattern describes keeping a service&amp;rsquo;s data private and accessible only via its API. [9]&lt;/p&gt;
&lt;h3 id="transition-step-4"&gt;Transition step&lt;/h3&gt;
&lt;p&gt;You don&amp;rsquo;t have to &amp;ldquo;microservices everything.&amp;rdquo;
Start by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;carving out new tables owned by one service&lt;/li&gt;
&lt;li&gt;introducing an API boundary&lt;/li&gt;
&lt;li&gt;migrating consumers gradually&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="pattern-7-central-integration-as-a-chokepoint"&gt;Pattern 7: Central integration as a chokepoint&lt;/h2&gt;
&lt;h3 id="what-it-looks-like-6"&gt;What it looks like&lt;/h3&gt;
&lt;p&gt;All integrations must go through a single shared integration layer/team (classic ESB gravity).&lt;/p&gt;
&lt;h3 id="why-it-existed-6"&gt;Why it existed&lt;/h3&gt;
&lt;p&gt;Centralizing integration gave consistency when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;protocols were messy&lt;/li&gt;
&lt;li&gt;tooling was expensive&lt;/li&gt;
&lt;li&gt;teams lacked automation&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="the-hidden-tax-6"&gt;The hidden tax&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;integration lead times explode&lt;/li&gt;
&lt;li&gt;teams stop experimenting&lt;/li&gt;
&lt;li&gt;one backlog becomes everyone&amp;rsquo;s bottleneck&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="the-replacement-pattern-6"&gt;The replacement pattern&lt;/h3&gt;
&lt;p&gt;Standardize:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;interfaces&lt;/strong&gt; (auth, tracing, deployment, contract testing)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;platform guardrails&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&amp;hellip;not every internal implementation detail.&lt;/p&gt;
&lt;h3 id="transition-step-5"&gt;Transition step&lt;/h3&gt;
&lt;p&gt;Carve out one &amp;ldquo;self-service integration&amp;rdquo; paved road:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;standard service template&lt;/li&gt;
&lt;li&gt;standard auth&lt;/li&gt;
&lt;li&gt;standard telemetry&lt;/li&gt;
&lt;li&gt;contracts + examples&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="pattern-8-perma-pocs-and-innovation-theater"&gt;Pattern 8: Perma-POCs and innovation theater&lt;/h2&gt;
&lt;h3 id="what-it-looks-like-7"&gt;What it looks like&lt;/h3&gt;
&lt;p&gt;Prototypes exist forever, never becoming production systems.&lt;/p&gt;
&lt;p&gt;Especially common with AI initiatives:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;impressive demos&lt;/li&gt;
&lt;li&gt;no production constraints&lt;/li&gt;
&lt;li&gt;no ownership for operability&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="why-it-existed-7"&gt;Why it existed&lt;/h3&gt;
&lt;p&gt;POCs are a safe way to explore unknowns.&lt;/p&gt;
&lt;h3 id="the-hidden-tax-7"&gt;The hidden tax&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;teams lose trust (&amp;ldquo;innovation never ships&amp;rdquo;)&lt;/li&gt;
&lt;li&gt;production teams inherit half-baked work&lt;/li&gt;
&lt;li&gt;opportunity cost compounds&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="the-replacement-pattern-7"&gt;The replacement pattern&lt;/h3&gt;
&lt;p&gt;From day one, require:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;an owner&lt;/li&gt;
&lt;li&gt;a production path&lt;/li&gt;
&lt;li&gt;a thin slice in a real environment&lt;/li&gt;
&lt;li&gt;explicit safety requirements (timeouts, budgets, telemetry)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="transition-step-6"&gt;Transition step&lt;/h3&gt;
&lt;p&gt;Make &amp;ldquo;POC exit criteria&amp;rdquo; mandatory:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;what metrics prove value?&lt;/li&gt;
&lt;li&gt;what is the minimum shippable slice?&lt;/li&gt;
&lt;li&gt;what must be true for reliability and security?&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="replace-committees-with-guardrails"&gt;Replace committees with guardrails&lt;/h2&gt;
&lt;p&gt;A recurring theme: &lt;strong&gt;humans are expensive control planes&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The modern move is to convert &amp;ldquo;tribal rules&amp;rdquo; into:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;templates&lt;/li&gt;
&lt;li&gt;automation&lt;/li&gt;
&lt;li&gt;policy-as-code&lt;/li&gt;
&lt;li&gt;paved paths&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Microsoft&amp;rsquo;s platform engineering work describes &amp;ldquo;paved paths&amp;rdquo; within an internal developer platform as recommended paths to production that guide developers through requirements without sacrificing velocity. [8]&lt;/p&gt;
&lt;p&gt;Guardrails beat gatekeepers because guardrails are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;consistent&lt;/li&gt;
&lt;li&gt;fast&lt;/li&gt;
&lt;li&gt;auditable&lt;/li&gt;
&lt;li&gt;scalable&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="modernize-without-a-rewrite"&gt;Modernize without a rewrite&lt;/h2&gt;
&lt;p&gt;Big-bang rewrites are expensive and risky. Incremental modernization is usually the winning move.&lt;/p&gt;
&lt;p&gt;The Strangler Fig pattern is a well-known approach: wrap or route traffic so you can replace parts of a legacy system gradually. [12]&lt;/p&gt;
&lt;p&gt;Practical approach:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;put a facade in front of the legacy surface&lt;/li&gt;
&lt;li&gt;carve off one slice at a time&lt;/li&gt;
&lt;li&gt;measure outcomes&lt;/li&gt;
&lt;li&gt;keep rollback easy&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This isn&amp;rsquo;t glamorous. It works.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="verification-how-you-know-its-working"&gt;Verification: how you know it&amp;rsquo;s working&lt;/h2&gt;
&lt;p&gt;If you want to avoid &amp;ldquo;modernization theater,&amp;rdquo; measure.&lt;/p&gt;
&lt;p&gt;DORA&amp;rsquo;s metrics guidance is a solid baseline: deployment frequency, lead time for changes, change failure rate, and time to restore service (MTTR). [1] The 2024 DORA report continues to focus on the organizational capabilities that drive high performance. [2]&lt;/p&gt;
&lt;p&gt;A simple evidence loop:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Pick one value stream (one product or platform slice).&lt;/li&gt;
&lt;li&gt;Baseline the four DORA metrics.&lt;/li&gt;
&lt;li&gt;Remove one friction point (one pattern).&lt;/li&gt;
&lt;li&gt;Re-measure.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If your metrics don&amp;rsquo;t move, you didn&amp;rsquo;t remove the real constraint.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="a-practical-checklist"&gt;A practical checklist&lt;/h2&gt;
&lt;p&gt;If you&amp;rsquo;re trying to retire &amp;ldquo;enterprise debt&amp;rdquo; safely:&lt;/p&gt;
&lt;h3 id="delivery"&gt;Delivery&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Timebox analysis; require a running slice early.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Prefer small changes and frequent releases; avoid batching.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="platform"&gt;Platform&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Provide a paved road for common workflows (service template, auth, telemetry). [8]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Remove ticket queues for repeatable requests (self-service + GitOps). [7]&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="reliability"&gt;Reliability&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Standardize timeouts, retries, budgets, and resource requests/limits. [6]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Use progressive delivery where risk is high.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="architecture"&gt;Architecture&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Reduce shared DB coupling; establish service-owned boundaries. [9][10]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Modernize incrementally (Strangler Fig), not via big-bang rewrites. [12]&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="governance"&gt;Governance&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Replace routine approvals with evidence: tests + policy-as-code + telemetry. [3][4]&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;p&gt;[1] DORA - &amp;ldquo;DORA&amp;rsquo;s software delivery performance metrics (guide)&amp;rdquo;. &lt;a href="https://dora.dev/guides/dora-metrics/" target="_blank" rel="noopener noreferrer"&gt;https://dora.dev/guides/dora-metrics/&lt;/a&gt;
[2] DORA - &amp;ldquo;Accelerate State of DevOps Report 2024&amp;rdquo;. &lt;a href="https://dora.dev/research/2024/dora-report/" target="_blank" rel="noopener noreferrer"&gt;https://dora.dev/research/2024/dora-report/&lt;/a&gt;
[3] DORA - &amp;ldquo;Streamlining change approval (capability)&amp;rdquo;. &lt;a href="https://dora.dev/capabilities/streamlining-change-approval/" target="_blank" rel="noopener noreferrer"&gt;https://dora.dev/capabilities/streamlining-change-approval/&lt;/a&gt;
[4] ContinuousDelivery.com - &amp;ldquo;Continuous Delivery and ITIL: Change Management&amp;rdquo;. &lt;a href="https://continuousdelivery.com/2010/11/continuous-delivery-and-itil-change-management/" target="_blank" rel="noopener noreferrer"&gt;https://continuousdelivery.com/2010/11/continuous-delivery-and-itil-change-management/&lt;/a&gt;
[5] Kubernetes docs - &amp;ldquo;Workloads (Deployments are a good fit for stateless workloads)&amp;rdquo;. &lt;a href="https://kubernetes.io/docs/concepts/workloads/" target="_blank" rel="noopener noreferrer"&gt;https://kubernetes.io/docs/concepts/workloads/&lt;/a&gt;
[6] Kubernetes docs - &amp;ldquo;Resource Management for Pods and Containers (requests/limits)&amp;rdquo;. &lt;a href="https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/" target="_blank" rel="noopener noreferrer"&gt;https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/&lt;/a&gt;
[7] OpenGitOps - &amp;ldquo;What is OpenGitOps?&amp;rdquo; and project background. &lt;a href="https://opengitops.dev/" target="_blank" rel="noopener noreferrer"&gt;https://opengitops.dev/&lt;/a&gt;
and &lt;a href="https://opengitops.dev/about/" target="_blank" rel="noopener noreferrer"&gt;https://opengitops.dev/about/&lt;/a&gt;
[8] Microsoft Engineering Blog - &amp;ldquo;Building paved paths: the journey to platform engineering&amp;rdquo;. &lt;a href="https://devblogs.microsoft.com/engineering-at-microsoft/building-paved-paths-the-journey-to-platform-engineering/" target="_blank" rel="noopener noreferrer"&gt;https://devblogs.microsoft.com/engineering-at-microsoft/building-paved-paths-the-journey-to-platform-engineering/&lt;/a&gt;
[9] Microservices.io - &amp;ldquo;Database per service&amp;rdquo; pattern. &lt;a href="https://microservices.io/patterns/data/database-per-service" target="_blank" rel="noopener noreferrer"&gt;https://microservices.io/patterns/data/database-per-service&lt;/a&gt;
[10] Microservices.io - &amp;ldquo;Shared database&amp;rdquo; pattern. &lt;a href="https://microservices.io/patterns/data/shared-database.html" target="_blank" rel="noopener noreferrer"&gt;https://microservices.io/patterns/data/shared-database.html&lt;/a&gt;
[11] AWS documentation - &amp;ldquo;Data protection in Amazon S3 (durability/availability design goals)&amp;rdquo;. &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/DataDurability.html" target="_blank" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/AmazonS3/latest/userguide/DataDurability.html&lt;/a&gt;
[12] Martin Fowler - &amp;ldquo;Strangler Fig Application&amp;rdquo; (legacy modernization pattern). &lt;a href="https://martinfowler.com/bliki/StranglerFigApplication.html" target="_blank" rel="noopener noreferrer"&gt;https://martinfowler.com/bliki/StranglerFigApplication.html&lt;/a&gt;
&lt;/p&gt;</content:encoded></item><item><title>Chapter 13: Templates + Checklists: The Copy/Paste Kit</title><link>https://roygabriel.dev/blog/llm-development-guide/12-templates-checklists/</link><pubDate>Sat, 07 Feb 2026 04:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/llm-development-guide/12-templates-checklists/</guid><description>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 13 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/11-team-collaboration/"&gt;Chapter 12: Team Collaboration: Handoffs, Shared Prompts, and Review&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/13-building-a-prompt-library/"&gt;Chapter 14: Building a Prompt Library: Governance + Quality Bar&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to bootstrap the workflow in minutes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Create &lt;code&gt;plan/&lt;/code&gt;, &lt;code&gt;prompts/&lt;/code&gt;, and &lt;code&gt;work-notes/&lt;/code&gt; with consistent templates.&lt;/li&gt;
&lt;li&gt;Add a phase spec template for large, multi-phase projects.&lt;/li&gt;
&lt;li&gt;Add a phase implementation prompt template for prompt-by-prompt execution.&lt;/li&gt;
&lt;li&gt;Use session start and end checklists.&lt;/li&gt;
&lt;li&gt;Generate PR descriptions that explain intent and verification.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Templates reduce prompt drift.&lt;/li&gt;
&lt;li&gt;Keep them short and consistent.&lt;/li&gt;
&lt;li&gt;Add verification to every phase.&lt;/li&gt;
&lt;li&gt;For large projects, pair phase specs with implementation prompt docs.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#plan-template"&gt;Plan template&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#prompt-template"&gt;Prompt template&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#phase-spec-template-large-projects"&gt;Phase spec template (large projects)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#phase-implementation-prompt-template-large-projects"&gt;Phase implementation prompt template (large projects)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#work-notes-template"&gt;Work notes template&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#session-checklists"&gt;Session checklists&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#pr-description-template"&gt;PR description template&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="plan-template"&gt;Plan template&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-markdown" data-lang="markdown"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;# &amp;lt;Project&amp;gt; Plan
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Overview
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Goals
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;-
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Context
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; Reference implementation:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Environment:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Phases
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Definition of done
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; [ ]
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Out of scope
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;-
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Risks / open questions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;-
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="prompt-template"&gt;Prompt template&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-markdown" data-lang="markdown"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;# Phase &amp;lt;X&amp;gt; - &amp;lt;Name&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Role
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Context
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; Plan:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Work notes:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; References:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Task
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Deliverables
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;1.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Constraints
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; MUST
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; MUST NOT
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Session management
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;Update work notes with decisions, assumptions, open questions, and a session log entry.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Verification
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; Command:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Expected:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Commit discipline
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;Propose a commit message and wait for approval.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="phase-spec-template-large-projects"&gt;Phase spec template (large projects)&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-markdown" data-lang="markdown"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;# Phase &amp;lt;N&amp;gt;&amp;lt;Letter&amp;gt; - &amp;lt;Phase Name&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Status
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;Planned
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Depends on
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;Phase dependency&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Feature flag
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;flag name or n/a&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Migration
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;none or required steps&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Design rationale
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&amp;lt;Why this phase exists and what risk it reduces&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Tasks
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### Prompt 1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;task&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### Prompt 2
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;task&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Files
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### New
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;path&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### Modified
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;path&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### Referenced (read-only)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;path&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Exit criteria
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; &amp;lt;build command&amp;gt; exits 0
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; &amp;lt;vet/lint command&amp;gt; exits 0
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; &amp;lt;test command&amp;gt; exits 0
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; No ignored returned errors
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Progress notes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="phase-implementation-prompt-template-large-projects"&gt;Phase implementation prompt template (large projects)&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-markdown" data-lang="markdown"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;# Phase &amp;lt;N&amp;gt;&amp;lt;Letter&amp;gt; - Implementation Prompts
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Complete prompts sequentially. Do not continue when verification fails.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Prompt 1 of &amp;lt;Total&amp;gt;: &amp;lt;Prompt Name&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Context files to load:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;4 to 6 explicit paths&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Task:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;exact implementation task&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Constraints:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Stay within this prompt&amp;#39;s scope.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Handle all returned errors.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Keep code small and reviewable.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Do not proceed to the next prompt until verification passes.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Verification:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;build command&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;vet/lint command&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;test command&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Commit discipline:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Summarize what changed.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Propose commit message.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Wait for approval before moving on.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="work-notes-template"&gt;Work notes template&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-markdown" data-lang="markdown"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;# Phase &amp;lt;X&amp;gt; - &amp;lt;Name&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Status
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; Not started
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; In progress
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; Blocked
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; Complete
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Decisions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Assumptions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Open questions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Session log
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Commits
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="session-checklists"&gt;Session checklists&lt;/h2&gt;
&lt;p&gt;Session start:&lt;/p&gt;</description><content:encoded>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 13 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/11-team-collaboration/"&gt;Chapter 12: Team Collaboration: Handoffs, Shared Prompts, and Review&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/13-building-a-prompt-library/"&gt;Chapter 14: Building a Prompt Library: Governance + Quality Bar&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to bootstrap the workflow in minutes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Create &lt;code&gt;plan/&lt;/code&gt;, &lt;code&gt;prompts/&lt;/code&gt;, and &lt;code&gt;work-notes/&lt;/code&gt; with consistent templates.&lt;/li&gt;
&lt;li&gt;Add a phase spec template for large, multi-phase projects.&lt;/li&gt;
&lt;li&gt;Add a phase implementation prompt template for prompt-by-prompt execution.&lt;/li&gt;
&lt;li&gt;Use session start and end checklists.&lt;/li&gt;
&lt;li&gt;Generate PR descriptions that explain intent and verification.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Templates reduce prompt drift.&lt;/li&gt;
&lt;li&gt;Keep them short and consistent.&lt;/li&gt;
&lt;li&gt;Add verification to every phase.&lt;/li&gt;
&lt;li&gt;For large projects, pair phase specs with implementation prompt docs.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#plan-template"&gt;Plan template&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#prompt-template"&gt;Prompt template&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#phase-spec-template-large-projects"&gt;Phase spec template (large projects)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#phase-implementation-prompt-template-large-projects"&gt;Phase implementation prompt template (large projects)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#work-notes-template"&gt;Work notes template&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#session-checklists"&gt;Session checklists&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#pr-description-template"&gt;PR description template&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="plan-template"&gt;Plan template&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-markdown" data-lang="markdown"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;# &amp;lt;Project&amp;gt; Plan
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Overview
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Goals
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;-
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Context
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; Reference implementation:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Environment:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Phases
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Definition of done
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; [ ]
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Out of scope
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;-
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Risks / open questions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;-
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="prompt-template"&gt;Prompt template&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-markdown" data-lang="markdown"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;# Phase &amp;lt;X&amp;gt; - &amp;lt;Name&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Role
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Context
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; Plan:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Work notes:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; References:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Task
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Deliverables
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;1.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Constraints
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; MUST
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; MUST NOT
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Session management
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;Update work notes with decisions, assumptions, open questions, and a session log entry.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Verification
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; Command:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Expected:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Commit discipline
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;Propose a commit message and wait for approval.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="phase-spec-template-large-projects"&gt;Phase spec template (large projects)&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-markdown" data-lang="markdown"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;# Phase &amp;lt;N&amp;gt;&amp;lt;Letter&amp;gt; - &amp;lt;Phase Name&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Status
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;Planned
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Depends on
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;Phase dependency&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Feature flag
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;flag name or n/a&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Migration
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;none or required steps&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Design rationale
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&amp;lt;Why this phase exists and what risk it reduces&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Tasks
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### Prompt 1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;task&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### Prompt 2
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;task&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Files
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### New
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;path&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### Modified
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;path&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### Referenced (read-only)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;path&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Exit criteria
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; &amp;lt;build command&amp;gt; exits 0
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; &amp;lt;vet/lint command&amp;gt; exits 0
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; &amp;lt;test command&amp;gt; exits 0
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; No ignored returned errors
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Progress notes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="phase-implementation-prompt-template-large-projects"&gt;Phase implementation prompt template (large projects)&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-markdown" data-lang="markdown"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;# Phase &amp;lt;N&amp;gt;&amp;lt;Letter&amp;gt; - Implementation Prompts
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Complete prompts sequentially. Do not continue when verification fails.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Prompt 1 of &amp;lt;Total&amp;gt;: &amp;lt;Prompt Name&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Context files to load:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;4 to 6 explicit paths&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Task:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;exact implementation task&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Constraints:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Stay within this prompt&amp;#39;s scope.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Handle all returned errors.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Keep code small and reviewable.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Do not proceed to the next prompt until verification passes.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Verification:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;build command&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;vet/lint command&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;test command&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Commit discipline:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Summarize what changed.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Propose commit message.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Wait for approval before moving on.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="work-notes-template"&gt;Work notes template&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-markdown" data-lang="markdown"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;# Phase &amp;lt;X&amp;gt; - &amp;lt;Name&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Status
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; Not started
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; In progress
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; Blocked
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; Complete
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Decisions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Assumptions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Open questions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Session log
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Commits
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="session-checklists"&gt;Session checklists&lt;/h2&gt;
&lt;p&gt;Session start:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Identify the phase.&lt;/li&gt;
&lt;li&gt;Load the phase prompt.&lt;/li&gt;
&lt;li&gt;Load the current work notes.&lt;/li&gt;
&lt;li&gt;Re-state the smallest goal for this session.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Session end:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Work notes updated.&lt;/li&gt;
&lt;li&gt;Decisions logged with rationale.&lt;/li&gt;
&lt;li&gt;Verification run.&lt;/li&gt;
&lt;li&gt;Commits made (or clearly blocked).&lt;/li&gt;
&lt;li&gt;Next step written down.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="pr-description-template"&gt;PR description template&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-markdown" data-lang="markdown"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Summary
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Changes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;-
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Out of scope
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;-
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Verification
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;-
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Review guide
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Follow-up
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; [ ]
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## References
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; Work notes:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Plan:
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="verification"&gt;Verification&lt;/h2&gt;
&lt;p&gt;Create a local &lt;code&gt;templates/&lt;/code&gt; folder and seed the files:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;mkdir -p templates
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cat &amp;gt; templates/PLAN-template.md &lt;span class="s"&gt;&amp;lt;&amp;lt;&amp;#39;MD&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;# Project Plan
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Overview
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Goals
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Phases
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Definition of done
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;MD&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cat &amp;gt; templates/PROMPT-template.md &lt;span class="s"&gt;&amp;lt;&amp;lt;&amp;#39;MD&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;# Phase - Prompt
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Role
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Context
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Task
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Constraints
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Verification
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Commit discipline
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;MD&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cat &amp;gt; templates/WORK-NOTES-template.md &lt;span class="s"&gt;&amp;lt;&amp;lt;&amp;#39;MD&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;# Phase - Work Notes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Status
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Decisions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Session log
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;MD&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Expected result:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You can start a new project by copying these templates and editing the placeholders.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Continue -&amp;gt; &lt;a href="https://roygabriel.dev/blog/llm-development-guide/13-building-a-prompt-library/"&gt;Chapter 14: Building a Prompt Library: Governance + Quality Bar&lt;/a&gt;
&lt;/p&gt;</content:encoded></item><item><title>Chapter 12: Team Collaboration: Handoffs, Shared Prompts, and Review</title><link>https://roygabriel.dev/blog/llm-development-guide/11-team-collaboration/</link><pubDate>Thu, 05 Feb 2026 02:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/llm-development-guide/11-team-collaboration/</guid><description>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 12 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/10-measuring-success/"&gt;Chapter 11: Measuring Success: Solo + Team Metrics Without Fake Precision&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/12-templates-checklists/"&gt;Chapter 13: Templates + Checklists: The Copy/Paste Kit&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to run this workflow on a team without turning it into process theater:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Hand off work mid-phase without a meeting.&lt;/li&gt;
&lt;li&gt;Share prompts that actually work.&lt;/li&gt;
&lt;li&gt;Review LLM-assisted code with the same rigor as human code.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Teams fail at LLM work because chat context is not shareable.&lt;/li&gt;
&lt;li&gt;Plans, prompt docs, and work notes make context portable.&lt;/li&gt;
&lt;li&gt;Keep review focused on code and verification, not on how the code was produced.&lt;/li&gt;
&lt;li&gt;Maintain a small set of &amp;ldquo;golden&amp;rdquo; reference implementations.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#handoff-patterns"&gt;Handoff patterns&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#shared-prompt-libraries"&gt;Shared prompt libraries&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#review-checklist"&gt;Review checklist&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="handoff-patterns"&gt;Handoff patterns&lt;/h2&gt;
&lt;h3 id="mid-phase-handoff"&gt;Mid-phase handoff&lt;/h3&gt;
&lt;p&gt;If you hand off in the middle of a phase, provide:&lt;/p&gt;</description><content:encoded>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 12 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/10-measuring-success/"&gt;Chapter 11: Measuring Success: Solo + Team Metrics Without Fake Precision&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/12-templates-checklists/"&gt;Chapter 13: Templates + Checklists: The Copy/Paste Kit&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to run this workflow on a team without turning it into process theater:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Hand off work mid-phase without a meeting.&lt;/li&gt;
&lt;li&gt;Share prompts that actually work.&lt;/li&gt;
&lt;li&gt;Review LLM-assisted code with the same rigor as human code.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Teams fail at LLM work because chat context is not shareable.&lt;/li&gt;
&lt;li&gt;Plans, prompt docs, and work notes make context portable.&lt;/li&gt;
&lt;li&gt;Keep review focused on code and verification, not on how the code was produced.&lt;/li&gt;
&lt;li&gt;Maintain a small set of &amp;ldquo;golden&amp;rdquo; reference implementations.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#handoff-patterns"&gt;Handoff patterns&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#shared-prompt-libraries"&gt;Shared prompt libraries&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#review-checklist"&gt;Review checklist&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="handoff-patterns"&gt;Handoff patterns&lt;/h2&gt;
&lt;h3 id="mid-phase-handoff"&gt;Mid-phase handoff&lt;/h3&gt;
&lt;p&gt;If you hand off in the middle of a phase, provide:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Updated work notes with status, decisions, open questions, and exact next step.&lt;/li&gt;
&lt;li&gt;The phase prompt doc.&lt;/li&gt;
&lt;li&gt;The reference implementation paths used.&lt;/li&gt;
&lt;li&gt;Any verification output (test results, lint output).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Handoff template:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-markdown" data-lang="markdown"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Handoff: &amp;lt;Phase&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### Status
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&amp;lt;What&amp;#39;s done, what&amp;#39;s in progress, what&amp;#39;s blocked&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### Files to review
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;file 1&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;file 2&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### Key decisions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;Decision&amp;gt;: &amp;lt;Rationale&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### Open questions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; &amp;lt;Question&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### Immediate next step
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&amp;lt;Exact command or file edit to do next&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### How to resume
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;1.&lt;/span&gt; Load prompts/&amp;lt;phase&amp;gt;.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;2.&lt;/span&gt; Load work-notes/&amp;lt;phase&amp;gt;.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;3.&lt;/span&gt; Continue from the last session log entry
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="phase-boundary-handoff"&gt;Phase boundary handoff&lt;/h3&gt;
&lt;p&gt;Phase boundary handoffs are easier:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Work notes are marked complete.&lt;/li&gt;
&lt;li&gt;The next phase starts cleanly.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="shared-prompt-libraries"&gt;Shared prompt libraries&lt;/h2&gt;
&lt;p&gt;A shared library reduces rework and increases consistency.&lt;/p&gt;
&lt;p&gt;A reasonable structure:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;prompt-library/
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; planning/
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; implementation/
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; testing/
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; review/
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Quality bar:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Prompts are specific enough to be useful.&lt;/li&gt;
&lt;li&gt;Prompts are general enough to be reused.&lt;/li&gt;
&lt;li&gt;Prompts record &amp;ldquo;when to use&amp;rdquo; and &amp;ldquo;prereqs&amp;rdquo;.&lt;/li&gt;
&lt;li&gt;Prompts have been used successfully multiple times.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="review-checklist"&gt;Review checklist&lt;/h2&gt;
&lt;p&gt;LLM-assisted work should be reviewed like any other work.&lt;/p&gt;
&lt;p&gt;High-signal checks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Imports and APIs exist (no hallucinations).&lt;/li&gt;
&lt;li&gt;Error handling is complete.&lt;/li&gt;
&lt;li&gt;Output matches reference patterns.&lt;/li&gt;
&lt;li&gt;Verification was actually run.&lt;/li&gt;
&lt;li&gt;Commits are atomic and explain intent.&lt;/li&gt;
&lt;li&gt;Tests test behavior, not just existence.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="verification"&gt;Verification&lt;/h2&gt;
&lt;p&gt;Create a shared template file so handoffs are consistent:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;mkdir -p docs
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cat &amp;gt; docs/llm-handoff-template.md &lt;span class="s"&gt;&amp;lt;&amp;lt;&amp;#39;MD&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;# LLM Work Handoff Template
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Phase
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Status
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Files to review
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Key decisions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Open questions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Verification run
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- &amp;lt;command&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- Expected: &amp;lt;...&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Next step
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## How to resume
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- Prompt:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- Work notes:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- References:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;MD&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Expected result:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Anyone can hand off work in under five minutes.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Continue -&amp;gt; &lt;a href="https://roygabriel.dev/blog/llm-development-guide/12-templates-checklists/"&gt;Chapter 13: Templates + Checklists: The Copy/Paste Kit&lt;/a&gt;
&lt;/p&gt;</content:encoded></item><item><title>Chapter 11: Measuring Success: Solo + Team Metrics Without Fake Precision</title><link>https://roygabriel.dev/blog/llm-development-guide/10-measuring-success/</link><pubDate>Tue, 03 Feb 2026 00:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/llm-development-guide/10-measuring-success/</guid><description>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 11 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/09-stop-rules-pitfalls/"&gt;Chapter 10: Stop Rules + Pitfalls: When to Upgrade, Bail, or Go Manual&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/11-team-collaboration/"&gt;Chapter 12: Team Collaboration: Handoffs, Shared Prompts, and Review&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to tell, with reasonable honesty, whether the workflow is helping:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Pick a small set of metrics you can actually measure.&lt;/li&gt;
&lt;li&gt;Separate leading indicators (process) from lagging indicators (outcomes).&lt;/li&gt;
&lt;li&gt;Avoid fake precision and vanity metrics.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;If you can&amp;rsquo;t measure reliably, don&amp;rsquo;t invent numbers.&lt;/li&gt;
&lt;li&gt;Track a baseline (a few representative tasks) before you claim improvement.&lt;/li&gt;
&lt;li&gt;Favor cheap metrics: time to first commit, PR revision rounds, post-merge bugs.&lt;/li&gt;
&lt;li&gt;Use leading indicators daily; use lagging indicators in retros.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#what-to-measure"&gt;What to measure&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#solo-baseline"&gt;Solo baseline&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#leading-vs-lagging-indicators"&gt;Leading vs lagging indicators&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#lightweight-reporting-template"&gt;Lightweight reporting template&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="what-to-measure"&gt;What to measure&lt;/h2&gt;
&lt;p&gt;Pick a small set that maps to real outcomes.&lt;/p&gt;</description><content:encoded>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 11 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/09-stop-rules-pitfalls/"&gt;Chapter 10: Stop Rules + Pitfalls: When to Upgrade, Bail, or Go Manual&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/11-team-collaboration/"&gt;Chapter 12: Team Collaboration: Handoffs, Shared Prompts, and Review&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to tell, with reasonable honesty, whether the workflow is helping:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Pick a small set of metrics you can actually measure.&lt;/li&gt;
&lt;li&gt;Separate leading indicators (process) from lagging indicators (outcomes).&lt;/li&gt;
&lt;li&gt;Avoid fake precision and vanity metrics.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;If you can&amp;rsquo;t measure reliably, don&amp;rsquo;t invent numbers.&lt;/li&gt;
&lt;li&gt;Track a baseline (a few representative tasks) before you claim improvement.&lt;/li&gt;
&lt;li&gt;Favor cheap metrics: time to first commit, PR revision rounds, post-merge bugs.&lt;/li&gt;
&lt;li&gt;Use leading indicators daily; use lagging indicators in retros.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#what-to-measure"&gt;What to measure&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#solo-baseline"&gt;Solo baseline&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#leading-vs-lagging-indicators"&gt;Leading vs lagging indicators&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#lightweight-reporting-template"&gt;Lightweight reporting template&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="what-to-measure"&gt;What to measure&lt;/h2&gt;
&lt;p&gt;Pick a small set that maps to real outcomes.&lt;/p&gt;
&lt;p&gt;Velocity indicators:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Time to first commit.&lt;/li&gt;
&lt;li&gt;Phase completion time.&lt;/li&gt;
&lt;li&gt;PR cycle time.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Quality indicators:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PR revision rounds.&lt;/li&gt;
&lt;li&gt;Bugs caught in review.&lt;/li&gt;
&lt;li&gt;Post-merge bugs.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Efficiency indicators:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Rework rate (time fixing output vs total time).&lt;/li&gt;
&lt;li&gt;Session count per task.&lt;/li&gt;
&lt;li&gt;Handoff success (can someone else continue without re-explaining).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="solo-baseline"&gt;Solo baseline&lt;/h2&gt;
&lt;p&gt;If you&amp;rsquo;re working solo, you can still create a baseline.&lt;/p&gt;
&lt;p&gt;Track per task:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Start time.&lt;/li&gt;
&lt;li&gt;First commit time.&lt;/li&gt;
&lt;li&gt;Total time to done.&lt;/li&gt;
&lt;li&gt;Number of &amp;ldquo;LLM retries&amp;rdquo; (how many prompt iterations for the same logical unit).&lt;/li&gt;
&lt;li&gt;Bugs you found after &amp;ldquo;done&amp;rdquo;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The point is not perfect measurement. The point is noticing patterns.&lt;/p&gt;
&lt;h2 id="leading-vs-lagging-indicators"&gt;Leading vs lagging indicators&lt;/h2&gt;
&lt;p&gt;Leading indicators predict success:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Work notes are updated.&lt;/li&gt;
&lt;li&gt;Prompts contain verification.&lt;/li&gt;
&lt;li&gt;Commits are atomic.&lt;/li&gt;
&lt;li&gt;References are provided.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Lagging indicators confirm success:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PR merged with low rework.&lt;/li&gt;
&lt;li&gt;Low post-merge bug rate.&lt;/li&gt;
&lt;li&gt;Handoffs succeed.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="lightweight-reporting-template"&gt;Lightweight reporting template&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-markdown" data-lang="markdown"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## LLM-Assisted Development Summary (Month)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### Adoption
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; Tasks completed with workflow: &amp;lt;N&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### Velocity
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; Median time to first commit: &amp;lt;X&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Median PR cycle time: &amp;lt;Y&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### Quality
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; Median PR revision rounds: &amp;lt;Z&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Post-merge bugs: &amp;lt;N&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### Costs
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; LLM cost estimate: &amp;lt;X&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### Notes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; What worked:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; What failed:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Changes for next month:
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="verification"&gt;Verification&lt;/h2&gt;
&lt;p&gt;Keep a simple CSV so you can graph later if you want.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;mkdir -p work-notes
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cat &amp;gt; work-notes/metrics.csv &lt;span class="s"&gt;&amp;lt;&amp;lt;&amp;#39;CSV&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;date,task,time_to_first_commit_minutes,total_time_minutes,llm_retries,pr_revision_rounds,post_merge_bugs,notes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;CSV&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Expected result:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You can append one row per task in under a minute.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Continue -&amp;gt; &lt;a href="https://roygabriel.dev/blog/llm-development-guide/11-team-collaboration/"&gt;Chapter 12: Team Collaboration: Handoffs, Shared Prompts, and Review&lt;/a&gt;
&lt;/p&gt;</content:encoded></item><item><title>Go vs Spring Boot for Enterprise APIs: Cost, Performance, and Cloud-Native Ops</title><link>https://roygabriel.dev/blog/go-vs-springboot-enterprise-apis/</link><pubDate>Sun, 01 Feb 2026 10:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/go-vs-springboot-enterprise-apis/</guid><description>&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;As-of note:&lt;/strong&gt; This is a production engineering perspective, not a benchmark scoreboard. If you care about cost or p99 latency, measure &lt;em&gt;your&lt;/em&gt; service with &lt;em&gt;your&lt;/em&gt; dependencies and &lt;em&gt;your&lt;/em&gt; deployment constraints.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="why-this-comparison-keeps-showing-up"&gt;Why this comparison keeps showing up&lt;/h2&gt;
&lt;p&gt;If you build enterprise APIs long enough, you’ll see the same pattern:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The “language choice” isn’t what breaks production.&lt;/li&gt;
&lt;li&gt;The &lt;em&gt;runtime envelope&lt;/em&gt; and &lt;em&gt;operational model&lt;/em&gt; usually are.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When teams compare &lt;strong&gt;Go&lt;/strong&gt; and &lt;strong&gt;Java Spring Boot&lt;/strong&gt;, they’re often asking a more specific question:&lt;/p&gt;</description><content:encoded>
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;As-of note:&lt;/strong&gt; This is a production engineering perspective, not a benchmark scoreboard. If you care about cost or p99 latency, measure &lt;em&gt;your&lt;/em&gt; service with &lt;em&gt;your&lt;/em&gt; dependencies and &lt;em&gt;your&lt;/em&gt; deployment constraints.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="why-this-comparison-keeps-showing-up"&gt;Why this comparison keeps showing up&lt;/h2&gt;
&lt;p&gt;If you build enterprise APIs long enough, you’ll see the same pattern:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The “language choice” isn’t what breaks production.&lt;/li&gt;
&lt;li&gt;The &lt;em&gt;runtime envelope&lt;/em&gt; and &lt;em&gt;operational model&lt;/em&gt; usually are.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When teams compare &lt;strong&gt;Go&lt;/strong&gt; and &lt;strong&gt;Java Spring Boot&lt;/strong&gt;, they’re often asking a more specific question:&lt;/p&gt;
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;“What will it cost to run this API at scale, and how predictable is it under real production conditions?”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Spring Boot’s value proposition is speed-to-service: stand-alone, production-grade Spring applications you can “just run,” with strong ecosystem defaults and integration breadth. [1]&lt;/p&gt;
&lt;p&gt;Go’s value proposition is operational simplicity: compile to an executable, ship a small container, run with fewer moving pieces, and keep latency and resource usage easier to reason about. &lt;code&gt;go build&lt;/code&gt; compiles packages into an executable. [5]&lt;/p&gt;
&lt;p&gt;This article is about the production-relevant tradeoffs: &lt;strong&gt;cost/resource usage&lt;/strong&gt;, &lt;strong&gt;performance under load&lt;/strong&gt;, &lt;strong&gt;cloud-native deployability&lt;/strong&gt;, and the “you will be on call for this” realities.&lt;/p&gt;
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;On code quality:&lt;/strong&gt; This isn’t “Go good / Java bad.” It’s an observation about failure modes: framework-heavy stacks can hide complexity until it shows up in startup time, memory, and surprises under load. Go’s bias toward explicitness often makes problems easier to see and cheaper to operate, even before the codebase is perfect.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;If your org is already Spring-heavy, Spring Boot can be the fastest path to a robust API, especially when you need Spring’s ecosystem (security, data, integrations). [1]&lt;/li&gt;
&lt;li&gt;If you run many small services, care about density, or need fast scale-to-zero/scale-from-zero behavior, Go often has an operational edge due to simpler packaging and typically lower baseline resource footprint.&lt;/li&gt;
&lt;li&gt;Kubernetes costs are strongly influenced by &lt;strong&gt;requests/limits&lt;/strong&gt; and scheduling density, so &lt;em&gt;baseline memory&lt;/em&gt; is often a bigger lever than micro-optimizing CPU. [7][8]&lt;/li&gt;
&lt;li&gt;Both ecosystems support hardened container builds (including &lt;strong&gt;distroless&lt;/strong&gt;) to reduce attack surface. [9][10]&lt;/li&gt;
&lt;li&gt;Observability is excellent in both; Java has very mature &lt;strong&gt;zero-code&lt;/strong&gt; instrumentation via the OpenTelemetry Java agent. [13][14] Go has strong SDK support and growing options for auto-instrumentation. [11]&lt;/li&gt;
&lt;li&gt;“Best” depends on your constraints. The best move is to benchmark your service envelope and compare &lt;strong&gt;p95/p99 latency&lt;/strong&gt;, &lt;strong&gt;RSS&lt;/strong&gt;, &lt;strong&gt;startup&lt;/strong&gt;, and &lt;strong&gt;error rates&lt;/strong&gt; under load.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="contents"&gt;Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="the-cost-model-what-you-actually-pay-for"&gt;The cost model: what you actually pay for&lt;/h2&gt;
&lt;p&gt;In cloud and Kubernetes environments, cost is strongly driven by:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;How many replicas you need&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How much CPU/memory you request per replica&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How quickly you can scale (up and down)&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How much time you spend operating the service&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Kubernetes scheduling and resource guarantees are based on &lt;strong&gt;requests&lt;/strong&gt; and &lt;strong&gt;limits&lt;/strong&gt;. Requests influence where Pods can be scheduled; limits cap what they can consume. [7][8]&lt;/p&gt;
&lt;p&gt;That means your “baseline footprint” matters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A service that requests 512Mi RAM &lt;em&gt;even when idle&lt;/em&gt; reduces node density.&lt;/li&gt;
&lt;li&gt;A service that requests 128Mi RAM allows more Pods per node.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="a-simple-illustrative-density-example"&gt;A simple (illustrative) density example&lt;/h3&gt;
&lt;p&gt;Assume you run 100 replicas of an API, and memory is your limiting resource:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Case A:&lt;/strong&gt; 100 × 512Mi = 51,200Mi ≈ &lt;strong&gt;50Gi&lt;/strong&gt; reserved&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Case B:&lt;/strong&gt; 100 × 128Mi = 12,800Mi ≈ &lt;strong&gt;12.5Gi&lt;/strong&gt; reserved&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That’s a ~&lt;strong&gt;37.5Gi&lt;/strong&gt; delta in reserved memory &lt;em&gt;before&lt;/em&gt; you count overhead (sidecars, DaemonSets, kube-system). This is not “Go vs Java math.” It’s “baseline footprint sets cluster size.”&lt;/p&gt;
&lt;p&gt;The point: &lt;strong&gt;cost discussions are often memory-and-startup discussions wearing a language-comparison mask.&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="gos-production-advantages-when-they-matter"&gt;Go’s production advantages (when they matter)&lt;/h2&gt;
&lt;h3 id="1-packaging-simplicity-and-deployment-surface"&gt;1) Packaging simplicity and deployment surface&lt;/h3&gt;
&lt;p&gt;Go’s toolchain compiles code into an executable (&lt;code&gt;go build&lt;/code&gt;). [5] Go’s modern toolchain approach (including toolchain selection starting in recent Go releases) helps keep builds reproducible across environments. [6]&lt;/p&gt;
&lt;p&gt;In practice, Go services often ship as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;a single process&lt;/li&gt;
&lt;li&gt;a single container layer containing a single binary&lt;/li&gt;
&lt;li&gt;minimal runtime dependencies&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That tends to reduce:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;container image complexity&lt;/li&gt;
&lt;li&gt;“works on my machine” drift&lt;/li&gt;
&lt;li&gt;runtime patch surface area&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;This matters most when you operate many services&lt;/strong&gt; and want upgrades to be boring.&lt;/p&gt;
&lt;h3 id="2-fast-start-and-scale-events"&gt;2) Fast start and “scale events”&lt;/h3&gt;
&lt;p&gt;In real systems, performance isn’t only request/response speed, it’s also how the service behaves during:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;deployments&lt;/li&gt;
&lt;li&gt;autoscaling&lt;/li&gt;
&lt;li&gt;node drains&lt;/li&gt;
&lt;li&gt;crashes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Go services commonly start quickly because they don’t require JVM warmup/classloading/JIT compilation. (Exact numbers vary; measure your service.)&lt;/p&gt;
&lt;p&gt;Spring Boot can start fast enough for most use cases, but cold starts can become a visible factor when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;you scale from zero frequently (serverless-like patterns)&lt;/li&gt;
&lt;li&gt;you do aggressive HPA scaling&lt;/li&gt;
&lt;li&gt;you run lots of short-lived jobs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Spring Boot also supports building &lt;strong&gt;native images&lt;/strong&gt; with GraalVM, which can materially improve startup and memory in some cases, but introduces different tradeoffs (build time, reflection limits, operational differences). [3][4]&lt;/p&gt;
&lt;h3 id="3-resource-envelope-predictability"&gt;3) Resource envelope predictability&lt;/h3&gt;
&lt;p&gt;For many “API gateway / orchestration / integration” services, CPU isn’t the bottleneck. Latency, network, and downstream behavior are.&lt;/p&gt;
&lt;p&gt;Go’s strengths here tend to be:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;predictable concurrency behavior&lt;/li&gt;
&lt;li&gt;straightforward backpressure patterns (bounded queues, semaphores)&lt;/li&gt;
&lt;li&gt;fewer runtime tuning knobs compared to JVM-heavy stacks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is not “Go always uses less RAM.” It’s “Go often gives you a tighter baseline envelope for simpler services, which improves scheduling density.”&lt;/p&gt;
&lt;h3 id="4-cloud-native-ergonomics-minimalism-wins-over-time"&gt;4) Cloud-native ergonomics: minimalism wins over time&lt;/h3&gt;
&lt;p&gt;Enterprise services accrete complexity over years. The less your runtime depends on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;classpath complexity&lt;/li&gt;
&lt;li&gt;reflection-driven magic&lt;/li&gt;
&lt;li&gt;extensive framework graphs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;…the easier it is to keep production surprises rare.&lt;/p&gt;
&lt;p&gt;Go’s bias toward explicit wiring tends to help with long-term operability, especially in platform/API layers where consistency matters.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="where-spring-boot-is-still-the-right-tool"&gt;Where Spring Boot is still the right tool&lt;/h2&gt;
&lt;p&gt;Spring Boot exists for a reason, and in many enterprises it’s still the correct default:&lt;/p&gt;
&lt;h3 id="1-ecosystem-and-starter-leverage"&gt;1) Ecosystem and “starter” leverage&lt;/h3&gt;
&lt;p&gt;Spring Boot’s opinionated defaults and starter ecosystem are an enormous accelerator for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;auth (OAuth2/OIDC)&lt;/li&gt;
&lt;li&gt;data access and ORM patterns&lt;/li&gt;
&lt;li&gt;enterprise integrations&lt;/li&gt;
&lt;li&gt;standardized configuration and profiles&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Spring Boot is explicitly designed to minimize configuration and help you ship “production-grade” applications quickly. [1]&lt;/p&gt;
&lt;p&gt;If you already have:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;shared Spring libraries&lt;/li&gt;
&lt;li&gt;internal Spring starters&lt;/li&gt;
&lt;li&gt;company-wide Spring conventions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;…then choosing Go for “purity” can be expensive in human terms.&lt;/p&gt;
&lt;h3 id="2-jvm-performance-can-be-excellent"&gt;2) JVM performance can be excellent&lt;/h3&gt;
&lt;p&gt;For long-lived services under sustained load, HotSpot JIT compilation can deliver extremely strong performance, sometimes outperforming Go in CPU-bound or allocation-sensitive scenarios.&lt;/p&gt;
&lt;p&gt;It’s a mistake to assume “compiled native binary” automatically means “faster.” The real question is: &lt;strong&gt;p99 latency, throughput per core, and behavior under GC pressure for your workload.&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="3-operational-maturity-and-tooling"&gt;3) Operational maturity and tooling&lt;/h3&gt;
&lt;p&gt;Spring Boot has well-worn operational patterns:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;actuator endpoints&lt;/li&gt;
&lt;li&gt;consistent configuration patterns&lt;/li&gt;
&lt;li&gt;deep tracing/profiling options&lt;/li&gt;
&lt;li&gt;broad community knowledge&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Also: if your org has deep Java on-call expertise, “operational simplicity” may already be solved socially.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="cloud-native-reality-images-cves-and-deploy-surface"&gt;Cloud-native reality: images, CVEs, and deploy surface&lt;/h2&gt;
&lt;h3 id="distroless-is-not-a-go-only-advantage"&gt;Distroless is not a Go-only advantage&lt;/h3&gt;
&lt;p&gt;A common Go pattern is “static binary + scratch/distroless.” But distroless images exist for Java too.&lt;/p&gt;
&lt;p&gt;Distroless images contain only the application and its runtime dependencies, with no package manager and no shell, reducing attack surface. [9] The distroless project includes Java images as well. [10]&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Operational implication:&lt;/strong&gt; smaller, simpler images usually mean:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;faster pulls and rollouts&lt;/li&gt;
&lt;li&gt;fewer things to patch&lt;/li&gt;
&lt;li&gt;fewer “shell inside container” habits (a feature, not a bug)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Whether you ship Go or Spring Boot, you can adopt hardened bases.&lt;/p&gt;
&lt;h3 id="two-dockerfile-patterns-illustrative"&gt;Two Dockerfile patterns (illustrative)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Go (multi-stage + distroless):&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-Dockerfile" data-lang="Dockerfile"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;golang:1.22-alpine&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;build&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;/src&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;COPY&lt;/span&gt; go.mod go.sum ./&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;RUN&lt;/span&gt; go mod download&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;COPY&lt;/span&gt; . .&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;RUN&lt;/span&gt; &lt;span class="nv"&gt;CGO_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="nv"&gt;GOOS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;linux go build -trimpath -ldflags &lt;span class="s2"&gt;&amp;#34;-s -w&amp;#34;&lt;/span&gt; -o /out/api ./cmd/api&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;gcr.io/distroless/static-debian12:nonroot&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;COPY&lt;/span&gt; --from&lt;span class="o"&gt;=&lt;/span&gt;build /out/api /api&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;nonroot:nonroot&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;/api&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Spring Boot (JAR + distroless Java):&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-Dockerfile" data-lang="Dockerfile"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;eclipse-temurin:21-jdk&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;build&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;/src&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;COPY&lt;/span&gt; . .&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;RUN&lt;/span&gt; ./mvnw -DskipTests package&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;gcr.io/distroless/java21-debian12:nonroot&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;COPY&lt;/span&gt; --from&lt;span class="o"&gt;=&lt;/span&gt;build /src/target/app.jar /app.jar&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;nonroot:nonroot&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="err"&gt;&lt;/span&gt;&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;java&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;-jar&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;/app.jar&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The important part isn’t the exact base image, it&amp;rsquo;s the &lt;em&gt;principle&lt;/em&gt;: reduce image surface area and keep the deploy artifact boring.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="observability-and-operations"&gt;Observability and operations&lt;/h2&gt;
&lt;p&gt;Both ecosystems are strong here, but they differ in “how quickly can I get real telemetry.”&lt;/p&gt;
&lt;h3 id="opentelemetry-support"&gt;OpenTelemetry support&lt;/h3&gt;
&lt;p&gt;OpenTelemetry is the vendor-neutral standard for traces/metrics/logs. [11]&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Go language docs: SDK + instrumentation guidance. [11]&lt;/li&gt;
&lt;li&gt;Java language docs: SDK + instrumentation guidance. [12]&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="javas-advantage-zero-code-instrumentation"&gt;Java’s advantage: zero-code instrumentation&lt;/h3&gt;
&lt;p&gt;The OpenTelemetry Java agent can attach to Java applications and automatically instrument popular libraries via bytecode injection. [13] The OpenTelemetry Java instrumentation project provides the agent and broad library coverage. [14]&lt;/p&gt;
&lt;p&gt;Practical implication: &lt;strong&gt;you can often get useful traces without touching code.&lt;/strong&gt; That’s a meaningful ops advantage in large enterprises.&lt;/p&gt;
&lt;h3 id="gos-reality-explicit-instrumentation-plus-growing-options"&gt;Go’s reality: explicit instrumentation (plus growing options)&lt;/h3&gt;
&lt;p&gt;Go’s OpenTelemetry SDK support is strong. [11] Go auto-instrumentation options exist and are improving, but your fastest path today is still typically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;instrument key inbound/outbound edges in code&lt;/li&gt;
&lt;li&gt;standardize middleware across services&lt;/li&gt;
&lt;li&gt;treat telemetry as part of the API contract&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That’s not bad. It’s just a different default.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="a-decision-matrix"&gt;A decision matrix&lt;/h2&gt;
&lt;p&gt;Use this as a starting point, not a rule.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Constraint / Goal&lt;/th&gt;
&lt;th&gt;Go tends to win&lt;/th&gt;
&lt;th&gt;Spring Boot tends to win&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Many small services, high density&lt;/td&gt;
&lt;td&gt;✅ smaller baseline envelopes often help&lt;/td&gt;
&lt;td&gt;⚠️ can be heavier per-service&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fast scale-from-zero, frequent redeploys&lt;/td&gt;
&lt;td&gt;✅ typically quick startup&lt;/td&gt;
&lt;td&gt;✅ with care; ✅✅ with native image tradeoffs [3][4]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise integration breadth&lt;/td&gt;
&lt;td&gt;⚠️ you build more glue yourself&lt;/td&gt;
&lt;td&gt;✅ Spring ecosystem leverage [1]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Team expertise&lt;/td&gt;
&lt;td&gt;✅ if Go is your platform standard&lt;/td&gt;
&lt;td&gt;✅ if Java/Spring is your standard&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;“Boring deployments”&lt;/td&gt;
&lt;td&gt;✅ single binary patterns&lt;/td&gt;
&lt;td&gt;✅ well-trodden JVM patterns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Zero-code observability&lt;/td&gt;
&lt;td&gt;⚠️ emerging&lt;/td&gt;
&lt;td&gt;✅ OTel Java agent maturity [13][14]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Long-lived CPU-heavy services&lt;/td&gt;
&lt;td&gt;✅ sometimes&lt;/td&gt;
&lt;td&gt;✅ JVM can be extremely strong&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr&gt;
&lt;h2 id="how-to-validate-with-a-real-experiment"&gt;How to validate with a real experiment&lt;/h2&gt;
&lt;p&gt;If you want a decision you can defend, run a 2-4 hour experiment:&lt;/p&gt;
&lt;h3 id="1-define-a-representative-endpoint-mix"&gt;1) Define a representative endpoint mix&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;1 simple “health/read” endpoint&lt;/li&gt;
&lt;li&gt;1 endpoint that hits your DB&lt;/li&gt;
&lt;li&gt;1 endpoint that calls a downstream HTTP service&lt;/li&gt;
&lt;li&gt;1 endpoint with payload validation + auth&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="2-measure-the-four-numbers-that-matter"&gt;2) Measure the four numbers that matter&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Startup time&lt;/strong&gt; (cold start to ready)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Steady-state RSS&lt;/strong&gt; at idle&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;p95 / p99 latency&lt;/strong&gt; under load&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Error rate&lt;/strong&gt; under load + partial downstream failure&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="3-run-the-same-load-and-failure-profile"&gt;3) Run the same load and failure profile&lt;/h3&gt;
&lt;p&gt;Use the same:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;container runtime&lt;/li&gt;
&lt;li&gt;resource requests/limits&lt;/li&gt;
&lt;li&gt;ingress configuration&lt;/li&gt;
&lt;li&gt;downstream simulators&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="4-compare-operational-work-not-only-performance"&gt;4) Compare operational work, not only performance&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;How painful is debugging?&lt;/li&gt;
&lt;li&gt;How much config is required?&lt;/li&gt;
&lt;li&gt;How quickly can your team ship fixes safely?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is where enterprise reality lives.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="common-failure-modes"&gt;Common failure modes&lt;/h2&gt;
&lt;h3 id="go-pitfalls"&gt;Go pitfalls&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Teams reinvent frameworks inconsistently across services.&lt;/li&gt;
&lt;li&gt;Too much “just a handler” code without shared middleware for auth, limits, tracing, and error handling.&lt;/li&gt;
&lt;li&gt;Ignoring backpressure (unbounded goroutines) → memory blowups.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="spring-boot-pitfalls"&gt;Spring Boot pitfalls&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Default dependency graphs grow quietly until startup time and memory become a problem.&lt;/li&gt;
&lt;li&gt;Classpath/auto-config complexity makes “why did it do that?” debugging expensive.&lt;/li&gt;
&lt;li&gt;Container runtime tuning gets deferred, then becomes urgent during cost reviews.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="both-ecosystems"&gt;Both ecosystems&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;No explicit timeouts (inbound and outbound).&lt;/li&gt;
&lt;li&gt;No limits or budgets.&lt;/li&gt;
&lt;li&gt;No telemetry until after the first incident.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="closing-thought"&gt;Closing thought&lt;/h2&gt;
&lt;p&gt;If your enterprise APIs are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;small, numerous, latency-sensitive, and cost-sensitive&lt;br&gt;
…Go is often a strong default.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your enterprise APIs are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;integration-heavy, domain-rich, and built on existing Spring conventions&lt;br&gt;
…Spring Boot is usually the shortest path to “production-grade.”&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The best answer is the one you can operate confidently, on call, at scale.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;Spring Boot project overview:
&lt;/li&gt;
&lt;li&gt;Spring Boot reference: Graceful Shutdown:
&lt;/li&gt;
&lt;li&gt;Spring Boot reference: GraalVM Native Images:
&lt;/li&gt;
&lt;li&gt;GraalVM guide: Build a Spring Boot app into a native executable:
&lt;/li&gt;
&lt;li&gt;Go tutorial: Compile and install the application (&lt;code&gt;go build&lt;/code&gt; produces an executable):
&lt;/li&gt;
&lt;li&gt;Go docs: Toolchains and the &lt;code&gt;go&lt;/code&gt; command:
&lt;/li&gt;
&lt;li&gt;Kubernetes docs: Resource Management for Pods and Containers (requests/limits):
&lt;/li&gt;
&lt;li&gt;Google Cloud: Kubernetes best practices for resource requests and limits:
&lt;/li&gt;
&lt;li&gt;Distroless container images (project overview):
&lt;/li&gt;
&lt;li&gt;Distroless Java images:
&lt;/li&gt;
&lt;li&gt;OpenTelemetry Go docs:
&lt;/li&gt;
&lt;li&gt;OpenTelemetry Java docs:
&lt;/li&gt;
&lt;li&gt;OpenTelemetry Java Agent (zero-code):
&lt;/li&gt;
&lt;li&gt;OpenTelemetry Java instrumentation (agent JAR + library coverage):
&lt;/li&gt;
&lt;/ol&gt;</content:encoded></item><item><title>Chapter 10: Stop Rules + Pitfalls: When to Upgrade, Bail, or Go Manual</title><link>https://roygabriel.dev/blog/llm-development-guide/09-stop-rules-pitfalls/</link><pubDate>Sat, 31 Jan 2026 23:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/llm-development-guide/09-stop-rules-pitfalls/</guid><description>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 10 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/08-security-sensitive-data/"&gt;Chapter 9: Security &amp;amp; Sensitive Data: Sanitize, Don&amp;rsquo;t Paste Secrets&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/10-measuring-success/"&gt;Chapter 11: Measuring Success: Solo + Team Metrics Without Fake Precision&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to avoid the two common failure outcomes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Spending hours fighting the model.&lt;/li&gt;
&lt;li&gt;Shipping output you can&amp;rsquo;t review.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You&amp;rsquo;ll do it with explicit stop rules, upgrade triggers, and a short recovery checklist.&lt;/p&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;If the change is under a minute manually, do it manually.&lt;/li&gt;
&lt;li&gt;If you can&amp;rsquo;t review the output competently, don&amp;rsquo;t ship it.&lt;/li&gt;
&lt;li&gt;If you&amp;rsquo;re on your third attempt for the same logical unit, upgrade or re-scope.&lt;/li&gt;
&lt;li&gt;Add verification steps to plans and prompts so &amp;ldquo;done&amp;rdquo; is testable.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#stop-rules"&gt;Stop rules&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#top-pitfalls"&gt;Top pitfalls&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#recovery-checklist"&gt;Recovery checklist&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="stop-rules"&gt;Stop rules&lt;/h2&gt;
&lt;p&gt;These are pragmatic defaults. Tune them to your environment.&lt;/p&gt;</description><content:encoded>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 10 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/08-security-sensitive-data/"&gt;Chapter 9: Security &amp;amp; Sensitive Data: Sanitize, Don&amp;rsquo;t Paste Secrets&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/10-measuring-success/"&gt;Chapter 11: Measuring Success: Solo + Team Metrics Without Fake Precision&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to avoid the two common failure outcomes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Spending hours fighting the model.&lt;/li&gt;
&lt;li&gt;Shipping output you can&amp;rsquo;t review.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You&amp;rsquo;ll do it with explicit stop rules, upgrade triggers, and a short recovery checklist.&lt;/p&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;If the change is under a minute manually, do it manually.&lt;/li&gt;
&lt;li&gt;If you can&amp;rsquo;t review the output competently, don&amp;rsquo;t ship it.&lt;/li&gt;
&lt;li&gt;If you&amp;rsquo;re on your third attempt for the same logical unit, upgrade or re-scope.&lt;/li&gt;
&lt;li&gt;Add verification steps to plans and prompts so &amp;ldquo;done&amp;rdquo; is testable.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#stop-rules"&gt;Stop rules&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#top-pitfalls"&gt;Top pitfalls&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#recovery-checklist"&gt;Recovery checklist&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="stop-rules"&gt;Stop rules&lt;/h2&gt;
&lt;p&gt;These are pragmatic defaults. Tune them to your environment.&lt;/p&gt;
&lt;h3 id="stop-rule-1-tiny-changes"&gt;Stop rule 1: tiny changes&lt;/h3&gt;
&lt;p&gt;If it is a tiny change (one line, one rename, one version bump), do it manually.&lt;/p&gt;
&lt;p&gt;LLM overhead is real:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You still have to explain.&lt;/li&gt;
&lt;li&gt;You still have to review.&lt;/li&gt;
&lt;li&gt;You still have to verify.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="stop-rule-2-you-cant-review-it"&gt;Stop rule 2: you can&amp;rsquo;t review it&lt;/h3&gt;
&lt;p&gt;Never commit code you could not explain in a review.&lt;/p&gt;
&lt;p&gt;If you don&amp;rsquo;t understand the domain:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;break the work into smaller pieces you can understand, or&lt;/li&gt;
&lt;li&gt;involve a reviewer who does.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="stop-rule-3-youre-fighting-output-quality"&gt;Stop rule 3: you&amp;rsquo;re fighting output quality&lt;/h3&gt;
&lt;p&gt;The 10-minute rule:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If you&amp;rsquo;ve spent about 10 minutes fighting the output, stop.&lt;/li&gt;
&lt;li&gt;Upgrade the model tier, or shrink the scope to a smaller logical unit.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="stop-rule-4-high-risk-code-needs-extra-caution"&gt;Stop rule 4: high-risk code needs extra caution&lt;/h3&gt;
&lt;p&gt;Be cautious with:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Authentication and authorization.&lt;/li&gt;
&lt;li&gt;Cryptography.&lt;/li&gt;
&lt;li&gt;Payment flows.&lt;/li&gt;
&lt;li&gt;Input validation.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can still use LLMs, but the bar for review and verification is higher.&lt;/p&gt;
&lt;h2 id="top-pitfalls"&gt;Top pitfalls&lt;/h2&gt;
&lt;p&gt;These show up repeatedly.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Trusting output without review.&lt;/li&gt;
&lt;li&gt;Skipping planning.&lt;/li&gt;
&lt;li&gt;Not providing reference implementations.&lt;/li&gt;
&lt;li&gt;Letting sessions run too long.&lt;/li&gt;
&lt;li&gt;Scope creep mid-session.&lt;/li&gt;
&lt;li&gt;Vague prompts.&lt;/li&gt;
&lt;li&gt;Not capturing decisions.&lt;/li&gt;
&lt;li&gt;No verification step.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A simple rule:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If you wouldn&amp;rsquo;t merge a junior developer&amp;rsquo;s PR without review, don&amp;rsquo;t merge LLM output without review.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="recovery-checklist"&gt;Recovery checklist&lt;/h2&gt;
&lt;p&gt;When things go wrong:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Stop iterating on bad output.&lt;/li&gt;
&lt;li&gt;Decide what kind of problem it is:
&lt;ul&gt;
&lt;li&gt;prompt problem,&lt;/li&gt;
&lt;li&gt;model capability problem,&lt;/li&gt;
&lt;li&gt;task is a poor fit.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Simplify:
&lt;ul&gt;
&lt;li&gt;smaller logical unit,&lt;/li&gt;
&lt;li&gt;more references,&lt;/li&gt;
&lt;li&gt;clearer constraints.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Fresh session if context has drifted.&lt;/li&gt;
&lt;li&gt;Manual fallback is a valid outcome.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="verification"&gt;Verification&lt;/h2&gt;
&lt;p&gt;Create a one-page stop-rules file so you can apply this consistently across tasks:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;mkdir -p work-notes
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cat &amp;gt; work-notes/stop-rules.md &lt;span class="s"&gt;&amp;lt;&amp;lt;&amp;#39;MD&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;# Stop Rules (Personal Defaults)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Manual first
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- If change is &amp;lt;= 1 minute manually, do it manually.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Upgrade triggers
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- Third attempt on same logical unit.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- Repeated misunderstandings.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- Output ignores constraints.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Bail triggers
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- I cannot review this competently.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- Task requires live debugging with runtime state.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- Sensitive data would be required to reproduce.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Required gates
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- Verification commands exist in plan.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- Verification commands exist in prompt.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- Work notes updated before continuing.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;MD&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Expected result:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You have a written policy you can apply without debating every time.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Continue -&amp;gt; &lt;a href="https://roygabriel.dev/blog/llm-development-guide/10-measuring-success/"&gt;Chapter 11: Measuring Success: Solo + Team Metrics Without Fake Precision&lt;/a&gt;
&lt;/p&gt;</content:encoded></item><item><title>Stop Shipping Slide Decks</title><link>https://roygabriel.dev/blog/stop-shipping-slide-decks/</link><pubDate>Sat, 31 Jan 2026 11:15:00 -0500</pubDate><guid>https://roygabriel.dev/blog/stop-shipping-slide-decks/</guid><description>&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;Position:&lt;/strong&gt; This is not &amp;ldquo;documentation bad.&amp;rdquo;
This is &amp;ldquo;documentation is a tool.&amp;rdquo; If it increases lead time, hides truth, or replaces learning, it&amp;rsquo;s not helping.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;In software, the real &amp;ldquo;source of truth&amp;rdquo; is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;running systems&lt;/li&gt;
&lt;li&gt;code and configuration&lt;/li&gt;
&lt;li&gt;production telemetry&lt;/li&gt;
&lt;li&gt;incident history&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Documentation should reduce uncertainty and speed up decisions. But two artifacts routinely do the opposite in large organizations:&lt;/p&gt;</description><content:encoded>
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;Position:&lt;/strong&gt; This is not &amp;ldquo;documentation bad.&amp;rdquo;
This is &amp;ldquo;documentation is a tool.&amp;rdquo; If it increases lead time, hides truth, or replaces learning, it&amp;rsquo;s not helping.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;In software, the real &amp;ldquo;source of truth&amp;rdquo; is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;running systems&lt;/li&gt;
&lt;li&gt;code and configuration&lt;/li&gt;
&lt;li&gt;production telemetry&lt;/li&gt;
&lt;li&gt;incident history&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Documentation should reduce uncertainty and speed up decisions. But two artifacts routinely do the opposite in large organizations:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;the 40-page slide deck&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;the Word doc living somewhere in SharePoint that nobody can find&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;These artifacts often become &lt;em&gt;deliverables&lt;/em&gt; - a substitute for building. They make it possible to spend months &amp;ldquo;progressing&amp;rdquo; without ever encountering reality.&lt;/p&gt;
&lt;p&gt;And here&amp;rsquo;s the part most orgs miss:&lt;/p&gt;
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;If you&amp;rsquo;re going to fail, you want to fail &lt;strong&gt;quickly and cheaply&lt;/strong&gt;, not slowly and expensively. [4]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That doesn&amp;rsquo;t mean reckless shipping. It means running a tight learning loop and letting reality correct you early - before you&amp;rsquo;ve sunk quarters of time into the wrong solution.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Decks are great for storytelling. They are bad as an engineering system of record.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;SharePoint architecture docs&amp;rdquo; become a &lt;strong&gt;document cemetery&lt;/strong&gt;: hard to find, hard to diff, and easy to ignore.&lt;/li&gt;
&lt;li&gt;The Agile Manifesto explicitly values &lt;strong&gt;working software over comprehensive documentation&lt;/strong&gt;. [1] And one Agile principle states that working software is the primary measure of progress. [2]&lt;/li&gt;
&lt;li&gt;Replace decks/docs-as-deliverables with:&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;RFC-lite&lt;/strong&gt; (1-2 pages) + a &lt;strong&gt;running thin slice&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ADRs&lt;/strong&gt; (Architecture Decision Records) to capture decisions + tradeoffs [5][6]&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Docs-as-code&lt;/strong&gt; (Markdown in the repo, reviewed like code)&lt;/li&gt;
&lt;li&gt;diagrams that are versioned and easy to update&lt;/li&gt;
&lt;li&gt;Measure improvement with system outcomes (lead time, deploy frequency, change failure rate, MTTR). [3]&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="contents"&gt;Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#pattern-1-deck-driven-development"&gt;Pattern 1: Deck-driven development&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#pattern-2-sharepoint-document-cemeteries"&gt;Pattern 2: SharePoint document cemeteries&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#pattern-3-architecture-as-narrative-not-decisions"&gt;Pattern 3: Architecture as narrative, not decisions&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#pattern-4-design-phase-gating"&gt;Pattern 4: &amp;ldquo;Design phase&amp;rdquo; gating&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#pattern-5-documentation-that-never-gets-pruned"&gt;Pattern 5: Documentation that never gets pruned&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#what-to-do-instead-a-documentation-system-that-ships"&gt;What to do instead: a documentation system that ships&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification-how-you-know-its-working"&gt;Verification: how you know it&amp;rsquo;s working&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-practical-checklist"&gt;A practical checklist&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#references"&gt;References&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="pattern-1-deck-driven-development"&gt;Pattern 1: Deck-driven development&lt;/h2&gt;
&lt;h3 id="what-it-looks-like"&gt;What it looks like&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;A 40-page deck is created to describe a system that doesn&amp;rsquo;t exist yet.&lt;/li&gt;
&lt;li&gt;The deck gets reviewed by multiple groups.&lt;/li&gt;
&lt;li&gt;Approval is treated as progress.&lt;/li&gt;
&lt;li&gt;When implementation starts, the world has changed - or key constraints were missed.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="why-it-exists"&gt;Why it exists&lt;/h3&gt;
&lt;p&gt;Decks are socially useful:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;they compress complexity into a narrative&lt;/li&gt;
&lt;li&gt;they help leaders &amp;ldquo;see&amp;rdquo; a plan&lt;/li&gt;
&lt;li&gt;they make uncertainty feel controlled&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="the-hidden-tax"&gt;The hidden tax&lt;/h3&gt;
&lt;p&gt;Decks are a poor engineering artifact because they&amp;rsquo;re:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;low fidelity&lt;/strong&gt;: they rarely contain executable truth&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;hard to maintain&lt;/strong&gt;: updates are manual and usually lag reality&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;hard to diff&lt;/strong&gt;: you can&amp;rsquo;t easily review what changed and why&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;easy to perform&lt;/strong&gt;: a deck can look complete while the design is still untested&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;not tied to code&lt;/strong&gt;: no direct path from &amp;ldquo;decision&amp;rdquo; -&amp;gt; &amp;ldquo;implementation&amp;rdquo; -&amp;gt; &amp;ldquo;verification&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The worst outcome isn&amp;rsquo;t that the deck is wrong. It&amp;rsquo;s that the deck delays the point where you discover what&amp;rsquo;s wrong.&lt;/p&gt;
&lt;h3 id="the-replacement-pattern"&gt;The replacement pattern&lt;/h3&gt;
&lt;p&gt;Use decks for storytelling &lt;strong&gt;after&lt;/strong&gt; you have reality. Use engineering artifacts to discover reality.&lt;/p&gt;
&lt;p&gt;A strong default:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;RFC-lite&lt;/strong&gt; (1-2 pages)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;a runnable thin slice&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;measurable verification&lt;/strong&gt; (latency, cost envelope, failure mode)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This aligns with Agile&amp;rsquo;s emphasis on working software as a real measure of progress. [2]&lt;/p&gt;
&lt;h3 id="transition-step-low-drama"&gt;Transition step (low drama)&lt;/h3&gt;
&lt;p&gt;Replace &amp;ldquo;deck required for approval&amp;rdquo; with &amp;ldquo;evidence required for approval&amp;rdquo;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;link to the RFC&lt;/li&gt;
&lt;li&gt;link to a running demo / branch / sandbox&lt;/li&gt;
&lt;li&gt;explicit constraints + tradeoffs&lt;/li&gt;
&lt;li&gt;an exit criteria checklist for the slice&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="pattern-2-sharepoint-document-cemeteries"&gt;Pattern 2: SharePoint document cemeteries&lt;/h2&gt;
&lt;h3 id="what-it-looks-like-1"&gt;What it looks like&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Architecture docs exist as Word/PDF files in SharePoint.&lt;/li&gt;
&lt;li&gt;Multiple versions exist (&amp;ldquo;Final_v7_REAL_FINAL.docx&amp;rdquo;).&lt;/li&gt;
&lt;li&gt;Search works poorly unless you already know what to search for.&lt;/li&gt;
&lt;li&gt;Nobody updates the doc because it&amp;rsquo;s painful and risky (&amp;ldquo;what if I change the blessed doc?&amp;rdquo;).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="why-it-exists-1"&gt;Why it exists&lt;/h3&gt;
&lt;p&gt;It&amp;rsquo;s an enterprise default:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;SharePoint is &amp;ldquo;official&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Word docs feel formal&lt;/li&gt;
&lt;li&gt;it&amp;rsquo;s familiar to non-engineering stakeholders&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="the-hidden-tax-1"&gt;The hidden tax&lt;/h3&gt;
&lt;p&gt;SharePoint docs typically fail at the things engineering needs most:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;discoverability&lt;/strong&gt; (people don&amp;rsquo;t know where to look)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ownership&lt;/strong&gt; (no clear maintainer)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;reviewability&lt;/strong&gt; (diffs and PR discussion are weak)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;linking to reality&lt;/strong&gt; (code, configs, dashboards, runbooks)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;keeping current&lt;/strong&gt; (documentation drift becomes the norm)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So teams stop trusting docs and rely on tribal knowledge - until they page someone at 2 a.m.&lt;/p&gt;
&lt;h3 id="the-replacement-pattern-1"&gt;The replacement pattern&lt;/h3&gt;
&lt;p&gt;Treat documentation as part of the codebase:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Markdown in the repo&lt;/li&gt;
&lt;li&gt;reviewed via PR like code&lt;/li&gt;
&lt;li&gt;versioned with implementation&lt;/li&gt;
&lt;li&gt;linked to:&lt;/li&gt;
&lt;li&gt;APIs (OpenAPI specs)&lt;/li&gt;
&lt;li&gt;dashboards&lt;/li&gt;
&lt;li&gt;runbooks&lt;/li&gt;
&lt;li&gt;incident writeups&lt;/li&gt;
&lt;li&gt;ADRs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Google&amp;rsquo;s documentation best practices make the point directly: a small set of fresh, accurate docs is better than a large pile in disrepair. [7]&lt;/p&gt;
&lt;h3 id="transition-step"&gt;Transition step&lt;/h3&gt;
&lt;p&gt;You don&amp;rsquo;t have to &amp;ldquo;migrate all docs.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Start with a triage:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Identify the top 10 documents people actually need.&lt;/li&gt;
&lt;li&gt;Recreate them as Markdown in a &lt;code&gt;docs/&lt;/code&gt; folder with an index.&lt;/li&gt;
&lt;li&gt;Leave the rest as archived references, not living truth.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="pattern-3-architecture-as-narrative-not-decisions"&gt;Pattern 3: Architecture as narrative, not decisions&lt;/h2&gt;
&lt;h3 id="what-it-looks-like-2"&gt;What it looks like&lt;/h3&gt;
&lt;p&gt;The doc describes a target architecture but doesn&amp;rsquo;t answer:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;why this approach?&lt;/li&gt;
&lt;li&gt;what alternatives were considered?&lt;/li&gt;
&lt;li&gt;what tradeoffs were accepted?&lt;/li&gt;
&lt;li&gt;what constraints matter most?&lt;/li&gt;
&lt;li&gt;what did we decide not to do?&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="why-it-exists-2"&gt;Why it exists&lt;/h3&gt;
&lt;p&gt;Narratives are easier than decision logs. It&amp;rsquo;s simpler to write &amp;ldquo;the system will&amp;hellip;&amp;rdquo; than to record the messy reality of tradeoffs.&lt;/p&gt;
&lt;h3 id="the-hidden-tax-2"&gt;The hidden tax&lt;/h3&gt;
&lt;p&gt;When decisions aren&amp;rsquo;t recorded, teams re-litigate them repeatedly. The same arguments come back every quarter - often because new people joined and the reasoning isn&amp;rsquo;t captured.&lt;/p&gt;
&lt;h3 id="the-replacement-pattern-adrs"&gt;The replacement pattern: ADRs&lt;/h3&gt;
&lt;p&gt;Use &lt;strong&gt;Architecture Decision Records (ADRs)&lt;/strong&gt;: short, structured notes that capture an important decision with its context and consequences. [5] The practice is commonly attributed to Michael Nygard&amp;rsquo;s 2011 write-up. [6]&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;ADRs are the opposite of a 40-slide deck:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;small&lt;/li&gt;
&lt;li&gt;specific&lt;/li&gt;
&lt;li&gt;diffable&lt;/li&gt;
&lt;li&gt;linkable to code changes&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="transition-step-1"&gt;Transition step&lt;/h3&gt;
&lt;p&gt;Start with one ADR per &amp;ldquo;architecturally significant decision&amp;rdquo;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;database choice&lt;/li&gt;
&lt;li&gt;messaging pattern&lt;/li&gt;
&lt;li&gt;tenancy model&lt;/li&gt;
&lt;li&gt;auth model&lt;/li&gt;
&lt;li&gt;deployment model&lt;/li&gt;
&lt;li&gt;data boundary decisions&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="pattern-4-design-phase-gating"&gt;Pattern 4: &amp;ldquo;Design phase&amp;rdquo; gating&lt;/h2&gt;
&lt;h3 id="what-it-looks-like-3"&gt;What it looks like&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;We can&amp;rsquo;t start implementation until the analysis is complete.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;The analysis expands to include every possible future case.&lt;/li&gt;
&lt;li&gt;The design grows more &amp;ldquo;complete&amp;rdquo; and less true.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="why-it-exists-3"&gt;Why it exists&lt;/h3&gt;
&lt;p&gt;Enterprises are understandably afraid of failure.&lt;/p&gt;
&lt;h3 id="the-hidden-tax-3"&gt;The hidden tax&lt;/h3&gt;
&lt;p&gt;This approach doesn&amp;rsquo;t eliminate failure. It defers it - making it more expensive.&lt;/p&gt;
&lt;p&gt;Lean Startup describes progress as validated learning and emphasizes moving quickly through a build-measure-learn loop. [4] The point isn&amp;rsquo;t startups. The point is learning fast when you&amp;rsquo;re uncertain.&lt;/p&gt;
&lt;h3 id="the-replacement-pattern-2"&gt;The replacement pattern&lt;/h3&gt;
&lt;p&gt;Timebox design, then validate with a thin slice:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;write the RFC-lite doc&lt;/li&gt;
&lt;li&gt;implement the smallest realistic end-to-end path&lt;/li&gt;
&lt;li&gt;measure the constraints&lt;/li&gt;
&lt;li&gt;then expand&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="transition-step-2"&gt;Transition step&lt;/h3&gt;
&lt;p&gt;Define &amp;ldquo;analysis exit criteria&amp;rdquo;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;measurable constraints validated (not theorized)&lt;/li&gt;
&lt;li&gt;spike code exists&lt;/li&gt;
&lt;li&gt;a plan for incremental rollout exists&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="pattern-5-documentation-that-never-gets-pruned"&gt;Pattern 5: Documentation that never gets pruned&lt;/h2&gt;
&lt;h3 id="what-it-looks-like-4"&gt;What it looks like&lt;/h3&gt;
&lt;p&gt;Docs accumulate but aren&amp;rsquo;t maintained:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;outdated architecture diagrams&lt;/li&gt;
&lt;li&gt;old runbooks&lt;/li&gt;
&lt;li&gt;stale onboarding guides&lt;/li&gt;
&lt;li&gt;dead links&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="why-it-exists-4"&gt;Why it exists&lt;/h3&gt;
&lt;p&gt;Pruning isn&amp;rsquo;t rewarded. Writing new docs feels productive; deleting old docs feels risky.&lt;/p&gt;
&lt;h3 id="the-hidden-tax-4"&gt;The hidden tax&lt;/h3&gt;
&lt;p&gt;Stale docs are worse than no docs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;they mislead&lt;/li&gt;
&lt;li&gt;they increase cognitive load&lt;/li&gt;
&lt;li&gt;they create false confidence&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="the-replacement-pattern-3"&gt;The replacement pattern&lt;/h3&gt;
&lt;p&gt;Adopt &amp;ldquo;minimum viable documentation&amp;rdquo; and prune regularly. [7]&lt;/p&gt;
&lt;p&gt;The rule I like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If a doc isn&amp;rsquo;t maintained, label it &lt;strong&gt;ARCHIVED&lt;/strong&gt; and explain why.&lt;/li&gt;
&lt;li&gt;If a doc is required, tie it to ownership and change workflow.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="transition-step-3"&gt;Transition step&lt;/h3&gt;
&lt;p&gt;Make docs part of PR hygiene:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;if the change affects behavior, docs update ships with it&lt;/li&gt;
&lt;li&gt;run link checks in CI&lt;/li&gt;
&lt;li&gt;keep an index page updated&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="what-to-do-instead-a-documentation-system-that-ships"&gt;What to do instead: a documentation system that ships&lt;/h2&gt;
&lt;p&gt;Here&amp;rsquo;s a simple &amp;ldquo;docs system&amp;rdquo; that works in practice.&lt;/p&gt;
&lt;h3 id="a-repo-structure-that-scales"&gt;A repo structure that scales&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;/README.md # entry point: what this is + how to run it
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;/docs/
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; index.md # &amp;#34;start here&amp;#34; documentation map
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; rfc/
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; 0001-tenancy-model.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; 0002-storage-approach.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; adr/
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; 0001-use-postgres.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; 0002-adopt-opentelemetry.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; architecture/
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; context.md # C4-ish: context + boundaries
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; containers.md # top-level services
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; deployment.md # runtime &amp;amp; environments
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; runbooks/
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; oncall.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; incident-response.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; api/
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; openapi.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="replace-40-slides-with-two-artifacts"&gt;Replace 40 slides with two artifacts&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;RFC-lite (1-2 pages)&lt;/strong&gt;: the &amp;ldquo;what&amp;rdquo; and &amp;ldquo;why&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Thin slice demo&lt;/strong&gt;: the reality check&lt;/li&gt;
&lt;/ol&gt;
&lt;h4 id="rfc-lite-template-copypaste"&gt;RFC-lite template (copy/paste)&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-markdown" data-lang="markdown"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;# RFC: &amp;lt;title&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Problem
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;What are we trying to solve? Who is affected?
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Constraints
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;Latency, cost, compliance, tenancy, uptime, environments.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Proposal
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;What are we building? What does &amp;#34;done&amp;#34; mean?
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Alternatives considered
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;Option A / B / C with short tradeoffs.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Risks and mitigations
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;What could go wrong? How will we contain blast radius?
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Verification
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;How will we measure success in production?
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4 id="adr-template-copypaste"&gt;ADR template (copy/paste)&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-markdown" data-lang="markdown"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;# ADR-XXXX: &amp;lt;decision&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Status
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;Proposed | Accepted | Deprecated
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Context
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;What drove this decision? What constraints matter?
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Decision
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;What did we decide?
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Consequences
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;What do we gain? What do we lose? What changes later?
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;hr&gt;
&lt;h2 id="verification-how-you-know-its-working"&gt;Verification: how you know it&amp;rsquo;s working&lt;/h2&gt;
&lt;p&gt;If you replace decks and doc cemeteries with real engineering artifacts, you should see:&lt;/p&gt;
&lt;h3 id="delivery-metrics-improve"&gt;Delivery metrics improve&lt;/h3&gt;
&lt;p&gt;Track the same system-level outcomes DORA promotes: lead time, deploy frequency, change failure rate, and time to restore service. [3]&lt;/p&gt;
&lt;h3 id="fewer-handoffs-and-fewer-alignment-meetings"&gt;Fewer handoffs and fewer &amp;ldquo;alignment meetings&amp;rdquo;&lt;/h3&gt;
&lt;p&gt;If teams can self-serve context from living docs, coordination cost drops.&lt;/p&gt;
&lt;h3 id="faster-first-reality"&gt;Faster &amp;ldquo;first reality&amp;rdquo;&lt;/h3&gt;
&lt;p&gt;A simple heuristic:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How long from idea -&amp;gt; first runnable thin slice?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If that number is months, the system is optimized for analysis, not learning.&lt;/p&gt;
&lt;h3 id="docs-stay-alive"&gt;Docs stay alive&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;docs updated alongside code&lt;/li&gt;
&lt;li&gt;fewer stale &amp;ldquo;final_v7&amp;rdquo; files&lt;/li&gt;
&lt;li&gt;fewer tribal-knowledge escalations&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="a-practical-checklist"&gt;A practical checklist&lt;/h2&gt;
&lt;p&gt;If you want to kill deck-driven delivery without starting a culture war:&lt;/p&gt;
&lt;h3 id="stop-treating-decks-as-deliverables"&gt;Stop treating decks as deliverables&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Architecture reviews require an RFC + a runnable slice.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Decks are optional; evidence is not.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="fix-document-discoverability"&gt;Fix document discoverability&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; One &lt;code&gt;docs/index.md&lt;/code&gt; that links to the docs that matter.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Make the repo the source of truth for technical docs.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="capture-decisions-not-fantasies"&gt;Capture decisions, not fantasies&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Add ADRs for major decisions and link them to PRs. [5][6]&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="timebox-analysis"&gt;Timebox analysis&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Set analysis exit criteria.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Optimize for early learning and quick failure when uncertainty is high. [4]&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="keep-docs-small-and-alive"&gt;Keep docs small and alive&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Prune regularly; archive what&amp;rsquo;s stale.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Run link checks in CI.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Treat docs like bonsai: maintained and trimmed, not accumulated. [7]&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;p&gt;[1] Manifesto for Agile Software Development (values; &amp;ldquo;Working software over comprehensive documentation&amp;rdquo;). &lt;a href="https://agilemanifesto.org/" target="_blank" rel="noopener noreferrer"&gt;https://agilemanifesto.org/&lt;/a&gt;
[2] Principles behind the Agile Manifesto (&amp;ldquo;Working software is the primary measure of progress&amp;rdquo;). &lt;a href="https://agilemanifesto.org/principles.html" target="_blank" rel="noopener noreferrer"&gt;https://agilemanifesto.org/principles.html&lt;/a&gt;
[3] DORA - &amp;ldquo;DORA&amp;rsquo;s software delivery performance metrics (guide)&amp;rdquo;. &lt;a href="https://dora.dev/guides/dora-metrics/" target="_blank" rel="noopener noreferrer"&gt;https://dora.dev/guides/dora-metrics/&lt;/a&gt;
[4] Lean Startup principles (Build-Measure-Learn; learning quickly; failing fast/cheaply as a concept). &lt;a href="https://theleanstartup.com/principles" target="_blank" rel="noopener noreferrer"&gt;https://theleanstartup.com/principles&lt;/a&gt;
[5] ADR - Architectural Decision Records (what ADRs are). &lt;a href="https://adr.github.io/" target="_blank" rel="noopener noreferrer"&gt;https://adr.github.io/&lt;/a&gt;
[6] Michael Nygard - &amp;ldquo;Documenting Architecture Decisions&amp;rdquo; (2011; ADR practice origin/popularization). &lt;a href="https://www.cognitect.com/blog/2011/11/15/documenting-architecture-decisions" target="_blank" rel="noopener noreferrer"&gt;https://www.cognitect.com/blog/2011/11/15/documenting-architecture-decisions&lt;/a&gt;
[7] Google Documentation Guide - Best practices (&amp;ldquo;Minimum Viable Documentation&amp;rdquo;; keep docs short, fresh, and pruned). &lt;a href="https://google.github.io/styleguide/docguide/best_practices.html" target="_blank" rel="noopener noreferrer"&gt;https://google.github.io/styleguide/docguide/best_practices.html&lt;/a&gt;
&lt;/p&gt;</content:encoded></item><item><title>MCP Servers in Production: Hardening, Backpressure, and Observability (Go)</title><link>https://roygabriel.dev/blog/mcp-servers-production-hardening-go/</link><pubDate>Sat, 31 Jan 2026 09:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/mcp-servers-production-hardening-go/</guid><description>&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;As-of note:&lt;/strong&gt; MCP is evolving. This article references the MCP specification versioned &lt;strong&gt;2025-11-25&lt;/strong&gt; and related docs; verify details against the current spec before shipping changes. [1][2][4]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;Most “agent demos” fail in production for boring reasons: missing timeouts, unbounded concurrency, ambiguous tool interfaces, and logging that accidentally turns into data exfiltration.&lt;/p&gt;
&lt;p&gt;An &lt;strong&gt;MCP server&lt;/strong&gt; isn’t “just an integration.” It’s a &lt;strong&gt;capability boundary&lt;/strong&gt; between an LLM host (IDE, desktop app, agent runner) and the real world: files, APIs, databases, tickets, home automation, and anything else you wire up. MCP uses JSON-RPC 2.0 messages over transports like stdio (local) and Streamable HTTP (remote). [1][2][5]&lt;/p&gt;</description><content:encoded>
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;As-of note:&lt;/strong&gt; MCP is evolving. This article references the MCP specification versioned &lt;strong&gt;2025-11-25&lt;/strong&gt; and related docs; verify details against the current spec before shipping changes. [1][2][4]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;Most “agent demos” fail in production for boring reasons: missing timeouts, unbounded concurrency, ambiguous tool interfaces, and logging that accidentally turns into data exfiltration.&lt;/p&gt;
&lt;p&gt;An &lt;strong&gt;MCP server&lt;/strong&gt; isn’t “just an integration.” It’s a &lt;strong&gt;capability boundary&lt;/strong&gt; between an LLM host (IDE, desktop app, agent runner) and the real world: files, APIs, databases, tickets, home automation, and anything else you wire up. MCP uses JSON-RPC 2.0 messages over transports like stdio (local) and Streamable HTTP (remote). [1][2][5]&lt;/p&gt;
&lt;p&gt;That means an MCP server is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;an API gateway for tools&lt;/li&gt;
&lt;li&gt;a policy enforcement point (whether you intended it or not)&lt;/li&gt;
&lt;li&gt;a reliability hotspot (tool calls are where latency and failure concentrate)&lt;/li&gt;
&lt;li&gt;a security hotspot (tools are where “read” becomes “exfil” and “write” becomes “impact”)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This post is a pragmatic checklist + a set of Go patterns to harden an MCP server so it keeps working when it’s under real load, and remains safe when the model gets “creative.”&lt;/p&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Treat tool inputs as &lt;strong&gt;untrusted&lt;/strong&gt;. Validate and constrain everything.&lt;/li&gt;
&lt;li&gt;Put &lt;strong&gt;budgets&lt;/strong&gt; everywhere: timeouts, concurrency limits, rate limits, and payload caps.&lt;/li&gt;
&lt;li&gt;Build for &lt;strong&gt;partial failure&lt;/strong&gt;: retries, idempotency keys, circuit breaking, fallbacks.&lt;/li&gt;
&lt;li&gt;Log like a security engineer: &lt;strong&gt;structured&lt;/strong&gt;, &lt;strong&gt;redacted&lt;/strong&gt;, &lt;strong&gt;auditable&lt;/strong&gt;, and &lt;strong&gt;useful&lt;/strong&gt;. [11]&lt;/li&gt;
&lt;li&gt;Instrument with traces/metrics early; “we’ll add telemetry later” is a trap. [13]&lt;/li&gt;
&lt;li&gt;Prefer Go for MCP servers because deployment and operational behavior are predictable: single binary, fast startup, structured concurrency via &lt;code&gt;context&lt;/code&gt;, and a strong standard library.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="contents"&gt;Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#a-production-mental-model-for-mcp-servers"&gt;A production mental model for MCP servers&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#threat-model-what-actually-goes-wrong"&gt;Threat model: what actually goes wrong&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#hardening-layer-1-identity-and-authorization"&gt;Hardening layer 1: identity and authorization&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#hardening-layer-2-tool-contracts-that-resist-ambiguity"&gt;Hardening layer 2: tool contracts that resist ambiguity&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#hardening-layer-3-budgets-and-backpressure"&gt;Hardening layer 3: budgets and backpressure&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#hardening-layer-4-safe-networking-and-ssrf-containment"&gt;Hardening layer 4: safe networking and SSRF containment&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#hardening-layer-5-observability-without-leaking-secrets"&gt;Hardening layer 5: observability without leaking secrets&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#hardening-layer-6-versioning-and-rollout-discipline"&gt;Hardening layer 6: versioning and rollout discipline&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-production-checklist"&gt;A production checklist&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#references"&gt;References&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="a-production-mental-model-for-mcp-servers"&gt;A production mental model for MCP servers&lt;/h2&gt;
&lt;p&gt;MCP’s docs describe a host (the AI application), a client (connector inside the host), and servers (capabilities/providers). Servers can be “local” (stdio) or “remote” (Streamable HTTP). [2][3]&lt;/p&gt;
&lt;p&gt;Here’s the production mental model that matters:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Your MCP server is a tool gateway.&lt;/strong&gt;&lt;br&gt;
Every tool is effectively an RPC method exposed to an agent. MCP uses JSON-RPC 2.0 semantics for requests/responses/notifications. [1][5]&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;LLM tool arguments are not trustworthy.&lt;/strong&gt;&lt;br&gt;
Even if the LLM is “helpful,” arguments can be malformed, overbroad, or dangerous, especially under prompt injection or user-provided hostile input.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The host UI is not a security boundary.&lt;/strong&gt;&lt;br&gt;
The spec emphasizes user consent and tool safety, but the protocol can’t enforce your policy for you. You still need server-side controls. [1]&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Transport changes your blast radius, not your responsibilities.&lt;/strong&gt;&lt;br&gt;
Stdio reduces network exposure, but doesn’t remove safety requirements. Streamable HTTP adds multi-client/multi-tenant concerns and requires real auth. [2][3]&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If you remember nothing else: treat the MCP server like a production API you’d be willing to put on call for.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="threat-model-what-actually-goes-wrong"&gt;Threat model: what actually goes wrong&lt;/h2&gt;
&lt;p&gt;When MCP servers cause incidents, it’s usually one of these:&lt;/p&gt;
&lt;h3 id="1-input-ambiguity--destructive-actions"&gt;1) Input ambiguity → destructive actions&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;A “delete” tool with optional filters&lt;/li&gt;
&lt;li&gt;A “run command” tool with free-form strings&lt;/li&gt;
&lt;li&gt;A “sync” tool that can touch thousands of objects&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Mitigation:&lt;/strong&gt; schema + semantic validation, safe defaults, two-phase commit patterns (preview then apply), and explicit “danger gates.”&lt;/p&gt;
&lt;h3 id="2-prompt-injection--tool-misuse"&gt;2) Prompt injection → tool misuse&lt;/h3&gt;
&lt;p&gt;The model can be tricked into calling tools with attacker-provided arguments. If your tool can read internal data or call internal APIs, you’ve created an exfil path.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Mitigation:&lt;/strong&gt; least privilege, allowlists, strong auth, egress controls, and redaction.&lt;/p&gt;
&lt;h3 id="3-ssrf--network-pivoting"&gt;3) SSRF / network pivoting&lt;/h3&gt;
&lt;p&gt;Any tool that fetches URLs, loads webhooks, or calls dynamic endpoints can be abused to hit internal networks or metadata endpoints. OWASP treats SSRF as a major category for a reason. [10]&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Mitigation:&lt;/strong&gt; deny-by-default networking (CIDR blocks, DNS/IP resolution checks, allowlisted destinations).&lt;/p&gt;
&lt;h3 id="4-unbounded-concurrency--resource-collapse"&gt;4) Unbounded concurrency → resource collapse&lt;/h3&gt;
&lt;p&gt;Agents can fire tools in parallel. Without limits you’ll blow up:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;API quotas&lt;/li&gt;
&lt;li&gt;DB connections&lt;/li&gt;
&lt;li&gt;CPU/memory&lt;/li&gt;
&lt;li&gt;downstream latency&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Mitigation:&lt;/strong&gt; per-tenant rate limiting, concurrency caps, queues, and backpressure.&lt;/p&gt;
&lt;h3 id="5-helpful-logs--data-leak"&gt;5) “Helpful logs” → data leak&lt;/h3&gt;
&lt;p&gt;Tool arguments and tool responses often contain secrets, tokens, or private data. If you log everything, you’ve built an involuntary data lake.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Mitigation:&lt;/strong&gt; structured + redacted logging, security logging guidelines, and minimal retention. [11][12]&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="hardening-layer-1-identity-and-authorization"&gt;Hardening layer 1: identity and authorization&lt;/h2&gt;
&lt;p&gt;If you run &lt;strong&gt;Streamable HTTP&lt;/strong&gt;, assume:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;multiple clients&lt;/li&gt;
&lt;li&gt;untrusted networks&lt;/li&gt;
&lt;li&gt;tokens will leak eventually&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;MCP’s architecture guidance recommends standard HTTP authentication methods and mentions OAuth as a recommended way to obtain tokens for remote servers. [2][3]&lt;/p&gt;
&lt;h3 id="practical-rules"&gt;Practical rules&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Authenticate every request.&lt;/strong&gt;&lt;br&gt;
Use bearer tokens or mTLS depending on environment.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Authorize per tool.&lt;/strong&gt;&lt;br&gt;
“Authenticated” ≠ “allowed to run &lt;code&gt;delete_everything&lt;/code&gt;”.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Prefer short-lived tokens&lt;/strong&gt; and rotate them. [12]&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Multi-tenant?&lt;/strong&gt; Put the tenant identity into:
&lt;ul&gt;
&lt;li&gt;auth token claims, or&lt;/li&gt;
&lt;li&gt;an explicit, validated tenant header (signed), then&lt;/li&gt;
&lt;li&gt;enforce it everywhere.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="go-pattern-a-minimal-auth-middleware-skeleton-http-transport"&gt;Go pattern: a minimal auth middleware skeleton (HTTP transport)&lt;/h3&gt;
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;This is &lt;em&gt;not&lt;/em&gt; a full MCP implementation, just the hardening pattern you’ll wrap around your MCP handler.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;// Pseudocode-ish middleware skeleton. Replace verifyToken with your auth logic.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;authMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Handler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Handler&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;HandlerFunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ResponseWriter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;strings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;TrimPrefix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Header&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#34;Authorization&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;Bearer &amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;&amp;#34;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;missing auth&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;StatusUnauthorized&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ident&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;verifyToken&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// includes tenant + scopes&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;invalid auth&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;StatusUnauthorized&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ctxKeyIdentity&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ident&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ServeHTTP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Key point:&lt;/strong&gt; authorization should happen &lt;em&gt;after&lt;/em&gt; you parse the requested tool name, but &lt;em&gt;before&lt;/em&gt; you execute anything.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="hardening-layer-2-tool-contracts-that-resist-ambiguity"&gt;Hardening layer 2: tool contracts that resist ambiguity&lt;/h2&gt;
&lt;p&gt;Most MCP tool failures are self-inflicted: tool interfaces are too vague.&lt;/p&gt;
&lt;h3 id="design-tools-like-production-apis"&gt;Design tools like production APIs&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Bad tool signature:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;run(command: string)&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Better:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;run_command(program: enum, args: string[], cwd: string, timeout_ms: int, dry_run: bool)&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Why it’s better:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;forces structure&lt;/li&gt;
&lt;li&gt;allows you to enforce allowlists&lt;/li&gt;
&lt;li&gt;gives you timeouts and safe defaults&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="add-a-preview--apply-flow-for-risky-tools"&gt;Add a “preview → apply” flow for risky tools&lt;/h3&gt;
&lt;p&gt;For any tool that writes data or triggers side effects, do a two-step approach:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;plan_*&lt;/code&gt; returns a machine-readable plan + a &lt;code&gt;plan_id&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;apply_*&lt;/code&gt; requires &lt;code&gt;plan_id&lt;/code&gt; and optional user confirmation token&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This mirrors how we run infra changes (plan/apply) and dramatically reduces accidental blast radius.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="hardening-layer-3-budgets-and-backpressure"&gt;Hardening layer 3: budgets and backpressure&lt;/h2&gt;
&lt;p&gt;Production systems are budget systems.&lt;/p&gt;
&lt;p&gt;If you don’t set explicit budgets, your MCP server will eventually allocate them for you via outages.&lt;/p&gt;
&lt;h3 id="budget-checklist"&gt;Budget checklist&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Server timeouts&lt;/strong&gt; (header read, request read, write, idle)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Request body caps&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Outbound timeouts&lt;/strong&gt; to dependencies&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Concurrency caps&lt;/strong&gt; per tool and per tenant&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rate limits&lt;/strong&gt; per tenant and per identity&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Queue limits&lt;/strong&gt; (bounded channels) to avoid memory blowups&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Circuit breaking&lt;/strong&gt; for flaky downstream dependencies&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="go-server-timeouts-are-not-optional"&gt;Go: server timeouts are not optional&lt;/h3&gt;
&lt;p&gt;Go’s &lt;code&gt;net/http&lt;/code&gt; provides explicit server timeouts; leaving them at zero is a common footgun. [6][7]&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nx"&gt;srv&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Addr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;:8080&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// your MCP handler + middleware&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ReadHeaderTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ReadTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;WriteTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;IdleTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="nx"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;srv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ListenAndServe&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="go-propagate-cancellation-everywhere-with-context"&gt;Go: propagate cancellation everywhere with &lt;code&gt;context&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;context.Context&lt;/code&gt; is the backbone of “structured concurrency” in Go: deadlines and cancellation signals flow through your call stack. [8][9]&lt;/p&gt;
&lt;p&gt;Rule: &lt;strong&gt;every tool execution must accept a &lt;code&gt;context.Context&lt;/code&gt;&lt;/strong&gt;, and every outbound call must honor it.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;toolCall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ToolRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ToolResponse&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;cancel&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;WithTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;defer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;cancel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// ... outbound calls use ctx&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;integration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Do&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="go-per-tenant-rate-limiting-with-xtimerate"&gt;Go: per-tenant rate limiting with &lt;code&gt;x/time/rate&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;golang.org/x/time/rate&lt;/code&gt; implements a token bucket limiter. [9]&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;limiters&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;mu&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;sync&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Mutex&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Limiter&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;l&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;limiters&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Limiter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;mu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;defer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;mu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Unlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Limiter&lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;lim&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;lim&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// Example: 5 req/sec with bursts up to 10&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;lim&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;NewLimiter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;lim&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;lim&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;rateLimitMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lims&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;limiters&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Handler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Handler&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;HandlerFunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ResponseWriter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ident&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;mustIdentity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;lims&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ident&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;TenantID&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;Allow&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;rate limited&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;StatusTooManyRequests&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ServeHTTP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="backpressure-choose-a-policy"&gt;Backpressure: choose a policy&lt;/h3&gt;
&lt;p&gt;When you’re overloaded, you need a policy. Pick one explicitly:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Fail fast&lt;/strong&gt; with 429 / “busy” (simplest, safest)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Queue&lt;/strong&gt; with bounded depth (more complex; must cap memory)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Degrade&lt;/strong&gt; by disabling expensive tools first&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The “fail fast” approach is often correct for tool gateways.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="hardening-layer-4-safe-networking-and-ssrf-containment"&gt;Hardening layer 4: safe networking and SSRF containment&lt;/h2&gt;
&lt;p&gt;If any tool can fetch a user-provided URL or call a user-influenced endpoint, SSRF is on the table. [10]&lt;/p&gt;
&lt;h3 id="ssrf-containment-strategies-that-actually-work"&gt;SSRF containment strategies that actually work&lt;/h3&gt;
&lt;p&gt;OWASP’s SSRF guidance boils down to a few themes: don’t trust user-controlled URLs, use allowlists, and enforce network controls. [10]&lt;/p&gt;
&lt;p&gt;In practice, for MCP servers:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Prefer allowlists over blocklists.&lt;/strong&gt;&lt;br&gt;
“Only these domains” beats “block internal IPs.” Attackers are creative.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Resolve and validate IPs before dialing.&lt;/strong&gt;&lt;br&gt;
DNS can be weaponized. Validate the final destination IP (and re-validate on redirects).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Disable redirects or re-validate each hop.&lt;/strong&gt;&lt;br&gt;
Redirect chains are SSRF’s favorite tool.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Enforce egress policy at the network layer too.&lt;/strong&gt;&lt;br&gt;
Kubernetes NetworkPolicies / firewall rules are your last line of defense.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="go-pattern-an-outbound-http-client-with-strict-timeouts"&gt;Go pattern: an outbound HTTP client with strict timeouts&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;// whole request budget&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Transport&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Transport&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Proxy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ProxyFromEnvironment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;DialContext&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;net&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Dialer&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;KeepAlive&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nx"&gt;DialContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;TLSHandshakeTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ResponseHeaderTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ExpectContinueTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;MaxIdleConns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;IdleConnTimeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Then wrap URL validation around any request creation. Keep it boring and strict.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="hardening-layer-5-observability-without-leaking-secrets"&gt;Hardening layer 5: observability without leaking secrets&lt;/h2&gt;
&lt;p&gt;Telemetry is how you prove:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;you’re within budgets&lt;/li&gt;
&lt;li&gt;tools behave as expected&lt;/li&gt;
&lt;li&gt;failures are localized&lt;/li&gt;
&lt;li&gt;incidents can be diagnosed without “ssh and guess”&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But logging is also where teams accidentally leak sensitive data.&lt;/p&gt;
&lt;p&gt;OWASP’s logging guidance emphasizes logging that supports detection/response while avoiding sensitive data exposure. [11] Pair that with secrets management discipline. [12]&lt;/p&gt;
&lt;h3 id="what-to-measure-minimum-viable-mcp-telemetry"&gt;What to measure (minimum viable MCP telemetry)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Counters&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;tool_calls_total{tool, tenant, status}&lt;/li&gt;
&lt;li&gt;auth_failures_total{reason}&lt;/li&gt;
&lt;li&gt;rate_limited_total{tenant}&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Histograms&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;tool_latency_seconds{tool}&lt;/li&gt;
&lt;li&gt;outbound_latency_seconds{dependency}&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Gauges&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;in_flight_tool_calls{tool}&lt;/li&gt;
&lt;li&gt;queue_depth{tool}&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="trace-boundaries"&gt;Trace boundaries&lt;/h3&gt;
&lt;p&gt;Instrument:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;request → tool routing&lt;/li&gt;
&lt;li&gt;tool execution span&lt;/li&gt;
&lt;li&gt;downstream calls span&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;OpenTelemetry’s Go docs show how to add instrumentation and emit traces/metrics. [13]&lt;/p&gt;
&lt;h3 id="logging-rules-that-save-you-later"&gt;Logging rules that save you later&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Use structured logging (JSON).&lt;/li&gt;
&lt;li&gt;Add correlation IDs (trace IDs) to logs.&lt;/li&gt;
&lt;li&gt;Redact:
&lt;ul&gt;
&lt;li&gt;Authorization headers&lt;/li&gt;
&lt;li&gt;tokens&lt;/li&gt;
&lt;li&gt;cookies&lt;/li&gt;
&lt;li&gt;tool payload fields known to contain secrets&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Log &lt;em&gt;events&lt;/em&gt;, not raw payloads:
&lt;ul&gt;
&lt;li&gt;“tool X called”&lt;/li&gt;
&lt;li&gt;“resource Y read”&lt;/li&gt;
&lt;li&gt;“write operation requested (dry_run=true)”&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Audit logs&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For high-impact tools, write an append-only audit record:
&lt;ul&gt;
&lt;li&gt;who (identity)&lt;/li&gt;
&lt;li&gt;what (tool + parameters summary)&lt;/li&gt;
&lt;li&gt;when&lt;/li&gt;
&lt;li&gt;result (success/failure)&lt;/li&gt;
&lt;li&gt;plan_id / idempotency_key&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Audit logs should be treated as security data.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="hardening-layer-6-versioning-and-rollout-discipline"&gt;Hardening layer 6: versioning and rollout discipline&lt;/h2&gt;
&lt;p&gt;MCP uses string-based version identifiers like &lt;code&gt;YYYY-MM-DD&lt;/code&gt; to represent the last date of backwards-incompatible changes. [4]&lt;/p&gt;
&lt;p&gt;That’s helpful, but it doesn’t solve the operational problem:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;clients upgrade at different times&lt;/li&gt;
&lt;li&gt;schema changes drift&lt;/li&gt;
&lt;li&gt;hosts differ in which capabilities they support&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="practical-compatibility-rules"&gt;Practical compatibility rules&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Pin your server’s supported protocol version&lt;/strong&gt; and expose it in &lt;code&gt;health&lt;/code&gt; or diagnostics.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Add contract tests&lt;/strong&gt; that run against:
&lt;ul&gt;
&lt;li&gt;one “current” client&lt;/li&gt;
&lt;li&gt;one “previous” client version&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Support additive changes&lt;/strong&gt; first:
&lt;ul&gt;
&lt;li&gt;new tools&lt;/li&gt;
&lt;li&gt;new optional fields&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Use feature flags for risky tools.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="rollout-like-a-platform-team"&gt;Rollout like a platform team&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Canaries for remote servers&lt;/li&gt;
&lt;li&gt;“Shadow mode” for new tools (log what would happen)&lt;/li&gt;
&lt;li&gt;Slow ramp with budget monitoring&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="a-production-checklist"&gt;A production checklist&lt;/h2&gt;
&lt;p&gt;If you’re building (or inheriting) an MCP server, run this checklist:&lt;/p&gt;
&lt;h3 id="safety"&gt;Safety&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Tool contracts are structured (no free-form “do anything” strings).&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Every tool has a safe default (&lt;code&gt;dry_run=true&lt;/code&gt;, &lt;code&gt;limit&lt;/code&gt; required, etc.).&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Destructive tools require a plan/apply step (or explicit confirmation gates).&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Tool inputs are validated and bounded (length, ranges, enums).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="identity--access"&gt;Identity &amp;amp; access&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Remote transport requires authentication and per-tool authorization.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Tokens are short-lived and rotated; secrets are not in source control. [12]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Tenant identity is enforced at every access point (not “best effort”).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="budgets--resilience"&gt;Budgets &amp;amp; resilience&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; HTTP server timeouts are configured. [6][7]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Outbound clients have timeouts and connection limits.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Rate limiting exists per tenant/identity. [9]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Concurrency caps exist per tool; overload behavior is explicit (fail fast / queue).&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Retries are bounded and idempotent where side effects exist.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="networking"&gt;Networking&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; URL fetch tools have allowlists and SSRF protections. [10]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Redirect policies are explicit (disabled or re-validated).&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Egress is constrained at the network layer (not only in code).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="observability"&gt;Observability&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Metrics cover tool calls, latency, errors, and rate limiting.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Tracing exists across tool execution and downstream calls. [13]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Logs are structured, correlated, and redacted. [11]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Audit logging exists for high-impact tools.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="operations"&gt;Operations&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Health checks and readiness checks exist.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Configuration is explicit and validated on startup.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Versioning strategy is documented and tested. [4]&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;Model Context Protocol (MCP) Specification (version 2025-11-25): &lt;a href="https://modelcontextprotocol.io/specification/2025-11-25" target="_blank" rel="noopener noreferrer"&gt;https://modelcontextprotocol.io/specification/2025-11-25&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;MCP Architecture Overview (participants, transports, concepts): &lt;a href="https://modelcontextprotocol.io/docs/learn/architecture" target="_blank" rel="noopener noreferrer"&gt;https://modelcontextprotocol.io/docs/learn/architecture&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;MCP Transport details (Streamable HTTP transport overview): &lt;a href="https://modelcontextprotocol.io/specification/2025-03-26/basic/transports" target="_blank" rel="noopener noreferrer"&gt;https://modelcontextprotocol.io/specification/2025-03-26/basic/transports&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;MCP Versioning: &lt;a href="https://modelcontextprotocol.io/specification/versioning" target="_blank" rel="noopener noreferrer"&gt;https://modelcontextprotocol.io/specification/versioning&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;JSON-RPC 2.0 Specification: &lt;a href="https://www.jsonrpc.org/specification" target="_blank" rel="noopener noreferrer"&gt;https://www.jsonrpc.org/specification&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Go &lt;code&gt;net/http&lt;/code&gt; package documentation: &lt;a href="https://pkg.go.dev/net/http" target="_blank" rel="noopener noreferrer"&gt;https://pkg.go.dev/net/http&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Cloudflare: “The complete guide to Go net/http timeouts”: &lt;a href="https://blog.cloudflare.com/the-complete-guide-to-golang-net-http-timeouts/" target="_blank" rel="noopener noreferrer"&gt;https://blog.cloudflare.com/the-complete-guide-to-golang-net-http-timeouts/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Go &lt;code&gt;context&lt;/code&gt; package documentation: &lt;a href="https://pkg.go.dev/context" target="_blank" rel="noopener noreferrer"&gt;https://pkg.go.dev/context&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Go &lt;code&gt;x/time/rate&lt;/code&gt; documentation: &lt;a href="https://pkg.go.dev/golang.org/x/time/rate" target="_blank" rel="noopener noreferrer"&gt;https://pkg.go.dev/golang.org/x/time/rate&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;OWASP SSRF Prevention Cheat Sheet / SSRF category references:&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://cheatsheetseries.owasp.org/cheatsheets/Server_Side_Request_Forgery_Prevention_Cheat_Sheet.html" target="_blank" rel="noopener noreferrer"&gt;https://cheatsheetseries.owasp.org/cheatsheets/Server_Side_Request_Forgery_Prevention_Cheat_Sheet.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://owasp.org/Top10/2021/A10_2021-Server-Side_Request_Forgery_%28SSRF%29/" target="_blank" rel="noopener noreferrer"&gt;https://owasp.org/Top10/2021/A10_2021-Server-Side_Request_Forgery_%28SSRF%29/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="11"&gt;
&lt;li&gt;OWASP Logging Cheat Sheet (security-focused logging guidance): &lt;a href="https://cheatsheetseries.owasp.org/cheatsheets/Logging_Cheat_Sheet.html" target="_blank" rel="noopener noreferrer"&gt;https://cheatsheetseries.owasp.org/cheatsheets/Logging_Cheat_Sheet.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Secrets management guidance:&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;OWASP Secrets Management Cheat Sheet: &lt;a href="https://cheatsheetseries.owasp.org/cheatsheets/Secrets_Management_Cheat_Sheet.html" target="_blank" rel="noopener noreferrer"&gt;https://cheatsheetseries.owasp.org/cheatsheets/Secrets_Management_Cheat_Sheet.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Kubernetes “Good practices for Kubernetes Secrets”: &lt;a href="https://kubernetes.io/docs/concepts/security/secrets-good-practices/" target="_blank" rel="noopener noreferrer"&gt;https://kubernetes.io/docs/concepts/security/secrets-good-practices/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="13"&gt;
&lt;li&gt;OpenTelemetry Go instrumentation docs: &lt;a href="https://opentelemetry.io/docs/languages/go/instrumentation/" target="_blank" rel="noopener noreferrer"&gt;https://opentelemetry.io/docs/languages/go/instrumentation/&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;</content:encoded></item><item><title>Chapter 9: Security &amp; Sensitive Data: Sanitize, Don't Paste Secrets</title><link>https://roygabriel.dev/blog/llm-development-guide/08-security-sensitive-data/</link><pubDate>Thu, 29 Jan 2026 21:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/llm-development-guide/08-security-sensitive-data/</guid><description>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 9 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/07-choosing-the-right-model/"&gt;Chapter 8: Choosing the Right Model: Capability Tiers, Not Hype&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/09-stop-rules-pitfalls/"&gt;Chapter 10: Stop Rules + Pitfalls: When to Upgrade, Bail, or Go Manual&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to use LLMs without doing something reckless:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Apply a concrete &amp;ldquo;never paste&amp;rdquo; list.&lt;/li&gt;
&lt;li&gt;Sanitize code, config, and logs into safe examples.&lt;/li&gt;
&lt;li&gt;Add a verification step so you don&amp;rsquo;t ship secrets.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Assume anything you paste could be logged or retained.&lt;/li&gt;
&lt;li&gt;If you wouldn&amp;rsquo;t publish it publicly, don&amp;rsquo;t paste it.&lt;/li&gt;
&lt;li&gt;Replace real values with placeholders.&lt;/li&gt;
&lt;li&gt;Sanitize logs aggressively.&lt;/li&gt;
&lt;li&gt;Verify your workspace for leaked secrets before you commit.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#the-core-principle"&gt;The core principle&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#never-paste-list"&gt;Never paste list&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#sanitization-patterns"&gt;Sanitization patterns&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#failure-modes"&gt;Failure modes&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="the-core-principle"&gt;The core principle&lt;/h2&gt;
&lt;p&gt;Assume anything you send to an LLM could be stored.&lt;/p&gt;</description><content:encoded>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 9 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/07-choosing-the-right-model/"&gt;Chapter 8: Choosing the Right Model: Capability Tiers, Not Hype&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/09-stop-rules-pitfalls/"&gt;Chapter 10: Stop Rules + Pitfalls: When to Upgrade, Bail, or Go Manual&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to use LLMs without doing something reckless:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Apply a concrete &amp;ldquo;never paste&amp;rdquo; list.&lt;/li&gt;
&lt;li&gt;Sanitize code, config, and logs into safe examples.&lt;/li&gt;
&lt;li&gt;Add a verification step so you don&amp;rsquo;t ship secrets.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Assume anything you paste could be logged or retained.&lt;/li&gt;
&lt;li&gt;If you wouldn&amp;rsquo;t publish it publicly, don&amp;rsquo;t paste it.&lt;/li&gt;
&lt;li&gt;Replace real values with placeholders.&lt;/li&gt;
&lt;li&gt;Sanitize logs aggressively.&lt;/li&gt;
&lt;li&gt;Verify your workspace for leaked secrets before you commit.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#the-core-principle"&gt;The core principle&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#never-paste-list"&gt;Never paste list&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#sanitization-patterns"&gt;Sanitization patterns&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#failure-modes"&gt;Failure modes&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="the-core-principle"&gt;The core principle&lt;/h2&gt;
&lt;p&gt;Assume anything you send to an LLM could be stored.&lt;/p&gt;
&lt;p&gt;Even with enterprise offerings, policies change. Check vendor policy as of 2026-02-14 (and your organization&amp;rsquo;s approved tools list) before using any tool with internal data.&lt;/p&gt;
&lt;h2 id="never-paste-list"&gt;Never paste list&lt;/h2&gt;
&lt;p&gt;Do not paste:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Credentials: API keys, tokens, passwords, private keys.&lt;/li&gt;
&lt;li&gt;PII: customer names, emails, addresses, health data.&lt;/li&gt;
&lt;li&gt;Production data: real records, full dumps, support tickets.&lt;/li&gt;
&lt;li&gt;Security configs: firewall rules, IAM policies, internal IPs.&lt;/li&gt;
&lt;li&gt;Proprietary secrets: unreleased product details, trade secrets.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Use the &amp;ldquo;Would I post this publicly?&amp;rdquo; test.&lt;/p&gt;
&lt;h2 id="sanitization-patterns"&gt;Sanitization patterns&lt;/h2&gt;
&lt;p&gt;Replace sensitive values with descriptive placeholders.&lt;/p&gt;
&lt;p&gt;Go example:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-go" data-lang="go"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;// Before (do not paste)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="c1"&gt;// db, err := sql.Open(&amp;#34;postgres&amp;#34;, &amp;#34;host=prod-db.internal user=admin password=SuperSecret123 dbname=customers&amp;#34;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="c1"&gt;// After (safe to paste)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;#34;postgres&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;#34;host=DATABASE_HOST user=DATABASE_USER password=DATABASE_PASSWORD dbname=DATABASE_NAME&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;YAML example:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c"&gt;# Before (do not paste)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="c"&gt;# data:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="c"&gt;# api-key: YWN0dWFsLWFwaS1rZXktaGVyZQ==&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="c"&gt;# After (safe to paste)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;&lt;/span&gt;&lt;span class="nt"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;api-key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;&amp;lt;BASE64_ENCODED_API_KEY&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;webhook-secret&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;&amp;lt;BASE64_ENCODED_WEBHOOK_SECRET&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Logs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Remove emails.&lt;/li&gt;
&lt;li&gt;Replace internal hostnames.&lt;/li&gt;
&lt;li&gt;Replace IPs with documentation ranges (&lt;code&gt;192.0.2.0/24&lt;/code&gt;, &lt;code&gt;198.51.100.0/24&lt;/code&gt;, &lt;code&gt;203.0.113.0/24&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="verification"&gt;Verification&lt;/h2&gt;
&lt;p&gt;Before you paste or commit, search your workspace for obvious secret patterns.&lt;/p&gt;
&lt;p&gt;These commands are noisy, but useful:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# High-signal patterns.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;rg -n &lt;span class="s2"&gt;&amp;#34;(AKIA[0-9A-Z]{16}|BEGIN (RSA|OPENSSH) PRIVATE KEY|xox[baprs]-|ghp_[A-Za-z0-9]{36})&amp;#34;&lt;/span&gt; . &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Common key/value names.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;rg -n &lt;span class="s2"&gt;&amp;#34;(?i)(api[_-]?key|secret|token|password)\s*[:=]&amp;#34;&lt;/span&gt; . &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Emails (often indicates logs or real data got copied).&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;rg -n &lt;span class="s2"&gt;&amp;#34;[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}&amp;#34;&lt;/span&gt; . &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Check staged changes specifically.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;git diff --cached
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Expected results:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;No obvious credentials or private keys appear in diffs.&lt;/li&gt;
&lt;li&gt;If matches exist, sanitize and regenerate the example.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="failure-modes"&gt;Failure modes&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Sharing real logs that contain tokens in URLs.&lt;/li&gt;
&lt;li&gt;Copying a Kubernetes &lt;code&gt;Secret&lt;/code&gt; verbatim.&lt;/li&gt;
&lt;li&gt;Letting an IDE plugin send your whole file without noticing.&lt;/li&gt;
&lt;li&gt;Assuming &amp;ldquo;enterprise&amp;rdquo; means &amp;ldquo;no risk&amp;rdquo; without verifying current policy.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Continue -&amp;gt; &lt;a href="https://roygabriel.dev/blog/llm-development-guide/09-stop-rules-pitfalls/"&gt;Chapter 10: Stop Rules + Pitfalls: When to Upgrade, Bail, or Go Manual&lt;/a&gt;
&lt;/p&gt;</content:encoded></item><item><title>Chapter 8: Choosing the Right Model: Capability Tiers, Not Hype</title><link>https://roygabriel.dev/blog/llm-development-guide/07-choosing-the-right-model/</link><pubDate>Tue, 27 Jan 2026 19:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/llm-development-guide/07-choosing-the-right-model/</guid><description>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 8 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/07-phase-documents-implementation-prompts/"&gt;Chapter 7: Large Projects with Phase Documents + Implementation Prompts&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/08-security-sensitive-data/"&gt;Chapter 9: Security &amp;amp; Sensitive Data: Sanitize, Don&amp;rsquo;t Paste Secrets&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to pick a model and interface deliberately:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use capability tiers instead of memorizing brand names.&lt;/li&gt;
&lt;li&gt;Upgrade quickly when quality is the bottleneck.&lt;/li&gt;
&lt;li&gt;Avoid wasting flagship models on structured boilerplate.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Treat model choice as a cost-of-mistakes problem.&lt;/li&gt;
&lt;li&gt;Use flagship models for planning, debugging, and high-stakes decisions.&lt;/li&gt;
&lt;li&gt;Use mid-tier models for implementation with strong references.&lt;/li&gt;
&lt;li&gt;Use fast/cheap models for boilerplate and simple transformations.&lt;/li&gt;
&lt;li&gt;If you&amp;rsquo;ve spent ~10 minutes fighting output quality, upgrade or shrink scope.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="as-of-note"&gt;As-of note&lt;/h2&gt;
&lt;p&gt;As of 2026-02-14, model names, pricing, and product policies change frequently. Prefer tier-based guidance, and verify vendor policies directly before using tools with sensitive data.&lt;/p&gt;</description><content:encoded>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 8 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/07-phase-documents-implementation-prompts/"&gt;Chapter 7: Large Projects with Phase Documents + Implementation Prompts&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/08-security-sensitive-data/"&gt;Chapter 9: Security &amp;amp; Sensitive Data: Sanitize, Don&amp;rsquo;t Paste Secrets&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to pick a model and interface deliberately:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use capability tiers instead of memorizing brand names.&lt;/li&gt;
&lt;li&gt;Upgrade quickly when quality is the bottleneck.&lt;/li&gt;
&lt;li&gt;Avoid wasting flagship models on structured boilerplate.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Treat model choice as a cost-of-mistakes problem.&lt;/li&gt;
&lt;li&gt;Use flagship models for planning, debugging, and high-stakes decisions.&lt;/li&gt;
&lt;li&gt;Use mid-tier models for implementation with strong references.&lt;/li&gt;
&lt;li&gt;Use fast/cheap models for boilerplate and simple transformations.&lt;/li&gt;
&lt;li&gt;If you&amp;rsquo;ve spent ~10 minutes fighting output quality, upgrade or shrink scope.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="as-of-note"&gt;As-of note&lt;/h2&gt;
&lt;p&gt;As of 2026-02-14, model names, pricing, and product policies change frequently. Prefer tier-based guidance, and verify vendor policies directly before using tools with sensitive data.&lt;/p&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#the-capability-tiers"&gt;The capability tiers&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#task-to-tier-mapping"&gt;Task-to-tier mapping&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#red-flags-upgrade-now"&gt;Red flags: upgrade now&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-selection-checklist"&gt;A selection checklist&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="the-capability-tiers"&gt;The capability tiers&lt;/h2&gt;
&lt;p&gt;Think in tiers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Flagship: best reasoning and instruction-following for novel work.&lt;/li&gt;
&lt;li&gt;Mid-tier: strong general performance for structured work with references.&lt;/li&gt;
&lt;li&gt;Fast/cheap: good for simple tasks, higher error rate on complex reasoning.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This framing stays useful even when names change.&lt;/p&gt;
&lt;h2 id="task-to-tier-mapping"&gt;Task-to-tier mapping&lt;/h2&gt;
&lt;p&gt;Use flagship for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Planning and architecture.&lt;/li&gt;
&lt;li&gt;Debugging complex failures.&lt;/li&gt;
&lt;li&gt;Security-sensitive review.&lt;/li&gt;
&lt;li&gt;Anything where mistakes are expensive.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Use mid-tier for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Implementation that follows existing patterns.&lt;/li&gt;
&lt;li&gt;Refactors with clear examples.&lt;/li&gt;
&lt;li&gt;Writing tests when the behavior is already defined.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Use fast/cheap for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Syntax lookups.&lt;/li&gt;
&lt;li&gt;Boilerplate you will review.&lt;/li&gt;
&lt;li&gt;Mechanical transformations.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="red-flags-upgrade-now"&gt;Red flags: upgrade now&lt;/h2&gt;
&lt;p&gt;Upgrade when you see:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The model repeats the same misunderstanding.&lt;/li&gt;
&lt;li&gt;Output ignores constraints.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Looks right&amp;rdquo; code fails in tests.&lt;/li&gt;
&lt;li&gt;You are on the third prompt iteration for the same unit.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The cheapest model is the one that gets you to a correct verified change with the least total time.&lt;/p&gt;
&lt;h2 id="a-selection-checklist"&gt;A selection checklist&lt;/h2&gt;
&lt;p&gt;Before you start, answer:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Is this novel or pattern-following?&lt;/li&gt;
&lt;li&gt;Do I have reference implementations?&lt;/li&gt;
&lt;li&gt;What is the cost of mistakes?&lt;/li&gt;
&lt;li&gt;Is this structured or ambiguous?&lt;/li&gt;
&lt;li&gt;Am I debugging or implementing?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If uncertain:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Start with flagship for planning.&lt;/li&gt;
&lt;li&gt;Drop to mid-tier once you have a stable pattern and good references.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="verification"&gt;Verification&lt;/h2&gt;
&lt;p&gt;A practical way to keep this from being hand-wavy is to force a written decision per phase.&lt;/p&gt;
&lt;p&gt;Create a small note file per task:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;mkdir -p work-notes
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cat &amp;gt; work-notes/model-selection.md &lt;span class="s"&gt;&amp;lt;&amp;lt;&amp;#39;MD&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;# Model Selection (Per Task)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Task
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;&amp;lt;What are we doing?&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Risk
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- Cost of mistakes:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- Can I review the output competently?
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## References
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- &amp;lt;Paths to reference implementations&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Model decision
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- Tier: &amp;lt;flagship|mid-tier|fast&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- Why:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- When to upgrade:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Outcome
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- Did we upgrade?
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- What broke / what worked:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;MD&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Expected result:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You can justify the model choice in one minute.&lt;/li&gt;
&lt;li&gt;You have a trigger for upgrading when output quality is the bottleneck.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Continue -&amp;gt; &lt;a href="https://roygabriel.dev/blog/llm-development-guide/08-security-sensitive-data/"&gt;Chapter 9: Security &amp;amp; Sensitive Data: Sanitize, Don&amp;rsquo;t Paste Secrets&lt;/a&gt;
&lt;/p&gt;</content:encoded></item><item><title>Chapter 7: Large Projects with Phase Documents + Implementation Prompts</title><link>https://roygabriel.dev/blog/llm-development-guide/07-phase-documents-implementation-prompts/</link><pubDate>Mon, 26 Jan 2026 20:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/llm-development-guide/07-phase-documents-implementation-prompts/</guid><description>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 7 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/06-scaling-the-workflow/"&gt;Chapter 6: Scaling the Workflow: Phases, Parallelism, Hygiene&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/07-choosing-the-right-model/"&gt;Chapter 8: Choosing the Right Model: Capability Tiers, Not Hype&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to run large, multi-phase delivery with less drift by introducing two explicit artifacts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A phase specification document that defines scope, dependencies, files, and exit criteria.&lt;/li&gt;
&lt;li&gt;A phase implementation prompt document that defines the prompt-by-prompt execution contract.&lt;/li&gt;
&lt;li&gt;A repeatable operating cadence for execution, verification, and commits.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Large projects fail when a single prompt tries to carry the whole implementation plan.&lt;/li&gt;
&lt;li&gt;Use one phase spec and one implementation-prompt file per sub-phase.&lt;/li&gt;
&lt;li&gt;Execute prompts sequentially; do not continue if build/vet/test gates fail.&lt;/li&gt;
&lt;li&gt;Keep context loading explicit for each prompt.&lt;/li&gt;
&lt;li&gt;For copy/paste templates, use &lt;a href="https://roygabriel.dev/blog/llm-development-guide/12-templates-checklists/"&gt;Chapter 13: Templates + Checklists: The Copy/Paste Kit&lt;/a&gt;
.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#why-this-pattern-exists"&gt;Why this pattern exists&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#the-two-document-system"&gt;The two-document system&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#worked-example-a-multi-phase-engineering-initiative"&gt;Worked example: a multi-phase engineering initiative&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#execution-protocol-for-prompt-files"&gt;Execution protocol for prompt files&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#failure-modes"&gt;Failure modes&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="why-this-pattern-exists"&gt;Why this pattern exists&lt;/h2&gt;
&lt;p&gt;For a one-day task, a plan plus one execution prompt is usually enough.&lt;/p&gt;</description><content:encoded>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 7 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/06-scaling-the-workflow/"&gt;Chapter 6: Scaling the Workflow: Phases, Parallelism, Hygiene&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/07-choosing-the-right-model/"&gt;Chapter 8: Choosing the Right Model: Capability Tiers, Not Hype&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to run large, multi-phase delivery with less drift by introducing two explicit artifacts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A phase specification document that defines scope, dependencies, files, and exit criteria.&lt;/li&gt;
&lt;li&gt;A phase implementation prompt document that defines the prompt-by-prompt execution contract.&lt;/li&gt;
&lt;li&gt;A repeatable operating cadence for execution, verification, and commits.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Large projects fail when a single prompt tries to carry the whole implementation plan.&lt;/li&gt;
&lt;li&gt;Use one phase spec and one implementation-prompt file per sub-phase.&lt;/li&gt;
&lt;li&gt;Execute prompts sequentially; do not continue if build/vet/test gates fail.&lt;/li&gt;
&lt;li&gt;Keep context loading explicit for each prompt.&lt;/li&gt;
&lt;li&gt;For copy/paste templates, use &lt;a href="https://roygabriel.dev/blog/llm-development-guide/12-templates-checklists/"&gt;Chapter 13: Templates + Checklists: The Copy/Paste Kit&lt;/a&gt;
.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#why-this-pattern-exists"&gt;Why this pattern exists&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#the-two-document-system"&gt;The two-document system&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#worked-example-a-multi-phase-engineering-initiative"&gt;Worked example: a multi-phase engineering initiative&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#execution-protocol-for-prompt-files"&gt;Execution protocol for prompt files&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#failure-modes"&gt;Failure modes&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="why-this-pattern-exists"&gt;Why this pattern exists&lt;/h2&gt;
&lt;p&gt;For a one-day task, a plan plus one execution prompt is usually enough.&lt;/p&gt;
&lt;p&gt;For multi-week work, that breaks down:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Context gets too large and detail gets dropped.&lt;/li&gt;
&lt;li&gt;Sessions diverge when constraints are implied instead of written.&lt;/li&gt;
&lt;li&gt;Verification becomes optional instead of required.&lt;/li&gt;
&lt;li&gt;Commits become large and hard to review.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The fix is to treat phase docs and implementation prompt docs as first-class project artifacts.&lt;/p&gt;
&lt;h2 id="the-two-document-system"&gt;The two-document system&lt;/h2&gt;
&lt;p&gt;For each sub-phase, create two files.&lt;/p&gt;
&lt;h3 id="1-phase-spec-document"&gt;1) Phase spec document&lt;/h3&gt;
&lt;p&gt;Purpose: define what this sub-phase must accomplish and how completion is validated.&lt;/p&gt;
&lt;p&gt;Typical sections:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Status, dependency, and migration notes.&lt;/li&gt;
&lt;li&gt;Design rationale (why this slice exists now).&lt;/li&gt;
&lt;li&gt;Tasks grouped by prompt number.&lt;/li&gt;
&lt;li&gt;Files: new, modified, and referenced-only.&lt;/li&gt;
&lt;li&gt;Exit criteria with concrete commands and expected results.&lt;/li&gt;
&lt;li&gt;Progress notes placeholder.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="2-phase-implementation-prompt-document"&gt;2) Phase implementation prompt document&lt;/h3&gt;
&lt;p&gt;Purpose: define exactly how execution happens, prompt by prompt.&lt;/p&gt;
&lt;p&gt;Each prompt should include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Context files to load (small, explicit list).&lt;/li&gt;
&lt;li&gt;Task details: signatures, interfaces, constraints.&lt;/li&gt;
&lt;li&gt;Quality gates and required verification commands.&lt;/li&gt;
&lt;li&gt;Stop condition: do not proceed until the current prompt passes.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A useful pattern is to couple one prompt to one logical implementation unit.&lt;/p&gt;
&lt;h2 id="worked-example-a-multi-phase-engineering-initiative"&gt;Worked example: a multi-phase engineering initiative&lt;/h2&gt;
&lt;p&gt;Assume you are delivering a new runtime capability over six weeks.&lt;/p&gt;
&lt;p&gt;You split work into:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Phase A: contracts and types.&lt;/li&gt;
&lt;li&gt;Phase B: core implementation.&lt;/li&gt;
&lt;li&gt;Phase C: API and integration points.&lt;/li&gt;
&lt;li&gt;Phase D: tests and validation.&lt;/li&gt;
&lt;li&gt;Phase E: observability and rollout safety.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For &lt;code&gt;Phase B&lt;/code&gt;, your phase spec might look like this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-markdown" data-lang="markdown"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;# Phase B - Core Implementation
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Status
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;Planned
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Depends on
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;Phase A
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Design rationale
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;Phase B isolates core behavior behind the contracts from Phase A.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;This prevents API and infrastructure concerns from polluting the core logic.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Tasks
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### Prompt 1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; Implement core orchestration types and constructor.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### Prompt 2
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; Implement main execution method with deterministic error paths.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### Prompt 3
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; Add unit tests for success and failure branches.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Files
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### New
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; internal/core/runtime.go
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; internal/core/runtime_test.go
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### Modified
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; internal/core/types.go
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### Referenced (read-only)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; internal/contracts/interfaces.go
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Exit criteria
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; &lt;span class="sb"&gt;`go build ./internal/core/...`&lt;/span&gt; exits 0
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; &lt;span class="sb"&gt;`go vet ./internal/core/...`&lt;/span&gt; exits 0
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; &lt;span class="sb"&gt;`go test ./internal/core/...`&lt;/span&gt; exits 0
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; No unchecked returned errors
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Progress notes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now pair it with a &lt;code&gt;Phase B&lt;/code&gt; implementation prompt file:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-markdown" data-lang="markdown"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;# Phase B - Implementation Prompts
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Prompt 1 of 3: Runtime skeleton
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Context files to load:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; docs/phases/PHASEB.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; internal/contracts/interfaces.go
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; internal/core/types.go
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; README.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Task:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Create &lt;span class="sb"&gt;`internal/core/runtime.go`&lt;/span&gt; with constructor and public methods.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Constraints:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Do not change files outside listed scope.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Handle all returned errors explicitly.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Keep methods short enough to remain reviewable.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Verification:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`go build ./internal/core/...`&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`go vet ./internal/core/...`&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Stop rule:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Do not proceed to Prompt 2 until both commands pass.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This is intentionally boring. Boring is what scales.&lt;/p&gt;
&lt;h2 id="execution-protocol-for-prompt-files"&gt;Execution protocol for prompt files&lt;/h2&gt;
&lt;p&gt;Use the same cadence for every prompt in a sub-phase:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Load only listed context files.&lt;/li&gt;
&lt;li&gt;Execute exactly one prompt.&lt;/li&gt;
&lt;li&gt;Update work notes (decisions, assumptions, blockers, next step).&lt;/li&gt;
&lt;li&gt;Run required verification gates.&lt;/li&gt;
&lt;li&gt;Commit one logical unit.&lt;/li&gt;
&lt;li&gt;Move to the next prompt.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Suggested commit discipline:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;One commit per prompt when prompts are independent.&lt;/li&gt;
&lt;li&gt;One commit per tightly coupled prompt pair when separation creates broken intermediate states.&lt;/li&gt;
&lt;li&gt;Message format should state scope and intent clearly.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When prompt counts are high, add a completion table in work notes:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-markdown" data-lang="markdown"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Prompt progress
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;- [x]&lt;/span&gt; Prompt 1
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [x]&lt;/span&gt; Prompt 2
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; Prompt 3
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; Prompt 4
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="verification"&gt;Verification&lt;/h2&gt;
&lt;p&gt;You can verify this system is functioning with mechanical checks.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# All phase specs have exit criteria.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;rg -n &lt;span class="s2"&gt;&amp;#34;^## Exit criteria&amp;#34;&lt;/span&gt; docs/phases/PHASE*.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# All prompt docs define context loading and verification.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;rg -n &lt;span class="s2"&gt;&amp;#34;^Context files to load:|^Verification:&amp;#34;&lt;/span&gt; docs/phases/*-PROMPT.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Work notes track progression.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;rg -n &lt;span class="s2"&gt;&amp;#34;^## Prompt progress|^## Session log&amp;#34;&lt;/span&gt; work-notes &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Expected results:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Every phase spec has explicit exit criteria.&lt;/li&gt;
&lt;li&gt;Every prompt file defines context and verification.&lt;/li&gt;
&lt;li&gt;Session state is recoverable without re-explaining the whole project.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="failure-modes"&gt;Failure modes&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Phase docs describe architecture but skip executable gates.&lt;/li&gt;
&lt;li&gt;Prompt docs are too broad (&amp;ldquo;implement phase&amp;rdquo;) and lose determinism.&lt;/li&gt;
&lt;li&gt;Prompts proceed despite failing verification.&lt;/li&gt;
&lt;li&gt;Context file lists are bloated and include unrelated material.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If this starts happening, shrink prompt scope and tighten exit criteria before continuing.&lt;/p&gt;
&lt;p&gt;Continue -&amp;gt; &lt;a href="https://roygabriel.dev/blog/llm-development-guide/07-choosing-the-right-model/"&gt;Chapter 8: Choosing the Right Model: Capability Tiers, Not Hype&lt;/a&gt;
&lt;/p&gt;</content:encoded></item><item><title>Chapter 6: Scaling the Workflow: Phases, Parallelism, Hygiene</title><link>https://roygabriel.dev/blog/llm-development-guide/06-scaling-the-workflow/</link><pubDate>Sun, 25 Jan 2026 18:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/llm-development-guide/06-scaling-the-workflow/</guid><description>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 6 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/05-execution-loop-commit-discipline/"&gt;Chapter 5: The Execution Loop: Review Discipline + Commit Discipline&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/07-phase-documents-implementation-prompts/"&gt;Chapter 7: Large Projects with Phase Documents + Implementation Prompts&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to take the same workflow and scale it up without chaos:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Split work into phases that do not overlap on files.&lt;/li&gt;
&lt;li&gt;Run parallel sessions or agents safely.&lt;/li&gt;
&lt;li&gt;Decide when artifacts stay local vs. get committed.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Scale by splitting phases, not by writing bigger prompts.&lt;/li&gt;
&lt;li&gt;Keep phase files small (rule of thumb: under ~200 lines).&lt;/li&gt;
&lt;li&gt;Parallel work requires clean boundaries and explicit interfaces.&lt;/li&gt;
&lt;li&gt;Decide up front whether &lt;code&gt;plan/&lt;/code&gt;, &lt;code&gt;prompts/&lt;/code&gt;, &lt;code&gt;work-notes/&lt;/code&gt; live in git.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#when-to-sub-phase"&gt;When to sub-phase&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#parallel-execution-requirements"&gt;Parallel execution requirements&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#repository-hygiene"&gt;Repository hygiene&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#gotchas"&gt;Gotchas&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="when-to-sub-phase"&gt;When to sub-phase&lt;/h2&gt;
&lt;p&gt;Sub-phase when:&lt;/p&gt;</description><content:encoded>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 6 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/05-execution-loop-commit-discipline/"&gt;Chapter 5: The Execution Loop: Review Discipline + Commit Discipline&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/07-phase-documents-implementation-prompts/"&gt;Chapter 7: Large Projects with Phase Documents + Implementation Prompts&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to take the same workflow and scale it up without chaos:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Split work into phases that do not overlap on files.&lt;/li&gt;
&lt;li&gt;Run parallel sessions or agents safely.&lt;/li&gt;
&lt;li&gt;Decide when artifacts stay local vs. get committed.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Scale by splitting phases, not by writing bigger prompts.&lt;/li&gt;
&lt;li&gt;Keep phase files small (rule of thumb: under ~200 lines).&lt;/li&gt;
&lt;li&gt;Parallel work requires clean boundaries and explicit interfaces.&lt;/li&gt;
&lt;li&gt;Decide up front whether &lt;code&gt;plan/&lt;/code&gt;, &lt;code&gt;prompts/&lt;/code&gt;, &lt;code&gt;work-notes/&lt;/code&gt; live in git.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#when-to-sub-phase"&gt;When to sub-phase&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#parallel-execution-requirements"&gt;Parallel execution requirements&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#repository-hygiene"&gt;Repository hygiene&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#gotchas"&gt;Gotchas&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="when-to-sub-phase"&gt;When to sub-phase&lt;/h2&gt;
&lt;p&gt;Sub-phase when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A phase touches too many files.&lt;/li&gt;
&lt;li&gt;The phase cannot be verified independently.&lt;/li&gt;
&lt;li&gt;The phase depends on decisions that are not written down.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example layout:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;plan/
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; phase-1a-analysis.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; phase-1b-design.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; phase-2a-scaffold.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; phase-2b-core-impl.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; phase-3-validation.md
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Keep each phase &amp;ldquo;one session friendly&amp;rdquo;: small enough to complete (or at least checkpoint) in 1 to 2 sessions.&lt;/p&gt;
&lt;p&gt;At this point, many teams benefit from formalizing each sub-phase with two files: a phase spec and a phase implementation prompt file. The next chapter walks through that pattern in detail.&lt;/p&gt;
&lt;h2 id="parallel-execution-requirements"&gt;Parallel execution requirements&lt;/h2&gt;
&lt;p&gt;Parallel work is possible, but only if you make boundaries explicit.&lt;/p&gt;
&lt;p&gt;Requirements:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;No overlapping files between parallel phases.&lt;/li&gt;
&lt;li&gt;Explicit interfaces (types, APIs, data shapes) written down.&lt;/li&gt;
&lt;li&gt;A merge plan (who rebases, who resolves conflicts, how often you sync).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A simple pattern:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Phase A defines interfaces and contracts.&lt;/li&gt;
&lt;li&gt;Phase B implements with those interfaces.&lt;/li&gt;
&lt;li&gt;Phase C adds tests and validation.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="repository-hygiene"&gt;Repository hygiene&lt;/h2&gt;
&lt;p&gt;Decide whether artifacts are local scaffolding or part of the repo.&lt;/p&gt;
&lt;p&gt;Common default:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Keep &lt;code&gt;plan/&lt;/code&gt;, &lt;code&gt;prompts/&lt;/code&gt;, &lt;code&gt;work-notes/&lt;/code&gt; local (gitignored).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Commit them deliberately when they become:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Long-lived docs.&lt;/li&gt;
&lt;li&gt;Reusable templates.&lt;/li&gt;
&lt;li&gt;Onboarding material.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you commit them, consider moving to &lt;code&gt;docs/&lt;/code&gt; and editing for humans.&lt;/p&gt;
&lt;h2 id="verification"&gt;Verification&lt;/h2&gt;
&lt;p&gt;These checks help you catch &amp;ldquo;phases are too big&amp;rdquo; early:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Find phase files that are getting too large.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# (This uses line count as a blunt proxy.)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;find plan -type f -name &lt;span class="s1"&gt;&amp;#39;*.md&amp;#39;&lt;/span&gt; -maxdepth &lt;span class="m"&gt;2&lt;/span&gt; -print0 &lt;span class="p"&gt;|&lt;/span&gt; xargs -0 wc -l &lt;span class="p"&gt;|&lt;/span&gt; sort -n
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Find prompts that do not reference work notes.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;rg -n &lt;span class="s2"&gt;&amp;#34;work-notes/&amp;#34;&lt;/span&gt; prompts &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Find phases that might overlap on files (manual review).&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Start by listing deliverables per phase in each prompt doc.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;rg -n &lt;span class="s2"&gt;&amp;#34;^## Deliverables&amp;#34;&lt;/span&gt; -n prompts
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="gotchas"&gt;Gotchas&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Parallelization without clean boundaries just creates merge conflicts faster.&lt;/li&gt;
&lt;li&gt;If you don&amp;rsquo;t define interfaces early, later phases stall.&lt;/li&gt;
&lt;li&gt;If artifacts are committed, treat them like code: review, version, maintain.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Continue -&amp;gt; &lt;a href="https://roygabriel.dev/blog/llm-development-guide/07-phase-documents-implementation-prompts/"&gt;Chapter 7: Large Projects with Phase Documents + Implementation Prompts&lt;/a&gt;
&lt;/p&gt;</content:encoded></item><item><title>When Management Layers Become Latency</title><link>https://roygabriel.dev/blog/when-management-layers-become-latency/</link><pubDate>Sat, 24 Jan 2026 10:30:00 -0500</pubDate><guid>https://roygabriel.dev/blog/when-management-layers-become-latency/</guid><description>&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;Note on examples:&lt;/strong&gt; The scenarios below are &lt;strong&gt;anonymized composites&lt;/strong&gt;. This isn&amp;rsquo;t &amp;ldquo;management bad.&amp;rdquo;
Good management is an accelerator. The problem is when management becomes &lt;strong&gt;layers of translation&lt;/strong&gt; between reality and decisions.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;In production systems, adding hops between a request and a response increases latency, failure modes, and debugging time.&lt;/p&gt;
&lt;p&gt;Organizations behave the same way.&lt;/p&gt;
&lt;p&gt;When engineering work flows through too many intermediary layers - tech leads, scrum masters, managers, senior managers, project managers, directors, senior directors, VPs, and beyond - the organization starts to exhibit the same symptoms as an over-proxied network:&lt;/p&gt;</description><content:encoded>
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;Note on examples:&lt;/strong&gt; The scenarios below are &lt;strong&gt;anonymized composites&lt;/strong&gt;. This isn&amp;rsquo;t &amp;ldquo;management bad.&amp;rdquo;
Good management is an accelerator. The problem is when management becomes &lt;strong&gt;layers of translation&lt;/strong&gt; between reality and decisions.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;In production systems, adding hops between a request and a response increases latency, failure modes, and debugging time.&lt;/p&gt;
&lt;p&gt;Organizations behave the same way.&lt;/p&gt;
&lt;p&gt;When engineering work flows through too many intermediary layers - tech leads, scrum masters, managers, senior managers, project managers, directors, senior directors, VPs, and beyond - the organization starts to exhibit the same symptoms as an over-proxied network:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;long lead times&lt;/li&gt;
&lt;li&gt;lost context (&amp;ldquo;telephone game&amp;rdquo; requirements)&lt;/li&gt;
&lt;li&gt;local optimization (everyone looks busy; value doesn&amp;rsquo;t move)&lt;/li&gt;
&lt;li&gt;coordination overhead that scales faster than delivery&lt;/li&gt;
&lt;li&gt;engineers feeling like nothing they build reaches production&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The painful part is that the org can look &lt;strong&gt;healthy&lt;/strong&gt; on paper (status is green, roadmaps are full) while the product fails to meet real expectations.&lt;/p&gt;
&lt;p&gt;This article is about the mechanics behind that failure - and the replacement patterns that restore flow.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Layers create handoffs.&lt;/strong&gt; Handoffs create queues. Queues create lead time.&lt;/li&gt;
&lt;li&gt;More roles don&amp;rsquo;t automatically increase throughput; coordination cost can dominate (Brooks&amp;rsquo;s Law). [6]&lt;/li&gt;
&lt;li&gt;Fast flow requires &lt;strong&gt;end-to-end ownership&lt;/strong&gt; with minimal handoffs (stream-aligned teams). [3][4]&lt;/li&gt;
&lt;li&gt;Measure outcomes at the system level (DORA metrics), not &amp;ldquo;activity&amp;rdquo; (story points, number of meetings). [1]&lt;/li&gt;
&lt;li&gt;Don&amp;rsquo;t turn metrics into targets (Goodhart&amp;rsquo;s Law). [7]&lt;/li&gt;
&lt;li&gt;Burnout often rises when delivery is painful and risky; improving delivery capability predicts lower burnout. [2][8]&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="contents"&gt;Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#pattern-1-translation-layers-replace-direct-truth"&gt;Pattern 1: Translation layers replace direct truth&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#pattern-2-status-becomes-the-work"&gt;Pattern 2: Status becomes the work&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#pattern-3-more-people-is-treated-like-a-throughput-solution"&gt;Pattern 3: &amp;ldquo;More people&amp;rdquo; is treated like a throughput solution&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#pattern-4-projectization-and-temporary-teams"&gt;Pattern 4: Projectization and temporary teams&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#pattern-5-governance-by-meeting-instead-of-guardrail"&gt;Pattern 5: Governance by meeting instead of guardrail&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#pattern-6-metrics-as-targets"&gt;Pattern 6: Metrics as targets&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#pattern-7-engineers-are-abstracted-away-from-production"&gt;Pattern 7: Engineers are abstracted away from production&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#replacement-patterns-that-work"&gt;Replacement patterns that work&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification-how-you-know-the-org-is-healing"&gt;Verification: how you know the org is healing&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-practical-checklist"&gt;A practical checklist&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#references"&gt;References&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="pattern-1-translation-layers-replace-direct-truth"&gt;Pattern 1: Translation layers replace direct truth&lt;/h2&gt;
&lt;h3 id="what-it-looks-like"&gt;What it looks like&lt;/h3&gt;
&lt;p&gt;A customer need or operational pain moves through a chain:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;customer -&amp;gt; product -&amp;gt; program -&amp;gt; project -&amp;gt; delivery manager -&amp;gt; engineering manager -&amp;gt; tech lead -&amp;gt; engineers&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;By the time it arrives at the team, it&amp;rsquo;s been translated multiple times and often loses:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the actual user story&lt;/li&gt;
&lt;li&gt;the constraints&lt;/li&gt;
&lt;li&gt;the real priority&lt;/li&gt;
&lt;li&gt;the &amp;ldquo;why&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="why-it-exists"&gt;Why it exists&lt;/h3&gt;
&lt;p&gt;Layering feels safe:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;fewer people &amp;ldquo;bother&amp;rdquo; engineers&lt;/li&gt;
&lt;li&gt;leaders get curated information&lt;/li&gt;
&lt;li&gt;decision makers see clean narratives&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="the-hidden-tax"&gt;The hidden tax&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Misalignment becomes normal.&lt;/li&gt;
&lt;li&gt;Engineers build the wrong thing &lt;em&gt;efficiently&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;Product expectations aren&amp;rsquo;t met, not because engineers can&amp;rsquo;t build - but because the input signal is degraded.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="the-replacement-pattern"&gt;The replacement pattern&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Shorten the feedback loop.&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Ensure teams have direct access to:&lt;/li&gt;
&lt;li&gt;customer signals (support tickets, usage, interviews)&lt;/li&gt;
&lt;li&gt;operational signals (incidents, latency, error budgets)&lt;/li&gt;
&lt;li&gt;Make the &amp;ldquo;why&amp;rdquo; non-optional: put it in the ticket, the PRD, and the kickoff.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;If a team can&amp;rsquo;t explain &amp;ldquo;why this exists,&amp;rdquo; it shouldn&amp;rsquo;t ship yet.&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="pattern-2-status-becomes-the-work"&gt;Pattern 2: Status becomes the work&lt;/h2&gt;
&lt;h3 id="what-it-looks-like-1"&gt;What it looks like&lt;/h3&gt;
&lt;p&gt;Organizations that struggle to ship often compensate with:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;more meetings&lt;/li&gt;
&lt;li&gt;more dashboards&lt;/li&gt;
&lt;li&gt;more decks&lt;/li&gt;
&lt;li&gt;more &amp;ldquo;alignment sessions&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The output looks like progress, but the production system doesn&amp;rsquo;t change.&lt;/p&gt;
&lt;h3 id="why-it-exists-1"&gt;Why it exists&lt;/h3&gt;
&lt;p&gt;When uncertainty is high, visibility is comforting.&lt;/p&gt;
&lt;h3 id="the-hidden-tax-1"&gt;The hidden tax&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Attention becomes scarce.&lt;/li&gt;
&lt;li&gt;Engineers fragment into &amp;ldquo;meeting responders.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Work becomes multi-tasked across too many initiatives (WIP explosion).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="the-replacement-pattern-1"&gt;The replacement pattern&lt;/h3&gt;
&lt;p&gt;Reduce status overhead by making &lt;strong&gt;the system visible&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CI/CD dashboards&lt;/li&gt;
&lt;li&gt;production telemetry&lt;/li&gt;
&lt;li&gt;an engineering scorecard based on system outcomes (not activity)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;DORA&amp;rsquo;s metrics are widely used as system-level indicators for delivery performance: deployment frequency, lead time, change failure rate, and time to restore service. [1]&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="pattern-3-more-people-is-treated-like-a-throughput-solution"&gt;Pattern 3: &amp;ldquo;More people&amp;rdquo; is treated like a throughput solution&lt;/h2&gt;
&lt;h3 id="what-it-looks-like-2"&gt;What it looks like&lt;/h3&gt;
&lt;p&gt;A late initiative triggers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;new managers&lt;/li&gt;
&lt;li&gt;new project managers&lt;/li&gt;
&lt;li&gt;new engineers&lt;/li&gt;
&lt;li&gt;more coordination rituals&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="why-it-exists-2"&gt;Why it exists&lt;/h3&gt;
&lt;p&gt;It&amp;rsquo;s intuitive: more people should mean more output.&lt;/p&gt;
&lt;h3 id="the-hidden-tax-2"&gt;The hidden tax&lt;/h3&gt;
&lt;p&gt;Software delivery has a coordination component. Adding people increases communication paths, onboarding, and synchronization.&lt;/p&gt;
&lt;p&gt;Brooks&amp;rsquo;s Law captures this succinctly: adding manpower to a late software project can make it later. [6]&lt;/p&gt;
&lt;h3 id="the-replacement-pattern-2"&gt;The replacement pattern&lt;/h3&gt;
&lt;p&gt;Before adding headcount, reduce coordination load:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;clarify ownership&lt;/li&gt;
&lt;li&gt;shrink scope to a thin vertical slice&lt;/li&gt;
&lt;li&gt;eliminate handoffs&lt;/li&gt;
&lt;li&gt;stabilize requirements long enough to ship&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Then scale with:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;duplication (more teams owning similar streams)&lt;/li&gt;
&lt;li&gt;platform leverage (paved roads), not more meetings&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="pattern-4-projectization-and-temporary-teams"&gt;Pattern 4: Projectization and temporary teams&lt;/h2&gt;
&lt;h3 id="what-it-looks-like-3"&gt;What it looks like&lt;/h3&gt;
&lt;p&gt;Engineers are repeatedly reorganized into short-lived &amp;ldquo;project teams,&amp;rdquo; and after delivery they are moved again.&lt;/p&gt;
&lt;h3 id="why-it-exists-3"&gt;Why it exists&lt;/h3&gt;
&lt;p&gt;Projects are easy to budget, track, and narrate.&lt;/p&gt;
&lt;h3 id="the-hidden-tax-3"&gt;The hidden tax&lt;/h3&gt;
&lt;p&gt;Temporary teams produce:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;fragile ownership&lt;/li&gt;
&lt;li&gt;weak operability&lt;/li&gt;
&lt;li&gt;&amp;ldquo;throw it over the wall&amp;rdquo; incentives&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Fast flow requires teams that own outcomes end-to-end with minimal handoffs.&lt;/p&gt;
&lt;p&gt;Team Topologies describes &lt;strong&gt;stream-aligned teams&lt;/strong&gt; as owning a slice of value end-to-end with no handoffs. [3][4]&lt;/p&gt;
&lt;h3 id="the-replacement-pattern-3"&gt;The replacement pattern&lt;/h3&gt;
&lt;p&gt;Prefer &lt;strong&gt;stable teams&lt;/strong&gt; aligned to a value stream (product/service), with:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;clear ownership&lt;/li&gt;
&lt;li&gt;operational responsibility (&amp;ldquo;you build it, you run it&amp;rdquo;)&lt;/li&gt;
&lt;li&gt;direct feedback from users and production&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="pattern-5-governance-by-meeting-instead-of-guardrail"&gt;Pattern 5: Governance by meeting instead of guardrail&lt;/h2&gt;
&lt;h3 id="what-it-looks-like-4"&gt;What it looks like&lt;/h3&gt;
&lt;p&gt;Instead of &amp;ldquo;how do we make safe delivery easy,&amp;rdquo; governance becomes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;approval steps&lt;/li&gt;
&lt;li&gt;committees&lt;/li&gt;
&lt;li&gt;sign-off chains&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="why-it-exists-4"&gt;Why it exists&lt;/h3&gt;
&lt;p&gt;Risk is real, and leaders want control.&lt;/p&gt;
&lt;h3 id="the-hidden-tax-4"&gt;The hidden tax&lt;/h3&gt;
&lt;p&gt;Humans are expensive control planes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;slow&lt;/li&gt;
&lt;li&gt;inconsistent&lt;/li&gt;
&lt;li&gt;difficult to audit at scale&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="the-replacement-pattern-4"&gt;The replacement pattern&lt;/h3&gt;
&lt;p&gt;Convert rules into guardrails:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;policy-as-code&lt;/li&gt;
&lt;li&gt;templates&lt;/li&gt;
&lt;li&gt;paved paths&lt;/li&gt;
&lt;li&gt;automated checks in CI/CD&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is how you scale safety without scaling meetings.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="pattern-6-metrics-as-targets"&gt;Pattern 6: Metrics as targets&lt;/h2&gt;
&lt;h3 id="what-it-looks-like-5"&gt;What it looks like&lt;/h3&gt;
&lt;p&gt;Teams are pressured to hit:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;story points&lt;/li&gt;
&lt;li&gt;&amp;ldquo;velocity&amp;rdquo;&lt;/li&gt;
&lt;li&gt;number of deployments&lt;/li&gt;
&lt;li&gt;&amp;ldquo;percent complete&amp;rdquo;&lt;/li&gt;
&lt;li&gt;tickets closed&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Then behavior adapts to the metric.&lt;/p&gt;
&lt;h3 id="why-it-exists-5"&gt;Why it exists&lt;/h3&gt;
&lt;p&gt;Leaders need a dashboard.&lt;/p&gt;
&lt;h3 id="the-hidden-tax-5"&gt;The hidden tax&lt;/h3&gt;
&lt;p&gt;When a measure becomes a target, it can stop being a good measure (Goodhart&amp;rsquo;s Law). [7]&lt;/p&gt;
&lt;p&gt;Examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;inflate points&lt;/li&gt;
&lt;li&gt;ship low-value changes to increase deploy count&lt;/li&gt;
&lt;li&gt;avoid hard work because it hurts &amp;ldquo;throughput&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="the-replacement-pattern-5"&gt;The replacement pattern&lt;/h3&gt;
&lt;p&gt;Use metrics diagnostically at the system level (not as individual KPIs).&lt;/p&gt;
&lt;p&gt;If you adopt DORA metrics, use them to identify constraints and improve flow - not as quarterly targets for teams. [1][9]&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="pattern-7-engineers-are-abstracted-away-from-production"&gt;Pattern 7: Engineers are abstracted away from production&lt;/h2&gt;
&lt;h3 id="what-it-looks-like-6"&gt;What it looks like&lt;/h3&gt;
&lt;p&gt;A team builds a system, but:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;another team deploys it&lt;/li&gt;
&lt;li&gt;another team runs it&lt;/li&gt;
&lt;li&gt;another team handles incidents&lt;/li&gt;
&lt;li&gt;another team owns the roadmap&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Engineers eventually conclude: &amp;ldquo;Nothing I build actually ships.&amp;rdquo;&lt;/p&gt;
&lt;h3 id="why-it-exists-6"&gt;Why it exists&lt;/h3&gt;
&lt;p&gt;Specialization can be useful, but excessive separation breaks feedback loops.&lt;/p&gt;
&lt;h3 id="the-hidden-tax-6"&gt;The hidden tax&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;teams don&amp;rsquo;t learn from production&lt;/li&gt;
&lt;li&gt;quality declines because consequences are indirect&lt;/li&gt;
&lt;li&gt;&amp;ldquo;deployment pain&amp;rdquo; rises: shipping becomes stressful and disruptive&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;DORA describes &lt;em&gt;deployment pain&lt;/em&gt; as fear/anxiety around deploying and links it to poorer delivery performance and culture. [8] DORA also notes continuous delivery predicts lower levels of burnout and reduces deployment pain. [2]&lt;/p&gt;
&lt;h3 id="the-replacement-pattern-6"&gt;The replacement pattern&lt;/h3&gt;
&lt;p&gt;Re-connect engineers to production:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;give teams operational ownership for what they build&lt;/li&gt;
&lt;li&gt;make telemetry and incident review part of engineering&lt;/li&gt;
&lt;li&gt;reduce fear by making releases small, frequent, and observable&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="replacement-patterns-that-work"&gt;Replacement patterns that work&lt;/h2&gt;
&lt;p&gt;These are the patterns I&amp;rsquo;ve seen consistently restore delivery flow without chaos.&lt;/p&gt;
&lt;h3 id="1-clarify-decision-rights-and-keep-them-close-to-the-work"&gt;1) Clarify decision rights (and keep them close to the work)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;One accountable owner per initiative (not &amp;ldquo;everyone is accountable&amp;rdquo;)&lt;/li&gt;
&lt;li&gt;Engineers participate in tradeoff decisions early (scope, sequencing, risk)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="2-design-teams-for-flow-not-for-org-charts"&gt;2) Design teams for flow (not for org charts)&lt;/h3&gt;
&lt;p&gt;Organizations build systems that mirror their communication structures (Conway&amp;rsquo;s Law). [5]
If your org is siloed and layered, your architecture often becomes siloed and layered too.&lt;/p&gt;
&lt;p&gt;Design teams so the desired architecture is the &lt;em&gt;path of least resistance&lt;/em&gt;.&lt;/p&gt;
&lt;h3 id="3-prefer-stream-aligned-teams--platform-leverage"&gt;3) Prefer stream-aligned teams + platform leverage&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Stream-aligned teams own outcomes end-to-end (no handoffs). [3][4]&lt;/li&gt;
&lt;li&gt;Platform teams reduce cognitive load by providing paved roads (auth, telemetry, CI/CD). [4]&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="4-replace-alignment-meetings-with-shared-artifacts"&gt;4) Replace &amp;ldquo;alignment meetings&amp;rdquo; with shared artifacts&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;one-page decision records&lt;/li&gt;
&lt;li&gt;clear &amp;ldquo;definition of done&amp;rdquo;&lt;/li&gt;
&lt;li&gt;demos that show working software in a real environment&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="5-turn-delivery-into-a-calm-repeatable-process"&gt;5) Turn delivery into a calm, repeatable process&lt;/h3&gt;
&lt;p&gt;When delivery is painful, people add layers to manage fear.
Fix the source:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;tests&lt;/li&gt;
&lt;li&gt;automation&lt;/li&gt;
&lt;li&gt;progressive delivery&lt;/li&gt;
&lt;li&gt;observable releases&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That&amp;rsquo;s how you reduce burnout sustainably. [2][8]&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="verification-how-you-know-the-org-is-healing"&gt;Verification: how you know the org is healing&lt;/h2&gt;
&lt;p&gt;Don&amp;rsquo;t rely on vibes. Use evidence.&lt;/p&gt;
&lt;h3 id="delivery-outcomes-system-level"&gt;Delivery outcomes (system-level)&lt;/h3&gt;
&lt;p&gt;Start with DORA metrics to track flow and stability. [1]&lt;/p&gt;
&lt;h3 id="product-outcomes"&gt;Product outcomes&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;adoption (are users actually using the thing?)&lt;/li&gt;
&lt;li&gt;retention (does usage persist?)&lt;/li&gt;
&lt;li&gt;reduced operational toil (do incidents go down?)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="team-outcomes"&gt;Team outcomes&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;fewer emergency escalations&lt;/li&gt;
&lt;li&gt;fewer &amp;ldquo;status-only&amp;rdquo; meetings&lt;/li&gt;
&lt;li&gt;improved on-call experience (lower deployment pain) [8]&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If lead time drops but burnout rises, you probably &amp;ldquo;optimized the dashboard&amp;rdquo; instead of the system (see Goodhart). [7]&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="a-practical-checklist"&gt;A practical checklist&lt;/h2&gt;
&lt;p&gt;If your org feels &amp;ldquo;management-heavy,&amp;rdquo; try this in order:&lt;/p&gt;
&lt;h3 id="reduce-translation-layers"&gt;Reduce translation layers&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Put engineers in the room (or thread) with real users/operators at least weekly.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Require the &amp;ldquo;why&amp;rdquo; to be written and reviewed before build starts.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="reduce-handoffs"&gt;Reduce handoffs&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Map the value stream and count handoffs.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Remove one handoff per quarter; make it a goal.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="reduce-wip"&gt;Reduce WIP&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Limit concurrent initiatives per team.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Finish before starting.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="convert-meetings-into-guardrails"&gt;Convert meetings into guardrails&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Replace approvals with automated checks where possible.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Create paved paths so the safe way is the easy way.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="reconnect-teams-to-production"&gt;Reconnect teams to production&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Teams own what they ship.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Tie incident learning back to design decisions.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Make releases smaller and more frequent.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;p&gt;[1] DORA - &amp;ldquo;DORA&amp;rsquo;s software delivery performance metrics (guide)&amp;rdquo;. &lt;a href="https://dora.dev/guides/dora-metrics/" target="_blank" rel="noopener noreferrer"&gt;https://dora.dev/guides/dora-metrics/&lt;/a&gt;
[2] DORA - &amp;ldquo;Capabilities: Continuous delivery&amp;rdquo; (notes relationship to burnout and deployment pain). &lt;a href="https://dora.dev/capabilities/continuous-delivery/" target="_blank" rel="noopener noreferrer"&gt;https://dora.dev/capabilities/continuous-delivery/&lt;/a&gt;
[3] Team Topologies - &amp;ldquo;Key Concepts&amp;rdquo; (stream-aligned teams; no handoffs). &lt;a href="https://teamtopologies.com/key-concepts" target="_blank" rel="noopener noreferrer"&gt;https://teamtopologies.com/key-concepts&lt;/a&gt;
[4] IT Revolution - &amp;ldquo;The Four Team Types from Team Topologies&amp;rdquo; (stream-aligned teams own end-to-end). &lt;a href="https://itrevolution.com/articles/four-team-types/" target="_blank" rel="noopener noreferrer"&gt;https://itrevolution.com/articles/four-team-types/&lt;/a&gt;
[5] Splunk - &amp;ldquo;Conway&amp;rsquo;s Law Explained&amp;rdquo; (systems mirror communication structures; includes original quote). &lt;a href="https://www.splunk.com/en_us/blog/learn/conways-law.html" target="_blank" rel="noopener noreferrer"&gt;https://www.splunk.com/en_us/blog/learn/conways-law.html&lt;/a&gt;
[6] Brooks&amp;rsquo;s Law (coined in &lt;em&gt;The Mythical Man-Month&lt;/em&gt;): &amp;ldquo;Adding manpower to a late software project makes it later.&amp;rdquo; &lt;a href="https://en.wikipedia.org/wiki/Brooks%27s_law" target="_blank" rel="noopener noreferrer"&gt;https://en.wikipedia.org/wiki/Brooks%27s_law&lt;/a&gt;
[7] CNA - &amp;ldquo;Goodhart&amp;rsquo;s Law&amp;rdquo; (when a measure becomes a target, it ceases to be a good measure). &lt;a href="https://www.cna.org/analyses/2022/09/goodharts-law" target="_blank" rel="noopener noreferrer"&gt;https://www.cna.org/analyses/2022/09/goodharts-law&lt;/a&gt;
[8] DORA - &amp;ldquo;Capabilities: Well-being&amp;rdquo; (deployment pain and its relationship to performance/culture). &lt;a href="https://dora.dev/capabilities/well-being/" target="_blank" rel="noopener noreferrer"&gt;https://dora.dev/capabilities/well-being/&lt;/a&gt;
[9] SEI (CMU) - &amp;ldquo;How to Misuse and Abuse DORA Metrics&amp;rdquo; (metric anti-patterns). &lt;a href="https://www.sei.cmu.edu/library/how-to-misuse-and-abuse-dora-metrics/" target="_blank" rel="noopener noreferrer"&gt;https://www.sei.cmu.edu/library/how-to-misuse-and-abuse-dora-metrics/&lt;/a&gt;
&lt;/p&gt;</content:encoded></item><item><title>Chapter 5: The Execution Loop: Review Discipline + Commit Discipline</title><link>https://roygabriel.dev/blog/llm-development-guide/05-execution-loop-commit-discipline/</link><pubDate>Fri, 23 Jan 2026 16:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/llm-development-guide/05-execution-loop-commit-discipline/</guid><description>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 5 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/04-work-notes-session-memory/"&gt;Chapter 4: Work Notes: External Memory + Running Log&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/06-scaling-the-workflow/"&gt;Chapter 6: Scaling the Workflow: Phases, Parallelism, Hygiene&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to run an LLM through implementation work in a way that stays reviewable:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;One logical unit at a time.&lt;/li&gt;
&lt;li&gt;Verification before claiming &amp;ldquo;done&amp;rdquo;.&lt;/li&gt;
&lt;li&gt;Atomic commits with clear intent.&lt;/li&gt;
&lt;li&gt;Notes updated as part of the loop.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Never skip: update notes, verify, propose commit, review.&lt;/li&gt;
&lt;li&gt;A &amp;ldquo;logical unit&amp;rdquo; is the smallest change that is independently reviewable.&lt;/li&gt;
&lt;li&gt;Treat LLM output like junior output: it needs review.&lt;/li&gt;
&lt;li&gt;Keep commits small to make rollback cheap.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#the-execution-loop"&gt;The execution loop&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#what-counts-as-a-logical-unit"&gt;What counts as a logical unit&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#commit-discipline"&gt;Commit discipline&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#failure-modes"&gt;Failure modes&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="the-execution-loop"&gt;The execution loop&lt;/h2&gt;
&lt;p&gt;This loop is intentionally repetitive:&lt;/p&gt;</description><content:encoded>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 5 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/04-work-notes-session-memory/"&gt;Chapter 4: Work Notes: External Memory + Running Log&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/06-scaling-the-workflow/"&gt;Chapter 6: Scaling the Workflow: Phases, Parallelism, Hygiene&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to run an LLM through implementation work in a way that stays reviewable:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;One logical unit at a time.&lt;/li&gt;
&lt;li&gt;Verification before claiming &amp;ldquo;done&amp;rdquo;.&lt;/li&gt;
&lt;li&gt;Atomic commits with clear intent.&lt;/li&gt;
&lt;li&gt;Notes updated as part of the loop.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Never skip: update notes, verify, propose commit, review.&lt;/li&gt;
&lt;li&gt;A &amp;ldquo;logical unit&amp;rdquo; is the smallest change that is independently reviewable.&lt;/li&gt;
&lt;li&gt;Treat LLM output like junior output: it needs review.&lt;/li&gt;
&lt;li&gt;Keep commits small to make rollback cheap.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#the-execution-loop"&gt;The execution loop&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#what-counts-as-a-logical-unit"&gt;What counts as a logical unit&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#commit-discipline"&gt;Commit discipline&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#failure-modes"&gt;Failure modes&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="the-execution-loop"&gt;The execution loop&lt;/h2&gt;
&lt;p&gt;This loop is intentionally repetitive:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Load prompt + work-notes
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;-&amp;gt; implement one logical unit
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;-&amp;gt; update work-notes
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;-&amp;gt; verify
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;-&amp;gt; propose commit
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;-&amp;gt; you review
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;-&amp;gt; commit
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;-&amp;gt; repeat
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you skip the middle steps, your &amp;ldquo;agent&amp;rdquo; becomes a vibes-based code generator.&lt;/p&gt;
&lt;h2 id="what-counts-as-a-logical-unit"&gt;What counts as a logical unit&lt;/h2&gt;
&lt;p&gt;A logical unit is a change that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Has a clear purpose.&lt;/li&gt;
&lt;li&gt;Can be verified.&lt;/li&gt;
&lt;li&gt;Can be reviewed in isolation.&lt;/li&gt;
&lt;li&gt;Does not leave the repo in a half-broken state.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Add one Kubernetes template file.&lt;/li&gt;
&lt;li&gt;Add one Go type + its tests.&lt;/li&gt;
&lt;li&gt;Add one API endpoint handler.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Non-examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;Implement everything for phase 2&amp;rdquo;.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Half a refactor&amp;rdquo;.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Quick fixes&amp;rdquo;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="commit-discipline"&gt;Commit discipline&lt;/h2&gt;
&lt;p&gt;Use a consistent commit format so you can scan history later.&lt;/p&gt;
&lt;p&gt;Example commit message:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;feat(helm): add service template for metrics-gateway
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- Exposes port 9090 as ClusterIP
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- Follows event-processor naming and labels
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- No ingress in this commit
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Refs: work-notes/phase-2b-core-resources.md
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;In prompts, require the model to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Summarize what changed.&lt;/li&gt;
&lt;li&gt;Explain why.&lt;/li&gt;
&lt;li&gt;Propose a message.&lt;/li&gt;
&lt;li&gt;Wait for approval.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="verification"&gt;Verification&lt;/h2&gt;
&lt;p&gt;Verification is context-specific, but the shape is consistent:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Build.&lt;/li&gt;
&lt;li&gt;Test.&lt;/li&gt;
&lt;li&gt;Lint.&lt;/li&gt;
&lt;li&gt;Render config.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Generic verification commands you can adapt:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Make sure you know what is staged and what is not.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;git status --porcelain
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;git diff
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Run the repo&amp;#39;s verification gates.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Replace these with your actual commands.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;go &lt;span class="nb"&gt;test&lt;/span&gt; ./...
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# After the commit, ensure you&amp;#39;re clean.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;git status --porcelain
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Expected results:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Before committing, &lt;code&gt;git diff&lt;/code&gt; matches the logical unit scope.&lt;/li&gt;
&lt;li&gt;After committing, &lt;code&gt;git status --porcelain&lt;/code&gt; is empty.&lt;/li&gt;
&lt;li&gt;Tests and other gates exit 0.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="failure-modes"&gt;Failure modes&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;The model keeps &amp;ldquo;just one more change&amp;rdquo;-ing.
&lt;ul&gt;
&lt;li&gt;Fix: put explicit stop points in the prompt.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;You don&amp;rsquo;t verify.
&lt;ul&gt;
&lt;li&gt;Fix: add verification commands to plan and prompt docs.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Commits include unrelated changes.
&lt;ul&gt;
&lt;li&gt;Fix: shrink the logical unit and split the work.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;You approve output you can&amp;rsquo;t review.
&lt;ul&gt;
&lt;li&gt;Fix: stop and do it manually or bring in a reviewer.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Continue -&amp;gt; &lt;a href="https://roygabriel.dev/blog/llm-development-guide/06-scaling-the-workflow/"&gt;Chapter 6: Scaling the Workflow: Phases, Parallelism, Hygiene&lt;/a&gt;
&lt;/p&gt;</content:encoded></item><item><title>Chapter 4: Work Notes: External Memory + Running Log</title><link>https://roygabriel.dev/blog/llm-development-guide/04-work-notes-session-memory/</link><pubDate>Wed, 21 Jan 2026 14:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/llm-development-guide/04-work-notes-session-memory/</guid><description>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 4 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/03-prompt-documents/"&gt;Chapter 3: Prompt Documents: Prompts That Survive Sessions&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/05-execution-loop-commit-discipline/"&gt;Chapter 5: The Execution Loop: Review Discipline + Commit Discipline&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to keep multi-session work consistent by maintaining work notes that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Preserve the model&amp;rsquo;s working state outside the chat.&lt;/li&gt;
&lt;li&gt;Capture decisions and rationale for later review.&lt;/li&gt;
&lt;li&gt;Make handoffs possible.&lt;/li&gt;
&lt;li&gt;Provide a deterministic &amp;ldquo;resume&amp;rdquo; prompt.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;LLMs have no durable memory. If it&amp;rsquo;s not written down, it doesn&amp;rsquo;t exist next session.&lt;/li&gt;
&lt;li&gt;Mirror &lt;code&gt;work-notes/&lt;/code&gt; files to your phases exactly.&lt;/li&gt;
&lt;li&gt;Track: status, decisions, assumptions, open questions, session log, commits.&lt;/li&gt;
&lt;li&gt;In your prompts, require the model to update notes before moving forward.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#directory-alignment"&gt;Directory alignment&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-work-notes-template-you-can-paste"&gt;A work-notes template you can paste&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#session-start-and-session-end-prompts"&gt;Session start and session end prompts&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#gotchas"&gt;Gotchas&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="directory-alignment"&gt;Directory alignment&lt;/h2&gt;
&lt;p&gt;Keep your three directories aligned so you can load one phase without dragging unrelated context into the session:&lt;/p&gt;</description><content:encoded>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 4 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/03-prompt-documents/"&gt;Chapter 3: Prompt Documents: Prompts That Survive Sessions&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/05-execution-loop-commit-discipline/"&gt;Chapter 5: The Execution Loop: Review Discipline + Commit Discipline&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to keep multi-session work consistent by maintaining work notes that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Preserve the model&amp;rsquo;s working state outside the chat.&lt;/li&gt;
&lt;li&gt;Capture decisions and rationale for later review.&lt;/li&gt;
&lt;li&gt;Make handoffs possible.&lt;/li&gt;
&lt;li&gt;Provide a deterministic &amp;ldquo;resume&amp;rdquo; prompt.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;LLMs have no durable memory. If it&amp;rsquo;s not written down, it doesn&amp;rsquo;t exist next session.&lt;/li&gt;
&lt;li&gt;Mirror &lt;code&gt;work-notes/&lt;/code&gt; files to your phases exactly.&lt;/li&gt;
&lt;li&gt;Track: status, decisions, assumptions, open questions, session log, commits.&lt;/li&gt;
&lt;li&gt;In your prompts, require the model to update notes before moving forward.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#directory-alignment"&gt;Directory alignment&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-work-notes-template-you-can-paste"&gt;A work-notes template you can paste&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#session-start-and-session-end-prompts"&gt;Session start and session end prompts&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#gotchas"&gt;Gotchas&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="directory-alignment"&gt;Directory alignment&lt;/h2&gt;
&lt;p&gt;Keep your three directories aligned so you can load one phase without dragging unrelated context into the session:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;plan/phase-2a-scaffolding.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;prompts/phase-2a-scaffolding.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;work-notes/phase-2a-scaffolding.md
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This makes sessions resumable and makes parallel work possible.&lt;/p&gt;
&lt;h2 id="a-work-notes-template-you-can-paste"&gt;A work-notes template you can paste&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-markdown" data-lang="markdown"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;# Phase &amp;lt;X&amp;gt; - &amp;lt;Phase Name&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Status
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; Not started
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; In progress
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; Blocked
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; Complete
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Decisions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;Decision&amp;gt;: &amp;lt;Rationale&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Assumptions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;Assumption&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Open questions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; &amp;lt;Question&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Session log
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### &amp;lt;YYYY-MM-DD HH:MM&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; What changed:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Why:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Blockers:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Next:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Commits
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;hash&amp;gt; - &amp;lt;message&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;You can keep it simple. The win is consistency.&lt;/p&gt;
&lt;h2 id="session-start-and-session-end-prompts"&gt;Session start and session end prompts&lt;/h2&gt;
&lt;h3 id="start-a-session"&gt;Start a session&lt;/h3&gt;
&lt;p&gt;Paste your phase prompt and current work notes, and tell the model to continue from the last session.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;I&amp;#39;m continuing work on Phase &amp;lt;X&amp;gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Prompt:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&amp;lt;paste prompts/phase-X.md&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Current state:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&amp;lt;paste work-notes/phase-X.md&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Please:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;1. Summarize where we are (3 to 4 sentences).
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;2. List blockers and open questions.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;3. Confirm the next logical unit.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;4. Proceed with the next logical unit.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="end-a-session"&gt;End a session&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Before we stop:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;1. Update the session log with what we did and what is next.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;2. Ensure decisions, assumptions, and open questions are current.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;3. Propose a commit message for any completed logical unit.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;4. Show the updated work-notes file.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="verification"&gt;Verification&lt;/h2&gt;
&lt;p&gt;You can verify your notes are doing their job by forcing a cold start:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Start a new chat.&lt;/li&gt;
&lt;li&gt;Paste only the phase prompt and the work-notes file.&lt;/li&gt;
&lt;li&gt;See if you can resume without re-explaining anything.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Mechanical checks:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Work notes exist and are non-empty.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;find work-notes -type f -name &lt;span class="s1"&gt;&amp;#39;*.md&amp;#39;&lt;/span&gt; -maxdepth &lt;span class="m"&gt;2&lt;/span&gt; -print -exec &lt;span class="nb"&gt;test&lt;/span&gt; -s &lt;span class="o"&gt;{}&lt;/span&gt; &lt;span class="se"&gt;\;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Work notes have at least the core sections.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;rg -n &lt;span class="s2"&gt;&amp;#34;^## (Status|Decisions|Assumptions|Open questions|Session log|Commits)&amp;#34;&lt;/span&gt; work-notes
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="gotchas"&gt;Gotchas&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Notes without rationale are not useful later.&lt;/li&gt;
&lt;li&gt;If you let the model continue without updating notes, the next session will drift.&lt;/li&gt;
&lt;li&gt;Avoid dumping raw logs with sensitive data. Sanitize first.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Continue -&amp;gt; &lt;a href="https://roygabriel.dev/blog/llm-development-guide/05-execution-loop-commit-discipline/"&gt;Chapter 5: The Execution Loop: Review Discipline + Commit Discipline&lt;/a&gt;
&lt;/p&gt;</content:encoded></item><item><title>Chapter 3: Prompt Documents: Prompts That Survive Sessions</title><link>https://roygabriel.dev/blog/llm-development-guide/03-prompt-documents/</link><pubDate>Mon, 19 Jan 2026 12:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/llm-development-guide/03-prompt-documents/</guid><description>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 3 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/02-planning-artifacts/"&gt;Chapter 2: Planning: Plan Artifacts, Constraints, Definition of Done&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/04-work-notes-session-memory/"&gt;Chapter 4: Work Notes: External Memory + Running Log&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to create prompt documents that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Encode intent precisely so you don&amp;rsquo;t re-explain yourself.&lt;/li&gt;
&lt;li&gt;Align to plan phases so scope stays tight.&lt;/li&gt;
&lt;li&gt;Include verification steps and explicit stop points.&lt;/li&gt;
&lt;li&gt;Tell the model how to update work notes and propose commits.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;A prompt doc is an artifact, not a chat message.&lt;/li&gt;
&lt;li&gt;Use one prompt file per phase.&lt;/li&gt;
&lt;li&gt;Always include: role, context, task, constraints, deliverables, verification, session management.&lt;/li&gt;
&lt;li&gt;Put negative constraints in writing (&amp;ldquo;MUST NOT&amp;rdquo;).&lt;/li&gt;
&lt;li&gt;Keep prompts copy/pasteable and path-specific.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#why-prompt-docs-matter"&gt;Why prompt docs matter&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#anatomy-of-a-good-prompt"&gt;Anatomy of a good prompt&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-template-you-can-copy"&gt;A template you can copy&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#failure-modes"&gt;Failure modes&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="why-prompt-docs-matter"&gt;Why prompt docs matter&lt;/h2&gt;
&lt;p&gt;If you&amp;rsquo;re doing anything bigger than a one-off snippet, the prompt itself becomes part of the system.&lt;/p&gt;</description><content:encoded>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 3 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/02-planning-artifacts/"&gt;Chapter 2: Planning: Plan Artifacts, Constraints, Definition of Done&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/04-work-notes-session-memory/"&gt;Chapter 4: Work Notes: External Memory + Running Log&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to create prompt documents that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Encode intent precisely so you don&amp;rsquo;t re-explain yourself.&lt;/li&gt;
&lt;li&gt;Align to plan phases so scope stays tight.&lt;/li&gt;
&lt;li&gt;Include verification steps and explicit stop points.&lt;/li&gt;
&lt;li&gt;Tell the model how to update work notes and propose commits.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;A prompt doc is an artifact, not a chat message.&lt;/li&gt;
&lt;li&gt;Use one prompt file per phase.&lt;/li&gt;
&lt;li&gt;Always include: role, context, task, constraints, deliverables, verification, session management.&lt;/li&gt;
&lt;li&gt;Put negative constraints in writing (&amp;ldquo;MUST NOT&amp;rdquo;).&lt;/li&gt;
&lt;li&gt;Keep prompts copy/pasteable and path-specific.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#why-prompt-docs-matter"&gt;Why prompt docs matter&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#anatomy-of-a-good-prompt"&gt;Anatomy of a good prompt&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-template-you-can-copy"&gt;A template you can copy&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#failure-modes"&gt;Failure modes&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="why-prompt-docs-matter"&gt;Why prompt docs matter&lt;/h2&gt;
&lt;p&gt;If you&amp;rsquo;re doing anything bigger than a one-off snippet, the prompt itself becomes part of the system.&lt;/p&gt;
&lt;p&gt;Prompt docs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reduce &amp;ldquo;prompt drift&amp;rdquo; across sessions.&lt;/li&gt;
&lt;li&gt;Make handoffs possible.&lt;/li&gt;
&lt;li&gt;Create an audit trail of what was asked.&lt;/li&gt;
&lt;li&gt;Force you to pin down deliverables and done-ness.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="anatomy-of-a-good-prompt"&gt;Anatomy of a good prompt&lt;/h2&gt;
&lt;p&gt;At minimum, include these sections:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Role: what expertise you&amp;rsquo;re invoking.&lt;/li&gt;
&lt;li&gt;Context: plan path, work-notes path, reference implementation paths.&lt;/li&gt;
&lt;li&gt;Task: what to do now.&lt;/li&gt;
&lt;li&gt;Constraints: what must and must not happen.&lt;/li&gt;
&lt;li&gt;Deliverables: exact files and outputs expected.&lt;/li&gt;
&lt;li&gt;Verification: commands and expected results.&lt;/li&gt;
&lt;li&gt;Session management: how to update work notes.&lt;/li&gt;
&lt;li&gt;Commit discipline: atomic commits, propose messages, wait for approval.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your prompt is missing verification and stop rules, you&amp;rsquo;re inviting &amp;ldquo;looks right&amp;rdquo; output.&lt;/p&gt;
&lt;h2 id="a-template-you-can-copy"&gt;A template you can copy&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-markdown" data-lang="markdown"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;# Phase &amp;lt;X&amp;gt; - &amp;lt;Phase Name&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Role
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;You are a senior software engineer.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Context
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; Plan: plan/&amp;lt;phase&amp;gt;.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Work notes: work-notes/&amp;lt;phase&amp;gt;.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Reference implementations:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;path 1&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;path 2&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Task
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;Implement the next logical unit for this phase.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Constraints (follow exactly)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; MUST follow patterns in the reference implementations.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; MUST keep changes scoped to this phase.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; MUST include tests when applicable.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; MUST propose verification commands.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; MUST NOT add new dependencies unless explicitly approved.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Deliverables
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;1.&lt;/span&gt; &amp;lt;file path&amp;gt; - &amp;lt;what it contains&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;2.&lt;/span&gt; &amp;lt;file path&amp;gt; - &amp;lt;what it contains&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Session management
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;As you work:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Update work-notes/&amp;lt;phase&amp;gt;.md:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;-&lt;/span&gt; Decisions (with rationale)
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;-&lt;/span&gt; Assumptions
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;-&lt;/span&gt; Open questions
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;-&lt;/span&gt; Session log entry
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; After each logical unit, pause and show the updated notes section.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Verification
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;After implementing the logical unit, run or propose:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;command&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Expected: &amp;lt;exit 0 / output contains X&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Commit discipline
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;After verification:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;1.&lt;/span&gt; Summarize what changed and why.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;2.&lt;/span&gt; Propose a conventional commit message.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;3.&lt;/span&gt; Wait for approval before continuing.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Refs: work-notes/&amp;lt;phase&amp;gt;.md
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Notes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use real paths.&lt;/li&gt;
&lt;li&gt;Put constraints in a dedicated section.&lt;/li&gt;
&lt;li&gt;Repeat the most important constraints near the end.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="verification"&gt;Verification&lt;/h2&gt;
&lt;p&gt;You can verify prompt docs are usable by checking two things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You can paste the entire file verbatim into a new session.&lt;/li&gt;
&lt;li&gt;A new session produces the same behavior because paths and constraints are explicit.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Concrete checks:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Prompts exist and are non-empty.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;find prompts -type f -name &lt;span class="s1"&gt;&amp;#39;*.md&amp;#39;&lt;/span&gt; -maxdepth &lt;span class="m"&gt;2&lt;/span&gt; -print -exec &lt;span class="nb"&gt;test&lt;/span&gt; -s &lt;span class="o"&gt;{}&lt;/span&gt; &lt;span class="se"&gt;\;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Prompts mention work-notes paths.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;rg -n &lt;span class="s2"&gt;&amp;#34;work-notes/&amp;#34;&lt;/span&gt; prompts
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Expected results:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Each prompt file is non-empty.&lt;/li&gt;
&lt;li&gt;Each prompt references a work-notes file.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="failure-modes"&gt;Failure modes&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Prompts that say &amp;ldquo;use the config file&amp;rdquo; without a path.&lt;/li&gt;
&lt;li&gt;Constraints buried in prose instead of a dedicated section.&lt;/li&gt;
&lt;li&gt;Prompts that do not mention verification.&lt;/li&gt;
&lt;li&gt;Prompts that do not tell the model to stop after a logical unit.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Continue -&amp;gt; &lt;a href="https://roygabriel.dev/blog/llm-development-guide/04-work-notes-session-memory/"&gt;Chapter 4: Work Notes: External Memory + Running Log&lt;/a&gt;
&lt;/p&gt;</content:encoded></item><item><title>Chapter 2: Planning: Plan Artifacts, Constraints, Definition of Done</title><link>https://roygabriel.dev/blog/llm-development-guide/02-planning-artifacts/</link><pubDate>Sat, 17 Jan 2026 11:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/llm-development-guide/02-planning-artifacts/</guid><description>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 2 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/01-practical-workflow-day-2/"&gt;Chapter 1: A Practical Workflow for LLM-Assisted Development That Doesn&amp;rsquo;t Collapse After Day 2&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/03-prompt-documents/"&gt;Chapter 3: Prompt Documents: Prompts That Survive Sessions&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to write a plan artifact that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Forces clarity on scope, constraints, and references.&lt;/li&gt;
&lt;li&gt;Produces verification steps (not just a task list).&lt;/li&gt;
&lt;li&gt;Is sized so an LLM can execute it phase-by-phase without drifting.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;A plan is a shared source of truth between you and the model.&lt;/li&gt;
&lt;li&gt;Keep plans at the &amp;ldquo;what&amp;rdquo; level; keep &amp;ldquo;how&amp;rdquo; in prompt docs.&lt;/li&gt;
&lt;li&gt;Every phase needs verification and a definition of done.&lt;/li&gt;
&lt;li&gt;If a plan file would exceed ~200 lines, split it.&lt;/li&gt;
&lt;li&gt;Always point to reference implementations by path.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#what-belongs-in-a-plan-and-what-doesnt"&gt;What belongs in a plan (and what doesn&amp;rsquo;t)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-plan-template-you-can-paste"&gt;A plan template you can paste&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#sizing-rules"&gt;Sizing rules&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification-and-definition-of-done"&gt;Verification and definition of done&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#gotchas"&gt;Gotchas&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="what-belongs-in-a-plan-and-what-doesnt"&gt;What belongs in a plan (and what doesn&amp;rsquo;t)&lt;/h2&gt;
&lt;p&gt;Plans work when they are explicit and boring.&lt;/p&gt;</description><content:encoded>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 2 of 16&lt;/p&gt;
&lt;p&gt;Previous: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/01-practical-workflow-day-2/"&gt;Chapter 1: A Practical Workflow for LLM-Assisted Development That Doesn&amp;rsquo;t Collapse After Day 2&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/03-prompt-documents/"&gt;Chapter 3: Prompt Documents: Prompts That Survive Sessions&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to write a plan artifact that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Forces clarity on scope, constraints, and references.&lt;/li&gt;
&lt;li&gt;Produces verification steps (not just a task list).&lt;/li&gt;
&lt;li&gt;Is sized so an LLM can execute it phase-by-phase without drifting.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;A plan is a shared source of truth between you and the model.&lt;/li&gt;
&lt;li&gt;Keep plans at the &amp;ldquo;what&amp;rdquo; level; keep &amp;ldquo;how&amp;rdquo; in prompt docs.&lt;/li&gt;
&lt;li&gt;Every phase needs verification and a definition of done.&lt;/li&gt;
&lt;li&gt;If a plan file would exceed ~200 lines, split it.&lt;/li&gt;
&lt;li&gt;Always point to reference implementations by path.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#what-belongs-in-a-plan-and-what-doesnt"&gt;What belongs in a plan (and what doesn&amp;rsquo;t)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-plan-template-you-can-paste"&gt;A plan template you can paste&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#sizing-rules"&gt;Sizing rules&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification-and-definition-of-done"&gt;Verification and definition of done&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#gotchas"&gt;Gotchas&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="what-belongs-in-a-plan-and-what-doesnt"&gt;What belongs in a plan (and what doesn&amp;rsquo;t)&lt;/h2&gt;
&lt;p&gt;Plans work when they are explicit and boring.&lt;/p&gt;
&lt;p&gt;Include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Goals and non-goals.&lt;/li&gt;
&lt;li&gt;Constraints and invariants.&lt;/li&gt;
&lt;li&gt;Reference implementations (by path).&lt;/li&gt;
&lt;li&gt;Phases in dependency order.&lt;/li&gt;
&lt;li&gt;Verification for each phase.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Avoid:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Full code blocks.&lt;/li&gt;
&lt;li&gt;Deep implementation detail.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Make it better&amp;rdquo; language.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you want the LLM to do a thing consistently across sessions, you need the thing written down.&lt;/p&gt;
&lt;h2 id="a-plan-template-you-can-paste"&gt;A plan template you can paste&lt;/h2&gt;
&lt;p&gt;Create one plan file per phase for larger work.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-markdown" data-lang="markdown"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;# &amp;lt;Project&amp;gt; Plan
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gh"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Overview
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&amp;lt;1 to 2 sentences about what we are building&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Goals
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;Goal 1&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;Goal 2&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Non-goals
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;Explicitly out of scope&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Constraints
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;Must follow reference style X&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;Must not add dependencies&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;Must keep backward compatibility&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## References
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;Path to reference implementation 1&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;Path to reference implementation 2&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Phase 1: &amp;lt;Name&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; &amp;lt;Task 1&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; &amp;lt;Task 2&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Verification:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;Command&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Expected: &amp;lt;Exit 0 / output contains X&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Phase 2: &amp;lt;Name&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; &amp;lt;Task 1&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Verification:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;Command&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Expected: &amp;lt;...&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Definition of done
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; &amp;lt;All phases verified&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; &amp;lt;Tests pass&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; &amp;lt;Docs updated as needed&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;- [ ]&lt;/span&gt; &amp;lt;No TODOs left behind&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;## Risks / open questions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;&lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;Open question 1&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; &amp;lt;Risk 1&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="sizing-rules"&gt;Sizing rules&lt;/h2&gt;
&lt;p&gt;You need the plan sized so the LLM can execute it without mixing unrelated changes.&lt;/p&gt;
&lt;p&gt;Use these rules of thumb:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Small (hours to 1 to 2 days): one &lt;code&gt;PLAN.md&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Medium (1 to 2 weeks): one &lt;code&gt;PLAN.md&lt;/code&gt; with explicit phases.&lt;/li&gt;
&lt;li&gt;Large (multi-week): &lt;code&gt;plan/phase-1a-...md&lt;/code&gt;, &lt;code&gt;plan/phase-1b-...md&lt;/code&gt;, etc.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When in doubt:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Split by file ownership (phases should avoid editing the same files).&lt;/li&gt;
&lt;li&gt;Split by interface boundaries (one phase defines types/contracts; later phases implement).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="verification-and-definition-of-done"&gt;Verification and definition of done&lt;/h2&gt;
&lt;p&gt;Make verification explicit in the plan so you don&amp;rsquo;t have to negotiate it mid-session.&lt;/p&gt;
&lt;p&gt;Bad:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;Add tests&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Better:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;Add unit tests for &lt;code&gt;Foo&lt;/code&gt; and run &lt;code&gt;go test ./...&lt;/code&gt; (expected: exit 0).&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your phase can&amp;rsquo;t be verified, it probably isn&amp;rsquo;t a phase yet.&lt;/p&gt;
&lt;h2 id="verification"&gt;Verification&lt;/h2&gt;
&lt;p&gt;If you follow the template above, you should be able to run something like:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Example: lint and test gates.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Replace with your repo&amp;#39;s actual commands.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;go &lt;span class="nb"&gt;test&lt;/span&gt; ./...
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;git diff --stat
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Expected results:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;go test&lt;/code&gt; exits 0.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;git diff --stat&lt;/code&gt; shows only the files you intended to touch in this phase.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="gotchas"&gt;Gotchas&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Plans that mix &amp;ldquo;what&amp;rdquo; and &amp;ldquo;how&amp;rdquo; become unreadable quickly.&lt;/li&gt;
&lt;li&gt;If you don&amp;rsquo;t write down constraints, the LLM will invent defaults.&lt;/li&gt;
&lt;li&gt;A &amp;ldquo;phase&amp;rdquo; that touches 30 files is usually multiple phases.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Continue -&amp;gt; &lt;a href="https://roygabriel.dev/blog/llm-development-guide/03-prompt-documents/"&gt;Chapter 3: Prompt Documents: Prompts That Survive Sessions&lt;/a&gt;
&lt;/p&gt;</content:encoded></item><item><title>Chapter 1: A Practical Workflow for LLM-Assisted Development That Doesn't Collapse After Day 2</title><link>https://roygabriel.dev/blog/llm-development-guide/01-practical-workflow-day-2/</link><pubDate>Thu, 15 Jan 2026 09:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/llm-development-guide/01-practical-workflow-day-2/</guid><description>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 1 of 16&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/02-planning-artifacts/"&gt;Chapter 2: Planning: Plan Artifacts, Constraints, Definition of Done&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to take a real development task and run an LLM through a repeatable loop that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Survives breaks and multi-day work.&lt;/li&gt;
&lt;li&gt;Produces output you can actually review.&lt;/li&gt;
&lt;li&gt;Includes verification steps, not just code.&lt;/li&gt;
&lt;li&gt;Creates a paper trail of decisions and assumptions.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Treat the LLM like a senior engineer who can execute quickly, but has no durable memory.&lt;/li&gt;
&lt;li&gt;Externalize memory into three artifacts: &lt;code&gt;plan/&lt;/code&gt;, &lt;code&gt;prompts/&lt;/code&gt;, and &lt;code&gt;work-notes/&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;For large projects, add phase specification docs and phase implementation prompt docs (see Chapter 7).&lt;/li&gt;
&lt;li&gt;Execute in small logical units, with verification and atomic commits.&lt;/li&gt;
&lt;li&gt;If you&amp;rsquo;re fighting output quality, upgrade the model or shrink the scope.&lt;/li&gt;
&lt;li&gt;Never paste secrets, PII, or production data.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="trust-contract-read-this-before-you-paste-anything"&gt;Trust contract (read this before you paste anything)&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Security: do not paste secrets, tokens, customer data, or anything you would not publish in a public repo.&lt;/li&gt;
&lt;li&gt;Staleness: model names, pricing, and vendor policies change frequently. Treat examples as illustrative as of 2026-02-14.&lt;/li&gt;
&lt;li&gt;Prereqs: you can run tests, review diffs, and explain the change in a code review.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#why-most-llm-assisted-development-fails"&gt;Why most LLM-assisted development fails&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#the-workflow"&gt;The workflow&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#quick-start-copypaste-kit"&gt;Quick start: copy/paste kit&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#worked-example-helm-chart-from-a-reference-chart"&gt;Worked example: Helm chart from a reference chart&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#failure-modes"&gt;Failure modes&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="why-most-llm-assisted-development-fails"&gt;Why most LLM-assisted development fails&lt;/h2&gt;
&lt;p&gt;Most failures are workflow failures, not &amp;ldquo;prompting&amp;rdquo; failures:&lt;/p&gt;</description><content:encoded>&lt;p&gt;Series: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/"&gt;LLM Development Guide&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;Chapter 1 of 16&lt;/p&gt;
&lt;p&gt;Next: &lt;a href="https://roygabriel.dev/blog/llm-development-guide/02-planning-artifacts/"&gt;Chapter 2: Planning: Plan Artifacts, Constraints, Definition of Done&lt;/a&gt;
&lt;/p&gt;
&lt;h2 id="what-youll-be-able-to-do"&gt;What you&amp;rsquo;ll be able to do&lt;/h2&gt;
&lt;p&gt;You&amp;rsquo;ll be able to take a real development task and run an LLM through a repeatable loop that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Survives breaks and multi-day work.&lt;/li&gt;
&lt;li&gt;Produces output you can actually review.&lt;/li&gt;
&lt;li&gt;Includes verification steps, not just code.&lt;/li&gt;
&lt;li&gt;Creates a paper trail of decisions and assumptions.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Treat the LLM like a senior engineer who can execute quickly, but has no durable memory.&lt;/li&gt;
&lt;li&gt;Externalize memory into three artifacts: &lt;code&gt;plan/&lt;/code&gt;, &lt;code&gt;prompts/&lt;/code&gt;, and &lt;code&gt;work-notes/&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;For large projects, add phase specification docs and phase implementation prompt docs (see Chapter 7).&lt;/li&gt;
&lt;li&gt;Execute in small logical units, with verification and atomic commits.&lt;/li&gt;
&lt;li&gt;If you&amp;rsquo;re fighting output quality, upgrade the model or shrink the scope.&lt;/li&gt;
&lt;li&gt;Never paste secrets, PII, or production data.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="trust-contract-read-this-before-you-paste-anything"&gt;Trust contract (read this before you paste anything)&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Security: do not paste secrets, tokens, customer data, or anything you would not publish in a public repo.&lt;/li&gt;
&lt;li&gt;Staleness: model names, pricing, and vendor policies change frequently. Treat examples as illustrative as of 2026-02-14.&lt;/li&gt;
&lt;li&gt;Prereqs: you can run tests, review diffs, and explain the change in a code review.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="table-of-contents"&gt;Table of contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#why-most-llm-assisted-development-fails"&gt;Why most LLM-assisted development fails&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#the-workflow"&gt;The workflow&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#quick-start-copypaste-kit"&gt;Quick start: copy/paste kit&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#worked-example-helm-chart-from-a-reference-chart"&gt;Worked example: Helm chart from a reference chart&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification"&gt;Verification&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#failure-modes"&gt;Failure modes&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="why-most-llm-assisted-development-fails"&gt;Why most LLM-assisted development fails&lt;/h2&gt;
&lt;p&gt;Most failures are workflow failures, not &amp;ldquo;prompting&amp;rdquo; failures:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You jump straight to implementation without a plan.&lt;/li&gt;
&lt;li&gt;You don&amp;rsquo;t provide reference implementations, so you get generic output.&lt;/li&gt;
&lt;li&gt;You lose context across sessions.&lt;/li&gt;
&lt;li&gt;You don&amp;rsquo;t verify output.&lt;/li&gt;
&lt;li&gt;You batch changes into giant commits that are hard to review or revert.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="the-workflow"&gt;The workflow&lt;/h2&gt;
&lt;p&gt;This is the smallest loop I&amp;rsquo;ve found that stays stable after day 2:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Plan -&amp;gt; Prompt docs -&amp;gt; Work notes -&amp;gt; Execute -&amp;gt; Verify -&amp;gt; Commit
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The artifacts are simple:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;plan/&lt;/code&gt;: what we&amp;rsquo;re doing and how we&amp;rsquo;ll know it&amp;rsquo;s done.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;prompts/&lt;/code&gt;: the reusable prompts aligned to phases.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;work-notes/&lt;/code&gt;: state, decisions, assumptions, open questions, and a running session log.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When work scales to multi-week delivery, promote this into explicit phase specification docs plus phase implementation prompt files so scope and verification stay deterministic across sessions.&lt;/p&gt;
&lt;h2 id="quick-start-copypaste-kit"&gt;Quick start: copy/paste kit&lt;/h2&gt;
&lt;p&gt;This is intentionally minimal. It&amp;rsquo;s enough to make sessions resumable.&lt;/p&gt;
&lt;h3 id="1-create-the-artifact-directories"&gt;1) Create the artifact directories&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;mkdir -p plan prompts work-notes
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="2-create-a-minimal-plan"&gt;2) Create a minimal plan&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cat &amp;gt; plan/phase-1.md &lt;span class="s"&gt;&amp;lt;&amp;lt;&amp;#39;MD&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;# Phase 1: Plan
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Overview
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;&amp;lt;One sentence: what we are building&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Goals
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- &amp;lt;Goal 1&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- &amp;lt;Goal 2&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Constraints
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- &amp;lt;Constraint 1&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- &amp;lt;Constraint 2&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Definition of done
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- [ ] &amp;lt;Verification command + expected outcome&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- [ ] &amp;lt;Verification command + expected outcome&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Out of scope
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- &amp;lt;Thing we will not do in this phase&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;MD&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="3-create-a-phase-prompt-doc"&gt;3) Create a phase prompt doc&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cat &amp;gt; prompts/phase-1.md &lt;span class="s"&gt;&amp;lt;&amp;lt;&amp;#39;MD&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;# Phase 1 - Execution Prompt
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Role
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;You are a senior software engineer.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Context
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- Plan: plan/phase-1.md
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- Work notes: work-notes/phase-1.md
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- Reference implementation(s): &amp;lt;paths&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Task
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;Implement the smallest logical unit that moves this phase forward.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Constraints (follow exactly)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- MUST follow patterns in the reference implementation.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- MUST propose verification commands.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- MUST NOT change files outside this phase scope.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Session management
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;As you work, update work-notes/phase-1.md:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- Decisions (with rationale)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- Assumptions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- Open questions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- Session log entry
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Commit discipline
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;After each logical unit:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;1. Stop and summarize what changed.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;2. Propose a commit message.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;3. Wait for approval.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;MD&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="4-create-work-notes"&gt;4) Create work notes&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cat &amp;gt; work-notes/phase-1.md &lt;span class="s"&gt;&amp;lt;&amp;lt;&amp;#39;MD&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;# Phase 1 - Work Notes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Status
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- [ ] Not started
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- [ ] In progress
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- [ ] Blocked
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;- [ ] Complete
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Decisions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Assumptions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Open questions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Session log
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;## Commits
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="s"&gt;MD&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="worked-example-helm-chart-from-a-reference-chart"&gt;Worked example: Helm chart from a reference chart&lt;/h2&gt;
&lt;p&gt;This example is about correctness and maintainability, not &amp;ldquo;Helm tricks&amp;rdquo;.&lt;/p&gt;
&lt;h3 id="scenario"&gt;Scenario&lt;/h3&gt;
&lt;p&gt;Goal: create a new chart (for example, &lt;code&gt;metrics-gateway&lt;/code&gt;) by following a reference chart (for example, &lt;code&gt;event-processor&lt;/code&gt;) that already works in your environment.&lt;/p&gt;
&lt;p&gt;The important part is the inputs you give the model. Don&amp;rsquo;t describe the reference chart. Paste it.&lt;/p&gt;
&lt;h3 id="reference-inputs-what-to-paste"&gt;Reference inputs (what to paste)&lt;/h3&gt;
&lt;p&gt;Run these commands in your repo and paste their output into your planning prompt:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;tree charts/event-processor/
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;sed -n &lt;span class="s1"&gt;&amp;#39;1,200p&amp;#39;&lt;/span&gt; charts/event-processor/Chart.yaml
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;sed -n &lt;span class="s1"&gt;&amp;#39;1,200p&amp;#39;&lt;/span&gt; charts/event-processor/values.yaml
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;sed -n &lt;span class="s1"&gt;&amp;#39;1,200p&amp;#39;&lt;/span&gt; charts/event-processor/templates/_helpers.tpl
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="plan-prompt-high-signal"&gt;Plan prompt (high-signal)&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;I want to create a new Helm chart for a service called `metrics-gateway`.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Reference implementation: charts/event-processor/ (this is our standard).
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;The new chart MUST follow the same structure and conventions.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Here are the reference inputs:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- tree output: ...
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- Chart.yaml: ...
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- values.yaml: ...
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- templates/_helpers.tpl: ...
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Please:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- Analyze the reference chart patterns.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- Produce a phased plan with verification steps.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;- Call out any open questions you need answered (ports, probes, resources).
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="execution-prompt-phase-aligned"&gt;Execution prompt (phase-aligned)&lt;/h3&gt;
&lt;p&gt;Once you have the plan, generate prompt docs aligned to phases (scaffold, core templates, env overrides, validation). Each prompt should:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Name the deliverables.&lt;/li&gt;
&lt;li&gt;Repeat constraints.&lt;/li&gt;
&lt;li&gt;Include &amp;ldquo;update work notes&amp;rdquo; instructions.&lt;/li&gt;
&lt;li&gt;Include verification commands.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="what-done-looks-like"&gt;What &amp;ldquo;done&amp;rdquo; looks like&lt;/h3&gt;
&lt;p&gt;A good end state is boring:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The new chart is structurally identical to the reference chart.&lt;/li&gt;
&lt;li&gt;The values structure matches (so operators don&amp;rsquo;t re-learn config surfaces).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;helm lint&lt;/code&gt; and &lt;code&gt;helm template&lt;/code&gt; succeed.&lt;/li&gt;
&lt;li&gt;Changes are split into reviewable commits.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="verification"&gt;Verification&lt;/h2&gt;
&lt;p&gt;You can verify you&amp;rsquo;re actually following the workflow, not just producing text:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Artifacts exist.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;test&lt;/span&gt; -d plan &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;test&lt;/span&gt; -d prompts &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;test&lt;/span&gt; -d work-notes
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# A plan exists and is not empty.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;test&lt;/span&gt; -s plan/phase-1.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# A prompt doc exists.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;test&lt;/span&gt; -s prompts/phase-1.md
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Work notes exist.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;test&lt;/span&gt; -s work-notes/phase-1.md
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If you are doing the Helm chart example:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;helm lint charts/metrics-gateway
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;helm template charts/metrics-gateway &amp;gt;/tmp/metrics-gateway.rendered.yaml
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;test&lt;/span&gt; -s /tmp/metrics-gateway.rendered.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Expected results:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The commands exit with code 0.&lt;/li&gt;
&lt;li&gt;The rendered YAML file is non-empty.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="failure-modes"&gt;Failure modes&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Skipping references: you get generic output that doesn&amp;rsquo;t match your repo.&lt;/li&gt;
&lt;li&gt;Skipping verification: you ship code that &amp;ldquo;looked right.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Letting sessions run too long: context drifts and you lose earlier constraints.&lt;/li&gt;
&lt;li&gt;Batching commits: review slows down and rollback gets painful.&lt;/li&gt;
&lt;li&gt;Using the wrong model: cheap models are fine for boilerplate, but can burn hours on complex reasoning.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Continue -&amp;gt; &lt;a href="https://roygabriel.dev/blog/llm-development-guide/02-planning-artifacts/"&gt;Chapter 2: Planning: Plan Artifacts, Constraints, Definition of Done&lt;/a&gt;
&lt;/p&gt;</content:encoded></item><item><title>Agile Isn't Dead. Agile Compliance Is.</title><link>https://roygabriel.dev/blog/agile-compliance-is-dead/</link><pubDate>Wed, 31 Dec 2025 12:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/agile-compliance-is-dead/</guid><description>&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;Note on examples:&lt;/strong&gt; The scenarios below are &lt;strong&gt;anonymized composites&lt;/strong&gt;.
This isn&amp;rsquo;t &amp;ldquo;Agile bad.&amp;rdquo; It&amp;rsquo;s &amp;ldquo;Agile the brand is often used to justify systems that do the opposite of Agile&amp;rsquo;s intent.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;Agile isn&amp;rsquo;t a set of meetings. It&amp;rsquo;s a physics statement:&lt;/p&gt;
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;Shorter feedback loops reduce risk.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Most enterprises didn&amp;rsquo;t fail Agile. They replaced Agile with a bureaucracy that uses Agile vocabulary:&lt;/p&gt;</description><content:encoded>
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;Note on examples:&lt;/strong&gt; The scenarios below are &lt;strong&gt;anonymized composites&lt;/strong&gt;.
This isn&amp;rsquo;t &amp;ldquo;Agile bad.&amp;rdquo; It&amp;rsquo;s &amp;ldquo;Agile the brand is often used to justify systems that do the opposite of Agile&amp;rsquo;s intent.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;Agile isn&amp;rsquo;t a set of meetings. It&amp;rsquo;s a physics statement:&lt;/p&gt;
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;Shorter feedback loops reduce risk.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Most enterprises didn&amp;rsquo;t fail Agile. They replaced Agile with a bureaucracy that uses Agile vocabulary:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;Sprint&amp;rdquo; becomes a reporting interval&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Velocity&amp;rdquo; becomes a performance metric&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Planning&amp;rdquo; becomes a negotiation&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Definition of done&amp;rdquo; becomes a checklist&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Agile transformation&amp;rdquo; becomes a multi-year program&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The result is predictable:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;delivery slows&lt;/li&gt;
&lt;li&gt;quality degrades&lt;/li&gt;
&lt;li&gt;reliability suffers&lt;/li&gt;
&lt;li&gt;engineers burn out&lt;/li&gt;
&lt;li&gt;product expectations aren&amp;rsquo;t met&lt;/li&gt;
&lt;li&gt;leadership gets more dashboards and fewer outcomes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This post is a production-first teardown of Agile theater - and a replacement model that actually ships.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Agile is about &lt;strong&gt;learning quickly&lt;/strong&gt;, not &lt;strong&gt;predicting perfectly&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Scrum is useful when it reduces uncertainty. It&amp;rsquo;s harmful when it becomes a compliance system.&lt;/li&gt;
&lt;li&gt;If you treat sprints as contracts, you&amp;rsquo;ll get &lt;strong&gt;scrumfall&lt;/strong&gt;: waterfall dependencies with sprint-shaped reporting.&lt;/li&gt;
&lt;li&gt;Replace &amp;ldquo;Agile compliance&amp;rdquo; with:&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Flow&lt;/strong&gt; (small batches, limit WIP)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Continuous delivery&lt;/strong&gt; (safe, frequent releases) [4]&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Evidence-based planning&lt;/strong&gt; (measure outcomes; adjust quickly) [5]&lt;/li&gt;
&lt;li&gt;Use system metrics (DORA) to verify improvement: lead time, deploy frequency, change failure rate, MTTR. [6]&lt;/li&gt;
&lt;li&gt;Beware Goodhart&amp;rsquo;s Law: metrics used as targets will be gamed. [7]&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="contents"&gt;Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#agile-the-physics-vs-agile-the-bureaucracy"&gt;Agile the physics vs Agile the bureaucracy&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#pattern-1-sprints-as-contracts"&gt;Pattern 1: Sprints as contracts&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#pattern-2-velocity-as-a-performance-metric"&gt;Pattern 2: Velocity as a performance metric&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#pattern-3-backlog-bloat-as-a-museum-of-anxiety"&gt;Pattern 3: Backlog bloat as a museum of anxiety&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#pattern-4-ceremonies-become-the-work"&gt;Pattern 4: Ceremonies become the work&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#pattern-5-dependencies-turn-scrum-into-fiction"&gt;Pattern 5: Dependencies turn Scrum into fiction&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#pattern-6-definition-of-done-without-production"&gt;Pattern 6: Definition of done without production&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#pattern-7-product-ownership-by-proxy"&gt;Pattern 7: Product ownership by proxy&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#whats-better-flow--cd--evidence"&gt;What&amp;rsquo;s better: Flow + CD + evidence&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#transition-plan-30-days-without-a-revolution"&gt;Transition plan: 30 days without a revolution&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#verification-how-you-know-its-working"&gt;Verification: how you know it&amp;rsquo;s working&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-practical-checklist"&gt;A practical checklist&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#references"&gt;References&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="agile-the-physics-vs-agile-the-bureaucracy"&gt;Agile the physics vs Agile the bureaucracy&lt;/h2&gt;
&lt;p&gt;The Agile Manifesto values working software over comprehensive documentation and emphasizes collaboration and responding to change. [1] One of its principles states that &lt;strong&gt;working software is the primary measure of progress&lt;/strong&gt;. [2]&lt;/p&gt;
&lt;p&gt;Those ideas are still correct.&lt;/p&gt;
&lt;p&gt;What broke in enterprises is implementation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Agile became &lt;strong&gt;process&lt;/strong&gt; instead of &lt;strong&gt;feedback&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;agile artifacts became &lt;strong&gt;deliverables&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;teams were optimized for &lt;strong&gt;predictability theater&lt;/strong&gt; instead of throughput and learning&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In short: Agile got turned into compliance.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="pattern-1-sprints-as-contracts"&gt;Pattern 1: Sprints as contracts&lt;/h2&gt;
&lt;h3 id="what-it-looks-like"&gt;What it looks like&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Sprint planning is treated as a commitment contract.&lt;/li&gt;
&lt;li&gt;Changing scope is seen as failure, even when reality changes.&lt;/li&gt;
&lt;li&gt;Teams avoid surfacing unknowns because unknowns disrupt &amp;ldquo;commitment.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="why-it-happens"&gt;Why it happens&lt;/h3&gt;
&lt;p&gt;Leaders want predictability. Sprints feel like a way to buy it.&lt;/p&gt;
&lt;h3 id="the-hidden-tax"&gt;The hidden tax&lt;/h3&gt;
&lt;p&gt;When you turn sprints into contracts, teams adapt:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;reduce exploration&lt;/li&gt;
&lt;li&gt;defer integration&lt;/li&gt;
&lt;li&gt;accept low-quality shortcuts&lt;/li&gt;
&lt;li&gt;split work into artificial &amp;ldquo;done-looking&amp;rdquo; chunks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You don&amp;rsquo;t eliminate uncertainty. You hide it until the end.&lt;/p&gt;
&lt;h3 id="the-replacement-pattern"&gt;The replacement pattern&lt;/h3&gt;
&lt;p&gt;Use cadence as a heartbeat, not as a contract:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Plan in small chunks.&lt;/li&gt;
&lt;li&gt;Commit to &lt;strong&gt;outcomes and constraints&lt;/strong&gt;, not a stack of tickets.&lt;/li&gt;
&lt;li&gt;Treat scope as a lever; treat time as a constraint.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="pattern-2-velocity-as-a-performance-metric"&gt;Pattern 2: Velocity as a performance metric&lt;/h2&gt;
&lt;h3 id="what-it-looks-like-1"&gt;What it looks like&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Story points become productivity.&lt;/li&gt;
&lt;li&gt;Velocity is compared across teams.&lt;/li&gt;
&lt;li&gt;Teams feel pressure to &amp;ldquo;go faster&amp;rdquo; by increasing points delivered.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="why-it-happens-1"&gt;Why it happens&lt;/h3&gt;
&lt;p&gt;Velocity is a number. Numbers are tempting.&lt;/p&gt;
&lt;h3 id="the-hidden-tax-1"&gt;The hidden tax&lt;/h3&gt;
&lt;p&gt;Story points are a local measure with no consistent meaning across teams. When you attach incentives, teams optimize for the metric:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;inflate estimates&lt;/li&gt;
&lt;li&gt;split work to maximize points&lt;/li&gt;
&lt;li&gt;avoid hard, high-leverage work&lt;/li&gt;
&lt;li&gt;ship low-value changes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is a textbook Goodhart&amp;rsquo;s Law failure mode: when a measure becomes a target, it ceases to be a good measure. [7]&lt;/p&gt;
&lt;h3 id="the-replacement-pattern-1"&gt;The replacement pattern&lt;/h3&gt;
&lt;p&gt;Measure the system, not the story:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;lead time&lt;/li&gt;
&lt;li&gt;cycle time&lt;/li&gt;
&lt;li&gt;deploy frequency&lt;/li&gt;
&lt;li&gt;change failure rate&lt;/li&gt;
&lt;li&gt;MTTR&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Use metrics diagnostically, not as quarterly targets.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="pattern-3-backlog-bloat-as-a-museum-of-anxiety"&gt;Pattern 3: Backlog bloat as a museum of anxiety&lt;/h2&gt;
&lt;h3 id="what-it-looks-like-2"&gt;What it looks like&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Thousands of backlog items exist &amp;ldquo;for visibility.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Nothing gets deleted.&lt;/li&gt;
&lt;li&gt;Refinement happens continuously, but priorities change weekly.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="why-it-happens-2"&gt;Why it happens&lt;/h3&gt;
&lt;p&gt;Backlogs feel like control: &amp;ldquo;We haven&amp;rsquo;t forgotten.&amp;rdquo;&lt;/p&gt;
&lt;h3 id="the-hidden-tax-2"&gt;The hidden tax&lt;/h3&gt;
&lt;p&gt;A giant backlog increases planning cost and reduces focus. Teams stop trusting priorities and operate on side-channel requests.&lt;/p&gt;
&lt;p&gt;My favorite framing:&lt;/p&gt;
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;If everything is in the backlog, nothing is prioritized. It&amp;rsquo;s just a museum of anxiety.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="the-replacement-pattern-2"&gt;The replacement pattern&lt;/h3&gt;
&lt;p&gt;Adopt a tight horizon model:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Now:&lt;/strong&gt; what we&amp;rsquo;re building&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Next:&lt;/strong&gt; what&amp;rsquo;s likely next&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Later:&lt;/strong&gt; ideas (low-investment capture)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Refine Now/Next. Archive the rest.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="pattern-4-ceremonies-become-the-work"&gt;Pattern 4: Ceremonies become the work&lt;/h2&gt;
&lt;h3 id="what-it-looks-like-3"&gt;What it looks like&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Standups become status meetings for managers.&lt;/li&gt;
&lt;li&gt;Planning takes hours.&lt;/li&gt;
&lt;li&gt;Refinement is endless.&lt;/li&gt;
&lt;li&gt;Retrospectives generate action items that never get resourced.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="why-it-happens-3"&gt;Why it happens&lt;/h3&gt;
&lt;p&gt;Ceremonies are easy to schedule. Delivery capability is harder to build.&lt;/p&gt;
&lt;h3 id="the-hidden-tax-3"&gt;The hidden tax&lt;/h3&gt;
&lt;p&gt;Attention becomes fragmented. Engineers become &amp;ldquo;meeting responders.&amp;rdquo; Work gets multi-tasked across initiatives.&lt;/p&gt;
&lt;p&gt;This is how you get:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;slow delivery&lt;/li&gt;
&lt;li&gt;low quality&lt;/li&gt;
&lt;li&gt;burnout&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="the-replacement-pattern-3"&gt;The replacement pattern&lt;/h3&gt;
&lt;p&gt;Keep only the meetings that reduce uncertainty:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;shorter planning&lt;/li&gt;
&lt;li&gt;true async refinement&lt;/li&gt;
&lt;li&gt;standup for coordination within the team (not reporting)&lt;/li&gt;
&lt;li&gt;retros with real ownership and budget&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Then invest in the thing ceremonies can&amp;rsquo;t replace: &lt;strong&gt;engineering capability&lt;/strong&gt; (tests, pipelines, observability, automation).&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="pattern-5-dependencies-turn-scrum-into-fiction"&gt;Pattern 5: Dependencies turn Scrum into fiction&lt;/h2&gt;
&lt;h3 id="what-it-looks-like-4"&gt;What it looks like&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Every story depends on another team.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Blocked&amp;rdquo; is normal.&lt;/li&gt;
&lt;li&gt;Integration is deferred to later sprints.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="why-it-happens-4"&gt;Why it happens&lt;/h3&gt;
&lt;p&gt;Organizations are siloed. Systems mirror communication structures (Conway&amp;rsquo;s Law). [8]&lt;/p&gt;
&lt;h3 id="the-hidden-tax-4"&gt;The hidden tax&lt;/h3&gt;
&lt;p&gt;You get scrumfall: waterfall dependencies, sprint-shaped reporting.&lt;/p&gt;
&lt;p&gt;A two-week sprint can&amp;rsquo;t save a three-month dependency queue.&lt;/p&gt;
&lt;h3 id="the-replacement-pattern-4"&gt;The replacement pattern&lt;/h3&gt;
&lt;p&gt;Design for end-to-end ownership and flow:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;reduce handoffs&lt;/li&gt;
&lt;li&gt;remove or automate cross-team gates&lt;/li&gt;
&lt;li&gt;create platform paved roads so teams can self-serve [9]&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When dependencies can&amp;rsquo;t be eliminated, make them explicit and manage them like risk, not like hope.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="pattern-6-definition-of-done-without-production"&gt;Pattern 6: Definition of done without production&lt;/h2&gt;
&lt;h3 id="what-it-looks-like-5"&gt;What it looks like&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;Done&amp;rdquo; means &amp;ldquo;merged.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;QA is a phase.&lt;/li&gt;
&lt;li&gt;Observability is optional.&lt;/li&gt;
&lt;li&gt;Releases happen &amp;ldquo;later.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="why-it-happens-5"&gt;Why it happens&lt;/h3&gt;
&lt;p&gt;Shipping is painful. So teams avoid it.&lt;/p&gt;
&lt;h3 id="the-hidden-tax-5"&gt;The hidden tax&lt;/h3&gt;
&lt;p&gt;If &amp;ldquo;done&amp;rdquo; doesn&amp;rsquo;t include production, you accumulate:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;integration debt&lt;/li&gt;
&lt;li&gt;release debt&lt;/li&gt;
&lt;li&gt;incident debt&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Reliability declines because feedback arrives late.&lt;/p&gt;
&lt;p&gt;Continuous delivery&amp;rsquo;s core argument is that keeping software deployable and releasing frequently reduces risk and enables faster feedback. [4]&lt;/p&gt;
&lt;h3 id="the-replacement-pattern-5"&gt;The replacement pattern&lt;/h3&gt;
&lt;p&gt;Upgrade your definition of done:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;deployed to a real environment&lt;/li&gt;
&lt;li&gt;observable (metrics/logs/traces)&lt;/li&gt;
&lt;li&gt;rollback path exists&lt;/li&gt;
&lt;li&gt;runbook exists for major failure modes&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="pattern-7-product-ownership-by-proxy"&gt;Pattern 7: Product ownership by proxy&lt;/h2&gt;
&lt;h3 id="what-it-looks-like-6"&gt;What it looks like&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Engineers rarely talk to users/operators.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Product&amp;rdquo; is a chain of intermediaries.&lt;/li&gt;
&lt;li&gt;Requirements arrive as polished tickets without the &amp;ldquo;why.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="why-it-happens-6"&gt;Why it happens&lt;/h3&gt;
&lt;p&gt;The organization tries to protect engineers from churn.&lt;/p&gt;
&lt;h3 id="the-hidden-tax-6"&gt;The hidden tax&lt;/h3&gt;
&lt;p&gt;This degrades the input signal. Engineers build the wrong thing efficiently - and then everyone is surprised it didn&amp;rsquo;t land.&lt;/p&gt;
&lt;h3 id="the-replacement-pattern-6"&gt;The replacement pattern&lt;/h3&gt;
&lt;p&gt;Bring engineers closer to reality:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;listen to customer calls&lt;/li&gt;
&lt;li&gt;review usage telemetry&lt;/li&gt;
&lt;li&gt;participate in discovery&lt;/li&gt;
&lt;li&gt;keep the &amp;ldquo;why&amp;rdquo; attached to every build&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;No one should ship something they can&amp;rsquo;t explain.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="whats-better-flow--cd--evidence"&gt;What&amp;rsquo;s better: Flow + CD + evidence&lt;/h2&gt;
&lt;p&gt;If Agile compliance is the disease, what&amp;rsquo;s the cure?&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s not &amp;ldquo;a different framework.&amp;rdquo; It&amp;rsquo;s an operating model:&lt;/p&gt;
&lt;h3 id="1-flow-small-batches-limited-wip"&gt;1) Flow: small batches, limited WIP&lt;/h3&gt;
&lt;p&gt;Lean/Kanban concepts focus on limiting work in progress and optimizing for flow. [3]&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Finish work, don&amp;rsquo;t start work.&lt;/li&gt;
&lt;li&gt;Reduce batch size.&lt;/li&gt;
&lt;li&gt;Make queues visible.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="2-continuous-delivery-make-change-safe"&gt;2) Continuous Delivery: make change safe&lt;/h3&gt;
&lt;p&gt;Continuous delivery is a capability: keep changes small, deployable, and observable so you can release frequently with lower risk. [4]&lt;/p&gt;
&lt;p&gt;This includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CI&lt;/li&gt;
&lt;li&gt;automated testing&lt;/li&gt;
&lt;li&gt;progressive delivery (when needed)&lt;/li&gt;
&lt;li&gt;rollback/roll-forward discipline&lt;/li&gt;
&lt;li&gt;telemetry tied to releases&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="3-evidence-based-planning-bets-not-contracts"&gt;3) Evidence-based planning: bets, not contracts&lt;/h3&gt;
&lt;p&gt;Lean Startup&amp;rsquo;s build-measure-learn loop emphasizes validated learning - ship something real, measure, and adjust. [5]&lt;/p&gt;
&lt;p&gt;For enterprises, the translation is simple:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Plan in small bets&lt;/li&gt;
&lt;li&gt;Validate early&lt;/li&gt;
&lt;li&gt;Use evidence to re-plan, not politics&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="transition-plan-30-days-without-a-revolution"&gt;Transition plan: 30 days without a revolution&lt;/h2&gt;
&lt;p&gt;You don&amp;rsquo;t need to burn the framework down. You need to change what you reward and what you ship.&lt;/p&gt;
&lt;h3 id="week-1-make-work-visible-as-flow"&gt;Week 1: Make work visible as flow&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Map the value stream from idea -&amp;gt; production.&lt;/li&gt;
&lt;li&gt;Count handoffs.&lt;/li&gt;
&lt;li&gt;Measure current lead time.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="week-2-reduce-batch-size"&gt;Week 2: Reduce batch size&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Pick one initiative.&lt;/li&gt;
&lt;li&gt;Cut it to a thin vertical slice that can ship.&lt;/li&gt;
&lt;li&gt;Define &amp;ldquo;done&amp;rdquo; as &amp;ldquo;in production, measurable.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="week-3-reduce-wip"&gt;Week 3: Reduce WIP&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Stop starting new work.&lt;/li&gt;
&lt;li&gt;Finish the slice.&lt;/li&gt;
&lt;li&gt;Remove one blocking dependency with a paved path or automation.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="week-4-close-the-feedback-loop"&gt;Week 4: Close the feedback loop&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Ship.&lt;/li&gt;
&lt;li&gt;Measure.&lt;/li&gt;
&lt;li&gt;Run a retro focused on system constraints (not blame).&lt;/li&gt;
&lt;li&gt;Repeat.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you do this and nothing improves, you learned something valuable: the constraint is elsewhere.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="verification-how-you-know-its-working"&gt;Verification: how you know it&amp;rsquo;s working&lt;/h2&gt;
&lt;p&gt;You should see movement in system outcomes:&lt;/p&gt;
&lt;p&gt;DORA describes four key delivery performance metrics: lead time for changes, deployment frequency, change failure rate, and time to restore service. [6]&lt;/p&gt;
&lt;p&gt;Signs of real improvement:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;lead time drops (less queueing and fewer handoffs)&lt;/li&gt;
&lt;li&gt;deploy frequency rises (smaller batches, calmer releases)&lt;/li&gt;
&lt;li&gt;change failure rate drops (better tests and safer rollouts)&lt;/li&gt;
&lt;li&gt;MTTR drops (better observability and operability)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And importantly: teams report less &amp;ldquo;deployment pain&amp;rdquo; and less burnout as delivery becomes calmer and more reliable. [10]&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="a-practical-checklist"&gt;A practical checklist&lt;/h2&gt;
&lt;p&gt;If you&amp;rsquo;re stuck in Agile theater, try this:&lt;/p&gt;
&lt;h3 id="stop-measuring-activity"&gt;Stop measuring activity&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Stop comparing velocity across teams.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Stop treating story points as productivity.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="shrink-feedback-loops"&gt;Shrink feedback loops&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Ship a thin slice to production early (behind a flag if needed).&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Put engineers closer to users/operators.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="reduce-handoffs-and-wip"&gt;Reduce handoffs and WIP&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Limit concurrent initiatives.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Remove one handoff per quarter.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="invest-in-delivery-capability"&gt;Invest in delivery capability&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; CI, tests, deployment automation&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; observability tied to releases&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; safer rollouts and rollback paths&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="use-metrics-as-signals-not-targets"&gt;Use metrics as signals, not targets&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Track DORA metrics at the system level. [6]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Avoid metric gaming (Goodhart). [7]&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;p&gt;[1] Manifesto for Agile Software Development (values). &lt;a href="https://agilemanifesto.org/" target="_blank" rel="noopener noreferrer"&gt;https://agilemanifesto.org/&lt;/a&gt;
[2] Principles behind the Agile Manifesto (&amp;ldquo;Working software is the primary measure of progress&amp;rdquo;). &lt;a href="https://agilemanifesto.org/principles.html" target="_blank" rel="noopener noreferrer"&gt;https://agilemanifesto.org/principles.html&lt;/a&gt;
[3] Kanban Guide (principles and practices oriented around flow and WIP). &lt;a href="https://kanbanguides.org/english/" target="_blank" rel="noopener noreferrer"&gt;https://kanbanguides.org/english/&lt;/a&gt;
[4] Continuous Delivery (concepts; keep software deployable, release frequently). &lt;a href="https://continuousdelivery.com/" target="_blank" rel="noopener noreferrer"&gt;https://continuousdelivery.com/&lt;/a&gt;
[5] The Lean Startup - Principles (Build-Measure-Learn; validated learning). &lt;a href="https://theleanstartup.com/principles" target="_blank" rel="noopener noreferrer"&gt;https://theleanstartup.com/principles&lt;/a&gt;
[6] DORA - &amp;ldquo;DORA&amp;rsquo;s software delivery performance metrics (guide)&amp;rdquo;. &lt;a href="https://dora.dev/guides/dora-metrics/" target="_blank" rel="noopener noreferrer"&gt;https://dora.dev/guides/dora-metrics/&lt;/a&gt;
[7] CNA - &amp;ldquo;Goodhart&amp;rsquo;s Law&amp;rdquo; (when a measure becomes a target, it ceases to be a good measure). &lt;a href="https://www.cna.org/analyses/2022/09/goodharts-law" target="_blank" rel="noopener noreferrer"&gt;https://www.cna.org/analyses/2022/09/goodharts-law&lt;/a&gt;
[8] Splunk - &amp;ldquo;Conway&amp;rsquo;s Law Explained&amp;rdquo; (systems mirror communication structures; includes original quote). &lt;a href="https://www.splunk.com/en_us/blog/learn/conways-law.html" target="_blank" rel="noopener noreferrer"&gt;https://www.splunk.com/en_us/blog/learn/conways-law.html&lt;/a&gt;
[9] Microsoft Engineering Blog - &amp;ldquo;Building paved paths: the journey to platform engineering&amp;rdquo;. &lt;a href="https://devblogs.microsoft.com/engineering-at-microsoft/building-paved-paths-the-journey-to-platform-engineering/" target="_blank" rel="noopener noreferrer"&gt;https://devblogs.microsoft.com/engineering-at-microsoft/building-paved-paths-the-journey-to-platform-engineering/&lt;/a&gt;
[10] DORA - &amp;ldquo;Capabilities: Well-being&amp;rdquo; (deployment pain and relationship to performance/culture). &lt;a href="https://dora.dev/capabilities/well-being/" target="_blank" rel="noopener noreferrer"&gt;https://dora.dev/capabilities/well-being/&lt;/a&gt;
&lt;/p&gt;</content:encoded></item><item><title>Agent Observability That Doesn't Lie</title><link>https://roygabriel.dev/blog/agent-observability-that-doesnt-lie/</link><pubDate>Sat, 20 Dec 2025 12:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/agent-observability-that-doesnt-lie/</guid><description>&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;Most &amp;ldquo;agent observability&amp;rdquo; is either:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;too shallow&lt;/strong&gt; (a chat transcript and a couple logs), or&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;too noisy&lt;/strong&gt; (every token logged, every tool payload stored, no signal)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Neither works in production.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re serious about operating agents, you need observability that answers three questions quickly:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;What happened?&lt;/strong&gt; (forensics)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Why did it happen?&lt;/strong&gt; (debuggability)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How often does it happen?&lt;/strong&gt; (reliability)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;OpenTelemetry exists to standardize how you instrument, generate, and export telemetry across traces, metrics, and logs. [1] W3C Trace Context defines how trace context propagates across service boundaries. [2]&lt;/p&gt;</description><content:encoded>&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;Most &amp;ldquo;agent observability&amp;rdquo; is either:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;too shallow&lt;/strong&gt; (a chat transcript and a couple logs), or&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;too noisy&lt;/strong&gt; (every token logged, every tool payload stored, no signal)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Neither works in production.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re serious about operating agents, you need observability that answers three questions quickly:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;What happened?&lt;/strong&gt; (forensics)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Why did it happen?&lt;/strong&gt; (debuggability)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How often does it happen?&lt;/strong&gt; (reliability)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;OpenTelemetry exists to standardize how you instrument, generate, and export telemetry across traces, metrics, and logs. [1] W3C Trace Context defines how trace context propagates across service boundaries. [2]&lt;/p&gt;
&lt;p&gt;Agents add two new requirements:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;tool calls are part of your &amp;ldquo;distributed trace&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;decisioning&amp;rdquo; is a first-class component (not just business logic)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This article is a practical blueprint.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Instrument agents like distributed systems:&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;traces&lt;/strong&gt; for causality (what triggered what)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;metrics&lt;/strong&gt; for health (p95 latency, error rates)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;logs&lt;/strong&gt; for human context (but redacted)&lt;/li&gt;
&lt;li&gt;Propagate a single trace across:&lt;/li&gt;
&lt;li&gt;agent runtime -&amp;gt; MCP gateway -&amp;gt; MCP tool servers -&amp;gt; upstream APIs&lt;/li&gt;
&lt;li&gt;Capture &lt;strong&gt;decision summaries&lt;/strong&gt;, not chain-of-thought.&lt;/li&gt;
&lt;li&gt;Treat cost as a production signal: emit per-run and per-tool cost metrics.&lt;/li&gt;
&lt;li&gt;Use semantic conventions where possible to keep telemetry queryable. [3]&lt;/li&gt;
&lt;li&gt;Don&amp;rsquo;t turn observability into a data breach: OWASP highlights sensitive info disclosure and prompt injection as key risks. [7]&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="contents"&gt;Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#what-to-observe-in-an-agent-system"&gt;What to observe in an agent system&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-trace-model-for-agents"&gt;A trace model for agents&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#metrics-that-matter"&gt;Metrics that matter&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#logs-and-redaction"&gt;Logs and redaction&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#audit-events-vs-debug-logs"&gt;Audit events vs debug logs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#dashboards-and-alerts"&gt;Dashboards and alerts&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-production-checklist"&gt;A production checklist&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#references"&gt;References&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="what-to-observe-in-an-agent-system"&gt;What to observe in an agent system&lt;/h2&gt;
&lt;p&gt;Agents have four observable subsystems:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Planner/Reasoner&lt;/strong&gt; (creates the plan, chooses tools)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tool execution&lt;/strong&gt; (calls MCP tools and interprets results)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory/state&lt;/strong&gt; (what was stored or retrieved)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Policy/budget&lt;/strong&gt; (what was allowed or blocked)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If you only observe #2, you&amp;rsquo;ll miss why the agent chose the wrong tool.
If you only observe #1, you&amp;rsquo;ll miss production failures.&lt;/p&gt;
&lt;p&gt;You need the full chain.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="a-trace-model-for-agents"&gt;A trace model for agents&lt;/h2&gt;
&lt;h3 id="the-core-idea"&gt;The core idea&lt;/h3&gt;
&lt;p&gt;A single &amp;ldquo;agent run&amp;rdquo; is a distributed trace:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;it spans model calls&lt;/li&gt;
&lt;li&gt;tool calls&lt;/li&gt;
&lt;li&gt;downstream system calls&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Use W3C Trace Context (&lt;code&gt;traceparent&lt;/code&gt;, &lt;code&gt;tracestate&lt;/code&gt;) to propagate the trace across boundaries. [2]&lt;/p&gt;
&lt;h3 id="suggested-spans-minimum-viable"&gt;Suggested spans (minimum viable)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Root span&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;agent.run&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;attributes: &lt;code&gt;agent.name&lt;/code&gt;, &lt;code&gt;tenant&lt;/code&gt;, &lt;code&gt;user&lt;/code&gt;, &lt;code&gt;session&lt;/code&gt;, &lt;code&gt;goal_hash&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Planner&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;agent.plan&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;attributes: &lt;code&gt;planner.model&lt;/code&gt;, &lt;code&gt;plan.step_count&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Model calls&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;llm.call&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;attributes: &lt;code&gt;model&lt;/code&gt;, &lt;code&gt;prompt_tokens&lt;/code&gt;, &lt;code&gt;completion_tokens&lt;/code&gt;, &lt;code&gt;latency_ms&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Tool selection&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;agent.tool_select&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;attributes: &lt;code&gt;selector.version&lt;/code&gt;, &lt;code&gt;candidate_count&lt;/code&gt;, &lt;code&gt;selected_count&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Tool call&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;tool.call&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;attributes: &lt;code&gt;tool.name&lt;/code&gt;, &lt;code&gt;tool.class&lt;/code&gt; (read/write/danger), &lt;code&gt;tool.server&lt;/code&gt;, &lt;code&gt;status&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Policy&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;policy.check&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;attributes: &lt;code&gt;policy.rule_id&lt;/code&gt;, &lt;code&gt;decision&lt;/code&gt; (allow/deny), &lt;code&gt;reason_code&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Memory&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;memory.read&lt;/code&gt; / &lt;code&gt;memory.write&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;attributes: &lt;code&gt;store&lt;/code&gt;, &lt;code&gt;keys&lt;/code&gt;, &lt;code&gt;bytes&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="why-spans--logs"&gt;Why spans &amp;gt; logs&lt;/h3&gt;
&lt;p&gt;Spans give you causality:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;which tool call caused a failure&lt;/li&gt;
&lt;li&gt;which step blew the budget&lt;/li&gt;
&lt;li&gt;which upstream dependency was slow&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With OpenTelemetry, you can emit traces and metrics using the same SDK approach. [1][4]&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="metrics-that-matter"&gt;Metrics that matter&lt;/h2&gt;
&lt;h3 id="tool-health-metrics"&gt;Tool health metrics&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;tool_calls_total{tool,status}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tool_latency_ms_bucket{tool}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tool_timeouts_total{tool}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tool_retries_total{tool}&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="agent-run-health-metrics"&gt;Agent run health metrics&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;agent_runs_total{status}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;agent_run_latency_ms_bucket{agent}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;agent_steps_total_bucket{agent}&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="cost-metrics-treat-cost-like-reliability"&gt;Cost metrics (treat cost like reliability)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;llm_tokens_total{model,type=prompt|completion}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;llm_cost_usd_total{model}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;run_cost_usd_bucket{agent}&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="policy-metrics"&gt;Policy metrics&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;policy_denied_total{rule_id}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;danger_tool_attempt_total{tool}&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Semantic conventions help your metrics stay queryable and consistent across systems. OpenTelemetry documents semantic conventions for HTTP spans/metrics, for example. [3][5]&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="logs-and-redaction"&gt;Logs and redaction&lt;/h2&gt;
&lt;p&gt;Logs should add human context, not become a data lake of secrets.&lt;/p&gt;
&lt;p&gt;Rules I like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Do not log prompts by default.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Do not log tool payloads by default.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Log summaries and hashes:&lt;/li&gt;
&lt;li&gt;&lt;code&gt;goal_hash&lt;/code&gt;, &lt;code&gt;plan_hash&lt;/code&gt;, &lt;code&gt;tool_args_hash&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Log &lt;strong&gt;structured error reasons&lt;/strong&gt;:&lt;/li&gt;
&lt;li&gt;&lt;code&gt;validation_error&lt;/code&gt;, &lt;code&gt;upstream_rate_limited&lt;/code&gt;, &lt;code&gt;auth_failed&lt;/code&gt;, &lt;code&gt;policy_denied&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For agent systems, OWASP highlights sensitive information disclosure and insecure output handling. Logging is one of the easiest ways to accidentally create both. [7]&lt;/p&gt;
&lt;h3 id="debug-mode-that-isnt-dangerous"&gt;&amp;ldquo;Debug mode&amp;rdquo; that isn&amp;rsquo;t dangerous&lt;/h3&gt;
&lt;p&gt;If you must support deeper logs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;only enable per tenant/user for a limited window&lt;/li&gt;
&lt;li&gt;auto-expire&lt;/li&gt;
&lt;li&gt;redact aggressively&lt;/li&gt;
&lt;li&gt;never store raw secrets&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="audit-events-vs-debug-logs"&gt;Audit events vs debug logs&lt;/h2&gt;
&lt;p&gt;Treat them as different products:&lt;/p&gt;
&lt;h3 id="audit-events-for-governance"&gt;Audit events (for governance)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;immutable-ish records of side effects&lt;/li&gt;
&lt;li&gt;minimal sensitive data&lt;/li&gt;
&lt;li&gt;always on&lt;/li&gt;
&lt;li&gt;long retention&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example audit fields:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;who: tenant/user/client&lt;/li&gt;
&lt;li&gt;what: tool + action class (create/update/delete)&lt;/li&gt;
&lt;li&gt;when: timestamp&lt;/li&gt;
&lt;li&gt;where: environment&lt;/li&gt;
&lt;li&gt;result: success/failure&lt;/li&gt;
&lt;li&gt;resource IDs (safe identifiers)&lt;/li&gt;
&lt;li&gt;idempotency keys / plan IDs&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="debug-logs-for-engineers"&gt;Debug logs (for engineers)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;short retention&lt;/li&gt;
&lt;li&gt;more context&lt;/li&gt;
&lt;li&gt;highly controlled access&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Mixing these two is how you end up with &amp;ldquo;SharePoint logs full of PII&amp;rdquo; and no one wants to touch them.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="dashboards-and-alerts"&gt;Dashboards and alerts&lt;/h2&gt;
&lt;h3 id="dashboards-start-simple"&gt;Dashboards (start simple)&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Tool reliability&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;top tools by error rate&lt;/li&gt;
&lt;li&gt;top tools by p95 latency&lt;/li&gt;
&lt;li&gt;timeouts per tool&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="2"&gt;
&lt;li&gt;&lt;strong&gt;Agent success&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;success rate by agent type&lt;/li&gt;
&lt;li&gt;&amp;ldquo;stuck runs&amp;rdquo; (runs exceeding max duration)&lt;/li&gt;
&lt;li&gt;average steps per run&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="3"&gt;
&lt;li&gt;&lt;strong&gt;Cost&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;cost per run&lt;/li&gt;
&lt;li&gt;cost per tenant&lt;/li&gt;
&lt;li&gt;top drivers (which tools/model calls)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="alerts-avoid-noise"&gt;Alerts (avoid noise)&lt;/h3&gt;
&lt;p&gt;Alert on what is actionable:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;tool error rate spikes for critical tools&lt;/li&gt;
&lt;li&gt;tool latency p95 spikes beyond SLO&lt;/li&gt;
&lt;li&gt;budget exceeded spike (runaway behavior)&lt;/li&gt;
&lt;li&gt;policy denied spike (possible prompt injection attempt)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you use SLOs and error budgets, Google&amp;rsquo;s SRE material is a practical reference for turning SLOs into alerting strategies. [6]&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="a-production-checklist"&gt;A production checklist&lt;/h2&gt;
&lt;h3 id="tracing"&gt;Tracing&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Every agent run has a trace ID.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Trace context propagates across MCP boundaries (W3C Trace Context). [2]&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Tool calls are spans with stable tool identifiers.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="metrics"&gt;Metrics&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Tool success/error/latency metrics exist.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Agent run success/latency/steps metrics exist.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Cost metrics exist and are monitored.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="logging"&gt;Logging&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Default logs are redacted summaries, not raw payloads.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Debug logging is time-bounded and access-controlled.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="audit"&gt;Audit&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Audit events exist for all side-effecting tools.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Audit records include &amp;ldquo;who/what/when/result&amp;rdquo; without leaking secrets.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="security"&gt;Security&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Observability does not become a secret exfil path (OWASP risks considered). [7]&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;p&gt;[1] OpenTelemetry - Documentation (overview): &lt;a href="https://opentelemetry.io/docs/" target="_blank" rel="noopener noreferrer"&gt;https://opentelemetry.io/docs/&lt;/a&gt;
[2] W3C - Trace Context: &lt;a href="https://www.w3.org/TR/trace-context/" target="_blank" rel="noopener noreferrer"&gt;https://www.w3.org/TR/trace-context/&lt;/a&gt;
[3] OpenTelemetry - Semantic conventions for HTTP (spans/metrics/logs): &lt;a href="https://opentelemetry.io/docs/specs/semconv/http/" target="_blank" rel="noopener noreferrer"&gt;https://opentelemetry.io/docs/specs/semconv/http/&lt;/a&gt;
[4] OpenTelemetry Go - Instrumentation docs: &lt;a href="https://opentelemetry.io/docs/languages/go/instrumentation/" target="_blank" rel="noopener noreferrer"&gt;https://opentelemetry.io/docs/languages/go/instrumentation/&lt;/a&gt;
[5] OpenTelemetry - Semantic conventions for HTTP metrics: &lt;a href="https://opentelemetry.io/docs/specs/semconv/http/http-metrics/" target="_blank" rel="noopener noreferrer"&gt;https://opentelemetry.io/docs/specs/semconv/http/http-metrics/&lt;/a&gt;
[6] Google SRE Workbook - Alerting on SLOs: &lt;a href="https://sre.google/workbook/alerting-on-slos/" target="_blank" rel="noopener noreferrer"&gt;https://sre.google/workbook/alerting-on-slos/&lt;/a&gt;
[7] OWASP - Top 10 for Large Language Model Applications: &lt;a href="https://owasp.org/www-project-top-10-for-large-language-model-applications/" target="_blank" rel="noopener noreferrer"&gt;https://owasp.org/www-project-top-10-for-large-language-model-applications/&lt;/a&gt;
&lt;/p&gt;</content:encoded></item><item><title>Cost Is a Reliability Problem</title><link>https://roygabriel.dev/blog/cost-is-a-reliability-problem/</link><pubDate>Sat, 13 Dec 2025 12:00:00 -0500</pubDate><guid>https://roygabriel.dev/blog/cost-is-a-reliability-problem/</guid><description>&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;Traditional reliability focuses on uptime. AI systems add a second axis:&lt;/p&gt;
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;Your system can be &amp;ldquo;up&amp;rdquo; while your budget is on fire.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;A runaway agent doesn&amp;rsquo;t always crash services. Sometimes it:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;loops tool calls&lt;/li&gt;
&lt;li&gt;retries incorrectly&lt;/li&gt;
&lt;li&gt;escalates to larger models repeatedly&lt;/li&gt;
&lt;li&gt;expands context windows unnecessarily&lt;/li&gt;
&lt;li&gt;performs expensive searches without stopping&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The result: surprise bills, throttling, and eventually hard outages when quotas are hit.&lt;/p&gt;</description><content:encoded>&lt;h2 id="why-this-matters"&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;Traditional reliability focuses on uptime. AI systems add a second axis:&lt;/p&gt;
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;&lt;strong&gt;Your system can be &amp;ldquo;up&amp;rdquo; while your budget is on fire.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;A runaway agent doesn&amp;rsquo;t always crash services. Sometimes it:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;loops tool calls&lt;/li&gt;
&lt;li&gt;retries incorrectly&lt;/li&gt;
&lt;li&gt;escalates to larger models repeatedly&lt;/li&gt;
&lt;li&gt;expands context windows unnecessarily&lt;/li&gt;
&lt;li&gt;performs expensive searches without stopping&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The result: surprise bills, throttling, and eventually hard outages when quotas are hit.&lt;/p&gt;
&lt;p&gt;Google&amp;rsquo;s SRE framing around &lt;strong&gt;error budgets&lt;/strong&gt; is a useful mental model: budgets create a control mechanism that balances stability with velocity. [1][2]
FinOps frames cost management as a collaboration practice between engineering, finance, and business. [3]&lt;/p&gt;
&lt;p&gt;This article is the practical bridge: &lt;strong&gt;use budgets and guardrails like you would for reliability.&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Treat cost as an SLO: define acceptable spend per run / per tenant / per day.&lt;/li&gt;
&lt;li&gt;Enforce budgets at multiple layers:&lt;/li&gt;
&lt;li&gt;per request/run&lt;/li&gt;
&lt;li&gt;per tool&lt;/li&gt;
&lt;li&gt;per tenant&lt;/li&gt;
&lt;li&gt;per environment&lt;/li&gt;
&lt;li&gt;Use hard limits + soft limits:&lt;/li&gt;
&lt;li&gt;soft: degrade model/tool choices&lt;/li&gt;
&lt;li&gt;hard: stop the run and ask for approval&lt;/li&gt;
&lt;li&gt;Add cost circuit breakers:&lt;/li&gt;
&lt;li&gt;abort on runaway loops&lt;/li&gt;
&lt;li&gt;quarantine tools causing repeated retries&lt;/li&gt;
&lt;li&gt;Make cost visible (metrics + dashboards) so teams can improve it.&lt;/li&gt;
&lt;li&gt;Align with FinOps: shared accountability, not &amp;ldquo;billing surprises.&amp;rdquo; [3]&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="contents"&gt;Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#cost-failure-modes-in-agent-systems"&gt;Cost failure modes in agent systems&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#define-cost-slos-and-budgets"&gt;Define cost SLOs and budgets&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#budget-layers-run-tool-tenant-environment"&gt;Budget layers: run, tool, tenant, environment&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#soft-limits-vs-hard-limits"&gt;Soft limits vs hard limits&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#circuit-breakers-for-runaway-behavior"&gt;Circuit breakers for runaway behavior&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#cost-aware-tool-and-model-selection"&gt;Cost-aware tool and model selection&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#dashboards-and-alerts"&gt;Dashboards and alerts&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#a-production-checklist"&gt;A production checklist&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#references"&gt;References&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="cost-failure-modes-in-agent-systems"&gt;Cost failure modes in agent systems&lt;/h2&gt;
&lt;h3 id="1-infinite-or-long-loops"&gt;1) Infinite or long loops&lt;/h3&gt;
&lt;p&gt;Common triggers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;ambiguous tool outputs&lt;/li&gt;
&lt;li&gt;brittle parsing&lt;/li&gt;
&lt;li&gt;&amp;ldquo;try again&amp;rdquo; reflexes&lt;/li&gt;
&lt;li&gt;non-idempotent retries&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="2-tool-spam"&gt;2) Tool spam&lt;/h3&gt;
&lt;p&gt;Agents sometimes &amp;ldquo;search until confident.&amp;rdquo;
If you don&amp;rsquo;t cap it, you get 20+ tool calls on a single request.&lt;/p&gt;
&lt;h3 id="3-model-escalation-cascades"&gt;3) Model escalation cascades&lt;/h3&gt;
&lt;p&gt;If your policy says &amp;ldquo;if uncertain, use a better model,&amp;rdquo; you can create a cost escalator:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;cheap model -&amp;gt; &amp;ldquo;uncertain&amp;rdquo; -&amp;gt; expensive model&lt;/li&gt;
&lt;li&gt;expensive model -&amp;gt; still uncertain -&amp;gt; more calls&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="4-context-growth"&gt;4) Context growth&lt;/h3&gt;
&lt;p&gt;If you keep appending tool outputs to the prompt, costs grow superlinearly and performance can degrade.&lt;/p&gt;
&lt;h3 id="5-external-quotas-become-outages"&gt;5) External quotas become outages&lt;/h3&gt;
&lt;p&gt;Even if cost is acceptable, external services (email APIs, GitHub, calendars) can rate limit you.
Cost and reliability are coupled.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="define-cost-slos-and-budgets"&gt;Define cost SLOs and budgets&lt;/h2&gt;
&lt;p&gt;Start with simple &amp;ldquo;production truths&amp;rdquo;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How much is one agent run allowed to cost?&lt;/li&gt;
&lt;li&gt;What is an acceptable daily spend per tenant?&lt;/li&gt;
&lt;li&gt;What is the max &amp;ldquo;blast radius&amp;rdquo; of a single request?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This maps cleanly to SRE&amp;rsquo;s error budget concept: budgets constrain unsafe behavior while preserving velocity. [2]&lt;/p&gt;
&lt;h3 id="example-cost-slos-pragmatic"&gt;Example cost SLOs (pragmatic)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Per run:&lt;/strong&gt; &amp;lt;= $0.10 (p95), &lt;= $0.50 (max)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Per tenant/day:&lt;/strong&gt; &amp;lt;= $50/day&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Per user/day:&lt;/strong&gt; &amp;lt;= $5/day&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Per tool call:&lt;/strong&gt; &amp;lt;= 3 calls to expensive tools&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These aren&amp;rsquo;t universal. They&amp;rsquo;re explicit. That&amp;rsquo;s what matters.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="budget-layers-run-tool-tenant-environment"&gt;Budget layers: run, tool, tenant, environment&lt;/h2&gt;
&lt;h3 id="1-per-run-budget"&gt;1) Per-run budget&lt;/h3&gt;
&lt;p&gt;Tracks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;max model tokens&lt;/li&gt;
&lt;li&gt;max tool calls&lt;/li&gt;
&lt;li&gt;max wall-clock time&lt;/li&gt;
&lt;li&gt;max &amp;ldquo;expensive operations&amp;rdquo; count&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Most important budget.&lt;/strong&gt; This is where you stop runaway behavior early.&lt;/p&gt;
&lt;h3 id="2-per-tool-budget"&gt;2) Per-tool budget&lt;/h3&gt;
&lt;p&gt;Some tools are inherently expensive:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;large searches&lt;/li&gt;
&lt;li&gt;long-running jobs&lt;/li&gt;
&lt;li&gt;heavy data exports&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Budget these separately:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;max calls&lt;/li&gt;
&lt;li&gt;max payload size&lt;/li&gt;
&lt;li&gt;max time range&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="3-per-tenant-budget"&gt;3) Per-tenant budget&lt;/h3&gt;
&lt;p&gt;Without this, your best customers can melt your infra.&lt;/p&gt;
&lt;p&gt;Per-tenant limits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;requests/min&lt;/li&gt;
&lt;li&gt;concurrent runs&lt;/li&gt;
&lt;li&gt;daily cost cap&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="4-per-environment-budget"&gt;4) Per-environment budget&lt;/h3&gt;
&lt;p&gt;Environments have different rules:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;dev: cheap, permissive, more logging&lt;/li&gt;
&lt;li&gt;prod: bounded, gated, auditable&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is where you implement &amp;ldquo;read-only mode&amp;rdquo; during incidents.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="soft-limits-vs-hard-limits"&gt;Soft limits vs hard limits&lt;/h2&gt;
&lt;h3 id="soft-limits-degrade-gracefully"&gt;Soft limits (degrade gracefully)&lt;/h3&gt;
&lt;p&gt;When approaching budget:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;switch to cheaper models&lt;/li&gt;
&lt;li&gt;reduce context size (summarize)&lt;/li&gt;
&lt;li&gt;narrow tool search range&lt;/li&gt;
&lt;li&gt;skip non-essential steps&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="hard-limits-stop-the-run"&gt;Hard limits (stop the run)&lt;/h3&gt;
&lt;p&gt;When budget is exceeded:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;stop tool calls&lt;/li&gt;
&lt;li&gt;stop escalation&lt;/li&gt;
&lt;li&gt;request user confirmation / approval&lt;/li&gt;
&lt;li&gt;produce a partial answer with an explanation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is exactly the &amp;ldquo;control mechanism&amp;rdquo; idea behind error budgets: it gives the system permission to shift focus when constraints are exceeded. [1]&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="circuit-breakers-for-runaway-behavior"&gt;Circuit breakers for runaway behavior&lt;/h2&gt;
&lt;p&gt;Add circuit breakers that detect &amp;ldquo;this is going bad&amp;rdquo;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;loop detector&lt;/strong&gt;: same tool called with similar args repeatedly&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;retry storm&lt;/strong&gt;: high retry count for a tool within a run&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;no progress&lt;/strong&gt;: plan step count increases without new evidence&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;latency breaker&lt;/strong&gt;: tool p95 spikes beyond threshold&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When triggered:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;stop the run&lt;/li&gt;
&lt;li&gt;quarantine the tool for this run&lt;/li&gt;
&lt;li&gt;degrade to safe alternatives&lt;/li&gt;
&lt;li&gt;emit high-signal telemetry&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="cost-aware-tool-and-model-selection"&gt;Cost-aware tool and model selection&lt;/h2&gt;
&lt;p&gt;Cost control is easier if it&amp;rsquo;s designed into selection:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Rank tools with a &amp;ldquo;cost weight&amp;rdquo; (latency + upstream cost + risk)&lt;/li&gt;
&lt;li&gt;Prefer read-only tools unless a write is required&lt;/li&gt;
&lt;li&gt;Use caches for common retrieval results&lt;/li&gt;
&lt;li&gt;Use deterministic summarization boundaries for tool outputs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you already implement a tool selector (see &amp;ldquo;Million Tool Problem&amp;rdquo;), cost becomes another rerank feature.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="dashboards-and-alerts"&gt;Dashboards and alerts&lt;/h2&gt;
&lt;p&gt;This is where FinOps and SRE meet: cost is an operational signal.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Dashboards&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;spend/day by tenant&lt;/li&gt;
&lt;li&gt;cost per run distribution&lt;/li&gt;
&lt;li&gt;top cost drivers (tools and models)&lt;/li&gt;
&lt;li&gt;runaway breaker triggers&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Alerts&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;daily spend exceeded&lt;/li&gt;
&lt;li&gt;sudden spend spikes (slope alerts)&lt;/li&gt;
&lt;li&gt;high frequency of loop breaker events&lt;/li&gt;
&lt;li&gt;high fraction of runs hitting hard limits&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;AWS&amp;rsquo;s Well-Architected Cost Optimization pillar frames cost optimization as a continual process across the workload lifecycle. That mindset applies here too. [4]&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="a-production-checklist"&gt;A production checklist&lt;/h2&gt;
&lt;h3 id="budgets"&gt;Budgets&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Per-run cost and tool-call budgets exist.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Per-tenant daily caps exist.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Per-tool &amp;ldquo;expensive operation&amp;rdquo; caps exist.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="enforcement"&gt;Enforcement&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Soft limits degrade gracefully (cheaper models, narrower queries).&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Hard limits stop and request approval.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Circuit breakers detect loops/retry storms.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="telemetry"&gt;Telemetry&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Cost metrics emitted per run and per tenant.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Breaker events recorded and alertable.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="culture"&gt;Culture&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Cost management is a shared practice (FinOps), not a surprise invoice. [3]&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;p&gt;[1] Google SRE Workbook - Example Error Budget Policy: &lt;a href="https://sre.google/workbook/error-budget-policy/" target="_blank" rel="noopener noreferrer"&gt;https://sre.google/workbook/error-budget-policy/&lt;/a&gt;
[2] Google SRE Book - Embracing Risk (error budgets as control mechanism): &lt;a href="https://sre.google/sre-book/embracing-risk/" target="_blank" rel="noopener noreferrer"&gt;https://sre.google/sre-book/embracing-risk/&lt;/a&gt;
[3] FinOps Foundation - What is FinOps? (definition and principles): &lt;a href="https://www.finops.org/introduction/what-is-finops/" target="_blank" rel="noopener noreferrer"&gt;https://www.finops.org/introduction/what-is-finops/&lt;/a&gt;
[4] AWS Well-Architected Framework - Cost Optimization pillar: &lt;a href="https://docs.aws.amazon.com/wellarchitected/latest/framework/cost-optimization.html" target="_blank" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/wellarchitected/latest/framework/cost-optimization.html&lt;/a&gt;
&lt;/p&gt;</content:encoded></item></channel></rss>