Projects | Roy Gabriel

Cruvero - AI Agent Ecosystem Platform

Thu, 12 Feb 2026 19:25:00 -0500

Summary

Cruvero is a production-grade AI agent orchestration platform I designed and built from the ground up in Go. It treats durability, observability, and operational control as infrastructure guarantees, not library afterthoughts.

Where frameworks like LangGraph bolt checkpointing onto a graph abstraction, Cruvero inverts the model: Temporal’s battle-tested workflow engine is the foundation, and the agent abstraction compiles down to it. The result is a platform where retry logic, failure recovery, human-in-the-loop approval, and multi-agent coordination aren’t library features; they’re infrastructure guarantees backed by the same technology that runs Uber’s and Stripe’s most critical workflows.

The system currently spans 90,000+ lines of Go and TypeScript, with a comprehensive React UI, Kubernetes deployment via Helm and ArgoCD, and an enterprise MCP gateway architecture designed to support 1,000+ concurrent agents across 150+ integrations.

The Problem

Every major agent framework optimizes for the same thing: time-to-demo. Spin up a LangGraph chain, wire a few tools, get a result in 30 seconds. Impressive on a slide. Catastrophic in production.

The failure modes are predictable. An agent workflow running for 40 minutes crashes mid-execution; state is gone. A tool call to an external API times out; the entire run fails with no recovery. A billing-sensitive agent hallucinates a $50,000 API call; no cost guardrails existed to stop it. An agent enters a reasoning loop, calling the same tool 15 times with near-identical arguments; nothing detects the degeneration.

These aren’t edge cases. They’re the baseline reality of running AI agents at enterprise scale. Cruvero was built to make them structurally impossible.

Architecture

Cruvero’s architecture is layered around a single principle: every agent action is a Temporal activity, and every workflow survives infrastructure failure by default.

Core Runtime: The agent loop follows a deterministic decide → act → observe → repeat state machine. Each cycle produces an immutable DecisionRecord with content-addressed hashes of the prompt, state, tool schemas, and model config. This gives you complete forensic capability: for any decision an agent made, you can see the exact inputs, replay the decision with a different model, or run counterfactual analysis (“what if it had chosen differently at step 4?”).

Durable Execution: Temporal manages all workflow state. Agent runs survive process crashes, worker restarts, and infrastructure failures transparently. Long-running workflows (minutes to hours) use continue-as-new with automatic state compaction. There is zero data loss on agent failure, guaranteed by Temporal’s event sourcing, not by application-level retry logic.

Multi-Agent Coordination: A first-class supervisor pattern supports seven coordination strategies: delegate, broadcast, debate, pipeline, map-reduce, voting, and saga with compensation. Agents communicate through signals, shared blackboard state, and pub/sub events. A supervisor can launch child agents, aggregate their results, and handle partial failures; all as durable Temporal workflows with full replay capability.

Graph DSL & Workflow Engine: A custom graph DSL compiles structured execution plans (steps, conditional routes, parallel branches, join semantics, subgraphs) into Temporal workflows. Join modes include all, any, N-of-M, and voting. The visual workflow builder (React Flow) provides bidirectional serialization between the visual canvas and the underlying graph definition.

Neuro-Inspired Intelligence

This is the feature set that no other agent framework implements. Drawing from neuroscience and cognitive architecture research, this layer introduces eight subsystems that fundamentally change how agents reason, learn, and self-correct.

Metacognitive Monitoring: Modeled on prefrontal cortex performance monitoring. The system tracks tool call hashes, observation hashes, progress deltas, confidence entropy, and goal-drift scores (via embedding cosine similarity against the original prompt). When it detects degradation, such as repetition loops, stalled progress, drifting goals, or collapsing confidence, it triggers graduated backpressure: forced reflection, model escalation (swap to a more capable model mid-run), context reset, mandatory strategy pivots, or human escalation. No more agents spinning their wheels for 200 steps.

Attention-Weighted Context Windows: Inspired by hippocampal memory replay. Instead of dumping context linearly into the prompt, a multi-factor salience scorer (relevance, recency, confidence, usage frequency) re-ranks all memory before assembly. A dynamic token budget allocator shifts allocation by task phase. Planning phases boost semantic/procedural memory, execution phases boost tool schemas, and review phases boost episodic memory. An interference detector flags contradictory facts explicitly in the prompt rather than letting the LLM silently pick one.

Temporal Reasoning: Deadline-aware execution with soft and hard deadlines, graduated pressure levels (relaxed through critical), automatic model switching under time pressure, and structured time context injection into every prompt.

Agent Immune System: Anomaly signature tracking with automatic tool quarantine. When a tool’s behavior degrades or produces anomalous outputs, the immune system hashes the failure pattern, tracks hit counts, and quarantines the tool after a configurable threshold. A vaccination CLI injects procedural memory to teach agents how to work around quarantined capabilities.

Compositional Tool Synthesis: Meta-tools that chain multiple tool calls into atomic pipelines with pre/postcondition contracts, typed argument mapping, and enforcement of non-retryable errors on contract violations.

Federated Trust & Delegation: Trust scoring for multi-agent delegation. Agents build trust through successful task completion; supervisors automatically select agents based on capability manifests and accumulated trust scores. Delegation chains provide full accountability tracking for post-mortem analysis.

Execution Provenance Graph: A tamper-evident DAG tracking every action, decision, and data dependency in an agent run. Supports ancestor/descendant queries, subgraph extraction, and run diffing to compare two executions and identify the exact point of divergence.

Enterprise Governance

Cruvero’s enterprise hardening philosophy is “tenant isolation is a property of the architecture, not a feature.” Every boundary is enforced at the infrastructure layer.

Multi-Tenancy & Namespace Isolation: Temporal namespaces, Postgres row-level security, and network policies enforce tenant boundaries. Per-tenant model selection, tool access control, and resource quotas are infrastructure-level guarantees that cannot be bypassed by application code.

Rate Limiting, Quotas & Cost Guardrails: Per-decision cost tracking (estimated and actual) with configurable policies: max cost per run, max cost per step, prefer-cheaper-model flags. Budget enforcement halts runs before they exceed limits. A model catalog with pricing metadata enables real-time cost optimization across providers.

Audit Logging & Compliance: Every tool call, LLM invocation, and state mutation is authenticated, authorized, and recorded in a tamper-evident audit trail. SOC 2-ready export formats. PII detection across five enforcement boundaries (audit, output, tool I/O, memory, events) with 12 PII types, unified secret detection, Shannon entropy analysis, HMAC-based stable tokenization, and a risk scoring engine.

Security Hardening: OWASP Top 10 mitigations, RBAC with four role levels (Viewer, Editor, Admin, Super Admin), OIDC authentication, CSRF protection, input sanitization, and CSP headers.

Tool Ecosystem & MCP Integration

Semantic Tool Discovery: A three-stage pipeline (keyword search → embedding similarity → quality-weighted reranking) selects tools dynamically rather than dumping all tool schemas into every prompt. Tool quality tracking quarantines degraded tools automatically.

MCP Protocol: 150+ Model Context Protocol integrations (Notion, GitHub, AWS, Azure, O365, ServiceNow, Slack, and more) with standardized tool interfaces. The current architecture uses stdio subprocesses; the enterprise target architecture introduces a gateway-mediated Streamable HTTP model with per-integration scaling, Dragonfly response caching, circuit breakers, Vault-backed credential isolation, and KEDA autoscaling, designed for 1,000+ concurrent agents.

Event-Driven Architecture: NATS provides async event fan-out alongside Temporal’s durable execution. MCP server lifecycle management, embedding pipeline intake, audit/telemetry buffering, and external consumer subscriptions (Teams/Telegram bots, dashboards, webhook relays) all flow through NATS, without ever entering the workflow deterministic path.

Observability & Operations

Distributed Tracing: OpenTelemetry spans per decision cycle, tool call, memory operation, and MCP invocation. Full correlation IDs from workflow entry through every activity.

Structured Logging: Zap-based structured logging with per-tenant, per-run, and per-step context propagation.

Production API: RESTful API with automatic OpenAPI 3.1 documentation, SSE streaming for live run updates, and comprehensive endpoints for run management, approval workflows, replay, tracing, cost queries, and tool management.

React Operational UI: A full-featured React 18 / TypeScript interface replacing the original htmx console. Surfaces every runtime capability: run management with live SSE streaming, approval queues, replay console with counterfactual analysis, causal trace explorer, tool registry browser, memory explorer with salience scores, cost dashboards (ECharts), supervisor multi-agent visualization, visual workflow builder (React Flow), live workflow inspection, speculative execution, and differential model testing.

Kubernetes Deployment: Helm chart with environment-aware value overlays, ArgoCD ApplicationSet for GitOps promotion (dev/staging/prod), ServiceMonitor templates, and ingress configuration.

Key Decisions

Go over Python: Single-binary deploys, predictable latency, deterministic resource usage, and a strong concurrency model for managing hundreds of concurrent agent sessions. No GIL, no dependency hell, no runtime surprises.

Temporal over custom durability: Rather than implementing checkpointing, retry logic, and state recovery as library features, Cruvero delegates all of it to Temporal’s battle-tested workflow engine. This is the same infrastructure that runs mission-critical systems at companies processing millions of transactions per day.

Neuroscience-grounded intelligence: The cognitive architecture isn’t marketing. Each subsystem maps to a specific neuroscience principle (prefrontal monitoring, hippocampal salience, temporal reasoning, immune response). The result is agents that self-correct, learn from failures, and degrade gracefully, capabilities no other framework offers.

Context management as a competitive advantage: Most frameworks dump everything into the context window and pray. Cruvero’s context pipeline includes phase-aware budget allocation, five-component salience scoring, semantic tool search, interference detection, observation masking, and proactive compression triggers. The competitive analysis shows clear advantages over LangChain/LangGraph across every dimension.

Outcome

Cruvero runs production agent workloads with infrastructure-grade reliability guarantees. The platform handles long-running workflows (minutes to hours), survives arbitrary infrastructure failures without data loss, enforces per-tenant cost and security policies, and provides complete observability from workflow entry through every LLM decision and tool call.

The codebase represents 90,000+ lines of production code, 80%+ test coverage, comprehensive documentation published via Hugo, and a development methodology designed for systematic LLM-assisted engineering at scale.

Stack

Go · Temporal · PostgreSQL · NATS · React 18 · TypeScript · Vite · React Flow · ECharts · Tailwind CSS · Kubernetes · Helm · ArgoCD · Qdrant · Dragonfly · Ollama · OpenTelemetry · Zap · Keycloak · Docker

Go MCP Server Ecosystem

Sun, 01 Sep 2024 00:00:00 +0000

Summary

This project is a growing ecosystem of Model Context Protocol (MCP) servers written in Go. Each server wraps a real service (calendar, email, task management, knowledge base, etc.) and exposes it as a typed, tool-based interface for MCP clients (e.g., Claude Desktop / Claude Code). [1][2]

The theme is simple: agents are only as useful as the tools they can call, and “tooling” needs the same production bar as any other integration layer: security boundaries, backpressure, observability, and predictable failure modes.

Open-source MCP servers

iCloud Calendar MCP Server (CalDAV): list calendars, search events, create/update/delete events; includes recurring event expansion, multi-account support, rate limiting, retries, audit logs, and Prometheus/health endpoints. [4]
iCloud Email MCP Server (IMAP/SMTP): search and read mail, send/reply, manage folders, handle attachments, and apply safety annotations (read-only vs destructive) with strict input validation. [5]
Todoist MCP Server (REST API v2 + Sync batching): manage tasks/projects/labels/comments; supports bulk operations with rate-limit-aware batching and Todoist filter syntax. [6]
Notion MCP Server (Notion REST API): pages, databases, blocks, comments, users; includes templates, exports (Markdown/CSV), smart queries, and built-in throttling/retries. [7]

Private / not-yet-open-sourced connectors

I’ve also built MCP connectors for enterprise systems that aren’t ready to open-source yet (either due to org-specific assumptions, credentials, or hard-coded domain models):

Kubernetes
Argo CD
SonarQube
GitHub
Temporal
OpenText Octane

(These follow the same design patterns described below.)

Problem

Agents need to interact with real systems: calendars, email, task systems, and internal developer platforms. Without a standard interface, every tool integration becomes a one-off, and reliability/guardrails drift between projects.

MCP solves the “standard interface” problem by defining how a host/client can discover and call server-exposed tools over a consistent protocol. [1][2] This ecosystem focuses on solving the remaining hard part: making those integrations production-grade.

Constraints

Local-first security boundary: credentials live on the host where the server runs (env vars, secret mounts, keychain tooling); the server talks directly to the upstream service with no proxy SaaS. [4][5]
Safety by design: explicit tool schemas, input validation, and tool classification (read-only vs mutating) so clients can apply guardrails. [4][5]
Fast & predictable: low startup time and bounded tool-call latency (timeouts + backpressure). [4][5][7]
Operable like a real service: logs that correlate per request, rate limiting, retries/backoff where appropriate, and health/metrics where it matters. [4][6][7]
Portable distribution: ship as single Go binaries (and containers where useful), so the “tool layer” is easy to deploy alongside agents. [4][5]

Architecture

At a high level, every server follows the same pattern:

MCP client (hosted by Claude / an agent runtime) communicates with the server (typically over stdio transport).
The server validates inputs, applies middleware (timeouts, logging, rate limits), and calls the upstream API/protocol.
Results are mapped into safe, typed tool outputs (and errors are normalized for the client).

┌───────────────────────────┐ MCP (tools) ┌────────────────────────────┐
│ Claude Desktop / Code │ ───────────────────────────▶ │ mcp-<service> (Go binary) │
│ (MCP host + client) │ ◀─────────────────────────── │ - tool schemas + handlers │
└───────────────────────────┘ JSON-RPC/session │ - auth + validation │
 │ - rate limit + retries │
 └───────────┬────────────────┘
 │
 │ service protocol / API
 ▼
 ┌──────────────────────────────┐
 │ iCloud / Todoist / Notion ... │
 └──────────────────────────────┘

Cross-cutting “production traits”

Instead of building one-off scripts, these servers implement common production patterns:

Timeout middleware on every tool call (so agents don’t hang forever). [4][5][7]
Request correlation IDs and structured logs (debuggable across multi-step agent runs). [4][5]
Rate limiting + backoff when upstream services throttle (e.g., iCloud and Notion). [4][7]
Bulk operation strategies that reduce API calls (e.g., Todoist Sync API batching for bulk changes). [6]
Health + metrics endpoints where running in containers makes sense (notably the iCloud Calendar server). [4]
Automated CI (race detector, linting, vulnerability checks) to keep “tool servers” from becoming unreviewed glue. [4][5]

Key decisions

Go for tool servers: predictable concurrency, easy cross-platform builds, and the “single static-ish binary” deployment model fits MCP servers well, especially when they’re launched per-session or run as small sidecars. [4][5]
Independent binaries per integration: calendar ≠ email ≠ tasks. Separate processes isolate failures, limit blast radius, and make upgrades/rollbacks straightforward.
Local-first auth: app-specific passwords (iCloud), API tokens (Todoist), integration tokens (Notion). The servers are designed so secrets stay on your machine / in your cluster secrets manager, not copied into prompts. [4][5][6][7]
Use an MCP SDK, focus on semantics: the implementations use the Go MCP SDK (mark3labs/mcp-go) so most effort goes into tool behavior, validation, and safety. [8][9]

Outcome

This ecosystem has produced multiple MCP servers that are:

useful (real workflows: schedule management, inbox operations, task execution, knowledge base automation),
operationally hardened (timeouts, retries, rate limits, observability),
portable (binaries + releases for easy distribution),
and structured enough to be safe (typed schemas, validation, tool annotations).

Concrete examples from the current repos:

The iCloud Calendar server exposes 5 tools, supports multi-account, and includes health + Prometheus metrics, audit logging without PII, retries/backoff, and rate limiting. [4]
The iCloud Email server exposes 14 tools and includes thread-safe IMAP access, request correlation IDs, strict validation, and “read-only vs destructive” tool annotations. [5]
Tagged releases exist across the servers (e.g., iCloud Calendar v1.1.0, iCloud Email v0.6.0, Todoist v1.0.0, Notion v0.8.0 published on Feb 7, 2026). [4][5][6][7]

Stack

Go, MCP, mark3labs/mcp-go, CalDAV, IMAP/SMTP, REST APIs (Todoist/Notion), Docker (distroless where applicable), Prometheus metrics (where applicable).

References

[1] Model Context Protocol (MCP): Specification (Protocol Revision 2025-11-25). https://modelcontextprotocol.io/specification/2025-11-25 [2] Model Context Protocol (MCP): Architecture (Protocol Revision 2025-06-18). https://modelcontextprotocol.io/specification/2025-06-18/architecture [3] Roy Gabriel: “Go MCP Server Ecosystem” (original portfolio page). https://www.roygabriel.dev/projects/mcp-servers/ [4] GitHub: roygabriel/mcp-icloud-calendar. https://github.com/roygabriel/mcp-icloud-calendar [5] GitHub: roygabriel/mcp-icloud-email. https://github.com/roygabriel/mcp-icloud-email [6] GitHub: roygabriel/mcp-todoist. https://github.com/roygabriel/mcp-todoist [7] GitHub: roygabriel/mcp-notion. https://github.com/roygabriel/mcp-notion [8] GitHub: mark3labs/mcp-go. https://github.com/mark3labs/mcp-go [9] go.mod (module dependencies) for the MCP servers (e.g., mark3labs/mcp-go used in this ecosystem). - https://raw.githubusercontent.com/roygabriel/mcp-icloud-calendar/main/go.mod - https://raw.githubusercontent.com/roygabriel/mcp-icloud-email/main/go.mod - https://raw.githubusercontent.com/roygabriel/mcp-todoist/main/go.mod - https://raw.githubusercontent.com/roygabriel/mcp-notion/main/go.mod