Learn how Coinbase Delivers 10x Engineering using Resolve AI
At AWS re:Invent 2025, one theme was impossible to miss: AI for prod is breaking out as its own category, not a side note to AI for coding.
From the moment the doors opened, our booth was wrapped in lines of engineers and executives asking one question: how do we use AI to actually run production systems, not just write more code?
For us, it confirmed what we see in the field: AI for prod is becoming one of the fastest forming categories in enterprise software because running production systems is still brutally hard.
AI coding assistants are now table stakes. As Coinbase's Angelo Marletta put it in his re:Invent session, “writing code is no longer the bottleneck.”
At scale, the bottleneck is everything that happens after code is written: resolving incidents, optimizing cost, managing multi-cloud environments, enforcing controls, and shipping changes safely. Engineers spend only a fraction of their time actually writing code. The rest goes into managing production, chasing incidents, reconciling data across tools, and aligning on risk. As models generate more code, this production toil gets worse unless you attack it directly.
That is the gap AI for prod is starting to close.
We ran two technical sessions that demonstrated how Resolve AI works in real-world environments. The talks used AI SRE as the entry point, but what resonated was broader: an AI system that can reason across the full production surface, from incidents and reliability to efficiency and change management.
We focused on three ideas.
Resolve AI integrates with your existing AWS, Kubernetes, Datadog, Grafana, Git, ticketing systems, and wikis. It builds a continuously updated knowledge graph of your system, mapping services, data flows, infrastructure, and how failures and changes propagate.
In one session, Mayank walked through a real Postgres deadlock in our own production. Resolve AI validated the alert, refined log queries, navigated the dependency graph, correlated behavior with a recent scaling operation, and traced the problem to a specific batch update path with non-deterministic lock ordering in code.
By the time the human on call acknowledged the alert, Resolve AI had already produced a detailed root cause analysis and suggested fixes. The same understanding also powers impact analysis for risky deployments and identifies high-leverage cost optimizations.
Real production work touches logs, metrics, traces, infra events, deployment history, cost data, tickets, and code. Different teams own each layer.
Resolve AI uses specialized agents that work in parallel, coordinated by a central planner. Metrics agents explore dashboards; log agents iterate on queries; infra agents inspect topology and capacity; code agents read diffs and blame; knowledge agents pull in runbooks and learnings from past incidents and human interactions.
The planner decides what to do next based on evidence, not hard-coded playbooks. For incident resolution, that means faster root cause identification, fewer cascading war rooms, and people on the bridge. In change and cost workflows, it means automatically checking blast radius, capacity headroom, and historical behavior before you ship or turn a knob.
Production knowledge mostly lives in people and Slack, not in perfect runbooks. Resolve AI learns the way a new engineer does, understanding which dashboards get used, how queries are refined, and incorporating feedback when humans confirm or correct its findings.
It quickly refines its approach to stop making the same mistakes. It applies patterns that actually work in your environment, whether the task is debugging a regression or suggesting a safer rollout plan.
From building and deploying Resolve AI with customers, a few lessons emerged.
It is not just a model problem. You need architecture that encodes how production actually works, including dependencies, tools, environments, and organizational boundaries.
Tool use is hard. Every logs, metrics, tracing, ticketing, and change system has its own query language and quirks. Agents must learn to operate these tools like experts, not just call generic APIs.
Context windows do not save you. Production data is effectively infinite. Intelligence comes from starting narrow and pulling the right data at the right time, whether you are debugging a spike or evaluating rollout risk.
Evals are hard as each customer environment is different. There is no standard dataset for AI in production. Every app and tool stack is unique, so you have to evaluate an AI for prod system on your own messy incidents and workflows before you can trust it.
Security is non-negotiable. The system needs to sit next to your telemetry and tools, respect least privilege, and avoid exfiltrating sensitive data.
Our north star is simple: Resolve AI should feel like an always-on, senior production engineer that knows your environment as well as your best people combined, and never gets tired.
We plug into your existing stack, build and maintain a rich model of your system, orchestrate specialized agents across logs, metrics, traces, infra, cost, code, and knowledge, and learn from every interaction.
We make investigations and reviews faster, more rigorous, and less painful. As we quickly learn your environment, the system becomes more effective, approaching the level of your most seasoned engineers across the production lifecycle.
The highlight of re:Invent was hearing customers describe what Resolve AI is doing across reliability, operations, and day-to-day engineering workflows.
Zscaler runs a global zero-trust security cloud that stops billions of threats per day across hundreds of thousands of systems. Each month, Chris, a lead SRE, deals with about 150,000 alerts and around 120 incidents that often pull 30 or more engineers onto a bridge for a single incident.
His bar for AI is simple: “I do not need more numbers or more data. What I need is a root cause.”
Chris described why they chose Resolve AI. It navigates both standard observability stacks and bespoke internal tools, runs iterative log investigations, and works in parallel while humans focus on decisions. It also helps them see where human time is being spent in operations and where automation can safely take over.
Zscaler's immediate goals: cut time to resolve mid-severity incidents from about an hour to roughly 15 minutes, reduce the number of people paged by at least 30 percent, and free up engineering time for higher-leverage work. Early results were strong enough that Zscaler is now using Resolve AI in production for autonomous alert investigations and on-demand debugging. Watch the talk.
Coinbase runs a highly regulated platform serving more than 120 million users and handling tens of thousands of crypto assets across thousands of services.
To validate AI for prod, the Coinbase team started with the hardest measurable workflow, real incidents. They trained Resolve AI on past incidents using Kubernetes and Datadog, added limited operational knowledge and team practices, and compared its output with what actually happened and with other third-party solutions.
Resolve AI was the only solution that exceeded its strict baseline accuracy requirement for identifying the true root cause. When confidence scores were low, the blocker was almost always missing data, not flawed reasoning and execution.
Angelo shared two examples: a poorly tuned load test that Resolve AI traced through the dependency graph in minutes, and an out-of-memory error in a sidecar that Resolve AI found and linked to cascading failures in about 12 minutes - 73% faster time to root cause than their traditional methods and without an army of engineers on the bridge. The same capabilities are now being extended to give developers production context directly in chat when planning changes or diagnosing regressions, with a longer-term goal of applying the system across more of the production lifecycle. Watch the talk.
Four patterns stood out.
The category is real, and demand is ahead of the supply of mature products.
Spiros, our Founder and CEO, shared a view we strongly believe in: the productivity gains from AI in production will exceed those from AI in coding. Agents will work around the clock, taking on longer, more complex tasks, while humans operate at a higher level of abstraction, with better context and greater leverage.
If the energy at re:Invent is any indication, the market is ready. Teams are tired of living in incident channels and treating production work as a tax on innovation. They want to ship faster with confidence.
AI for prod is how we get there.
You can try Resolve AI today with a free preview of our self-service product, including a ready-to-use sandbox and the ability to connect your own tools, so you can see how much faster and easier your production workflows can be.
Special thanks to Angelo from Coinbase and Chris from Zscaler for sharing their journeys on stage, and to everyone who stopped by our booth. The demand at re:Invent fuels us as we keep building.

Discover why most AI approaches like LLMs or individual AI agents fail in complex production environments and how multi-agent systems enable truly AI-native engineering. Learn the architectural patterns from our Stanford presentation that help engineering teams shift from AI-assisted to AI-native workflows.

Software engineering has embraced code generation, but the real bottleneck is production. Downtime, degradations, and war rooms drain velocity and cost millions. This blog explains why an AI SRE is the critical next step, how it flips the script on reliability, and why it must be part of your AI strategy now.

Vibe debugging is the process of using AI agents to investigate any software issue, from understanding code to troubleshooting the daily incidents that disrupt your flow. In a natural language conversation, the agent translates your intent (whether a vague question or a specific hypothesis) into the necessary tool calls, analyzes the resulting data, and delivers a synthesized answer.