The last mile in AI takes a lot of effort to get it right

"With AI you get what seems like an amazing outcome very quickly for the space that you're working in, but then when you actually are going to a production-grade version of it, you realize that there's a lot of cracks."

Key takeaways

Start small with AI adoption. Don't expect magic on day one. Build up your skills and understanding gradually.
Pair programming over replacement. AI works best when it enables human exploration. Don’t expect it to replace human judgement.
Evaluation frameworks are critical. You can't improve what you can't measure. Good evals prevent you from fixing one problem while creating others.
AI excels at pattern detection. For example: Large-scale code migrations are a sweet spot where AI's pattern recognition really shines.
Tribal knowledge lives outside documentation. Maybe in slack. Instead of forcing documentation, mine the conversations people are already having.
Navigating brownfield with AI is hard. Existing, complex codebases remain a major challenge for AI tools.

About

I currently work as a Sr.Director of Engineering at Yelp, but my journey here is pretty unique. I spent about a decade building developer productivity tools, and then recently moved over to the consumer product side. This means I got to be both the person creating tools to make engineers more productive AND the person actually using those tools day-to-day.

My background spans software engineering, system engineering, and I've been a big fan of Linux and systems in general since I was a teenager. This gives me a really practical, hands-on perspective on what actually works versus what sounds good in theory.

How would you define developer productivity?

Most people think developer productivity is just DevOps, but I see it differently. My team focused more on application infrastructure - things like building web frameworks and deployment platforms that help people push code to production fast and seamlessly.

The whole idea was to centralize common functions so you could do more with fewer people. Instead of needing 500 engineers to support everything, maybe you could get away with 300 if you had some dedicated engineers building really good shared platforms for everyone else to use.

When should a company build a developer productivity team?

This is a question I get a lot, and my answer is pretty straightforward: don't do it too early. If you're still trying to find product-market fit, you definitely shouldn't be thinking about developer productivity teams yet.

I suggest starting to consider around 200 engineers, and even then, start small - maybe 6 engineers pulled from existing teams. The key question to ask yourself is whether you're slowing down because you're not scaling effectively when launching your product.

"If you're a company that's pre-product-market fit, you shouldn't even be thinking about a developer productivity team."

What has been your experience with AI adoption?

I've seen AI adoption from two different angles - both in developer tools (which is like SaaS adopting AI for the product) and in consumer products. In both cases, I noticed the same pattern: you get amazing results really quickly, but then when you try to make it production-ready, you realize there are a lot of cracks.

The key insight I've learned is that the last mile takes a really long time to get right. It's not enough to have a cool demo - getting something that actually works reliably in production is much harder.

"You get what seems like an amazing outcome very quickly for the space that you're working in, but then when you actually are going to a production-grade version of it, you realize that there's a lot of cracks."

What's the biggest challenge in moving from demo to production?

According to my experience, the main issue is evaluation frameworks. I call them "evals" - basically comprehensive test sets that benchmark how well your AI is performing across different scenarios.

Without good evals, you can improve one thing and accidentally make something else worse without even realizing it. So teams end up going in circles, fixing one problem but creating new ones.

"If you don't have a good framework for iterating with your evals, then you could be looping on things. You could improve one thing and then regress on something else that you may not realize."

How do cost considerations affect AI adoption?

This is where my experience with consumer products really helps. When you're thinking about millions of users hitting your AI system, cost becomes a huge factor. You might not be able to use the most expensive, best-performing model because it would be too costly at scale.

This creates a tricky optimization problem. As you try to reduce costs by using cheaper models, you might hurt performance in ways you don't expect. That's why having those evaluation frameworks becomes even more important.

"If I'm thinking about cost, I'm thinking about millions of users hitting that LLM, so maybe I can't use the most expensive model out there."

Where has AI worked really well for your teams?

One of the coolest applications I've seen is using AI for large-scale code migrations. My team calculated it would take four years of developer time to migrate from Flow to TypeScript. That's not four years sequentially - that's adding up all the time from all the teams.

AI turns out to be really good at pattern detection, which is exactly what you need for these big migrations. It can handle the fuzziness and variations that traditional automated tools miss, while still having validation loops to make sure the output is correct.

"We added up four years of developer time that it was gonna take to go from Flow to TypeScript. So then you start asking the question of, can I speed up this process? AI is a really good pattern detector."

How can AI help engineering teams capture tribal knowledge?

This is one of my most interesting discoveries. Every company has experienced people with lots of tribal knowledge, and the usual way to capture that is through documentation. But documentation is a lot of work, and honestly, most people don't want to do it.

I realized that Slack messages end up being a kind of living documentation. People ask the same questions over and over in Slack. So we started ingesting all our Slack messages into a data store and using that for AI-powered Q&A bots.

"The more experienced people that we have have this tribal knowledge and the way that we try to get that tribal knowledge out of those people is through documentation. But we know that documentation is a pretty heavy lift for most people."

How should we think about AI helping us during production incidents?

I see a lot of potential here, especially for helping new team members. When someone new joins a team, there's usually that first week where they're getting used to the systems and figuring out how on-call works.

AI could help by using pattern matching to suggest what might be wrong and help reduce that on-call burden, especially for infrastructure teams.

"That's a great thing about using AI in this space is that it can start to suggest what might be wrong and actually help cure some of that on-call burden for those infrastructure teams."

I see three phases: first was "more information, more data" - basically throwing everything at the problem. Then came "more information in less time" - getting the same amount of info but faster.

The future I envision is "the right information at the right time" - not a flood of data, but the specific, relevant information you actually need. AI isn't quite there yet for giving perfect answers, so engineers need to be able to explore alongside the AI.

"Instead of a proliferation of information, it's getting down to a concise answer. In some cases AI isn't quite there. So if it gives the wrong answer, we need an opportunity to be able to explore beyond that point."

I'm pretty clear on this: AI should be a co-pilot, not a replacement. At least for now, engineers and SREs need to be able to explore alongside the AI to understand what's going on, rather than just trusting AI to give them the right answer.

"We probably need some sort of form for the engineer or the SRE to be able to explore alongside the AI to understand what's going on and use it as more of a co-pilot, rather than looking at AI as the place where you're gonna get your question answered."

What’s the biggest challenge with AI systems today?

I've been experimenting with AI coding tools and found that for simple tasks, they're fantastic. But for harder, more complex tasks, I often get stuck in loops where the AI keeps coming back with wrong answers.

I think this comes down to issues with planning (AI struggling with complex multi-step processes), context (not surfacing the right information at the right time), and general model capabilities.

"For the simple tasks, it's fantastic because I can just send off a simple task to it, do what it needs to do, even if it gets it wrong. But there were some tasks where they were a little bit harder, where I just kept on looping and it just kept on coming back with the wrong answer."

“Brownfield” is one of the biggest challenges AI faces in real companies. Most AI examples you see are for greenfield projects - starting from scratch. But the tough reality is dealing with brownfield projects - existing codebases with millions of lines of code.

In these situations, people have done things in ad hoc ways over the years. There are unexpected patterns, unobservable behaviors, and even if you could load the whole codebase into an AI system, you still need to know where to look.

"The toughest space is the Brownfield space where you have so much information, especially when you have multiple millions of lines of code. People do things in ad hoc ways, unexpected ways, there's unobservable things that are happening in your code."

How far should we lean into AI?

I frame what I see as the fundamental question facing the industry: how far should we lean into AI? One extreme is AI does everything and humans become obsolete. The other extreme is AI as a co-pilot that helps us do our jobs better, more efficiently, and maybe more enjoyably by taking away the boring stuff.

What’s your most memorable on-call experience?

I will share a story that probably resonates with a lot of engineers. I was the fourth escalation level for a consumer product incident, meaning several people had already failed to respond before it got to me. I woke up to handle the incident while others were still sleeping.

The stressful part wasn't just the technical challenge - it was having to quickly build context on systems I wasn't familiar with while also managing communication across the company during an outage. It's the kind of experience that makes you appreciate why we need better tools for incident response.

"You know, it always ends up being a stressful experience... probably want to do less of that going forward."