Don’t expect AI to do magic on large, abstract tasks
"Don't expect these things to do magic on a very abstract task out of the box. Think of them as senior engineers, but I think today they are more like interns. So think of allocating tasks in a very defined form."
Key takeaways
- Business value must drive AI adoption. Not the other way around. Successful AI implementation requires clear business metrics and ROI rather than technology enthusiasm alone.
- Protect your customer trust above all. Never risk customer relationships for AI experimentation. Internal testing provides safe learning environments.
- When you think of AI, think "intern" and not "senior engineer". Current AI works best with well-defined, specific tasks rather than abstract, high-level requests.
- Don’t choose between quality or speed. Companies serving millions of users must achieve both reliability and velocity, not choose between them.
- Layer your quality controls. Multiple validation layers (human oversight, business logic, automated testing) are essential for production AI.
- Historical experience with AI provides patience. Having witnessed previous AI evolution cycles gives realistic expectations about current capabilities and future potential.
About
I’m Mandar and I’m the Director of Machine Learning group at DoorDash. I have a unique perspective into AI as I’ve had a 20-year journey through machine learning. I've been fortunate enough to witness and participate in four major AI inflection points: tree-based models, deep neural networks, attention mechanisms, and now foundational models.
This historical perspective gives me a really different view of the current AI hype cycle. I've seen multiple waves of AI breakthroughs, so I have a sense of what's genuinely transformative versus what's just incremental progress.
How did AI evolve as a field from the time you started working on it?
I started my career working on speech recognition, where even basic word recognition was incredibly challenging. I remember being at Microsoft Research and thinking that continuous speech recognition was still just a sci-fi dream.
Fast forward to today, and we can generate videos from text descriptions. It's such a dramatic leap that I still find it hard to believe. Going from struggling with basic speech recognition to generating entire videos feels almost magical.
"Even doing word recognition was incredibly challenging. I remember being at Microsoft Research, working on some aspects of ASR, and then this whole dream that you could just have continuous speech being recognized was still a dream, was a sci-fi movie realization. But given where we are today, we can actually generate videos from texts. It's just absolutely mind blowing."
What are your experiences with adopting AI for your engineering teams?
We take a very business-centric approach to AI adoption at DoorDash. We don't jump on new technologies just because they're cool or trendy. Instead, we ask: does this help us achieve a step function improvement in our business, or does it accelerate some portion of our efficiency metrics?
Our criteria are pretty specific: the AI has to either help with cost metrics (reducing expenses), ML velocity (developing machine learning solutions faster), or general efficiency gains. If it doesn't clearly help with one of these areas, we're not interested.
"The way our teams have approached using some of these new technologies has been very business centric. We want to jump onto this tooling or these ideas if they help us achieve a step function in our business or help accelerate some portions of our efficiency metrics."
What guardrails have you implemented in AI adoption to improve trust?
We're really careful about exposing AI directly to customers, especially given the potential for hallucinations and other issues. Our strategy is to start with internal-facing use cases first.
The logic is simple: if there are problems with hallucinations or if the AI doesn't meet service level agreements, it's better for internal users to be affected rather than paying customers. This protects customer trust while still allowing us to experiment and learn.
"We have been very careful in approaching some of these ideas, especially when it comes to exposing them directly to our customers. The first approach was – let's look at the use cases where we expose these LLMs to technology that is internal facing."
Customer trust is absolutely paramount for us at DoorDash. We've been very deliberate about not doing anything that could damage the trust that advertisers, merchants, or consumers have placed in our platform.
This conservative approach might slow down AI rollouts, but it reflects the maturity of an organization that understands the long-term value of customer relationships. Losing trust for the sake of a cool AI feature just isn't worth it.
"We have been very careful to make sure that we don't hamper the trust that our advertisers or our merchants have placed on DoorDash or our consumers have placed on DoorDash."
We implement multiple layers of quality control for our AI systems. The first layer is human-in-the-loop oversight: having humans review what the AI generates before it goes live.
The second layer is business logic that acts as an intermediary between AI output and user interaction. This ensures that business rules are respected and prevents hallucinations from affecting user experiences or exposing us to adversarial attacks.
"Having human in the loop, as a first phase of deployment. Having a human looking at the output that the AI has generated is a very important step. We always ensure that the business logic is respected and doesn't lead to hallucinations affecting user experiences."
What are some areas where you are using AI for software engineering?
One interesting application is using ML models to evaluate the relevance of items on DoorDash. For example, if someone searches for "cookies," the AI helps determine whether the items shown are actually relevant to that query.
This application demonstrates how AI can improve both quality and cost efficiency in our core business functions. It's not flashy, but it directly impacts the user experience and business metrics.
"We use ML models to evaluate the relevance of items on DoorDash. For example, if the query is 'cookies' and we show you a bunch of items, are these items relevant to this query or not?"
We've seen really positive adoption of coding agents among our engineering community. We started with GitHub Copilot and have expanded to tools like Cursor and WindSurf.
The response from engineers has been strong and positive, which suggests these tools are actually providing value rather than just being novelties that people try once and forget about.
"We started with Copilot from GitHub, and now we are looking at some of the tools like Cursor and WindSurf. We are seeing a very strong positive response from the engineering community"
One of the most compelling applications I've seen is using AI for large-scale code migrations. When we wanted to migrate from Flow to TypeScript, we calculated it would take four years of total developer time across all teams.
AI excels at pattern detection, which is exactly what you need for these massive migrations. It can handle the variations and edge cases that traditional automated tools miss, while still having validation mechanisms to ensure correctness.
"Say we want to move to TypeScript... we added up four years of developer time that it was gonna take to go from Flow to TypeScript. AI is a really good pattern detector. So can you start to have some of that fuzziness come in and facilitate some of the code migrations as well?"
We're experimenting with AI applications in production systems, though it's still in the experimental phase. We're seeing promise in terms of improving developer quality of life and helping teams move fast while maintaining reliability. The focus seems to be on making engineers' lives easier during production issues rather than fully automated responses to incidents.
"The third area I would say is areas like production monitoring, but it's still experimental. But it's showing a lot of promise in terms of improving the developer quality of life as well as helping us move fast with some reliability."
What mental models would help engineers while adopting AI capabilities?
I offer a really practical mental model: think of current AI agents like interns rather than senior engineers. You need to give them well-defined, specific tasks rather than high-level abstractions.
This insight came from observing junior developers struggle when they tried to give AI agents very abstract, high-level tasks. When you break those abstractions down into four or five specific tasks, the agents perform much better.
"Don't expect these things to do magic on a very abstract task out of the box. Think of them as senior engineers, but I think today they are more like interns. So think of allocating tasks in a very defined form."
I observed that when junior developers tried to assign very high-level, abstract tasks to AI agents, the agents weren't great at performing those abstractions. But when you break the same work down into smaller, more concrete tasks, each individual task becomes much easier for the agent to handle.
The key insight is that success with AI requires the same planning and task decomposition skills you'd use when managing human team members. You can't just throw a big, vague problem at AI and expect it to figure everything out.
"Some of our juniors started using it. They had a very abstract or very high level task which they assigned to some of these agents, and it turned out the agents were not great at performing these high level abstractions. But when you actually break that abstraction down into four or five tasks, each of those tasks are much easier to perform."
My main advice is to set realistic expectations and think about task decomposition. Don't expect AI to do magic on abstract tasks right out of the box. Instead, tailor your expectations based on how you plan to use these tools.
The approach I recommend is similar to managing human engineers: break down high-level abstractions into specific, well-defined tasks that the AI can actually handle effectively.
"Don't expect these things to do magic on a very abstract task out of the box. Tailor your expectations in terms of how you are going to be leveraging these tools."
I believe that AI agents will become smarter over time and eventually function more like senior engineers. But today, they're more like interns who need clear, specific instructions and careful oversight.
The key is to work with their current capabilities while they improve, rather than expecting them to already be at the level they'll eventually reach.
"Over the course of time, the agents would become smarter. So think of them as senior engineers, but I think today they are more like interns."
How do you balance speed and quality while adopting AI?
Absolutely not. We reject the typical trade-off between speed and quality. Our philosophy is that there's no way we can sacrifice quality or speed - we want both strong reliability and strong quality while moving fast.
This demanding standard reflects what it takes to operate at scale with millions of users. When you're that big, you can't afford to compromise on either dimension.
"There is no way we can sacrifice quality or speed. We want to go fast with strong reliability and strong quality. There is no way we can sacrifice one for the other."
What's your most memorable on-call experience?
15 years ago, early in my career, I was writing a search algorithm in C (back when there was no try-catch error handling), and we deployed a change that caused segmentation faults in our search engine.
It took us a couple of days to track down the memory leak in the C code. The fact that I still remember this incident after 15 years shows how brutal debugging challenges can be, especially in lower-level languages.
"This was my first foray into the industry. I was writing a search algorithm in C. So for those of you who are too young to know what that means, there is no try-catch in C. And so we had an error where we deployed a change and there was a segfault happening in our search engine."