The future of fast food was supposed to arrive in a Chicago McDonald's parking lot. In 2021, the world's largest burger chain partnered with IBM to test voice AI at the drive-thru speaker—a technology that promised to take orders faster, more accurately, and without the labor shortage headaches plaguing the industry.
Three years later, in July 2024, McDonald's pulled the plug.
The problem wasn't that the technology didn't work. It did—about 85% of the time. The problem was that 85% isn't good enough when you're processing millions of transactions per day, each one representing a customer who expects their order to be right, and a franchisee who needs to protect razor-thin margins in an industry where order accuracy directly impacts profitability.
The Great Voice AI Experiment: Who's In and Who's Out
The quick-service restaurant industry is in the middle of a high-stakes race to automate the drive-thru, the channel that generates an estimated 70% of revenue for major chains. But the scoreboard reveals a technology still finding its footing.
McDonald's announced in June 2024 that it would end its Automated Order Taker (AOT) partnership with IBM, shutting down the technology at more than 100 test locations no later than July 26, 2024. The three-year pilot had struggled with what the industry politely calls "order accuracy incidents"—viral TikTok videos showed the system adding hundreds of chicken nuggets to orders, mishearing "ice cream" as "bacon," and failing to process cancellation requests.
But while McDonald's was exiting, others were doubling down.
Wendy's announced in February 2025 that it would deploy its FreshAI system—built in partnership with Google Cloud—to between 500 and 600 locations by year's end. The company had started with a single Columbus, Ohio restaurant in 2023, expanded to 36 company-operated stores across Ohio and Florida in 2024, and is now scaling nationwide. Wendy's has claimed a "success rate of nearly 99%"—though that metric counts any order started by the AI and submitted to the point-of-sale system, even if a human had to intervene mid-conversation to fix errors.
Taco Bell parent company Yum Brands announced in July 2024 that it would bring voice AI to hundreds of U.S. drive-thrus by the end of the year, expanding from a pilot that started with five California locations. As of early 2024, more than 100 Taco Bell restaurants were using the technology. The company has since announced a broader partnership with Nvidia to accelerate its AI deployments across all Yum brands, including KFC and Pizza Hut.
Meanwhile, SoundHound AI—one of the industry's most aggressive voice AI providers—has quietly built what may be the largest footprint in the sector. The company reported powering over 14,000 restaurant locations by Q2 2025, processing more than 100 million customer interactions by October 2024. Its client roster includes White Castle, Chipotle, Jersey Mike's, Church's Texas Chicken, Applebee's, and numerous regional chains.
The Accuracy Problem: Why Every Percentage Point Matters
The industry's dirty secret is that even human order-takers aren't perfect. Traditional drive-thrus achieve order accuracy rates of 80-85% during peak hours, according to multiple industry sources. Some studies peg human accuracy as high as 89-92% under optimal conditions.
Voice AI systems have reached comparable—sometimes superior—levels in controlled deployments. SoundHound claims its AI can complete more than 90% of orders without requiring human intervention. Presto Automation reported in December 2024 that its "non-interaction rate" (NIR)—the percentage of orders taken entirely by AI—averaged 85% across all Presto Voice-enabled restaurants, with certain locations hitting 95%.
But raw accuracy numbers obscure the real challenge: AI and humans fail differently.
A 2025 study by customer experience firm Intouch Insight found that traditional drive-thrus achieved 89% order accuracy, while AI-powered systems dropped to 83%. The critical finding: 65% of AI errors involved customizations—the "no pickles," "extra sauce," "make it a meal" modifications that represent the most profitable upsell opportunities and the most frustrating customer experience failures.
Humans might mishear an order during a lunch rush, but they excel at the conversational inference that makes drive-thru ordering work. When a customer says "I'll take a large chocolate milkshake," a human knows they probably mean a large chocolate Frosty at Wendy's, or a McCafé shake at McDonald's. AI systems require extensive training on brand-specific menu terminology—and they still struggle.
The IBM system that McDonald's abandoned reportedly achieved 85% accuracy in its Chicago test market. But McDonald's serves roughly 70 million customers per day globally. An 85% accuracy rate would mean 10.5 million incorrect orders daily—an unacceptable customer service and cost burden, even accounting for the fact that not all of those customers use drive-thrus.
The Triple Challenge: Accents, Dialects, and Ambient Noise
Drive-thrus are among the most acoustically hostile environments imaginable for speech recognition technology. Forbes reported in February 2026 that McDonald's IBM partnership ended largely because "the system struggled with interpreting different accents, dialects and background noise."
The challenges are layered and specific:
Background noise fluctuates wildly. A drive-thru microphone must filter out idling engines, honking horns, construction noise from nearby roads, weather conditions, and multiple voices inside the vehicle—all while isolating the speaker's voice clearly enough to distinguish "Coke" from "Diet Coke."
Accent and dialect variation creates a moving target. The same menu item can be pronounced dozens of ways across regional accents. "Large" sounds different in Boston than Birmingham. "Fountain drink" might be called "soda," "pop," or "Coke" (as a generic term for all soft drinks) depending on geography. SoundHound acknowledged in 2022 that "the complexities of language differences make creating accent-agnostic speech recognition systems nearly as challenging as offering distinct languages."
Real-time conversational complexity means AI must process interruptions, mid-order changes, simultaneous speakers (a parent ordering for kids in the back seat), slang, incomplete sentences, and the rapid-fire customization requests that define modern fast food ordering.
Technology providers have made significant progress on these challenges. Presto Automation highlighted in June 2025 that its system "passed multiple background noise challenges during the order-taking process," designed specifically for restaurants located on busy roads. Amazon's Nova Sonic model, announced in late 2025, promises "accurate recognition of streaming speech across accents with robustness to background noise."
But the gap remains measurable—and costly.
The Hybrid Model: AI's Actual Future in the Drive-Thru
No major chain is pursuing fully autonomous voice AI at scale. Instead, the industry is converging on a hybrid model: AI handles routine transactions and escalates complex orders to human staff.
This isn't an admission of failure—it's practical economics. The average drive-thru order is simple: a combo meal, maybe a drink, minimal customization. For those transactions, AI is fast, consistent, never calls in sick, and can upsell ("Would you like to add a dessert?") without the social awkwardness that makes human workers hesitate.
Presto Automation's hybrid approach explicitly combines AI with human backup. The company's "Presto Voice" system is designed to hand off seamlessly to a human operator when the AI detects uncertainty—a customer repeating themselves, requesting an unusual modification, or expressing frustration.
Wendy's FreshAI pairs the voice system with a digital menu board showing a visual confirmation of the order, allowing customers to catch errors in real-time and reducing the burden on the AI to achieve perfect transcription.
The economics work when AI handles 70-80% of orders completely, reducing the number of human order-takers needed per shift while keeping experienced staff available for the 20-30% of transactions that require human judgment. This isn't full automation—it's labor reallocation.
The ROI Question: When Does 85% Beat 92%?
Despite lower accuracy rates, voice AI deployments are accelerating. The reason is simple: cost.
Labor represents one of the largest variable expenses in QSR operations, typically 25-30% of revenue. An AI system that can handle even 70% of drive-thru orders without human intervention generates immediate labor savings, particularly during overnight and off-peak hours when staffing is most challenging.
The technology also addresses the industry's chronic staffing shortage. U.S. restaurant operators have struggled to fill positions since the pandemic, with turnover rates in QSR often exceeding 100% annually. An AI system that never quits, never needs training, and works 24/7 has inherent value even if it occasionally adds unwanted McNuggets to an order.
Revenue impact extends beyond labor savings. AI systems consistently outperform humans at upselling. They don't forget to suggest adding fries, they're never too busy to offer a dessert, and they don't experience the social discomfort that makes human workers hesitate to push additional items. Industry estimates suggest AI-driven upselling can increase average check size by 10-15%.
But the flip side is measurable too: incorrect orders drive customer dissatisfaction, generate waste, and require costly remakes. A single viral video of an AI system failing spectacularly can damage brand reputation in ways that are hard to quantify but impossible to ignore.
The Timeline: When Will Voice AI Be "Good Enough"?
Industry insiders suggest we're two to four years away from voice AI systems that can match or exceed human performance across all common drive-thru scenarios—not just average accuracy, but reliable handling of customizations, accents, and noisy environments.
The path forward depends on three converging trends:
Larger training datasets from real-world deployments. Every order processed by a Wendy's FreshAI system or a White Castle SoundHound installation generates training data that improves the underlying models. The current generation of systems has processed hundreds of millions of real transactions—an advantage earlier pilots lacked.
Improved natural language models from the broader AI revolution. The same large language model advances powering ChatGPT and Claude are being adapted for real-time voice applications. Amazon's Nova Sonic, Google Cloud's contributions to Wendy's FreshAI, and Nvidia's partnership with Yum Brands all leverage cutting-edge speech-to-speech models that didn't exist three years ago.
Better hardware and infrastructure at the edge. Microphone arrays, noise-canceling technology, and on-device processing are improving rapidly. The physical equipment capturing customer voices in 2025 is orders of magnitude better than what McDonald's deployed in 2021.
By 2027-2028, industry observers predict that voice AI will reliably handle 90%+ of drive-thru orders without human intervention, achieving or exceeding human-level accuracy on customizations and performing well across diverse accents and noise conditions.
But "good enough" is a moving target. As AI improves, customer expectations rise. The viral failures of early systems have created awareness that many customers now explicitly ask whether they're speaking to a human or a machine—and some refuse to engage with AI systems at all.
What McDonald's Walking Away Really Means
McDonald's decision to end its IBM partnership wasn't a rejection of voice AI—it was a rejection of that particular implementation. The company explicitly stated it remains committed to exploring voice ordering technology with alternative vendors.
The failure taught the industry several critical lessons:
First, brand risk matters more than cost savings. McDonald's couldn't afford the reputational damage from viral videos of malfunctioning AI, even if the technology saved money on average.
Second, pilot success doesn't guarantee scalability. The IBM system worked adequately in controlled tests but struggled when deployed across diverse locations, customer demographics, and operating conditions.
Third, the technology provider ecosystem matters. IBM, despite its AI credentials, wasn't a natural fit for the real-time, customer-facing demands of drive-thru ordering. The vendors seeing success—SoundHound, Google Cloud, specialized startups like Presto—have deep expertise in voice interfaces and restaurant operations specifically.
The 85% Threshold
Voice AI in the drive-thru has reached a critical inflection point. The technology works well enough to be useful, but not well enough to be invisible. It's accurate enough to save money in many deployments, but not reliable enough to replace human workers entirely.
That 85% accuracy rate—whether it's Presto's non-interaction rate, McDonald's IBM performance, or the industry average—represents both remarkable progress and a stubborn plateau. The final 10-15 percentage points, the gap between "mostly right" and "reliably excellent," is where the real challenge lies.
The chains betting billions on this technology aren't waiting for perfection. They're deploying hybrid systems that combine AI efficiency with human judgment, banking on incremental improvements while managing customer expectations.
For now, that means your next drive-thru order might be taken by AI—but there's still a human listening in, ready to step in when the robot can't quite understand that you want "no onions, extra pickles, and make it a large."
Because in an industry built on speed, consistency, and customer satisfaction, 85% accuracy isn't a failure. But it's not good enough yet, either.
Marcus Chen
Former multi-unit franchise operations director with 15+ years managing QSR technology rollouts. Specializes in operational efficiency, kitchen systems, and workforce management technology.
More from Marcus