A visual metaphor for the rise of voice AI — where technology is learning to listen, respond, and sound more human than ever before. Image Source: ChatGPT

Venture in the Age of AI

By Alastair Goldfisher
Veteran journalist and creator of The Venture Lens newsletter and The Venture Variety Show podcast. Alastair covers the intersection of AI, startups, and storytelling with over 30 years of experience reporting on venture capital and emerging technologies.

AI Finds Its Voice

Key Takeaways: Voice AI’s Race Toward Human-Level Interaction

  • How investors view voice AI — from those betting on rapid adoption to those urging a longer view

  • How startups like Maven AGI, Whippy and Aircall are redefining customer conversations

  • What the coming “Voice Turing Test” could mean for business and society

Voice AI Demand Surges as Businesses Turn to Real-Time Automation

On the Fourth of July, fireworks companies faced a predictable problem: phones ringing nonstop with customers asking where to buy fireworks, how to use the products safely and whether certain items were in stock.

For most of the year, these businesses don’t need 24/7 phone support. But on this one weekend, call volume spikes so high that no amount of temporary staffing or outsourced answering services can keep pace with the volume. And once the holiday ends, the phones go silent again.

Instead of scrambling for short-term call centers, many of these companies this past past July turned to Aircall. Within minutes, they deployed AI voice agents that routed calls, provided updates and reassured customers trying to find locations or check on orders.

For Tom Chen, Aircall’s chief product officer, it was proof that real-time voice is making a comeback.

“Historically, voice was the most expensive channel to support. With AI, that changes,” Chen told me. “Companies that never even offered phone support are carving out efficiencies to bring voice back, and customers love it.”

Across industries, the shift is accelerating. Businesses are under economic pressure to adopt new efficiencies. And legacy phone systems, which are long blamed for robotic and frustrating interactions, are being replaced by conversational AI.

Also, customers no longer tolerate rigid scripts or hold music. They expect humanlike conversations that solve problems immediately. And in such sectors as healthcare, retail and food service, where speed and personalization matter most, voice-driven systems are becoming the preferred interface.

Investors Weigh Timing, Not Just Hype

Investors are pouring record amounts of capital into voice AI. Total equity funding for voice AI startups reached $2.1 billion in 2024, nearly seven times higher than the $315 million raised in 2022, according to CB Insights. That pace has likely kept up through mid-2025, as multiple sources report funding in AI voice tech has already outpaced 2024’s record.

Notable rounds like ElevenLabs’ $180 million Series C in January and, more recently, Assort Health’s $76 million Series B in late September illustrate how quickly the sector is maturing.

Also, in an indicator of early-stage founder activity, 22% of the startups in a recent Y Combinator class included companies building with voice, according to Cartesia.

The surge explains why Astasia Myers of Felicis Ventures is so emphatic about voice AI. She blogged about Felicis’ investment in Assort Health, a healthcare startup using agentic AI to transform patient communication. The company reports it has cut call wait times by 89% and reached 98% resolution rates across millions of patient interactions.

For Myers, this isn’t just about one company. It’s a signal that customers are demanding AI voice today. As she told the Startup Grind AI Summit:

“With the immeasurable improvements in AI voice models, you can not only automate the task but often the NPS of the experience is better.”

Tony Wang of 500 Global echoed the excitement at the same Startup Grind event. But he urged a broader lens. He said the winners in text and image models are largely set, while voice remains an open frontier: early, imperfect and full of room for founders to explore.

“The experience is still relatively early and still relatively broken,” Wang said. “But the surface area is huge. Voice agents are a really good interim approach, and as a founder, you want to start thinking about how to go from that to something durable.”

The two investors agree on the same trajectory, and the opportunity is real. But maturity will take time. The difference is tempo: Myers is betting that the breakout moment is now, while Wang suggests that the payoff will come as the technology evolves.

The Approach of Maven AGI and Whippy

If voice AI is still evolving, Maven AGI Co-Founder and CEO Jonathan Corbin disagrees. He said his company’s AI agents have already saved customers more than $200 million annually, cut support costs by 50%, and achieved satisfaction scores above 90%.

“What we’re really building is one brain that powers the entire customer journey,” Corbin said.

For clients like TripAdvisor, that means resolving 93% of customer inquiries through AI voice, a scale that he noted few human call centers could ever achieve. Corbin framed it less as replacing agents and more as amplifying them. “When you give someone the right context, they can do the work of five people. The AI makes that possible.”

Despite high expectations, Corbin knows adoption isn’t automatic. Most vendors are still early in deploying commercially viable products, and scaling enterprise-grade performance remains challenging. His view is that those who combine data context, voice nuance and automation speed will separate themselves from the noise.

While Maven tackles enterprise CX, Whippy focuses on small and mid-sized businesses. Co-founder and CTO Jack Kennedy calls the company’s product “UI-less software,” because once installed, it automates phone-based tasks that companies struggle to staff.

“Whippy just operates in the background almost as a real employee,” Kennedy said.

For pharmacies and staffing agencies, that means screening calls, scheduling and handling repetitive customer queries. In industries like recruiting — where phone-based roles can have 125% annual turnover — Kennedy argues AI is doing the work people don’t want to do.

Still, he doesn’t oversell. “AI is very good at some things, very bad at others,” he said. Whippy can handle high-volume, low-stakes conversations, but nuanced customer issues still require humans. It’s a reminder that while adoption is growing, the technology’s limitations remain part of the story.

Aircall: Technical Hurdles and the Road to Voice-to-Voice

Back at Aircall, Chen sees the challenges and the long-term potential of voice AI.

Voice agents are harder to deploy than text-based AI because every step — from speech-to-text to generating answers, then converting them back into natural-sounding voice—introduces friction and delay. A few milliseconds of latency can destroy the illusion of a natural conversation. Add in poor telecom connections or noisy environments, and the reliability gap widens.

Still, Chen believes voice AI is nearing a turning point.

He describes an emerging shift toward voice-to-voice models that interpret tone and emotion directly rather than transcribing and re-synthesizing. “You can feel when someone’s frustrated or rushed,” he told me during a recent podcast recording. “Once AI can sense that in real time, we’ll have something closer to real conversation.”

That evolution, he added, will make AI voice more than a customer-service tool. It could become a universal interface for how people interact with software, letting anyone “talk” to systems as naturally as they do with each other.

“If you don’t have always-on customer communication in the next five years,” Chen said, “you’ll be at a disadvantage.”

Jobs and the Turing Question

But voice AI isn’t just an efficiency play. It’s reshaping jobs and raising new questions about its impact on society.

Corbin at Maven AGI describes AI as a force multiplier, giving agents the context to “do the work of five people.” Kennedy at Whippy sees it filling high-turnover roles that humans don’t want, while Aircall’s Chen focuses on transparency, and he warns that customers may not always realize they’re speaking to AI.

Going forward, companies may need new safeguards to keep customer trust intact.

That tension — between speed and sincerity — sets up the next AI inflection point. Industry researchers now talk openly about a coming “Voice Turing Test”, the moment when AI speech becomes indistinguishable from human. Some analysts expect parity within two years, driven by advances in prosody, breathing simulation, and emotional tone modeling.

When that happens, the conversation won’t be about whether customers notice—but whether they should be told.

Companies in healthcare, finance, and education are already testing new disclosure language, such as declaring: “This call may be monitored or conducted by AI.”

For others, especially in entertainment and marketing, human-level voice synthesis may become a selling point rather than a concern.

The line between empathy and mimicry is narrowing. The same technology that makes support more personal could also blur the boundary between genuine emotion and algorithmic tone.

It’s a debate that few in tech want to lead, but it’s coming fast.

Where Voice AI Goes Next

The fireworks story shows voice AI already delivering practical value. Investors like Astasia Myers and Tony Wang see the same momentum, but they just differ on when the technology truly hits its stride.

The operators — from Jonathan Corbin’s enterprise-scale “brain” at Maven AGI to Jack Kennedy’s SMB automation at Whippy to Tom Chen’s pursuit of emotional voice models at Aircall — show a technology that is promising and imperfect.

Whether voice AI becomes the command layer for business interactions or remains an interim bridge to something else, one thing is certain: after years of chatbots, voice is back in the conversation.

Q&A Section: Understanding the Voice AI Shift

Q1: Why are companies adopting voice AI now?
A: Rising customer expectations and labor constraints are pushing businesses to adopt AI voice agents that deliver faster, more personalized support while reducing staffing costs.

Q2: Which industries benefit most from voice AI?
A: Healthcare, retail, food service, recruiting, and SMB operations where high call volume and fast resolution are critical.

Q3: What role are investors playing?
A: Firms like Felicis Ventures, 500 Global, and Y Combinator are accelerating the market, funding startups such as Assort Health, ElevenLabs, and Cartesia.

Q4: What are the biggest technical hurdles?
A: Reducing latency, improving speech-to-speech emotion handling, and delivering natural, reliable real-time conversations across telecom environments.

Q5: Will AI replace human agents?
A: Not fully. Leaders like Maven AGI, Whippy, and Aircall emphasize AI as a force multiplier, handling routine calls while humans focus on nuanced issues.

🎙️ Stay informed by subscribing to The Venture Lens for the latest insights and updates from Alastair.

Editor’s Note: This article was written by Alastair Goldfisher and originally appeared in The Venture Lens. Republished here with permission.

Keep Reading

No posts found