Skip to main content

AI Voice Optimization: Faster Responses, New Models, and Vision

AI Voice Optimization: Faster Responses, New Models, and Vision

Speed and intelligence are everything in voice AI. We've optimized response times, added TTS caching, introduced new voice models, and enabled vision capabilities during live voice conversations — so your AI agents sound better, respond faster, and see what your callers see.

What's New

  • Voice Speed Optimization: Two rounds of voice latency optimization make AI responses noticeably faster. Your AI agents now respond with near-human speed, keeping conversations natural and callers engaged.

  • TTS Caching: Frequently used text-to-speech phrases are now cached, eliminating synthesis delay on repeat playback. Greetings, hold messages, and common responses play back instantly.

  • Vision on Voice Chat: AI agents can now process images shared during voice conversations. A caller can send a photo of a product, document, or issue, and the AI agent sees and responds to it in real time — no switching to a separate chat channel.

  • Call Screening: New call screening functionality lets your AI agent qualify incoming calls before routing them to your team. Screen for intent, urgency, or specific criteria, and only connect the calls that matter.

  • New Voice Models: Added the latest AI voice models with full voice configuration settings. Your AI agents can use the most natural-sounding voices available, with settings that the agent respects throughout the conversation.

  • My Voice Setting: A personal voice selection option in the agent voice dropdown lets you assign specific voices to specific agents. Each agent gets its own identity and personality through its voice.

  • AI Call Summaries: Every AI call now generates an automatic summary available as a workflow variable and displayed on the contact's activity timeline. Your team gets instant context without listening to recordings. Summaries are now generated for both inbound and outbound AI calls.

Why This Matters

Speed Wins Conversations: Faster AI responses mean callers stay engaged instead of wondering if the line went dead. The difference between 2 seconds and 0.5 seconds of response time is the difference between a natural conversation and an awkward pause.

Visual AI Conversations: Vision on voice chat opens up entirely new use cases — insurance claims, tech support, product identification — where seeing what the caller sees changes everything.

Instant Context for Your Team: AI call summaries on every contact mean your team never walks into a follow-up call blind. Read the summary, know the situation, and pick up right where the AI left off.