Microsoft Unveils MAI-Voice-1: Hyper-Realistic Speech Generation from Just One Minute of Audio
Microsoft launches three new foundational AI models including MAI-Voice-1, which delivers hyper-realistic voice synthesis and custom brand voice creation, marking a major leap in enterprise TTS capabilities.
Leggi di più →ElevenLabs Reaches $11 Billion Valuation, Eyes IPO as Voice AI Becomes Enterprise Standard
AI voice startup ElevenLabs raises $500 million at an $11 billion valuation, tripling its worth in just over a year while forging major partnerships with IBM and planning a potential IPO.
Leggi di più →Global AI Voice Regulation Tightens: EU AI Act Deepfake Rules Take Effect as Voice Cloning Crosses 'Indistinguishable Threshold'
As voice cloning technology reaches human-level quality, regulators worldwide respond with new laws — the EU AI Act's deepfake labeling rules, the US ELVIS Act, and emerging biometric voice data protections reshape the industry landscape.
Leggi di più →OpenAI Launches Voice Engine to the Public: Real-Time Conversational TTS Now Available to All Developers
After over a year of limited preview, OpenAI opens Voice Engine to all API developers, introducing real-time streaming TTS with emotional awareness and 40+ language support at significantly reduced pricing.
Leggi di più →Google DeepMind Brings Studio-Quality TTS to Smartphones with SoundStorm 2 Edge — No Internet Required
Google DeepMind announces SoundStorm 2 Edge, a compact on-device TTS model that runs entirely on mobile hardware, delivering studio-quality voice synthesis without cloud connectivity and opening new possibilities for offline accessibility.
Leggi di più →AI Dubbing Market Surges Past $2 Billion as Hollywood, Streaming Giants, and Game Studios Embrace Automated Localization
The AI-powered dubbing and localization market crosses the $2 billion mark in Q1 2026, driven by adoption from Netflix, Disney+, and major game publishers seeking to reach global audiences at a fraction of traditional costs.
Leggi di più →Apple Unveils 'Personal Voice 2.0' in iOS 20: On-Device Voice Cloning Creates Your Digital Twin in 3 Minutes
Apple announces Personal Voice 2.0 at its spring event, allowing users to create a highly realistic clone of their own voice in just 3 minutes of recording — all processed entirely on-device with Apple Silicon, positioning it as the privacy-first alternative to cloud-based voice AI.
Leggi di più →Spotify Rolls Out AI Voice Translation for Podcasts Globally: Your Favorite Hosts Now Speak 40 Languages in Their Own Voice
Spotify launches its AI-powered podcast translation feature worldwide, using voice cloning technology to automatically dub podcasts into 40 languages while preserving each host's unique voice characteristics — opening 100,000+ shows to global audiences overnight.
Leggi di più →FDA Clears First AI Voice Assistant for Clinical Use: Voice-Based Patient Screening Enters the Hospital
The FDA grants its first clearance for an AI voice assistant designed for clinical patient interaction, allowing automated voice-based symptom screening and triage in emergency departments — marking a historic milestone for voice AI in healthcare.
Leggi di più →Meta Releases Llama-Voice: First Fully Open-Source TTS Model to Match Commercial Giants in 50+ Languages
Meta drops Llama-Voice under an Apache 2.0 license, delivering near state-of-the-art voice synthesis, zero-shot voice cloning from 10 seconds of audio, and 52-language coverage — all runnable on a single consumer GPU.
Leggi di più →NVIDIA Launches Voice Foundry NIM: Blackwell-Optimized Microservices Cut Real-Time TTS Costs by 70%
NVIDIA unveils Voice Foundry, a dedicated suite of NIM inference microservices for TTS and STT optimized for Blackwell GB200 hardware, promising sub-80ms first-token latency and 70% lower per-character costs for enterprise voice applications.
Leggi di più →Audible Opens AI-Narrated Audiobook Catalog to 400,000 Backlist Titles — Narrators Split on Landmark Royalty Model
Amazon's Audible launches the industry's largest AI-narrated audiobook catalog, adding 400,000 previously unnarrated titles using voice clones of consenting narrators, with a first-of-its-kind per-listen residual model that splits the narration community.
Leggi di più →Google Launches Gemini 3.1 Flash TTS: 70+ Languages, Multi-Speaker Dialogue, and a Top Spot on the Artificial Analysis Leaderboard
Google introduces Gemini 3.1 Flash TTS, a new text-to-speech model with audio tags for fine-grained vocal control, native multi-speaker dialogue, and 70+ language support — landing in the 'most attractive quadrant' of the Artificial Analysis TTS leaderboard with an Elo of 1,211.
Leggi di più →OpenAI Launches GPT-Realtime-2: Voice Models with GPT-5-Class Reasoning, Live Translation, and Streaming Transcription
OpenAI introduces three new Realtime API voice models — GPT-Realtime-2 with GPT-5-class reasoning, GPT-Realtime-Translate covering 70+ input languages, and GPT-Realtime-Whisper for live transcription — quadrupling the context window to 128K tokens and bringing voice agents closer to production-ready workflows.
Leggi di più →Microsoft Launches MAI-Voice-2 at Build 2026: Expressive Speech and Zero-Shot Voice Cloning Across 15 Languages
Microsoft unveils MAI-Voice-2, calling it the most expressive and natural-sounding text-to-speech model it has built, expanding from English-only to 15 languages with granular emotion control, code-switching, and zero-shot voice prompting from a few seconds of audio.
Leggi di più →Wispr Hits ~$2 Billion Valuation as AI Voice Dictation Becomes a Workplace Standard
Wispr, the startup behind the AI dictation tool Wispr Flow, is raising roughly $260 million at a near-$2 billion valuation led by Menlo Ventures — nearly tripling its worth in six months as voice-to-text moves from novelty to everyday workplace productivity tool.
Leggi di più →FTC Begins Enforcing the TAKE IT DOWN Act: Platforms Face $53,088-Per-Violation Penalties for AI Deepfakes
The FTC's civil enforcement of the TAKE IT DOWN Act took effect on May 19, 2026, requiring platforms to remove nonconsensual intimate imagery — including AI-generated deepfakes — within 48 hours, with penalties of $53,088 per violation. The agency promptly sent warning letters to major platforms and 'nudify' websites.
Leggi di più →Poland Government Takes Stake in ElevenLabs, Launches AI Lab to Build Voice AI from Europe
The Government of Poland invests in ElevenLabs through its Vinci/BGK Group, joining Andreessen Horowitz and Sequoia as a strategic backer, while launching AI Lab Poland to nurture the next generation of voice AI companies with global ambition.
Leggi di più →ElevenLabs Launches Dubbing v2: Emotion-Preserving AI Dubbing Across 90+ Languages
ElevenLabs releases Dubbing v2, a breakthrough AI dubbing model that preserves the original speaker's emotion, tone, and pacing across 90+ languages by conditioning directly on the performance rather than just transcripts.
Leggi di più →ElevenLabs Partners with UK Government to Bring Voice AI to Public Services, Doubles London Headquarters
ElevenLabs signs a Memorandum of Understanding with the UK's Department for Science, Innovation and Technology to deploy voice AI in public services, focusing on accessibility for the visually impaired, elderly, and linguistically diverse communities.
Leggi di più →Rumik Launches Silk Mulberry 1.5: 'Describe a Voice Into Existence' with Plain-Language Prompts, Matching Commercial TTS Giants at 95% Lower Cost
Indian AI startup Rumik releases Silk Mulberry 1.5, a text-to-speech model that replaces preset voice menus with plain-language voice descriptions, achieving MOS scores competitive with ElevenLabs and Google at roughly $0.0046 per minute.
Leggi di più →Michael Caine's AI Voice Narrates 13-Hour 'The Odyssey' Audiobook — 20 AI Characters, Original Score, Built by 4 Producers in 6 Weeks
ElevenLabs releases a cinematic audiobook of Homer's The Odyssey narrated by an authorized AI replica of Sir Michael Caine's voice, featuring ~20 AI-generated character voices, original music, and sound design — all produced by a four-person team in six weeks.
Leggi di più →Five9 Launches Voice AI Agents with ElevenLabs, Deepgram, and OpenAI Under the Hood — Targeting Legacy IVR Replacement
Five9 unveils Voice AI Agents at Customer Contact Week 2026, combining ElevenLabs TTS, Deepgram ASR, and OpenAI reasoning in a proprietary three-model architecture built to replace scripted IVR systems with natural, human-like voice self-service.
Leggi di più →