microsoft/VibeVoice — Open-Source Voice AI That Synthesizes 90-Minute, 4-Speaker Audio (44.7k stars)
Microsoft's open-source voice AI uses an ultra-low 7.5 Hz speech tokenizer to synthesize 90-minute, 4-speaker conversational audio in one pass. The companion ASR transcri

What you can do with it
Microsoft's open-source voice AI uses an ultra-low 7.5 Hz speech tokenizer to synthesize 90-minute, 4-speaker conversational audio in one pass. The companion ASR transcribes 60-minute audio in a single shot with speaker diarization. 9 multilingual voices + 11 English style voices. MIT-licensed for commercial use — podcasts, audiobooks, dubbing.
On GitHub trending for the past few days. Yesterday alone added +320 stars. Total: 44,707.
Why it exists
Microsoft's open-source voice AI uses an ultra-low 7.5 Hz speech tokenizer to synthesize 90-minute, 4-speaker conversational audio in one pass. The companion ASR transcribes 60-minute audio in a single shot with speaker diarization. 9 multilingual voices + 11 English style voices. MIT-licensed for commercial use — podcasts, audiobooks, dubbing.
It hit because it filled a gap that closed-source vendors charge for. Friendly README, permissive license, runs locally — that combination still wins on GitHub trending in 2026.
Core Features
- 90분 long-form 합성
- 4명까지 동시 화자 지원
- 60분 ASR 한 번에
- speaker diarization 자동
- 9개 언어 멀티링구얼
- 11개 영어 스타일 보이스
출처: github.com · 회사 OG · 뉴스 fair use
Stack & Architecture
Languages/frameworks: Python, PyTorch, Hugging Face Transformers, Continuous Speech Tokenizer (7.5 Hz).
License: MIT.
Compared to neighbors
| Repo | Strength | Trade-off |
|---|---|---|
| This repo | 90분 long-form 합성 | Young |
| ElevenLabs (closed) | Mature | Heavier or restrictive license |
| Suno (closed) | Mature | Heavier or restrictive license |
| Coqui TTS (open) | Mature | Heavier or restrictive license |
Why now
Open alternatives to closed vendors are hot this quarter. Enterprises avoid lock-in, hobbyists want hackable code, and GitHub trending rewards both. This repo also has fast PR review cycles.
Quick start
git clone https://github.com/microsoft/VibeVoice && pip install -e . && python demo.py --text 'hello' --voice en_alice
출처: github.com · 회사 OG · 뉴스 fair use
Limits & Roadmap
English-first training, GPU memory bumps for long context, occasional cuDNN issues. Roadmap: quantization, hosted version, more locales.
Sources
Tomorrow morning
- Devs: clone, run quickstart, file an issue with first impression.
- PMs: spend 30 minutes auditing license + deps for internal use.
관련 기사

Microsoft's VibeVoice Handles 60-Minute Audio in One Shot — and It's Open Source
Microsoft VibeVoice rockets up GitHub with 27.8K stars. Its ASR model processes 60-min audio in a single pass across 50+ languages, while TTS runs at just 7.5Hz frame rate. All open source.

Mistral's Voxtral TTS Is Free, Open-Source, and Gunning for ElevenLabs
Mistral just dropped Voxtral TTS under Apache 2.0. A 4B-parameter model that supports 9 languages, clones voices from 5-second samples, and runs on consumer hardware. The $11B voice AI market just got disrupted.

Microsoft Just Shipped Its Own Foundation Models
Microsoft released three MAI foundation models — Voice-1, Transcribe-1, and Image-1 — on Azure Foundry. The quiet signal of a multi-year bet to stop leaning only on OpenAI.
AI 트렌드를 앞서가세요
매일 아침, 엄선된 AI 뉴스를 받아보세요. 스팸 없음. 언제든 구독 취소.
