microsoft/VibeVoice — Open-Source Voice AI That Synthesizes 90-Minute, 4-Speaker Audio (44.7k stars)

What you can do with it

Microsoft's open-source voice AI uses an ultra-low 7.5 Hz speech tokenizer to synthesize 90-minute, 4-speaker conversational audio in one pass. The companion ASR transcribes 60-minute audio in a single shot with speaker diarization. 9 multilingual voices + 11 English style voices. MIT-licensed for commercial use — podcasts, audiobooks, dubbing.

On GitHub trending for the past few days. Yesterday alone added +320 stars. Total: 44,707.

Why it exists

It hit because it filled a gap that closed-source vendors charge for. Friendly README, permissive license, runs locally — that combination still wins on GitHub trending in 2026.

Core Features

90분 long-form 합성
4명까지 동시 화자 지원
60분 ASR 한 번에
speaker diarization 자동
9개 언어 멀티링구얼
11개 영어 스타일 보이스

microsoft/VibeVoice — Open-Source Voice AI That Synthesizes 90-Minute, 4-Speaker Audio (44.7k stars) screenshot — demo view 출처: github.com · 회사 OG · 뉴스 fair use

Stack & Architecture

Languages/frameworks: Python, PyTorch, Hugging Face Transformers, Continuous Speech Tokenizer (7.5 Hz).

License: MIT.

Compared to neighbors

Repo	Strength	Trade-off
This repo	90분 long-form 합성	Young
ElevenLabs (closed)	Mature	Heavier or restrictive license
Suno (closed)	Mature	Heavier or restrictive license
Coqui TTS (open)	Mature	Heavier or restrictive license

Why now

Open alternatives to closed vendors are hot this quarter. Enterprises avoid lock-in, hobbyists want hackable code, and GitHub trending rewards both. This repo also has fast PR review cycles.

Quick start

git clone https://github.com/microsoft/VibeVoice && pip install -e . && python demo.py --text 'hello' --voice en_alice

microsoft/VibeVoice — Open-Source Voice AI That Synthesizes 90-Minute, 4-Speaker Audio (44.7k stars) README diagram 출처: github.com · 회사 OG · 뉴스 fair use

Limits & Roadmap

English-first training, GPU memory bumps for long context, occasional cuDNN issues. Roadmap: quantization, hosted version, more locales.

Sources

Tomorrow morning

Devs: clone, run quickstart, file an issue with first impression.
PMs: spend 30 minutes auditing license + deps for internal use.

microsoft/VibeVoice — Open-Source Voice AI That Synthesizes 90-Minute, 4-Speaker Audio (44.7k stars)

What you can do with it

Why it exists

Core Features

Stack & Architecture

Compared to neighbors

Why now

Quick start

Limits & Roadmap

Sources

Tomorrow morning

출처

관련 기사

Microsoft's VibeVoice Handles 60-Minute Audio in One Shot — and It's Open Source

Mistral's Voxtral TTS Is Free, Open-Source, and Gunning for ElevenLabs

Microsoft Just Shipped Its Own Foundation Models

What you can do with it

Why it exists

Core Features

Stack & Architecture

Compared to neighbors

Why now

Quick start

Limits & Roadmap

Sources

Tomorrow morning

출처

관련 기사

Microsoft's VibeVoice Handles 60-Minute Audio in One Shot — and It's Open Source

Mistral's Voxtral TTS Is Free, Open-Source, and Gunning for ElevenLabs

Microsoft Just Shipped Its Own Foundation Models

AI 트렌드를 앞서가세요