spoonai
GitHubVoice AITTSASR

microsoft/VibeVoice — Open-Source Voice AI That Synthesizes 90-Minute, 4-Speaker Audio (44.7k stars)

Microsoft's open-source voice AI uses an ultra-low 7.5 Hz speech tokenizer to synthesize 90-minute, 4-speaker conversational audio in one pass. The companion ASR transcri

·3분 소요·GitHubGitHub
공유
microsoft/VibeVoice — Open-Source Voice AI That Synthesizes 90-Minute, 4-Speaker
출처: GitHub

What you can do with it

Microsoft's open-source voice AI uses an ultra-low 7.5 Hz speech tokenizer to synthesize 90-minute, 4-speaker conversational audio in one pass. The companion ASR transcribes 60-minute audio in a single shot with speaker diarization. 9 multilingual voices + 11 English style voices. MIT-licensed for commercial use — podcasts, audiobooks, dubbing.

On GitHub trending for the past few days. Yesterday alone added +320 stars. Total: 44,707.

Why it exists

Microsoft's open-source voice AI uses an ultra-low 7.5 Hz speech tokenizer to synthesize 90-minute, 4-speaker conversational audio in one pass. The companion ASR transcribes 60-minute audio in a single shot with speaker diarization. 9 multilingual voices + 11 English style voices. MIT-licensed for commercial use — podcasts, audiobooks, dubbing.

It hit because it filled a gap that closed-source vendors charge for. Friendly README, permissive license, runs locally — that combination still wins on GitHub trending in 2026.

Core Features

  • 90분 long-form 합성
  • 4명까지 동시 화자 지원
  • 60분 ASR 한 번에
  • speaker diarization 자동
  • 9개 언어 멀티링구얼
  • 11개 영어 스타일 보이스

microsoft/VibeVoice — Open-Source Voice AI That Synthesizes 90-Minute, 4-Speaker Audio (44.7k stars) screenshot — demo view 출처: github.com · 회사 OG · 뉴스 fair use

Stack & Architecture

Languages/frameworks: Python, PyTorch, Hugging Face Transformers, Continuous Speech Tokenizer (7.5 Hz).

License: MIT.

Compared to neighbors

Repo Strength Trade-off
This repo 90분 long-form 합성 Young
ElevenLabs (closed) Mature Heavier or restrictive license
Suno (closed) Mature Heavier or restrictive license
Coqui TTS (open) Mature Heavier or restrictive license

Why now

Open alternatives to closed vendors are hot this quarter. Enterprises avoid lock-in, hobbyists want hackable code, and GitHub trending rewards both. This repo also has fast PR review cycles.

Quick start

git clone https://github.com/microsoft/VibeVoice && pip install -e . && python demo.py --text 'hello' --voice en_alice

microsoft/VibeVoice — Open-Source Voice AI That Synthesizes 90-Minute, 4-Speaker Audio (44.7k stars) README diagram 출처: github.com · 회사 OG · 뉴스 fair use

Limits & Roadmap

English-first training, GPU memory bumps for long context, occasional cuDNN issues. Roadmap: quantization, hosted version, more locales.

Sources

Tomorrow morning

  • Devs: clone, run quickstart, file an issue with first impression.
  • PMs: spend 30 minutes auditing license + deps for internal use.

관련 기사

무료 뉴스레터

AI 트렌드를 앞서가세요

매일 아침, 엄선된 AI 뉴스를 받아보세요. 스팸 없음. 언제든 구독 취소.

매일 30개+ 소스 분석 · 한국어/영어 이중 언어광고 없음 · 1-클릭 해지