Model Speech & Audio Production

MAI-Transcribe-1

Multilingual Speech Recognition

Try MAI-Transcribe-1 on Microsoft Foundry → Try on Microsoft Foundry →

About MAI-Transcribe-1

MAI-Transcribe-1 is a Microsoft AI multilingual speech recognition model supporting up to 25 languages with enterprise-grade transcription accuracy at roughly half the GPU cost of leading competing systems. It is engineered for accessibility tools, automated captioning, content production workflows, and voice agents, and is robust across varied acoustic conditions and speaker characteristics. The cost-efficiency derives from architectural optimization and training on representative multilingual corpora rather than from sacrificing language coverage.

The model addresses a real economic constraint in deploying ASR at scale: high-volume transcription has historically been priced out of reach for many accessibility and inclusion use cases. By delivering competitive accuracy at a substantially lower compute footprint, MAI-Transcribe-1 widens the addressable market for live captioning, multilingual voice agents, and content indexing. It complements MAI-Voice-1 to give Microsoft a complete neural speech stack — recognition and synthesis — on first-party infrastructure.

Key capabilities

Competitive accuracy at ~50% GPU cost of leading systems
Supports up to 25 languages with enterprise-grade accuracy
Engineered for captioning, accessibility, and voice-agent pipelines
Real-time integration through the Azure Speech SDK
Tuned for content workflows at scale

Technology Stack

Neural ASR Azure Speech SDK

Technology Stack

Neural ASR Azure Speech SDK

Ready to Explore?

Dive into platform integrations, source code, research papers, and announcements.

PLATFORM Microsoft Foundry Try MAI-Transcribe-1 in the Microsoft Foundry model catalog. EXPLORE ON FOUNDRY BLOG Microsoft Blog See the latest updates from Microsoft Research. VISIT BLOG