MAI-Image-2
State-of-the-Art Text-to-Image
Try MAI-Image-2 on Microsoft Foundry → Try on Microsoft Foundry →
About MAI-Image-2
MAI-Image-2 is Microsoft AI’s flagship text-to-image generation model, which debuted at #3 on the Arena.ai community leaderboard. The model produces photorealistic imagery with reliable in-image text rendering and an expressive style range, and it was developed in close collaboration with professional photographers, designers, and visual storytellers to ensure aesthetic quality and faithful prompt adherence. It supports high-resolution output, nuanced style transfer, and compositional control across complex scenes.
MAI-Image-2 anchors Microsoft AI’s first-party generative-imagery stack and demonstrates the company’s intent to compete head-on with the leading text-to-image systems on quality rather than scale alone. Its leaderboard standing validates a training and data-curation strategy emphasizing human-centered design, and its integration across Microsoft surfaces — from Copilot to Designer — gives professional and consumer users a Microsoft-owned alternative to third-party generators. The model is also the architectural foundation for MAI-Image-2-Efficient, which adapts it for high-volume production.
Key capabilities
- Photorealistic outputs with reliable text rendering
- Debuted at #3 on the Arena.ai leaderboard
- Designed in collaboration with photographers, designers, and visual storytellers
- Expressive stylistic range from cinematic to graphic design
- First fully in-house Microsoft AI text-to-image model
Ready to Explore?
Dive into platform integrations, source code, research papers, and announcements.