Creative & Generative Media Model Vision Experimental

MAI-Image-2

State-of-the-Art Text-to-Image

Try MAI-Image-2 on Microsoft Foundry → Try on Microsoft Foundry →

About MAI-Image-2

MAI-Image-2 is Microsoft AI’s flagship text-to-image generation model, which debuted at #3 on the Arena.ai community leaderboard. The model produces photorealistic imagery with reliable in-image text rendering and an expressive style range, and it was developed in close collaboration with professional photographers, designers, and visual storytellers to ensure aesthetic quality and faithful prompt adherence. It supports high-resolution output, nuanced style transfer, and compositional control across complex scenes.

MAI-Image-2 anchors Microsoft AI’s first-party generative-imagery stack and demonstrates the company’s intent to compete head-on with the leading text-to-image systems on quality rather than scale alone. Its leaderboard standing validates a training and data-curation strategy emphasizing human-centered design, and its integration across Microsoft surfaces — from Copilot to Designer — gives professional and consumer users a Microsoft-owned alternative to third-party generators. The model is also the architectural foundation for MAI-Image-2-Efficient, which adapts it for high-volume production.

Key capabilities

Photorealistic outputs with reliable text rendering
Debuted at #3 on the Arena.ai leaderboard
Designed in collaboration with photographers, designers, and visual storytellers
Expressive stylistic range from cinematic to graphic design
First fully in-house Microsoft AI text-to-image model

Technology Stack

Diffusion Models CUDA

Technology Stack

Diffusion Models CUDA

Ready to Explore?

Dive into platform integrations, source code, research papers, and announcements.

PLATFORM Microsoft Foundry Try MAI-Image-2 in the Microsoft Foundry model catalog. EXPLORE ON FOUNDRY BLOG Microsoft Blog See the latest updates from Microsoft Research. VISIT BLOG