Magma

Multimodal Foundation Model for AI Agents

6,478 USERS

Try Magma on Microsoft Foundry → Try on Microsoft Foundry →

About Magma

Magma is a multimodal foundation model for AI agents that perceives text and visual input and generates grounded actions in both digital and physical environments — navigating user interfaces and manipulating real-world tools. It is pretrained on a heterogeneous mix of images, videos, and robotics demonstrations and introduces two novel supervision techniques: Set-of-Mark (SoM), which grounds actions in space using numeric markers on interactive elements, and Trace-of-Mark (ToM), which captures temporal action plans from unlabeled video. With a relatively modest pretraining budget, Magma reaches state-of-the-art results on UI navigation, robotic manipulation, and spatial reasoning benchmarks while remaining competitive on standard vision-language tasks.

Magma’s two core innovations target the central weaknesses of agentic AI: spatial grounding (where to act) and temporal reasoning (what sequence of actions to take). The shared SoM/ToM representation lets the model transfer skills across surfaces that look superficially different — insights from instructional video carry over into robotics control, and vice versa. As a foundation model, Magma reduces the need for expensive task-specific training and gives developers a single starting point for building assistive robots, GUI agents, and other embodied systems.

Key capabilities

Single VLA model achieving SOTA across UI navigation and robot manipulation
Set-of-Mark grounding for action selection
Trace-of-Mark training on unlabeled video at scale
Perceives text and visuals; emits digital and physical actions
Built on LLaMA-3 with CLIP-ConvNeXt-XXLarge vision

Technology Stack

PyTorch LLaMA-3 backbone CLIP-ConvNeXt-XXLarge

Technology Stack

PyTorch LLaMA-3 backbone CLIP-ConvNeXt-XXLarge

Ready to Explore?

Dive into platform integrations, source code, research papers, and announcements.

PLATFORM Microsoft Foundry Try Magma in the Microsoft Foundry model catalog. EXPLORE ON FOUNDRY CODE GitHub Repository Browse the open-source codebase and contribute. VIEW REPOSITORY ACADEMIC Research Paper Read the peer-reviewed publication. READ PAPER BLOG Microsoft Blog See the latest updates from Microsoft Research. VISIT BLOG