Phi-4-Reasoning-Vision-15B

Compact Multimodal Reasoning Model

140,401 USERS

Try Phi-4-Reasoning-Vision-15B on Microsoft Foundry → Try on Microsoft Foundry →

About Phi-4-Reasoning-Vision-15B

Phi-4-Reasoning-Vision-15B is a compact open-weight multimodal reasoning model that pairs the Phi-4-Reasoning language backbone with a SigLIP-2 vision encoder via a mid-fusion architecture. It supports dynamic-resolution input of up to 3,600 visual tokens for high-fidelity document and GUI analysis and introduces a hybrid reasoning design with THINK mode (chain-of-thought for complex math and scientific tasks) and NOTHINK mode (direct inference for perception tasks). Trained on 240 B200 GPUs over four days using carefully curated mixed datasets, the model reaches 84.8% on AI2D, 83.3% on ChartQA, 75.2% on MathVista-MINI, and 88.2% on ScreenSpot-V2 GUI localization — competitive with models roughly ten times its size.

The model addresses a critical gap in compact multimodal systems by engaging deliberate reasoning only when task complexity warrants it, cutting inference latency without sacrificing accuracy on demanding workloads. The mid-fusion design over pretrained components and its emphasis on data-centric training make it well suited for computer-use agents (GUI grounding), visual math problem solving, and OCR-intensive document workflows. Its strong performance on specialized reasoning benchmarks underscores Microsoft’s focus on practical multimodal intelligence that can run within typical enterprise and developer compute budgets.

Key capabilities

Hybrid reasoning (THINK/NOTHINK) within a single 15B model
Mid-fusion of Phi-4-Reasoning with the SigLIP-2 vision encoder
Up to 3,600 visual tokens for high-resolution perception
Open-weight checkpoint competitive with models 10× its size
Optimized for vLLM and Transformers inference

Technology Stack

PyTorch Transformers vLLM SigLIP-2

Technology Stack

PyTorch Transformers vLLM SigLIP-2

Ready to Explore?

Dive into platform integrations, source code, research papers, and announcements.

PLATFORM Microsoft Foundry Try Phi-4-Reasoning-Vision-15B in the Microsoft Foundry model catalog. EXPLORE ON FOUNDRY CODE GitHub Repository Browse the open-source codebase and contribute. VIEW REPOSITORY ACADEMIC Research Paper Read the peer-reviewed publication. READ PAPER BLOG Microsoft Blog See the latest updates from Microsoft Research. VISIT BLOG