← Back to Innovations
Creative & Generative Media Model Multimodal Production

Microsoft Phi-4

Phi-4 Family (Multimodal, Mini)

9,223,372,036,854,775,808 USERS
Try Microsoft Phi-4 on Microsoft Foundry → Try on Microsoft Foundry →
Microsoft Phi-4

About Microsoft Phi-4

Phi-4 is Microsoft’s 14-billion-parameter small language model trained on roughly 9.8 trillion tokens of synthetic “textbook-style” data combined with filtered public documents and curated academic material. The model is engineered for reasoning, mathematics, and code generation in memory-constrained environments, achieving 84.8% on MMLU, 80.4% on MATH, and 82.6% on HumanEval. Phi-4 was aligned using supervised fine-tuning and direct preference optimization, with a 16K-token context window. The Phi-4 family also includes Phi-4-multimodal (5.6B parameters, unifying speech, vision, and text in a single architecture) and Phi-4-mini (3.8B parameters, optimized for reasoning and 128K-context tasks).

Phi-4 advances Microsoft’s strategy of scaling down rather than up — delivering capabilities competitive with much larger models while remaining cheap enough to deploy on commodity hardware and at the edge. By treating training-data quality as the primary design lever and applying targeted post-training techniques, the Phi-4 family demonstrates that small, intentionally-trained models can match frontier performance on reasoning-heavy tasks. The family underpins applications in enterprise reasoning, accessibility, on-device assistants, and education, and serves as the base for several downstream Microsoft models including Phi-4-Reasoning-Vision and Rho-Alpha.

Key capabilities

  • Single multimodal model with mixture-of-LoRAs across speech, vision, and text
  • Phi-4-multimodal (5.6B) unifies speech, vision, and text
  • Phi-4-mini (3.8B) with 128K context for long-context tasks
  • Strong reasoning, coding, and math performance at SLM scale
  • Open weights with vLLM and Flash Attention support
Technology Stack
PyTorch Transformers Flash Attention vLLM
Technology Stack
PyTorch Transformers Flash Attention vLLM