Production

Model Router

Try it in your environment Read the documentation

Model Router is a trained language model in Microsoft Foundry that, in real time, routes each prompt to the most suitable underlying LLM — purpose‑built for developers and platform teams who don’t want to hard‑code a single model into their apps. The latest version supports 18 models across OpenAI, Anthropic, DeepSeek, Meta, and xAI — including GPT‑5, GPT‑5.2, o4‑mini, Claude Opus 4.6, Claude Sonnet 4.5, DeepSeek‑V3.2, Llama‑4 Maverick, Grok 4, and more — under one deployment and one chat experience. Three routing modes (Balanced, Cost, Quality) tune the routing logic for your use case, while model subsets, automatic failover, prompt caching, and tool‑use support round out the production picture. In short, one endpoint, the right model for every prompt.

This is what changes when “pick a model” stops being an architectural decision. High‑volume product teams running chat copilots get the cheapest sufficient model for each user turn, with built‑in failover when something hiccups. Reasoning‑heavy workloads route to the strongest model only when the prompt demands it. Cost‑sensitive deployments use Cost mode to stay inside a tight budget, while production teams running agentic scenarios get tool‑use support directly from a single deployment in the Foundry Agent Service. Put simply, Model Router gives you the ceiling of frontier models with the floor of small‑model economics — selected per request, not per app.

Bring it into your stack — try Model Router on GitHub.