Rho-Alpha
Robotics VLA+ Model from Phi
Try Rho-Alpha on Microsoft Foundry → Try on Microsoft Foundry →
About Rho-Alpha
Rho-Alpha (ρα) is the first robotics vision-language model derived from Microsoft’s Phi series. It translates natural-language commands into control signals for bimanual robotic manipulation, integrating tactile sensing alongside vision and inheriting the efficiency and grounding characteristics of the Phi vision-language backbone. Crucially, the model learns continually from human teleoperation feedback during deployment, allowing the robot to adapt to new environments and tasks without an explicit retraining cycle.
Rho-Alpha addresses the central pain point of physical robotics: task-specific programming dominates development cost, and policies trained in simulation often degrade in the real world. By combining language understanding with multimodal sensing (vision, proprioception, touch) and continuous teleoperation-driven learning, the model gives deployed systems a path to improve over time rather than ship frozen. As Microsoft’s first Phi-derived robotics model, it signals a deliberate move from purely digital agents into Physical AI, where the same family of small, efficient models drives both screens and arms.
Key capabilities
- VLA+ with tactile sensing and online learning from corrections
- First robotics model derived from Microsoft's Phi VLM series
- Translates natural language directly into bimanual control signals
- Continual learning from human teleoperation during deployment
- Early-access research model for physical-AI experimentation
Ready to Explore?
Dive into platform integrations, source code, research papers, and announcements.