Experiment

Rho-Alpha

Rho-alpha (ρα) is the first robotics model derived from Microsoft’s Phi series of vision-language models.

Rho-alpha translates natural language commands into control signals for robotic systems performing bimanual manipulation tasks. It can be described as a VLA+ model in that it expands the set of perceptual and learning modalities beyond those typically used by VLAs. For perception, Rho-alpha adds tactile sensing, with efforts underway to accommodate modalities such as force. For learning, our work enables Rho-alpha to continually improve during deployment by learning from feedback provided by people.

Rho-alpha achieves tactile-aware behaviors infused with vision-language understanding through a process of co-training on trajectories from physical demonstrations and simulated tasks, together with web-scale visual question answering data.

While extending perception capabilities can enable Rho-alpha to adjust a robot’s course of action during operation, robots can still make mistakes that are hard for them to recover from. Human operators can set a robot back on track using intuitive teleoperation devices such as a 3D mouse. Rho-alpha can continue learning from this kind of corrective feedback online during system operation.

Robotics manufacturers, integrators, and end-users have unique insights into the use-cases and scenarios where emerging physical AI technologies offer transformative potential. To empower these stakeholders, we are working toward foundational technologies like Rho-alpha, along with associated tooling, that will enable them to train, deploy, and continuously adapt their own cloud-hosted physical AI using their own data for their own robots and scenarios.

If you’re interested in experimenting with and helping shape the future of our Physical AI foundations and tools, express your interest in our Research Early Access Program. The model will also be made available on Microsoft Foundry at a future date.