OmniParser V2

Pure-Vision GUI Screen Parser

1,711 USERS

Try OmniParser V2 on Microsoft Foundry → Try on Microsoft Foundry →

About OmniParser V2

OmniParser V2 is a pure-vision GUI screen-parsing module that converts UI screenshots into structured, actionable elements without invoking a language model. It pairs a fine-tuned YOLOv8 icon detector with a Florence-2-based caption model: the detector localizes interactive regions and the captioner describes their function. V2 cuts inference latency by 60% relative to V1 and reaches 39.6 average accuracy on the ScreenSpot Pro benchmark — meaningful improvements for real-time agent execution. The module handles both interactive and non-interactive elements and generalizes across arbitrary screen layouts without domain-specific training.

Screen understanding is fundamental infrastructure for any agent that drives a GUI, but earlier approaches leaned on expensive vision-language models or hand-labeled UI schemas. By splitting detection from captioning into specialized components and skipping the LLM in the parsing loop, OmniParser keeps both accuracy and latency in the range required by computer-use agents. Its structured output — element coordinates plus descriptions — slots directly into action-prediction pipelines, and it now powers a number of Microsoft and third-party agent stacks, including Fara-7B.

Key capabilities

Avg 0.6s/frame on A100; turns any LLM into a computer-use agent
60% lower latency than V1
39.6 average accuracy on ScreenSpot Pro
Fine-tuned YOLOv8 icon detector paired with Florence-2 captioning
Pure-vision GUI parsing without DOM access

Technology Stack

PyTorch YOLOv8 Florence-2

Technology Stack

PyTorch YOLOv8 Florence-2

Ready to Explore?

Dive into platform integrations, source code, research papers, and announcements.

PLATFORM Microsoft Foundry Try OmniParser V2 in the Microsoft Foundry model catalog. EXPLORE ON FOUNDRY CODE GitHub Repository Browse the open-source codebase and contribute. VIEW REPOSITORY ACADEMIC Research Paper Read the peer-reviewed publication. READ PAPER BLOG Microsoft Blog See the latest updates from Microsoft Research. VISIT BLOG