BugPilot
Synthetic Bug Generation for SWE Agents
About BugPilot
BugPilot is a synthetic bug-generation pipeline that creates realistic bugs by having LLM agents implement new features and inadvertently break existing tests. This contrasts with traditional synthetic-bug generation that deliberately perturbs code and tends to produce out-of-distribution bugs unlike anything humans write. BugPilot leverages Claude Sonnet 4 agents to generate feature-addition bugs and combines them with real-world bugs from R2EGym and synthetic bugs from SWE-Smith to assemble comprehensive training datasets. The pipeline produced FrogBoss (32B) and FrogMini (14B), Qwen3-based coding agents specialized in bug fixing on SWE-Bench Verified.
Qualitative analysis shows BugPilot-generated bugs exhibit characteristics closer to human-produced bugs than prior synthetic pipelines, materially improving training-data quality for software-engineering agents. The combination of real and synthetic sources, plus the agentic generation mechanism, lets teams scale up high-quality debugging trajectories without manual annotation. FrogBoss and FrogMini are available in the Microsoft Foundry catalog for research and experimentation, alongside other code-focused models such as NextCoder.
Key capabilities
- SOTA Pass@1 of 54.6 on SWE-Bench Verified at 32B
- Two specialized agents: FrogBoss (32B) and FrogMini (14B)
- Synthetic bug pipeline where LLM agents unintentionally break tests
- Built on the Qwen3 backbone via the R2E-Gym training environment
- Produces more realistic, human-like bugs than rule-based generators
Ready to Explore?
Dive into platform integrations, source code, research papers, and announcements.