← Back to Innovations
Code & Software Engineering Model Language Experimental

BugPilot

Synthetic Bug Generation for SWE Agents

44,722,322,510 USERS
Try BugPilot on Microsoft Foundry → Try on Microsoft Foundry →
BugPilot

About BugPilot

BugPilot is a synthetic bug-generation pipeline that creates realistic bugs by having LLM agents implement new features and inadvertently break existing tests. This contrasts with traditional synthetic-bug generation that deliberately perturbs code and tends to produce out-of-distribution bugs unlike anything humans write. BugPilot leverages Claude Sonnet 4 agents to generate feature-addition bugs and combines them with real-world bugs from R2EGym and synthetic bugs from SWE-Smith to assemble comprehensive training datasets. The pipeline produced FrogBoss (32B) and FrogMini (14B), Qwen3-based coding agents specialized in bug fixing on SWE-Bench Verified.

Qualitative analysis shows BugPilot-generated bugs exhibit characteristics closer to human-produced bugs than prior synthetic pipelines, materially improving training-data quality for software-engineering agents. The combination of real and synthetic sources, plus the agentic generation mechanism, lets teams scale up high-quality debugging trajectories without manual annotation. FrogBoss and FrogMini are available in the Microsoft Foundry catalog for research and experimentation, alongside other code-focused models such as NextCoder.

Key capabilities

  • SOTA Pass@1 of 54.6 on SWE-Bench Verified at 32B
  • Two specialized agents: FrogBoss (32B) and FrogMini (14B)
  • Synthetic bug pipeline where LLM agents unintentionally break tests
  • Built on the Qwen3 backbone via the R2E-Gym training environment
  • Produces more realistic, human-like bugs than rule-based generators
Technology Stack
PyTorch R2E-Gym Qwen3 backbone vLLM
Technology Stack
PyTorch R2E-Gym Qwen3 backbone vLLM