BugPilot
High-quality bug generation is key to training the next iteration of language model-based software engineering (SWE) agents. Current synthetic bug pipelines involve intentionally perturbing the code to cause issues, introducing an out-of-distribution effect from real-world bug generation.
BugPilot introduces a method where LLM agents try to create new features, thereby breaking existing tests unintentionally. Researchers found through qualitative analysis that these bugs are more similar to those generated by humans.
The BugPilot was utilized to generate FrogBoss and FrogMini, 32B and 14B-parameter coding agents (respectively) that are specialized in fixing bugs in code. FrogBoss was obtained by finetuning a Qwen332B language model on debugging trajectories generated by Claude Sonnet 4 within the BugPilot framework. The training data combines realworld bugs from R2EGym, synthetic bugs from SWESmith, and novel “FeatAdd” bugs. Both models are available for experimentation today in the Microsoft Foundry catalog.