Code & Software Engineering Benchmark & Environment Language Experimental

Debug-gym

Interactive Debugging Environment for LLM Agents

Explore Debug-gym on GitHub → Explore on GitHub →

About Debug-gym

Debug-gym is an open-source, text-based interactive debugging environment that teaches AI coding agents to debug the way programmers do — through iterative tool use. It exposes Python’s pdb, bash shells, code viewers, grep, edit, and breakpoint management, so agents can gather information and form hypotheses before proposing fixes. The environment follows the Gymnasium paradigm and supports Docker and Kubernetes backends for isolated execution, and agents can dynamically import and customize tools to fit specific workflows.

Debug-gym integrates widely-used software-engineering benchmarks (SWE-bench, SWE-Smith, Aider, Mini-nightmare) with specialized swebench-debug configurations, and experiments show that agents with access to debugging tools significantly outperform those without on code-repair tasks. The environment addresses a structural limitation of current LLM-based coding agents — their inability to seek additional context through tool interaction when an initial fix fails — and gives researchers a standard testbed for developing the next generation of debugging agents, naturally complementing BugPilot’s bug-generation pipeline.

Key capabilities

Agents access pdb to set breakpoints and inspect program state
Interactive debugging environment for LLM coding agents
Includes Aider, Mini-nightmare, and SWE-bench benchmarks
Integrates with swe-smith for scalable task generation
Open-source research playground from Microsoft Research

Technology Stack

Python pdb SWE-bench

Technology Stack

Python pdb SWE-bench

Ready to Explore?

Dive into platform integrations, source code, research papers, and announcements.

PLATFORM Microsoft Foundry Try Debug-gym in the Microsoft Foundry model catalog. EXPLORE ON FOUNDRY CODE GitHub Repository Browse the open-source codebase and contribute. VIEW REPOSITORY BLOG Microsoft Blog See the latest updates from Microsoft Research. VISIT BLOG