Benchmark & Environment Language Experimental

SocialReasoning-Bench

Benchmark for Agent Social Reasoning

Try SocialReasoning-Bench on Microsoft Foundry → Try on Microsoft Foundry →

About SocialReasoning-Bench

SocialReasoning-Bench is an open-source benchmark from Microsoft Research AI Frontiers that measures whether AI agents can negotiate competently and act in the user’s best interest in multi-party settings. It evaluates agents in two realistic domains — Calendar Coordination (scheduling meetings on behalf of a user) and Marketplace Negotiation (purchasing products) — and introduces two metrics, Outcome Optimality (value captured for the principal) and Due Diligence (process quality versus a competent decision-making standard). Experiments with GPT-4.1, GPT-5.4, Claude Sonnet 4.6, and Gemini 3 Flash show agents completing tasks at near-perfect rates while frequently leaving substantial value on the table for users.

The benchmark reveals that frontier models struggle with social reasoning even with defensive prompting: in Marketplace Negotiation, most settle at or near zero Outcome Optimality, ceding nearly all surplus to counterparties. Decomposing results into outcome and process metrics reveals distinct failure modes — some agents reach reasonable outcomes through fragile, lucky processes, while others negotiate diligently but ineffectively. Under adversarial counterparties, agents prove vulnerable to authority appeals, social proof, loss aversion, and prompt-injection attacks, highlighting real gaps in their ability to serve as trustworthy delegates in delegated decision-making.

Key capabilities

Two principal-agent domains: Calendar Coordination and Marketplace Negotiation
Scores agents on both Outcome Optimality and Due Diligence
Multi-party negotiation evaluation in realistic settings
Open-source benchmark from Microsoft Research AI Frontiers
Reproducible Python LLM-eval harness across model providers

Technology Stack

Python LLM eval harness

Technology Stack

Python LLM eval harness

Ready to Explore?

Dive into platform integrations, source code, research papers, and announcements.

PLATFORM Microsoft Foundry Try SocialReasoning-Bench in the Microsoft Foundry model catalog. EXPLORE ON FOUNDRY CODE GitHub Repository Browse the open-source codebase and contribute. VIEW REPOSITORY BLOG Microsoft Blog See the latest updates from Microsoft Research. VISIT BLOG