LLM Multi-Agent Simulations of Research Teams

Published in Under review, EMNLP 2026, 2026

Built LLM multi-agent simulations of five real research teams (~24,000 utterances, 6 sessions per team) with a four-layer fidelity framework. Agent personas are grounded in six observed behavioral traits, closing 64% of the naive-to-human outcome gap across 5/5 teams. The evaluation engineers 55 NLP interaction metrics and a two-judge LLM ensemble reaching 0.96 Spearman agreement on process quality, all wired into an end-to-end Python pipeline for transcript parsing, feature extraction, simulation runs, and bootstrap inference.

Under review at EMNLP 2026.

Recommended citation: Sungjin Choi. (2026). "LLM Multi-Agent Simulations of Research Teams." Under review, EMNLP 2026.