Papers
arxiv:2601.22859

MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering

Published on Jan 30
· Submitted by
Sijun He
on Feb 5
Authors:
,
,
,
,
,
,
,
,
,
,

Abstract

MEnvAgent is a multi-language framework that automates environment construction for software engineering tasks using a planning-execution-verification architecture and environment reuse mechanism, achieving improved performance on a new benchmark and creating the largest open-source polyglot dataset of verifiable Docker environments.

AI-generated summary

The evolution of Large Language Model (LLM) agents for software engineering (SWE) is constrained by the scarcity of verifiable datasets, a bottleneck stemming from the complexity of constructing executable environments across diverse languages. To address this, we introduce MEnvAgent, a Multi-language framework for automated Environment construction that facilitates scalable generation of verifiable task instances. MEnvAgent employs a multi-agent Planning-Execution-Verification architecture to autonomously resolve construction failures and integrates a novel Environment Reuse Mechanism that reduces computational overhead by incrementally patching historical environments. Evaluations on MEnvBench, a new benchmark comprising 1,000 tasks across 10 languages, demonstrate that MEnvAgent outperforms baselines, improving Fail-to-Pass (F2P) rates by 8.6% while reducing time costs by 43%. Additionally, we demonstrate the utility of MEnvAgent by constructing MEnvData-SWE, the largest open-source polyglot dataset of realistic verifiable Docker environments to date, alongside solution trajectories that enable consistent performance gains on SWE tasks across a wide range of models. Our code, benchmark, and dataset are available at https://github.com/ernie-research/MEnvAgent.

Community

Paper author Paper submitter

Check out this verifiable environment for SWE! Open-sourced dataset, Docker images, and evals!

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/menvagent-scalable-polyglot-environment-construction-for-verifiable-software-engineering-9685-7964dccd

  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.22859 in a model README.md to link it from this page.

Datasets citing this paper 3

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.22859 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.