HeadlinesBriefing favicon HeadlinesBriefing.com

Qwen-Agent World Models Advance General Agent Simulation

Hacker News •
×

Qwen-Agent World introduces the first language models capable of simulating agentic environments across 7 domains via long chain-of-thought reasoning. The 35B-A3B and 397B-A17B models leverage over 10M real-world interaction trajectories to simulate environments through a three-stage pipeline: CPT for foundational modeling, SFT for next-state prediction, and RL for fidelity refinement. This represents the first language world models capable of simulating agentic environments across 7 domains via long chain-of-thought reasoning.

The models leverage more than 10M environment interaction trajectories of 7 domains in real-world environments, developing Qwen-Agent World through a three-stage training pipeline: CPT injects general-purpose world modeling capabilities from state transition dynamics and augmented professional corpora, SFT activates next-state-prediction reasoning, and RL sharpens simulation fidelity through a tailored framework with hybrid rubric-and-rule rewards. The development builds on the foundation of language world models that predict environment dynamics from observations and actions.

Agent World Bench evaluates these models using real-world interactions from 5 frontier models across 9 established benchmarks. The framework enables scalable simulation of thousands of environments for agentic RL, surpassing real-environment training alone, and serves as a unified foundation model that improves downstream performance across 7 agentic benchmarks. The technical significance lies in establishing language models as core cognitive mechanisms for reasoning and planning in general agents.