论文arxiv cs.LG · 2mo ago需要关注

Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models

分类释义：学术论文 / 技术报告

TL;DR

arXiv:2605.13935v1 Announce Type: new Abstract: Diffusion language models are a promising alternative to autoregressive models, yet post-training methods for them largely adapt reward-maximizing objectives. We identify a central failure mode in this setting we call trajectory locking: sampled reward-driven updates over-concentrate probability mass onto a narrow set of denoising paths, reducing coverage of alternative correct solutions under repeated sampling. To address this, we propose TraFL (T

关键要点

01arXiv:2605.13935v1 Announce Type: new Abstract: Diffusion language models are a promising alternative to autoregressive models。
02yet post-training methods for them largely adapt reward-maximizing objectives. We identify a central failure mode in this setting we call trajectory locking: sampled reward-driven updates over-concentrate probability mass onto a narrow set of denoising paths。
03reducing coverage of alternative correct solutions under repeated sampling. To address this。
04we propose TraFL (T。

为什么值得关注

对你的工程实践意味着什么

LLM 实时生成MiniMax-M2.7缓存命中

角色	你应该做什么
Tech Lead	评估扩散语言模型路线 vs 自回归模型的技术选型，参考此研究判断后训练稳定性
应用工程师	暂无直接影响，关注扩散语言模型采样多样性问题的最佳实践即可
运维 / 平台	暂无直接影响，了解即可
产品 / 业务	暂无直接影响，了解此研究进展即可

阅读原文 ↗来源：arxiv cs.LG

Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models

关键要点

对你的工程实践意味着什么

同类资讯

Sympathetic Framing: Evaluating AI Alignment across Sociodemographic Groups

Recursive transformers for semiconductor thermo-mechanical reliability

LayerRAG-Bench: A Cross-Layer Reliability Benchmark for Agentic Retrieval-Augmented Generation