

To be or Not to be, That‘s a Token——论文阅读笔记——Beyond the 80/20 Rule和R2R
本篇文章是针对两篇关注于LLM生成的COT中关键Token的论文的阅读笔记,第一篇叫 Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning 第二篇叫 R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routin
