This paper introduces an adaptive token-ordering approach for masked diffusion models (MDMs) and autoregressive models (ARMs), aiming to optimize inference efficiency by dynamically adjusting the token generation order. The core idea revolves around using reinforcement learning (RL) to learn a policy that prioritizes the generation of easier tokens, thereby reducing computational cost and improving overall performance. The authors propose a novel framework where a π-learner, trained via entropy-regularized soft Q-learning, dynamically determines the token generation sequence based on the cumulative predictive V-information. This approach is motivated by the observation that not all tokens are equally difficult to predict, and by focusing on easier tokens first, the model can achieve better performance with less computation. The method is evaluated across a range of tasks, including structured reasoning puzzles, text generation, and downstream benchmarks like HumanEval and Math, demonstrating improvements in metrics such as perplexity, negative log-likelihood (NLL), and pass@1 accuracy. The authors introduce three adaptive inference oracles—vanilla, Top-K, and Margin—to further refine the token selection process. The empirical results suggest that the proposed adaptive token ordering strategy leads to significant reductions in perplexity and improvements in puzzle-solving accuracy compared to fixed or random token ordering approaches. The paper's main contribution lies in the novel application of reinforcement learning to the problem of token ordering, providing a dynamic and adaptive approach to sequence generation that is sensitive to the difficulty of individual tokens. The authors also provide a detailed formulation of the cumulative predictive V-information objective, linking token difficulty with inference order, which adds clarity to the proposed method. Overall, the paper presents a promising approach to optimizing inference in generative models, with potential implications for a wide range of applications. However, as I will discuss in detail, there are several areas where further investigation and clarification are needed to fully assess the method's practical applicability and robustness.