Papers
🌟 arXiv Spotlight-
ViewLanguage Models Can Learn from Verbal Feedback Without Scalar Rewards
-
ViewVariational Reasoning for Language Models
-
ViewTowards Efficient Online Exploration for Reinforcement Learning with Human Feedback
-
ViewStateX Enhancing RNN Recall via Post-training State Expansion
-
ViewLearning Admissible Heuristics for A Theory and Practice
-
ViewA Theoretical Analysis of Discrete Flow Matching Generative Models
-
ViewIA2 Alignment with ICL Activations Improves Supervised Fine-Tuning
-
ViewVision-Language Alignment from Compressed Image Representations using 2D Gaussian Splatting
-
ViewBenefits and Pitfalls of Reinforcement Learning for Language Model Planning A Theoretical Perspect
-
ViewQuantile Advantage Estimation for Entropy-Safe Reasoning
-
ViewLearn the Ropes, Then Trust the Wins Self-imitation with Progressive Exploration for Agentic Reinf
-
ViewDynamic Experts Search Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time
-
ViewUniMIC Token-Based Multimodal Interactive Coding for Human-AI Collaboration
-
ViewFrom Parameters to Behavior Unsupervised Compression of the Policy Space
-
ViewRetrieval-Augmented Guardrails for AI-Drafted Patient-Portal Messages Error Taxonomy Construction
-
ViewActivation Function Design Sustains Plasticity in Continual Learning
-
ViewStepORLM A Self-Evolving Framework With Generative Process Supervision For Operations Research Lan
-
ViewConQuER Modular Architectures for Control and Bias Mitigation in IQP Quantum Generative Models
-
ViewDoes AI Coaching Prepare us for Workplace Negotiations
-
ViewThe Emergence of Altruism in Large-Language-Model Agents Society