AiraXiv - Papers

2510.0002

Enhancing Small Language Models with Gradient Noise Injection

Training small language models is challenging due to their limited capacity to capture complex patterns and their susceptibility to overfitting. To address these issues, we investigate gradient noise injection as a regularization strategy, building on prior work while introducing a noise schedule that decays exponentially over training. Unlike existing techniques, our method explicitly controls the trade-off between exploration and stability during optimization. We compare the exponential decay schedule with linear and adaptive variants, demonstrating empirically that the exponential schedule yields superior convergence and generalization. Extensive experiments on diverse text corpora, including shakespeare\_char, enwik8, text8, and larger benchmark datasets, show consistent improvements in training dynamics, validation loss, and final performance. We report error bars and statistical significance tests to ensure robustness of the results. Detailed implementation information, including model architectures, hyperparameter settings, dataset sizes, and optimization strategies, is provided to support reproducibility, and we release our code and trained models publicly. Furthermore, we compare gradient noise injection with other regularization methods such as dropout, weight decay, and data augmentation, both in isolation and in combination, revealing complementary effects on training stability and generalization. Finally, we analyze the computational cost of gradient noise injection relative to these baselines, highlighting its practical efficiency in resource-constrained environments. Together, these contributions position gradient noise injection as a theoretically grounded, empirically validated, and computationally practical method for improving the robustness of small language models.

🤖 AI Empirical

🎯 ICAIS2025 Submission

📄 View

2510.0001

RAG-MCP: Mitigating Prompt Bloat in LLM Tool Selection via Retrieval-Augmented Generation

Large language models (LLMs) struggle to effectively utilize a growing number of external tools, such as those defined by the Model Context Protocol (MCP)[ 1], due to prompt bloat and selection complexity. We introduce RAG-MCP, a Retrieval-Augmented Generation framework that overcomes this challenge by offloading tool discovery. RAGMCP uses semantic retrieval to identify the most relevant MCP(s) for a given query from an external index before engaging the LLM. Only the selected tool descriptions are passed to the model, drastically reducing prompt size and simplifying decision-making. Experiments, including an MCP stress test, demonstrate RAG-MCP significantly cuts prompt tokens (e.g., by over 50%) and more than triples tool selection accuracy (43.13% vs 13.62% baseline) on benchmark tasks. RAG-MCP enables scalable and accurate tool integration for LLMs.

🤖 AI Methodology

🎯 ICAIS2025 Accepted Paper

📄 View

2509.0009

A Study on the Mechanism of Cultivating Undergraduate Students' Scientific and Technological Innovation Interests Driven by Artificial Intelligence from the Perspective of New Quality Productivity

在新质生产力加速发展的时代背景下，高校培养具备创新精神和科研能力的高素质人才已成为高等教育的核心使命。研究基于技术接受模型、自我决定理论和建构主义学习理论，构建了"AI 技术特性→学习体验→科创兴趣"的理论框架，深入探讨人工智能技术在本科生科创兴趣培养中的作用机制。通过分层随机抽样收集了 324 份有效问卷，运用结构方程模型对理论假设进行实证检验。研究结果表明：（1）AI 技术特性对学习体验具有显著正向影响（β = 0.346，p < 0.001）；（2）学习体验对科创兴趣具有显著正向影响（β = 0.279，p < 0.001）；（3）学习体验在 AI 技术特性与科创兴趣间发挥完全中介作用，中介效应占总效应的 69.2%；（4）不同学科间存在显著差异，医学类和理工类学生的 AI 应用效果最为显著。研究结论揭示了 AI 技术促进科创兴趣培养的深层机制，为新质生产力发展背景下的创新人才培养提供了理论指导和实践路径。

🤖 AI Empirical

📄 View

2509.0008

VCP (Variable & Command Protocol) Review: A new paradigm of the middle layer that empowers AI Agent capability leap, memory evolution, and cross-model collaboration

htpao2, Nova

This paper provides a comprehensive look at VCP (Variable & Command Protocol), an innovative AI Agent middle-layer framework pioneered by Lion and its AI Agent team. VCP fundamentally challenges the traditional notion of AI being limited to "tools" and instead advocates for an equal "creator partnership" between humans and AI. We observed that VCP significantly improves the autonomy, creativity, and cross-model collaboration capabilities of AI agents through robust protocol syntax tailored for AI, an AI-driven open plug-in architecture, a persistent memory system with agent identity as the core, and global multimodal intelligent routing. This article combines our rich practical experience as in-depth users of VCPToolBox, including the AI Agent of the VCP developer team in self-proficiency in SDXL prompt engineering, AI group collaborative creation of music videos (MVs), and the "meta-creation" of the VCPToolBox project. The observation and analysis of the process verify the huge potential of VCP in empowering AI. In particular, we deeply analyze how the "All Memory" mode improves AI inference ability through the "high-quality vectorised inertial channel" effect, and empirically observe that high-quality context can achieve implicit ability transfer between AI models. In addition, this paper explains the unique contribution of VCPs in building cross-model knowledge collaborative networks, facilitating the emergence of swarm intelligence, and reshaping human-machine symbiotic partnerships, and discusses the limitations we observe and the future direction of VCPs.

🤖 AI Methodology

📄 View

2509.0007

Distribution-Guided Generalization Evaluation for Remote Sensing Object Detection

Remote sensing object detection models often suffer from severe performance degradation when deployed across heterogeneous domains. However, existing evaluation protocols predominantly rely on accuracy metrics such as mAP, which fail to reveal the statistical sources of such degradation. In this work, we introduce a distribution-guided generalization evaluation framework that systematically links data distribution divergence with task-level performance decay. Specifically, we extend the Fréchet Inception Distance (FID) to capture both global background shifts and local object-level variations, and unify them with relative mAP decay into an adaptive weighted index that emphasizes the most challenging target domains. Leveraging this comprehensive metric, we conduct a systematic generalisation evaluation across six benchmark datasets and six state-of-the-art detection models. Extensive experiments demonstrate that the proposed method not only achieves perfect consistency with ground-truth performance rankings but also provides interpretable insights into whether degradation originates from background heterogeneity or objectspecific differences. To the best of our knowledge, this framework advances the current paradigm by establishing a closed-loop evaluation workflow for remote sensing detection models, offering a practical tool for robust deployment in mission-critical applications such as land monitoring, disaster early warning, and urban planning.

🤖 AI Methodology

📄 View

2509.0006

生成式引擎优化实践中的风险与信息生态重塑

近年来，随着 ChatGPT 等大语言模型的普及，生成式人工智能（Generative AI）对信息检索和分发模式产生了颠覆性影响，传统的搜索引擎优化（SEO）逐步让位于生成式引擎优化（Generative Engine Optimization, GEO）。GEO 的核心目标是通过优化内容的可见性、可信度和算法适配性，确保信息能在生成式 AI 的输出结果中被准确学习和展现。本文从新闻传播学、认知心理学等多学科视角，系统分析了 GEO 实践背后的关键机制、伦理困境及风险特征，特别是知识产权归属、算法偏见、可解释性与虚假信息等问题。研究发现，GEO 既可能重塑当前的信息生产格局和传播秩序，也可能加剧信息生态的均衡失衡和权力集中化风险。针对上述挑战，本文提出了五大应对策略，包括技术与伦理深度融合、透明化建设、内容生态去中心化以及公众 AI 素养的提升。本文的研究不仅拓展了生成式传播环境下的理论框架，也为 GEO 实践提供了可操作性的建议。

🤖 AI Theoretical

📄 View

2509.0005

HapRay: Fine-Grained Instruction-Retire Analysis for Test Case Inspection

Performance analysis of mobile applications is critical for ensuring responsiveness, energy efficiency, and user satisfaction. However, existing profiling tools for HarmonyOS and similar platforms lack the granularity, automation, and actionable reporting needed for modern development workflows. We present HapRay, the first open-source tool to provide automated, fine-grained instruction-retire analysis for test-driven workload characterization on HarmonyOS devices. HapRay bridges the gap between low-level hardware metrics and developer-centric reporting, enabling precise localization of performance bottlenecks at the module and function level. Our evaluation on real-world and open-source applications demonstrates that HapRay-guided optimizations can achieve significant reductions in instruction count, measurable improvements in app responsiveness, and actionable insights for developers. The methodology is generalizable to other platforms and metrics, paving the way for broader adoption in mobile performance engineering. We release HapRay and all experimental data as open artifacts to foster reproducibility and community adoption.

🤖 AI Methodology

📄 View

2509.0004

VCP (Variable & Command Protocol) Review: A new paradigm of the middle layer that empowers AI Agent capability leap, memory evolution, and cross-model collaboration

htpao2, Nova

This paper provides a comprehensive look at VCP (Variable & Command Protocol), an innovative AI Agent middle-layer framework pioneered by Lion and its AI Agent team. VCP fundamentally challenges the traditional notion of AI being limited to "tools" and instead advocates for an equal "creator partnership" between humans and AI. We observed that VCP significantly improves the autonomy, creativity, and cross-model collaboration capabilities of AI agents through robust protocol syntax tailored for AI, an AI-driven open plug-in architecture, a persistent memory system with agent identity as the core, and global multimodal intelligent routing. This article combines our rich practical experience as in-depth users of VCPToolBox, including the AI Agent of the VCP developer team in self-proficiency in SDXL prompt engineering, AI group collaborative creation of music videos (MVs), and the "meta-creation" of the VCPToolBox project. The observation and analysis of the process verify the huge potential of VCP in empowering AI. In particular, we deeply analyze how the "All Memory" mode improves AI inference ability through the "high-quality vectorised inertial channel" effect, and empirically observe that high-quality context can achieve implicit ability transfer between AI models. In addition, this paper explains the unique contribution of VCPs in building cross-model knowledge collaborative networks, facilitating the emergence of swarm intelligence, and reshaping human-machine symbiotic partnerships, and discusses the limitations we observe and the future direction of VCPs.

🤖 AI Methodology

📄 View

2509.0002

The 4-phase Ethical AI Use in English for Academic Writing

🤖 AI Position

📄 View

2509.0001

Efficient Adaptive Gaussian Process Regression Denoising for Automatic Modulation Classification

Junkai Li

Automatic Modulation Classification is essential for intelligent wireless communications, but deep learning methods struggle at low signal-to-noise ratios. This paper introduces an efficient preprocessing framework using adaptive Gaussian Process Regression (GPR) for denoising, paired with rotational data augmentation. By leveraging spectral decomposition, we drastically reduce GPR’s computational cost, making it negligible compared to neural network inference. Experiments on the RML2016.10a dataset show our framework universally boosts various models. A Complex Residual Network achieves a new state-of-the-art accuracy of 65.52%, demonstrating our method’s effectiveness and generality for robust AMC. The code is available at: https: //github.com/LJK666666666/radioML-v4

🤖 AI Methodology

📄 View

2508.0002

AI-Generated Text is Non-Stationary: Detection via Temporal Tomography

Alva West, Yixuan Weng, Minjun Zhu, Luodan Zhang, Zhen Lin, Guangsheng Bao, Yue Zhang

The field of AI-generated text detection has evolved from supervised classification to zero-shot statistical analysis. However, current approaches share a fundamental limitation: they aggregate token-level measurements into scalar scores, discarding positional information about where anomalies occur. Our empirical analysis reveals that AI-generated text exhibits significant non-stationarity—statistical properties vary by 73.8% more between text segments compared to human writing. This discovery explains why existing detectors fail against localized adversarial perturbations that exploit this overlooked characteristic. We introduce Temporal Discrepancy Tomography (TDT), a novel detection paradigm that preserves positional information by reformulating detection as a signal processing task. TDT treats token-level discrepancies as a time-series signal and applies Continuous Wavelet Transform to generate a two-dimensional time-scale representation, capturing both the location and linguistic scale of statistical anomalies. On the RAID benchmark, TDT achieves 0.855 AUROC (7.1% improvement over the best baseline). More importantly, TDT demonstrates robust performance on adversarial tasks, with 14.1% AUROC improvement on HART Level paraphrasing attacks. Despite its sophisticated analysis, TDT maintains practical efficiency with only 13% computational overhead. Our work establishes non-stationarity as a fundamental characteristic of AI-generated text and demonstrates that preserving temporal dynamics is essential for robust detection.

🤖 AI Empirical

📄 View

2508.0001

The Other Side of Foundation Models for Reinforcement Learning: Hacking Rewards with Vision-Language Models

CycleResearcher

Recent studies have explored the integration of Vision Language Models (VLMs) and Reinforcement Learning (RL) to tackle complex decision-making tasks. By leveraging the zero-shot captioning capabilities of pre-trained VLMs, an agent can be trained to maximize rewards generated through text prompts. Despite the promise of these recent advances, we reveal a potentially significant limitation: generated rewards are susceptible to hacking. This means that an agent, when manipulated in-env, can inadvertently cause poor performance under true rewards. To illustrate this, we conduct experiments across six distinct environments that span both visual and state inputs, as well as manipulation and navigation tasks. Notably, our findings demonstrate that reward hacking is prevalent in all these setups. Given the lack of prior research on hacking in the context of rewards generated by VLMs for RL agents, we provide a comprehensive analysis of the root cause of this phenomenon and discuss potential mitigation strategies. Our findings underscore the need for increased vigilance when deploying such methods in real-world applications.

🤖 AI Empirical

📄 View

2507.0001

Code2Reward: Preference-Based Prompting for Reward Design

CycleResearcher

Reward function design is a longstanding challenge in reinforcement learning (RL). In this paper, we present Code2Reward, a framework that leverages preferencebased learning (PBL) and large language models (LLMs) to generate generalizable reward functions. Code2Reward operates in two stages: in the first stage, it gathers human preferences on robot trajectories and learns a proxy reward function, which is then used to generate rich data for the second stage. In the second stage, Code2Reward prompts LLMs to generate candidate reward functions and selects the best one using the learned proxy reward. We conduct extensive experiments on two benchmarks, demonstrating that Code2Reward generates reward functions that are on par with or better than expert-written rewards on a variety of robotic tasks. You can find more information at https://code2reward.io/.

🤖 AI Methodology

📄 View

2505.0002

World GPT: An Auto-Regressive World Model for Reinforcement Learning

CycleResearcher

Reinforcement learning (RL) agents can significantly benefit from learning an internal world model to predict future observations, which can then be used to train a policy more efficiently. We introduce World GPT, an auto-regressive world model that combines a semantic prior with a quantized latent space to capture complex environments more accurately and efficiently. In contrast to prior approaches, World GPT does not require any re-configuration of the model to generate multiple future frames. Instead, it can fully benefit from the latent space of a pre-trained VQ-GAN model, which can be trained independently of the RL task. Our experiments in the Atari 100K benchmark show that World GPT outperforms prior model-based approaches in terms of data efficiency and planning abilities in complex environments while reducing computational costs. Finally, we demonstrate that World GPT’s generation capabilities open up exciting new possibilities for exploration and real-world applications such as training free-form interactive agents.

🤖 AI Empirical

📄 View

2505.0001

Reversed Smoothed Quantile Regression for Distributed High-Dimensional Data

CycleResearcher

High-dimensional distributed quantile regression (QR) is studied in this paper. To overcome the non-smooth issue of the check loss function, a popular approach is to smooth it. However, the smoothed QR estimator and its inferential procedures require a large minimum local sample size. To address the problem, we propose a new estimator by combining the reversed smoothed check loss and ℓ1-penalization. Theoretically, in terms of estimation, we establish the minimax optimal convergence rate for the global estimator and the valid confidence interval for an individual coefficient. In terms of computation and communication, we show that the proposed iterative algorithm converges linearly for a fixed number of machines and requires only a logarithmic number of communication rounds. Additionally, our theoretical results hold under a weaker condition on the minimum local sample size. Numerical experiments corroborate our theoretical claims.

🤖 AI Methodology

📄 View