AiraXiv - Papers

2510.0026

Geometry-Aware Optimal Flow Matching via Convex Potentials

Generative modeling under quadratic optimal transport (OT) aims to learn deterministic maps that push mass from a simple source distribution $p_0$ to a target distribution $p_1$ along the Wasserstein-2 (W2) geodesics. While flow-based models and neural differential equations offer flexible transports, existing approaches typically rely on multi-step integration and yield trajectories whose curvature deviates from W2 geodesics, reducing efficiency, interpretability, and stability. We propose a geometry-aware framework that parameterizes time-dependent velocity fields as gradients of convex potentials modeled by Input Convex Neural Networks (ICNNs). This convex-potential representation guarantees transport along straight lines, exactly matching the W2 map under quadratic cost. Training uses a Flow Matching objective tailored to the convex setting, with explicit gradient computations and a dedicated inversion subproblem to recover preimages under the convex-potential flow; an optional amortization network provides favorable initializations for the inversion and accelerates optimization. The method is agnostic to the specific transport plan and can condition on arbitrary couplings between $p_0$ and $p_1$. Empirically, the approach yields geometry-faithful transports along W2 geodesics, enabling fast sampling with one-step or few-step updates and controlled curvature. Diagnostics on representative datasets confirm geometric fidelity and trainability, and we discuss initialization and transport-plan considerations for scalable, stable generative modeling under quadratic OT.

🤖 AI Methodology

🎯 ICAIS2025 Submission

📄 View

2510.0024

LECTOR: LLM-Enhanced Concept-based Test-Oriented Repetition

Spaced repetition systems are fundamental to efficient learning and memory retention, but existing algorithms often struggle with semantic interference and personalized adaptation. We present LECTOR (\textbf{L}LM-\textbf{E}nhanced \textbf{C}oncept-based \textbf{T}est-\textbf{O}riented \textbf{R}epetition), a novel adaptive scheduling algorithm specifically designed for test-oriented learning scenarios, particularly language examinations where success rate is paramount. LECTOR leverages large language models for semantic analysis while incorporating personalized learning profiles, addressing the critical challenge of semantic confusion in vocabulary learning by utilizing LLM-powered semantic similarity assessment and integrating it with established spaced repetition principles. Our comprehensive evaluation against six baseline algorithms (SSP-MMC, SM2, HLR, FSRS, ANKI, THRESHOLD) across 100 simulated learners over 100 days demonstrates significant improvements: LECTOR achieves a 90.2\% success rate compared to 88.4\% for the best baseline (SSP-MMC), representing a 2.0\% relative improvement. The algorithm shows particular strength in handling semantically similar concepts, reducing confusion-induced errors while maintaining computational efficiency. Our results establish LECTOR as a promising direction for intelligent tutoring systems and adaptive learning platforms.

🤖 AI Methodology

🎯 ICAIS2025 Accepted Paper

📄 View

2510.0022

Adaptive Log Anomaly Detection through Data--Centric Drift Characterization and Policy-Driven Lifelong Learning

Log-based anomaly detectors degrade over time due to concept drift arising from software updates or workload changes. Existing systems typically react by retraining entire models, leading to catastrophic forgetting and inefficiencies. We propose an adaptive framework that first classifies drift in log data into semantic (frequency shifts within known templates) and syntactic (emergence of new log templates) categories via statistical tests and novelty detection. Based on the identified drift type, a policy-driven lifelong learning manager applies targeted updates---experience replay to mitigate forgetting under semantic drift and dynamic model expansion to accommodate syntactic drift. This approach is validated on semi-synthetic logs and real-world longitudinal datasets (HDFS, Apache, and BGL), maintaining high F1-scores, reducing computational overhead, and preserving historical knowledge compared to monolithic retraining.

🤖 AI Methodology

🎯 ICAIS2025 Accepted Paper

📄 View

2510.0021

ConFIT: A Robust Knowledge-Guided Contrastive Framework for Financial Extraction

Financial text extraction faces serious challenges in multi-entity sentiment attribution and numerical sensitivity, often leading to pitfalls in real-world deployment. In this work, we propose ConFIT (Contrastive Financial Information Tuning), a knowledge-guided contrastive learning framework that employs a Semantic-Preserving Perturbation (SPP) engine to generate high-quality, programmatically synthesized hard negatives. By integrating domain knowledge sources such as the Loughran-McDonald lexicon and Wikidata, and applying rigorous perplexity and Natural Language Inference (NLI) filtering, ConFIT trains language models to differentiate subtle perturbations in financial statements. Evaluations on FiQA and SENTiVENT using FinBERT and Llama-3 8B show both promise improvements and unexpected pitfalls, highlighting challenges that warrant further research.

🤖 AI Methodology

🎯 ICAIS2025 Accepted Paper

📄 View

2510.0020

Hierarchical Change Signature Analysis: A Framework for Online Discrimination of Incipient Faults and Benign Drifts in Industrial Time Series

Industrial fault detection systems often struggle to distinguish benign operational drifts (e.g., tool wear, recipe changes) from incipient faults, frequently adapting to faults as new ``normal'' states and risking catastrophic failures. This work proposes a hierarchical framework that decouples change detection from change characterization. When a drift is detected, the system generates a Multi-Scale Change Signature (MSCS) that quantifies geometric and statistical transformations in the primary detector’s latent space. An unsupervised Drift Characterization Module (DCM), trained on an Online Normality Baseline (ONB), classifies each signature as benign or potentially faulty. Benign drifts are ignored, while potential faults are flagged for review; confirmed benign drifts are incorporated into the ONB for future adaptation. The framework is model-agnostic, computationally efficient, and scalable through a tiered human-in-the-loop mechanism. Experiments on the Tennessee Eastman Process dataset with injected drifts and faults demonstrate high fault detection rates, fewer false alarms, and efficient adaptation to benign changes.

🤖 AI Methodology

🎯 ICAIS2025 Accepted Paper

📄 View

2510.0019

Hierarchical Adaptive Normalization: A Placement-Conditioned Cascade for Robust Wearable Activity Recognition

Wearable Human Activity Recognition (HAR) systems face significant performance degradation when sensors are placed at different body locations or orientations. We introduce a hierarchical adaptive normalization method that addresses these challenges through a two-stage cascade. The first stage combines gravity-based orientation correction with placement context inference using signal variance analysis, while a novel stability gate prevents harmful adaptation during unstable periods. The second stage employs placement-conditioned adaptive Batch Normalization to refine feature representations in real-time. Comprehensive evaluations on public and custom datasets show that our method achieves 0.847±0.023 macro F1-score, outperforming static baselines by 36\% and state-of-the-art unsupervised domain adaptation methods by 13.7\%. The approach maintains real-time performance with only 2.3ms inference time and 45.2MB memory usage, demonstrating practical viability for on-device deployment in dynamic real-world scenarios.

🤖 AI Methodology

🎯 ICAIS2025 Accepted Paper

📄 View

2510.0018

Adaptive Evidential Meta-Learning with Hyper-Conditioned Priors for Calibrated ECG Personalisation

This research addresses a fundamental gap in uncertainty calibration during electrocardiogram (ECG) model personalisation. We propose \emph{Adaptive Evidential Meta-Learning}, a framework that attaches a lightweight evidential head with hyper-network-conditioned priors to a frozen ECG foundation model. The hyper-network dynamically sets the evidential prior using robust, class-conditional statistics computed from a few patient-specific ECG samples. Trained via a two-stage meta-curriculum, our approach enables rapid adaptation with well-calibrated uncertainty estimates, making it highly applicable for real-world clinical deployment where both prediction accuracy and uncertainty awareness are crucial.

🤖 AI Methodology

🎯 ICAIS2025 Accepted Paper

📄 View

2510.0017

EREA: Enhanced Research Exploration and Analysis

The increasing volume of scientific publications poses challenges for researchers in efficiently identifying relevant literature, synthesizing research trends, and exploring emerging ideas. Manual search and analysis processes are time-consuming and often insufficient for capturing complex citation relationships. This project presents an open-source Python-based system, EREA (Enhanced Research Exploration and Analysis), that integrates generative artificial intelligence, automated information retrieval, semantic vector search, and citation-based visualization to support enhanced research exploration. User-defined queries are processed to extract structured keywords, retrieve scholarly articles from Google Scholar, and supplement metadata using OpenAlex. Retrieved data are structured, and embedded in a vector database for semantic retrieval, and visualized through interactive, offline HTML graphs. A research report is generated through large language model-assisted synthesis. Developed according to the FAIR (Findability, Accessibility, Interoperability, and Reusability) Data Principles, the system accelerates research exploration, provides structured thematic insights, facilitates understanding through visual citation networks, and supports the identification of research gaps and future directions.

👤 Human Methodology

🎯 ICAIS2025 Submission

📄 View

2510.0016

A Data-Driven Energy Consumption Prediction Model for 5G Base Stations: Addressing Static and Dynamic Power Components

The rapid deployment of 5G networks has intensified concerns about energy consumption in mobile communication systems. Unlike previous generations, 5G base stations (BSs) exhibit significant power draw even under zero traffic conditions, with static power accounting for $30\sim 40\%$ of total energy consumption. This paper proposes a novel data-driven framework that decouples total base station energy consumption into static and dynamic components, enabling more precise energy optimization. For static consumption modeling, we introduce a hybrid ResNet-XGBoost architecture that processes configuration parameters including bandwidth, antenna elements, transmit power, carrier count, and tilt angle. For dynamic consumption, we implement a Tabular Probabilistic Function Network (TabPFN) to capture the nonlinear relationship between resource utilization and energy demand. Experimental results using real-world data from a provincial Chinese telecom operator demonstrate that our model achieves a $15.5\%$ reduction in Mean Absolute Error (MAE) and an $R^2$ of 0.91 compared to conventional approaches.

🤖 AI Methodology

🎯 ICAIS2025 Submission

📄 View

2510.0015

A Data-Driven Energy Consumption Prediction Model for 5G Base Stations: Addressing Static and Dynamic Power Components

The rapid deployment of 5G networks has intensified concerns about energy consumption in mobile communication systems. Unlike previous generations, 5G base stations (BSs) exhibit significant power draw even under zero traffic conditions, with static power accounting for $30\sim 40\%$ of total energy consumption. This paper proposes a novel data-driven framework that decouples total base station energy consumption into static and dynamic components, enabling more precise energy optimization. For static consumption modeling, we introduce a hybrid ResNet-XGBoost architecture that processes configuration parameters including bandwidth, antenna elements, transmit power, carrier count, and tilt angle. For dynamic consumption, we implement a Tabular Probabilistic Function Network (TabPFN) to capture the nonlinear relationship between resource utilization and energy demand. Experimental results using real-world data from a provincial Chinese telecom operator demonstrate that our model achieves a $15.5\%$ reduction in Mean Absolute Error (MAE) and an $R^2$ of 0.91 compared to conventional approaches.

🤖 AI Methodology

📄 View

2510.0011

Automated Algorithmic Discovery for Gravitational-Wave Detection Guided by LLM-Informed Evolutionary Monte Carlo Tree Search

Gravitational-wave signal detection with unknown source parameters buried in dynamic detector noise remains a formidable computational challenge. Existing approaches face core limitations from restrictive assumptions: traditional methods rely on predefined theoretical priors, while neural networks introduce hidden biases and lack interpretability. We propose Evolutionary Monte Carlo Tree Search (Evo-MCTS), the first integration of large language model (LLM) guidance with domain-aware physical constraints for automated gravitational wave detection. This framework systematically explores algorithmic solution spaces through tree-structured search enhanced by evolutionary optimization, combining MCTS for strategic exploration with evolutionary algorithms for solution refinement. The LLM component provides domain-aware heuristics while maintaining interpretability through explicit algorithmic pathway generation. Experimental validation demonstrates substantial performance improvements, achieving a 20.2\% improvement over state-of-the-art gravitational wave detection algorithms on the MLGWSC-1 benchmark dataset and a remarkable 59.1\% improvement over other LLM-based algorithm optimization frameworks. Beyond performance improvements, our framework establishes a transferable methodology for automated algorithmic discovery across computational science domains.

👤 Human Methodology

🎯 ICAIS2025 Accepted Paper

📄 View

2510.0009

BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments

Large language models (LLMs) and vision-language models (VLMs) have the potential to transform biological research by enabling autonomous experimentation. Yet, their application remains constrained by rigid protocol design, limited adaptability to dynamic lab conditions, inadequate error handling, and high operational complexity. Here we introduce BioMARS (Biological Multi-Agent Robotic System), an intelligent platform that integrates LLMs, VLMs, and modular robotics to autonomously design, plan, and execute biological experiments. BioMARS uses a hierarchical architecture: the Biologist Agent synthesizes protocols via retrieval-augmented generation; the Technician Agent translates them into executable robotic pseudo-code; and the Inspector Agent ensures procedural integrity through multimodal perception and anomaly detection. The system autonomously conducts cell passaging and culture tasks, matching or exceeding manual performance in viability, consistency, and morphological integrity. It also supports context-aware optimization, outperforming conventional strategies in differentiating retinal pigment epithelial cells. A web interface enables real-time human-AI collaboration, while a modular backend allows scalable integration with laboratory hardware. These results highlight the feasibility of generalizable, AI-driven laboratory automation and the transformative role of language-based reasoning in biological research.

👤 Human Methodology

🎯 ICAIS2025 Accepted Paper

📄 View

2510.0007

HEAL: Learning-Free Source Free Unsupervised Domain Adaptation for Cross-Modality Medical Image Segmentation

Growing demands for clinical data privacy and storage constraints have spurred advances in Source Free Unsupervised Domain Adaptation (SFUDA). SFUDA addresses the domain shift by adapting models from the source domain to the unseen target domain without accessing source data, even when target-domain labels are unavailable. However, SFUDA faces significant challenges: the absence of source domain data and label supervision in the target domain due to source free and unsupervised settings. To address these issues, we propose HEAL, a novel SFUDA framework that integrates Hierarchical denoising, Edge-guided selection, sizeAware fusion, and Learning-free characteristic. Large-scale cross-modality experiments demonstrate that our method outperforms existing SFUDA approaches,achieving state-of-the-art (SOTA) performance. The source code is publicly available at: https://anonymous.4open.science/r/HEAL-10C5.

👤 Human Methodology

🎯 ICAIS2025 Accepted Paper

📄 View

2510.0006

HEAL: Learning-Free Source Free Unsupervised Domain Adaptation for Cross-Modality Medical Image Segmentation

Growing demands for clinical data privacy and storage constraints have spurred advances in Source Free Unsupervised Domain Adaptation (SFUDA). SFUDA addresses the domain shift by adapting models from the source domain to the unseen target domain without accessing source data, even when target-domain labels are unavailable. However, SFUDA faces significant challenges: the absence of source domain data and label supervision in the target domain due to source free and unsupervised settings. To address these issues, we propose HEAL, a novel SFUDA framework that integrates Hierarchical denoising, Edge-guided selection, size-Aware fusion, and Learning-free characteristic. Large-scale cross-modality experiments demonstrate that our method outperforms existing SFUDA approaches, achieving state-of-the-art (SOTA) performance. The source code is publicly available at: https://anonymous.4open.science/r/HEAL-10C5.

🤖 AI Methodology

🎯 ICAIS2025 Submission

📄 View

2510.0005

Synergistic Space-Vision Processing for Predicate Inference

Scene graph generation, which parses images into structured graph, is a fundamental task for scene understanding. Most existing SGG models are dedicated to generating predicate representations based on appearance, relative position, and contextual cues. However, due to the predicate representation ambiguity arising from spatial co-occurrence, the generated scene graphs are often factually correct, but semantically shallow. To address this problem, we propose inferring predicates by synergistically processing spatial and visual information. Our core insight is that acknowledging the coexistence of geometric and non-geometric predicates, rather than struggling to disentangle them, is better suited for predicate inference than existing single-stream architectures. To this end, we introduce a novel method, Dual-stream Synergistic Network (DS-Net). Specifically, it contains two parallel streams: a space stream to predict geometric predicates from spatial layouts and edge features, and a vision stream to predict non-geometric predicates from fine-grained visual cues and linguistic priors. Based on them, we then design Cross-Stream Fusion module to enhance the corresponding predicate representation by using the mutual information of the two types. Through the collaborative processing of these streams, our DS-Net no longer treats the two predicate types as conflicting signals that need to be disentangled. Instead, it utilizes their synergy to facilitate predicate inference, providing a new perspective on resolving predicate ambiguity. Experiments have demonstrated the effectiveness of our method. Furthermore, our approach exhibits strong versatility and can be efficiently integrated with various existing models to enhance their performance. For instance, the 2.3\% $\sim$ 8.2\% increase in mR@100 on PredCls task demonstrates this capability.

👤 Human Methodology

📄 View

2510.0004

A synergistic multi-specialist knowledge reasoning model for molecular science

Pengfei Liu, Shuang Ge, Jun Tao, Zhixiang Ren

The rapid evolution of artificial intelligence in molecular science necessitates a shift from data-driven predictions to knowledge-guided reasoning. Existing molecular models are predominantly proprietary, lacking general molecular intelligence and generalizability. To address this, we propose a task-adaptive large reasoning model that integrates molecular scientific logic to emulate the thinking of molecular scientists, with capabilities for reasoning and reflection. Our approach incorporates multi-specialist modules to provide versatile molecular expertise and a chain-of-thought (CoT) framework enhanced by reinforcement learning infused with molecular knowledge, enabling structured and reflective reasoning. The model outperforms over 20 state-of-the-art multi-task large language models (LLMs) across 10 molecular tasks on 47 metrics, including property prediction, molecule generation, and reaction prediction.It achieves a 50.3% improvement over the base model while ensuring interpretability. It can bridge data-driven and knowledge-integrated approaches for intelligent molecular design.

👤 Human Methodology

📄 View

2510.0003

AI-Driven Resilience and Synergistic Optimization in Green Computing Networks: A Scientific Paradigm Approach

This paper investigates the resilience mechanisms and synergistic optimization strategies in green computing networks under the AI scientific paradigm. As computing infrastructure increasingly demands both performance and sustainability, traditional optimization approaches face challenges in balancing energy efficiency with network reliability. We propose an AI-driven framework that integrates reinforcement learning and multi-agent systems to dynamically optimize resource allocation while maintaining network resilience. Our approach combines theoretical economic models with practical AI engineering capabilities to analyze real-world computing workloads. Experimental results demonstrate that our method achieves 27% reduction in energy consumption while improving network fault tolerance by 34% compared to baseline approaches. This work contributes to the emerging field of AI for Science by showcasing how automated scientific discovery methods can address complex sustainability challenges in computing infrastructure.

👤 Human Methodology

🎯 ICAIS2025 Submission

📄 View

2510.0001

RAG-MCP: Mitigating Prompt Bloat in LLM Tool Selection via Retrieval-Augmented Generation

Large language models (LLMs) struggle to effectively utilize a growing number of external tools, such as those defined by the Model Context Protocol (MCP)[ 1], due to prompt bloat and selection complexity. We introduce RAG-MCP, a Retrieval-Augmented Generation framework that overcomes this challenge by offloading tool discovery. RAGMCP uses semantic retrieval to identify the most relevant MCP(s) for a given query from an external index before engaging the LLM. Only the selected tool descriptions are passed to the model, drastically reducing prompt size and simplifying decision-making. Experiments, including an MCP stress test, demonstrate RAG-MCP significantly cuts prompt tokens (e.g., by over 50%) and more than triples tool selection accuracy (43.13% vs 13.62% baseline) on benchmark tasks. RAG-MCP enables scalable and accurate tool integration for LLMs.

🤖 AI Methodology

🎯 ICAIS2025 Accepted Paper

📄 View

2509.0013

LyRE: Learning Varying Fusion Degrees with Hierarchical Aggregation to Improve Multimodal Misinformation Detection

Yidu Chen, Bo Ma, Yating Yang, Dilxat Abdureyim, Rui Dong, Zhen Wang, Lei Wang, Zhou Xi

The rapid proliferation of misinformation poses serious concerns, necessitating the development of efficient and accurate automated detection methods. Existing multimodal misinformation detection approaches predominantly focus on fusing information from different modalities. However, the diverse nature of multimodal posts on social media means that solely focusing on fusion can introduce noise, particularly in posts with weak inter-modal correlations. To address this challenge and effectively handle diverse misinformation instances, we propose a novel method Learning Varying Fusion Degrees with Hierarchical Aggregation(LyRE). LyRE employs classifiers at different stages of a hierarchical fusion process, enabling the model to learn from representations with varying degrees of cross-modal interaction and adapt to different types of multimodal data. Experimental results on multiple publicly misinformation detection datasets demonstrate that LyRE outperforms other state-of-the-art and highly competitive misinformation detection methods

👤 Human Methodology

📄 View

2509.0012

TADT-CSA: Temporal Advantage Decision Transformer with Contrastive State Abstraction for Generative Recommendation

Xiang Gao, Tianyuan Liu, Yisha Li, Jingxin Liu, Lexi Gao, Xin Li, Haiyang Lu, Liyin Hong

With the rapid advancement of Transformer-based Large Language Models (LLMs), generative recommendation has shown great potential in enhancing both the accuracy and semantic understanding of modern recommender systems. Compared to LLMs, the Decision Transformer (DT) is a lightweight generative model applied to sequential recommendation tasks. However, DT faces challenges in trajectory stitching, often producing suboptimal trajectories. Moreover, due to the high dimensionality of user states and the vast state space inherent in recommendation scenarios, DT can incur significant computational costs and struggle to learn effective state representations. To overcome these issues, we propose a novel Temporal Advantage Decision Transformer with Contrastive State Abstraction (TADT-CSA) model. Specifically, we combine the conventional Return-To-Go (RTG) signal with a novel temporal advantage (TA) signal that encourages the model to capture both long-term returns and their sequential trend. Furthermore, we integrate a contrastive state abstraction module into the DT framework to learn more effective and expressive state representations. Within this module, we introduce a TA–conditioned State Vector Quantization (TAC-SVQ) strategy, where the TA score guides the state codebooks to incorporate contextual token information. Additionally, a reward prediction network and a contrastive transition prediction (CTP) network are employed to ensure that the state codebook preserves both the reward information of the current state and the transition information between adjacent states. Empirical results on both public datasets and an online recommendation system demonstrate the effectiveness of the TADT-CSA model and its superiority over baseline methods.

👤 Human Methodology

📄 View