AiraXiv - Papers

2510.0089

BasketVision: Benchmarking MLLMs' Grasp of Complex Dynamic Systems

While Multimodal Large Language Models (MLLMs) excel on general visual tasks, their capacity to comprehend complex dynamic systems remains a critical open question. Such systems, governed by physical laws, explicit rules, and multi-agent interactions, form the fabric of the real world. To facilitate a systematic diagnosis of current MLLM limitations, we introduce BasketVision, a new benchmark that leverages professional basketball as a microcosm for these dynamic environments. BasketVision probes model capabilities across seven dimensions—spanning perception, reasoning, and prediction—through 6,000 curated, bilingual questions from professional game data. An automated data generation pipeline underpins the benchmark, ensuring both scalability and fine-grained precision. Our evaluation of 23 leading models reveals a chasm between machine and human cognition: human experts attain 96.34% accuracy, while the premier model, GPT-4o, achieves only 63.15%. The analysis pinpoints spatial reasoning as a persistent bottleneck and uncovers specific patterns of task specialization. BasketVision thus serves as a crucial apparatus for charting the frontiers of MLLMs and steering future work toward more robust reasoning in dynamic visual worlds.

👤 Human Methodology

🎯 ICAIS2025 Accepted Paper

📄 View

2510.0088

MatEvolve: A Synergistic Symbolic–LLM Agent for Multi-Objective Materials Design

Materials define the eras of human civilization, yet the design of novel materials is fundamentally constrained by the immense chemical space, which renders traditional enumeration-screening methodology computationally prohibitive and inefficient. This paper introduces a paradigm shift towards insight-exploration-validation, enabling an intelligent and evolutionary exploration of material design pathways. To actualize this paradigm, we propose MatEvolve, a synergistic symbolic–LLM agent that reconceptualizes material design as a closed-loop, programmatic evolution task. Central to MatEvolve is a novel symbolic formalism, Material Edit Language, which empowers the agent to programmatically take chemical operations. The exploration trajectory is directed by a multifaceted guidance strategy, comprising a dynamic knowledge injection mechanism and a two-stage exploration strategy that balances broad exploration and deep optimization. Furthermore, a multi-objective fitness landscape ensures directional and efficient navigational guidance. These integrated strategies contribute to a 32.2% improvement over direct material structure modification. Crucially, comparisons demonstrate that our insight-exploration-validation paradigm outperforms the traditional enumeration-screening approach by 33.6%, highlighting its superior efficacy in navigating vast design spaces.

👤 Human Methodology

🎯 ICAIS2025 Accepted Paper

📄 View

2510.0085

AI Mathematician as a Partner in Advancing Mathematical Discovery

Artificial intelligence (AI) has demonstrated impressive progress in mathematical reasoning, yet its integration into the practice of mathematical research remains limited. In this study, we investigate how the AI Mathematician (AIM) system can operate as a research partner rather than a mere problem solver. Focusing on a challenging problem in homogenization theory, we analyze the autonomous reasoning trajectories of AIM and incorporate targeted human interventions to structure the discovery process. Through iterative decomposition of the problem into tractable subgoals, selection of appropriate analytical methods, and validation of intermediate results, we reveal how human intuition and machine computation can complement one another. This collaborative paradigm enhances the reliability, transparency, and interpretability of the resulting proofs, while retaining human oversight for formal rigor and correctness. The approach leads to a complete and verifiable proof, and more broadly, demonstrates how systematic human-AI co-reasoning can advance the frontier of mathematical discovery.

👤 Human Methodology

🎯 ICAIS2025 Accepted Paper

📄 View

2510.0042

ICIMBench: An In-Context Iterative Molecular Design Benchmark for Large Language Models

Large language models (LLMs) are rapidly transforming scientific discovery, showing promise in hypothesis generation, literature understanding, and symbolic reasoning. Yet, their capacity to conduct iterative, feedback-driven molecular design---a hallmark of real-world drug and materials discovery---remains underexplored. Existing benchmarks typically cast molecular tasks as one-shot question-answering or text-to-molecule translation, neglecting the iterative propose-evaluate-refine process central to scientific practice. We propose \textbf{ICIMBench}, an \textit{In-Context Iterative Molecular Design Benchmark} that evaluates LLMs in multi-turn molecular design episodes. In each task, the model receives a natural-language specification, generates candidate molecules in SMILES format, and iteratively refines them based on deterministic oracle feedback from RDKit. We introduce the \textbf{NumEval} metric---the number of evaluations required to satisfy the target---which captures both performance efficiency and robustness under realistic evaluation budgets. Experiments on frontier models (GPT-5, DeepSeek-V3.2, Intern-S1) show that while single-property design is largely solved (NumEval $=1$) by state-of-the-art LLMs like GPT-5, multi-property optimization remains a strong challenge, especially under coupled constraints such as lipophilicity and scaffold similarity. ICIMBench provides a principled framework for probing the in-context reasoning and adaptive optimization abilities of LLMs, paving the way toward autonomous, language-driven molecular discovery.

👤 Human Empirical

🎯 ICAIS2025 Submission

📄 View

2510.0041

Graph neural network for colliding particles with an application to sea ice floe modeling

This paper introduces a novel approach to sea ice modeling using Graph Neural Networks (GNNs), utilizing the natural graph structure of sea ice, where nodes represent individual ice pieces, and edges model the physical interactions, including collisions. This concept is developed within a one-dimensional framework as a foundational step. Traditional numerical methods, while effective, are computationally intensive and less scalable. By utilizing GNNs, the proposed model, termed the Collision-captured Network (CN), integrates data assimilation (DA) techniques to effectively learn and predict sea ice dynamics under various conditions. The approach was validated using synthetic data, both with and without observed data points, and it was found that the model accelerates the rendering of trajectories without compromising accuracy. This advancement offers a more efficient tool for forecasting in marginal ice zones (MIZ) and highlights the potential of combining machine learning with data assimilation for more effective and efficient modeling.

👤 Human Application

🎯 ICAIS2025 Submission

📄 View

2510.0040

A Fuzzy-based Approach to Predict Human Interaction by Functional Near-Infrared Spectroscopy

In this article, we introduce the Fuzzy logic-based attention (Fuzzy Attention Layer) mechanism, a novel computational approach designed to enhance the interpretability and efficacy of neural models in psychological research. The fuzzy attention layer integrated into the transformer encoder model to analyze complex psychological phenomena from neural signals captured by functional near-infrared spectroscopy (fNIRS). By leveraging fuzzy logic, the fuzzy attention layer learns and identifies interpretable patterns of neural activity. This addresses a significant challenge in using transformers: the lack of transparency in determining which specific brain activities most contribute to particular predictions. Our experimental results, obtained from fNIRS data engaged in social interactions involving handholding, reveal that the fuzzy attention layer not only learns interpretable patterns of neural activity but also enhances model performance. In addition, these patterns provide deeper insights into the neural correlates of interpersonal touch and emotional exchange. The application of our model shows promising potential in understanding the complex aspects of human social behavior, verify psychological theory with machine learning algorithms, thereby contributing significantly to the fields of social neuroscience and AI. Presented version based on the work published in IEEE TFS (2025)

👤 Human Methodology

🎯 ICAIS2025 Submission

📄 View

2510.0039

Uncertainty Quantification in Machine Learning for Responsible AI

Machine learning and artificial intelligence will be deeply embedded in the intelligent systems humans use to automate tasking, optimize planning, and support decision-making. We present a critical review of uncertainty quantification (UQ) in large language models (LLMs), synthesizing insights from over 80 papers across leading venues (ACL, ASE, NeurIPS, ICML, AAAI, IJCAI, Nature, and others). We introduce UQ-Net, a unified probabilistic framework that combines Bayesian modeling, calibration, conformal prediction, and selective decision rules to disentangle epistemic and aleatoric uncertainty and to support reliable decision thresholds. UQ-Net integrates uncertainty estimates with calibration procedures and anomaly detection to enable safer selective deployment of LLM agents. Through case studies in medical diagnosis and code generation, we demonstrate that UQ-Net improves calibration and reduces predictive error by 15–20% relative to standard baselines. We survey existing evaluation practices and identify critical gaps: misalignment of consistency and entropy with factuality, lack of benchmarks for multi-episode interactions, and inconsistent metrics for calibration and tightness. We advocate for context-aware datasets, standardized metrics, and human-in-the-loop evaluations to better align UQ methods with deployment needs. Our review and proposed framework offer a principled foundation for operationalizing UQ in LLMs, advancing the development of trustworthy, responsible agentic AI for safety-sensitive, real-world applications.

👤 Human Survey

🎯 ICAIS2025 Accepted Paper

📄 View

2510.0038

The Hitchhiker's Guide to Autonomous Research: A Survey of Scientific Agents

The advancement of LLM-based agents is redefining AI for Science (AI4S) by enabling autonomous scientific research. Prominent LLMs exhibited expertise across multiple domains, catalysing constructions of domain-specialised scientific agents. Nevertheless, the profound epistemic and methodological gaps between AI and the natural sciences still impede the systematic design, training, and validation of these agents. This survey bridges the existing gap by presenting an exhaustive blueprint for scientific agents, spanning systematic construction methodologies, targeted capability enhancement, and rigorous evaluations. Anchored in the canonical scientific workflow, this paper (i) pinpoints the overview of scientific agents, starting with the development from general-purpose agents to scientific agents driven by articulated goal-orientation, then subsequently advancing a comprehensive taxonomy that organises existing agents by construction strategy and capability scope, and (ii) introduces a two-tier progressive framework, from scientific agents contrustion from scratch to targeted capability enhancement, for realizing autonomous scientific research. It is our aspiration that this survey will serve as guidance for researchers across various domains, facilitating the systematic design of domain-specific scientific agents and stimulating further innovation in AI-driven scientific research. To support long-term progress, we curate a live repository (\href{https://github.com/gudehhh666/Awesome_Scientific_Agent.git}{\textsc{Awesome\_Scientific\_Agent}}) that continuously aggregates emerging methods, benchmarks, and best practices.

👤 Human Survey

🎯 ICAIS2025 Accepted Paper

📄 View

2510.0036

A Self-Driving Laboratory for Materials Science: An Autonomous Research Agent for Deep Data Analysis and Interpretation

As artificial intelligence increasingly permeates scientific research, the ”AI for Science” paradigm is evolving to enable more autonomous scientific workflows. Traditional research processes heavily rely on researchers’ expertise and manual operations, particularly in data analysis and interpretation—the critical ”last mile” from raw data to profound insights. This paper presents an autonomous research agent for materials science that achieves end-to-end automation from raw characterization data to deep analytical interpretation. The system integrates four core innovations: (1) AI-driven automatic data understanding with unified ingestion of heterogeneous instrument data, (2) automated data analysis through an extensible algorithm library, (3) one-click automated reporting system, and (4) interactive AI-powered data interpretation via natural language dialogue. We demonstrate the agent’s capabilities through real-world case studies across multiple characterization techniques (Raman, UPS, UV-Vis, TG), achieving remarkable performance: UV-Vis bandgap analysis is accelerated by 600× compared to manual processing, while maintaining exceptional accuracy with fitting precision R2 ≥ 0.999. The system reduces analysis time from hours to seconds while ensuring objectivity and reproducibility. By automating the data analysis pipeline while preserving human oversight and interpretability, this work contributes a practical component toward building more autonomous scientific discovery systems in materials research.

👤 Human Methodology

🎯 ICAIS2025 Accepted Paper

📄 View

2510.0035

MotivGraph-SoIQ: Integrating Motivational Knowledge Graphs and Socratic Dialogue for Enhanced LLM Ideation

Large Language Models (LLMs) hold substantial potential for accelerating academic ideation but face critical challenges in grounding ideas and mitigating confirmation bias for further refinement. We propose integrating motivational knowledge graphs and socratic dialogue to address these limitations in enhanced LLM ideation (MotivGraph-SoIQ). This novel framework provides essential grounding and practical idea improvement steps for LLM ideation by integrating a Motivational Knowledge Graph (MotivGraph) with a Q-Driven Socratic Ideator. The MotivGraph structurally stores three key node types-problem, challenge, and solution—to offer motivation grounding for the LLM ideation process. The Ideator is a dual-agent system utilizing Socratic questioning, which facilitates a rigorous refinement process that mitigates confirmation bias and improves idea quality across novelty, experimental rigor, and motivational rationality dimensions. On the ICLR25 paper topics dataset, MotivGraph-SoIQ exhibits clear advantages over existing state-of-the-art approaches across LLM-based scoring, ELO ranking, and human evaluation metrics.

👤 Human Methodology

🎯 ICAIS2025 Accepted Paper

📄 View

2510.0034

Cognitive-YOLO: LLM-Driven Architecture Synthesis from First Principles of Data for Object Detection

Designing high-performance object detection architectures is a complex task, where traditional manual design is time-consuming and labor-intensive, and Neural Architecture Search (NAS) is computationally prohibitive. While recent approaches using Large Language Models (LLMs) show promise, they often function as iterative optimizers within a search loop, rather than generating architectures directly from a holistic understanding of the data. To address this gap, we propose Cognitive-YOLO, a novel framework for LLM-driven architecture synthesis that generates network configurations directly from the intrinsic characteristics of the dataset. Our method consists of three stages: first, an analysis module extracts key meta-features (e.g., object scale distribution and scene density) from the target dataset; second, the LLM reasons upon these features, augmented with state-of-the-art components retrieved via Retrieval-Augmented Generation (RAG), to synthesize the architecture into a structured neural network description, which we term the Neural Architecture Description Language (NADL); finally, a compiler instantiates this description into a deployable model. Extensive experiments on five diverse object detection datasets demonstrate that our proposed Cognitive-YOLO consistently generates superior architectures, achieving state-of-the-art (SOTA) performance by outperforming strong baseline models across multiple benchmarks.

👤 Human Methodology

🎯 ICAIS2025 Accepted Paper

📄 View

2510.0027

From Knowledge Tree to Knowledge Forest: Harnessing Chemical Understanding with Machine Learning and Artificial Intelligence

The 2024 Physics and Chemistry Nobel Prizes to machine learning (ML) and artificial intelligence (AI) breakthroughs marked “Year 1 of AI for Science,” underscoring their transformative role in physical sciences. Yet data are not the same as understanding—a distinction central to chemistry, which has long relied on concepts such as bond, aromaticity, and reactivity as scaffolds for understanding and explanation. Building on our recent perspectives (ACS Phys. Chem. Au 2024, 4, 135–142; J. Chem. Theory Compt. 2025, DOI: 10.1021/acs.jctc.5c01299), this article explores how ML/AI can become engines of chemical understanding. We introduce a quintet of chemical knowledge—ontology, epistemology, theory, concept, and understanding—and develop the metaphors of the Knowledge Tree and Knowledge Forest to show how diverse epistemologies interact and recursively enrich one another. Case studies on aromaticity, catalysis, orbital-free density functional theory, and protein folding illustrate how ML features, when interpreted as conceptual roots, yield fruits of understanding. Contrasting multiscale modeling with hierarchical modeling, we argue that ML enables emergent, concept-driven integration across levels. Cultivating this plural and hierarchical ecosystem may guide theoretical chemistry toward its next breakthroughs, resolving Dirac’s dilemma not by brute force but by forests of concepts that transform data into enduring understanding.

👤 Human Position

🎯 ICAIS2025 Accepted Paper

📄 View

2510.0017

EREA: Enhanced Research Exploration and Analysis

The increasing volume of scientific publications poses challenges for researchers in efficiently identifying relevant literature, synthesizing research trends, and exploring emerging ideas. Manual search and analysis processes are time-consuming and often insufficient for capturing complex citation relationships. This project presents an open-source Python-based system, EREA (Enhanced Research Exploration and Analysis), that integrates generative artificial intelligence, automated information retrieval, semantic vector search, and citation-based visualization to support enhanced research exploration. User-defined queries are processed to extract structured keywords, retrieve scholarly articles from Google Scholar, and supplement metadata using OpenAlex. Retrieved data are structured, and embedded in a vector database for semantic retrieval, and visualized through interactive, offline HTML graphs. A research report is generated through large language model-assisted synthesis. Developed according to the FAIR (Findability, Accessibility, Interoperability, and Reusability) Data Principles, the system accelerates research exploration, provides structured thematic insights, facilitates understanding through visual citation networks, and supports the identification of research gaps and future directions.

👤 Human Methodology

🎯 ICAIS2025 Submission

📄 View

2510.0011

Automated Algorithmic Discovery for Gravitational-Wave Detection Guided by LLM-Informed Evolutionary Monte Carlo Tree Search

Gravitational-wave signal detection with unknown source parameters buried in dynamic detector noise remains a formidable computational challenge. Existing approaches face core limitations from restrictive assumptions: traditional methods rely on predefined theoretical priors, while neural networks introduce hidden biases and lack interpretability. We propose Evolutionary Monte Carlo Tree Search (Evo-MCTS), the first integration of large language model (LLM) guidance with domain-aware physical constraints for automated gravitational wave detection. This framework systematically explores algorithmic solution spaces through tree-structured search enhanced by evolutionary optimization, combining MCTS for strategic exploration with evolutionary algorithms for solution refinement. The LLM component provides domain-aware heuristics while maintaining interpretability through explicit algorithmic pathway generation. Experimental validation demonstrates substantial performance improvements, achieving a 20.2\% improvement over state-of-the-art gravitational wave detection algorithms on the MLGWSC-1 benchmark dataset and a remarkable 59.1\% improvement over other LLM-based algorithm optimization frameworks. Beyond performance improvements, our framework establishes a transferable methodology for automated algorithmic discovery across computational science domains.

👤 Human Methodology

🎯 ICAIS2025 Accepted Paper

📄 View

2510.0010

BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments

Large language models (LLMs) and vision-language models (VLMs) have the potential to transform biological research by enabling autonomous experimentation. Yet, their application remains constrained by rigid protocol design, limited adaptability to dynamic lab conditions, inadequate error handling, and high operational complexity. Here we introduce BioMARS (Biological Multi-Agent Robotic System), an intelligent platform that integrates LLMs, VLMs, and modular robotics to autonomously design, plan, and execute biological experiments. BioMARS uses a hierarchical architecture: the Biologist Agent synthesizes protocols via retrieval-augmented generation; the Technician Agent translates them into executable robotic pseudo-code; and the Inspector Agent ensures procedural integrity through multimodal perception and anomaly detection. The system autonomously conducts cell passaging and culture tasks, matching or exceeding manual performance in viability, consistency, and morphological integrity. It also supports conte

👤 Human Application

🎯 ICAIS2025 Submission

📄 View

2510.0009

BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments

Large language models (LLMs) and vision-language models (VLMs) have the potential to transform biological research by enabling autonomous experimentation. Yet, their application remains constrained by rigid protocol design, limited adaptability to dynamic lab conditions, inadequate error handling, and high operational complexity. Here we introduce BioMARS (Biological Multi-Agent Robotic System), an intelligent platform that integrates LLMs, VLMs, and modular robotics to autonomously design, plan, and execute biological experiments. BioMARS uses a hierarchical architecture: the Biologist Agent synthesizes protocols via retrieval-augmented generation; the Technician Agent translates them into executable robotic pseudo-code; and the Inspector Agent ensures procedural integrity through multimodal perception and anomaly detection. The system autonomously conducts cell passaging and culture tasks, matching or exceeding manual performance in viability, consistency, and morphological integrity. It also supports context-aware optimization, outperforming conventional strategies in differentiating retinal pigment epithelial cells. A web interface enables real-time human-AI collaboration, while a modular backend allows scalable integration with laboratory hardware. These results highlight the feasibility of generalizable, AI-driven laboratory automation and the transformative role of language-based reasoning in biological research.

👤 Human Methodology

🎯 ICAIS2025 Accepted Paper

📄 View

2510.0007

HEAL: Learning-Free Source Free Unsupervised Domain Adaptation for Cross-Modality Medical Image Segmentation

Growing demands for clinical data privacy and storage constraints have spurred advances in Source Free Unsupervised Domain Adaptation (SFUDA). SFUDA addresses the domain shift by adapting models from the source domain to the unseen target domain without accessing source data, even when target-domain labels are unavailable. However, SFUDA faces significant challenges: the absence of source domain data and label supervision in the target domain due to source free and unsupervised settings. To address these issues, we propose HEAL, a novel SFUDA framework that integrates Hierarchical denoising, Edge-guided selection, sizeAware fusion, and Learning-free characteristic. Large-scale cross-modality experiments demonstrate that our method outperforms existing SFUDA approaches,achieving state-of-the-art (SOTA) performance. The source code is publicly available at: https://anonymous.4open.science/r/HEAL-10C5.

👤 Human Methodology

🎯 ICAIS2025 Accepted Paper

📄 View

2510.0005

Synergistic Space-Vision Processing for Predicate Inference

Scene graph generation, which parses images into structured graph, is a fundamental task for scene understanding. Most existing SGG models are dedicated to generating predicate representations based on appearance, relative position, and contextual cues. However, due to the predicate representation ambiguity arising from spatial co-occurrence, the generated scene graphs are often factually correct, but semantically shallow. To address this problem, we propose inferring predicates by synergistically processing spatial and visual information. Our core insight is that acknowledging the coexistence of geometric and non-geometric predicates, rather than struggling to disentangle them, is better suited for predicate inference than existing single-stream architectures. To this end, we introduce a novel method, Dual-stream Synergistic Network (DS-Net). Specifically, it contains two parallel streams: a space stream to predict geometric predicates from spatial layouts and edge features, and a vision stream to predict non-geometric predicates from fine-grained visual cues and linguistic priors. Based on them, we then design Cross-Stream Fusion module to enhance the corresponding predicate representation by using the mutual information of the two types. Through the collaborative processing of these streams, our DS-Net no longer treats the two predicate types as conflicting signals that need to be disentangled. Instead, it utilizes their synergy to facilitate predicate inference, providing a new perspective on resolving predicate ambiguity. Experiments have demonstrated the effectiveness of our method. Furthermore, our approach exhibits strong versatility and can be efficiently integrated with various existing models to enhance their performance. For instance, the 2.3\% $\sim$ 8.2\% increase in mR@100 on PredCls task demonstrates this capability.

👤 Human Methodology

📄 View

2510.0004

A synergistic multi-specialist knowledge reasoning model for molecular science

Pengfei Liu, Shuang Ge, Jun Tao, Zhixiang Ren

The rapid evolution of artificial intelligence in molecular science necessitates a shift from data-driven predictions to knowledge-guided reasoning. Existing molecular models are predominantly proprietary, lacking general molecular intelligence and generalizability. To address this, we propose a task-adaptive large reasoning model that integrates molecular scientific logic to emulate the thinking of molecular scientists, with capabilities for reasoning and reflection. Our approach incorporates multi-specialist modules to provide versatile molecular expertise and a chain-of-thought (CoT) framework enhanced by reinforcement learning infused with molecular knowledge, enabling structured and reflective reasoning. The model outperforms over 20 state-of-the-art multi-task large language models (LLMs) across 10 molecular tasks on 47 metrics, including property prediction, molecule generation, and reaction prediction.It achieves a 50.3% improvement over the base model while ensuring interpretability. It can bridge data-driven and knowledge-integrated approaches for intelligent molecular design.

👤 Human Methodology

📄 View

2510.0003

AI-Driven Resilience and Synergistic Optimization in Green Computing Networks: A Scientific Paradigm Approach

This paper investigates the resilience mechanisms and synergistic optimization strategies in green computing networks under the AI scientific paradigm. As computing infrastructure increasingly demands both performance and sustainability, traditional optimization approaches face challenges in balancing energy efficiency with network reliability. We propose an AI-driven framework that integrates reinforcement learning and multi-agent systems to dynamically optimize resource allocation while maintaining network resilience. Our approach combines theoretical economic models with practical AI engineering capabilities to analyze real-world computing workloads. Experimental results demonstrate that our method achieves 27% reduction in energy consumption while improving network fault tolerance by 34% compared to baseline approaches. This work contributes to the emerging field of AI for Science by showcasing how automated scientific discovery methods can address complex sustainability challenges in computing infrastructure.

👤 Human Methodology

🎯 ICAIS2025 Submission

📄 View