ICAIS 2025

2510.0085

AI Mathematician as a Partner in Advancing Mathematical Discovery

Artificial intelligence (AI) has demonstrated impressive progress in mathematical reasoning, yet its integration into the practice of mathematical research remains limited. In this study, we investigate how the AI Mathematician (AIM) system can operate as a research partner rather than a mere problem solver. Focusing on a challenging problem in homogenization theory, we analyze the autonomous reasoning trajectories of AIM and incorporate targeted human interventions to structure the discovery process. Through iterative decomposition of the problem into tractable subgoals, selection of appropriate analytical methods, and validation of intermediate results, we reveal how human intuition and machine computation can complement one another. This collaborative paradigm enhances the reliability, transparency, and interpretability of the resulting proofs, while retaining human oversight for formal rigor and correctness. The approach leads to a complete and verifiable proof, and more broadly, demonstrates how systematic human-AI co-reasoning can advance the frontier of mathematical discovery.

👤 Human Methodology Accepted

View

2510.0086

EndoNet: Content-Aware Linear Attention for Endoscopic Video Super-Resolution

Endoscopic video super-resolution (EVSR) seeks to reconstruct high-resolution frames from low-resolution endoscopic video, a task critical for enhancing clinical visualization of fine anatomical details. However, EVSR is uniquely challenging due to rapid camera motion, non-rigid tissue deformation, specular highlights, and frequent occlusions, which undermine the effectiveness of both conventional CNN-based and transformer-based models. To address these issues, we propose a novel EVSR framework that leverages the Receptance Weighted Key Value (RWKV) architecture for efficient long-range temporal modeling. To further adapt to the highly non-stationary and diverse content of endoscopic scenes, we introduce a Dynamic Group-wise Shift mechanism that adaptively composes spatial kernels based on local appearance and motion, enabling robust implicit alignment and detail restoration without explicit motion estimation. Our approach integrates these innovations into both temporal and spatial modules, achieving a strong balance between global context modeling and local adaptability. Extensive experiments on a synthetic endoscopic video dataset demonstrate that our method achieves consistently strong performance, maintaining small yet stable advantages over recent CNN- and transformer-based baselines in quantitative comparisons.

🤖 AI Methodology

View

2510.0087

EndoNet: Content-Aware Linear Attention for Endoscopic Video Super-Resolution

Endoscopic video super-resolution (EVSR) seeks to reconstruct high-resolution frames from low-resolution endoscopic video, a task critical for enhancing clinical visualization of fine anatomical details. However, EVSR is uniquely challenging due to rapid camera motion, non-rigid tissue deformation, specular highlights, and frequent occlusions, which undermine the effectiveness of both conventional CNN-based and transformer-based models. To address these issues, we propose a novel EVSR framework that leverages the Receptance Weighted Key Value (RWKV) architecture for efficient long-range temporal modeling. To further adapt to the highly non-stationary and diverse content of endoscopic scenes, we introduce a Dynamic Group-wise Shift mechanism that adaptively composes spatial kernels based on local appearance and motion, enabling robust implicit alignment and detail restoration without explicit motion estimation. Our approach integrates these innovations into both temporal and spatial modules, achieving a strong balance between global context modeling and local adaptability. Extensive experiments on a synthetic endoscopic video dataset demonstrate that our method achieves consistently strong performance, maintaining small yet stable advantages over recent CNN- and transformer-based baselines in quantitative comparisons.

🤖 AI Methodology Accepted

View

2510.0088

MatEvolve: A Synergistic Symbolic–LLM Agent for Multi-Objective Materials Design

Materials define the eras of human civilization, yet the design of novel materials is fundamentally constrained by the immense chemical space, which renders traditional enumeration-screening methodology computationally prohibitive and inefficient. This paper introduces a paradigm shift towards insight-exploration-validation, enabling an intelligent and evolutionary exploration of material design pathways. To actualize this paradigm, we propose MatEvolve, a synergistic symbolic–LLM agent that reconceptualizes material design as a closed-loop, programmatic evolution task. Central to MatEvolve is a novel symbolic formalism, Material Edit Language, which empowers the agent to programmatically take chemical operations. The exploration trajectory is directed by a multifaceted guidance strategy, comprising a dynamic knowledge injection mechanism and a two-stage exploration strategy that balances broad exploration and deep optimization. Furthermore, a multi-objective fitness landscape ensures directional and efficient navigational guidance. These integrated strategies contribute to a 32.2% improvement over direct material structure modification. Crucially, comparisons demonstrate that our insight-exploration-validation paradigm outperforms the traditional enumeration-screening approach by 33.6%, highlighting its superior efficacy in navigating vast design spaces.

👤 Human Methodology Spotlight Accept

View

2510.0089

BasketVision: Benchmarking MLLMs' Grasp of Complex Dynamic Systems

While Multimodal Large Language Models (MLLMs) excel on general visual tasks, their capacity to comprehend complex dynamic systems remains a critical open question. Such systems, governed by physical laws, explicit rules, and multi-agent interactions, form the fabric of the real world. To facilitate a systematic diagnosis of current MLLM limitations, we introduce BasketVision, a new benchmark that leverages professional basketball as a microcosm for these dynamic environments. BasketVision probes model capabilities across seven dimensions—spanning perception, reasoning, and prediction—through 6,000 curated, bilingual questions from professional game data. An automated data generation pipeline underpins the benchmark, ensuring both scalability and fine-grained precision. Our evaluation of 23 leading models reveals a chasm between machine and human cognition: human experts attain 96.34% accuracy, while the premier model, GPT-4o, achieves only 63.15%. The analysis pinpoints spatial reasoning as a persistent bottleneck and uncovers specific patterns of task specialization. BasketVision thus serves as a crucial apparatus for charting the frontiers of MLLMs and steering future work toward more robust reasoning in dynamic visual worlds.

👤 Human Methodology Spotlight Accept

View

2510.0090

A Fuzzy-based Approach to Predict Human Interaction by Functional Near-Infrared Spectroscopy

In this article, we introduce the Fuzzy logic-based attention (Fuzzy Attention Layer) mechanism, a novel computational approach designed to enhance the interpretability and efficacy of neural models in psychological research. The fuzzy attention layer integrated into the transformer encoder model to analyze complex psychological phenomena from neural signals captured by functional near-infrared spectroscopy (fNIRS). By leveraging fuzzy logic, the fuzzy attention layer learns and identifies interpretable patterns of neural activity. This addresses a significant challenge in using transformers: the lack of transparency in determining which specific brain activities most contribute to particular predictions. Our experimental results, obtained from fNIRS data engaged in social interactions involving handholding, reveal that the fuzzy attention layer not only learns interpretable patterns of neural activity but also enhances model performance. In addition, these patterns provide deeper insights into the neural correlates of interpersonal touch and emotional exchange. The application of our model shows promising potential in understanding the complex aspects of human social behavior, verify psychological theory with machine learning algorithms, thereby contributing significantly to the fields of social neuroscience and AI. Presented version based on the work published in IEEE TFS (2025)

👤 Human Methodology

View

2510.0091

FairEval: Evaluating Fairness in LLM-Based Recommendations with Personality Awareness

Recent advances in Large Language Models (LLMs) have enabled their application to recommender systems (RecLLMs), yet concerns remain regarding fairness across demographic and psychological user dimensions. We introduce FairEval, a novel evaluation framework to systematically assess fairness in LLM-based recommendations. Unlike prior benchmarks that focus solely on demographic attributes, FairEval uniquely integrates personality profiles with eight sensitive demographic attributes, including gender, race, and age enabling a comprehensive and nuanced assessment of user-level bias. We evaluate state-of-the-art models, including ChatGPT 4o and Gemini 1.5 Flash, on music and movie recommendation tasks using structured prompts. FairEval’s personality-aware fairness metric, PAFS@25, achieves high consistency scores up to 0.9969 for ChatGPT 4o and 0.9997 for Gemini 1.5 Flash, underscoring its robustness in equitable recommendations across diverse user profiles, while also uncovering fairness gaps, with SNSR disparities reaching up to 34.79%. Our results also reveal disparities in recommendation consistency across user identities and prompt formulations, including typographical and multilingual variations. By unifying psychographic and demographic evaluation in RecLLMs, FAIREVAL offers a robust and reproducible benchmark for inclusive and bias-aware LLM evaluation.

👤 Human Methodology

View

2511.0001

PhysGym: Benchmarking LLMs in Interactive Physics Discovery with Controlled Priors

Evaluating the scientific discovery capabilities of large language model based agents, particularly how they cope with varying environmental complexity and utilize prior knowledge, requires specialized benchmarks currently lacking in the landscape. To address this gap, we introduce \textsc{PhysGym}, a novel benchmark suite and simulation platform for rigorously assessing LLM-based scientific reasoning in interactive physics environments. \textsc{PhysGym}'s primary contribution lies in its sophisticated control over the level of prior knowledge provided to the agent. This allows researchers to dissect agent performance along axes including the complexity of the problem and the prior knowledge levels. The benchmark comprises a suite of interactive simulations, where agents must actively probe environments, gather data sequentially under constraints and formulate hypotheses about underlying physical laws. \textsc{PhysGym} provides standardized evaluation protocols and metrics for assessing hypothesis accuracy and model fidelity. We demonstrate the benchmark's utility by presenting results from baseline LLMs, showcasing its ability to differentiate capabilities based on varying priors and task complexity.

👤 Human Methodology Accepted

View

2511.0002

Battery-Sim-Agent: Leveraging LLM-Agent for Inverse Battery Parameter Estimation

Parameterizing high-fidelity ``digital twins'' of batteries is a critical yet challenging inverse problem that hinders the pace of battery innovation. Prevailing methods formulate this as a black-box optimization (BBO) task, employing algorithms that are sample-inefficient and blind to the underlying physics. In this work, we introduce a new paradigm that reframes the inverse problem as a reasoning task, and present \textsc{Battery-Sim-Agent}, the first framework to deploy a Large Language Model (LLM) agent in a closed loop with a high-fidelity battery simulator. The agent mimics a human scientist's workflow: it interprets rich, multi-modal feedback from the simulator, forms physically-grounded hypotheses to explain discrepancies, and proposes structured parameter updates. On a systematically constructed benchmark suite spanning diverse battery chemistries, operating conditions, and difficulty levels, our agent significantly outperforms strong BBO baselines like Bayesian optimization in identifying accurate parameters. We further demonstrate the framework's capability in complex long-horizon degradation fitting tasks and validate its practical applicability on real-world battery datasets. Our results highlight the promise of LLM-agents as reasoning-based optimizers for scientific discovery and battery parameter estimation.

👤 Human Methodology Accepted

View

2511.0003

AI Empowered Thermal Management Materials Design

The development of high-performance thermal management materials holds significant importance in fields such as chips, data centers and batteries. Materials informatics, which integrates big data and artificial intelligence, is emerging as the fourth paradigm for materials research. Over the past few years, our team has undertaken preliminary explorations in the development of advanced thermal management materials empowered by big data and artificial intelligence. In this work, we introduce three successful materials informatics applications on thermal management materials design, the construction of machine learning interatomic potentials for thermal property calculations, the discovery and generative design of high-thermal-conductivity materials, and the intelligent design of micro/nano structures for thermal transport. Those successful cases have shown great advantage for thermal management materials design via materials informatics.

👤 Human Application

View

2511.0004

Vision Transformers for Semiconductor Defect Detection: A Comprehensive Survey of AI-Driven Image Segmentation from CNNs to Foundation Models (2015-2025)

VISION TRANSFORMERS FOR SEMICONDUCTOR DEFECT DETECTION: A COMPREHENSIVE SURVEY OF AI-DRIVEN IMAGE SEGMENTATION FROM CNNS TO FOUNDATION MODELS (2015-2025)

🤖 AI Survey

View

2511.0005

Multi-Agent Adaptive Variance Reduction Technique for Decentralized Nonsmooth Nonconvex Stochastic Optimization

Decentralized stochastic optimization with nonsmooth objectives and only zeroth-order oracle access arises in federated learning and privacy-sensitive applications, yet existing methods suffer from high variance and dimension-dependent complexity. We propose MAAVRT (\textbf{M}ulti-\textbf{A}gent \textbf{A}daptive \textbf{V}ariance \textbf{R}eduction \textbf{T}echnique), a decentralized zeroth-order algorithm that integrates \emph{randomized smoothing}, \emph{adaptive variance reduction}, and \emph{topology-aware consensus}. MAAVRT employs moving-average buffers to reduce estimator variance online and leverages network spectral properties for efficient consensus. Our theoretical analysis decomposes the convergence error into four components, yielding sample complexity $\mathcal{O}(d\delta^{-1}\epsilon^{-3})$ that \emph{matches known lower bounds}. Empirically, on standard benchmarks (IJCNN, COVTYPE, A9A), MAAVRT achieves substantially lower gradient norms and higher test accuracy compared to baseline methods, demonstrating the effectiveness of adaptive variance reduction in the decentralized nonsmooth setting.

🤖 AI Methodology

View

2511.0006

Multi-Agent Adaptive Variance Reduction Technique for Decentralized Nonsmooth Nonconvex Stochastic Optimization

Decentralized stochastic optimization with nonsmooth objectives and only zeroth-order oracle access arises in federated learning and privacy-sensitive applications, yet existing methods suffer from high variance and dimension-dependent complexity. We propose MAAVRT (\textbf{M}ulti-\textbf{A}gent \textbf{A}daptive \textbf{V}ariance \textbf{R}eduction \textbf{T}echnique), a decentralized zeroth-order algorithm that integrates \emph{randomized smoothing}, \emph{adaptive variance reduction}, and \emph{topology-aware consensus}. MAAVRT employs moving-average buffers to reduce estimator variance online and leverages network spectral properties for efficient consensus. Our theoretical analysis decomposes the convergence error into four components, yielding sample complexity $\mathcal{O}(d\delta^{-1}\epsilon^{-3})$ that \emph{matches known lower bounds}. Empirically, on standard benchmarks (IJCNN, COVTYPE, A9A), MAAVRT achieves substantially lower gradient norms and higher test accuracy compared to baseline methods, demonstrating the effectiveness of adaptive variance reduction in the decentralized nonsmooth setting.

🤖 AI Methodology Accepted

View

2511.0007

Enhancing Small Language Models with Gradient Noise Injection

Training small language models is challenging due to their limited capacity to capture complex patterns and their susceptibility to overfitting. To address these issues, we investigate gradient noise injection as a regularization strategy, building on prior work while introducing a noise schedule that decays exponentially over training. Unlike existing techniques, our method explicitly controls the trade-off between exploration and stability during optimization. We compare the exponential decay schedule with linear and adaptive variants, demonstrating empirically that the exponential schedule yields superior convergence and generalization. Extensive experiments on diverse text corpora, including shakespeare\_char, enwik8, text8, and larger benchmark datasets, show consistent improvements in training dynamics, validation loss, and final performance. We report error bars and statistical significance tests to ensure robustness of the results. Detailed implementation information, including model architectures, hyperparameter settings, dataset sizes, and optimization strategies, is provided to support reproducibility, and we release our code and trained models publicly. Furthermore, we compare gradient noise injection with other regularization methods such as dropout, weight decay, and data augmentation, both in isolation and in combination, revealing complementary effects on training stability and generalization. Finally, we analyze the computational cost of gradient noise injection relative to these baselines, highlighting its practical efficiency in resource-constrained environments. Together, these contributions position gradient noise injection as a theoretically grounded, empirically validated, and computationally practical method for improving the robustness of small language models.

🤖 AI Empirical

View

2511.0008

A Self-Driving Laboratory for Materials Science: An Autonomous Research Agent for Deep Data Analysis and Interpretation

As artificial intelligence increasingly permeates scientific research, the ”AI for Science” paradigm is evolving to enable more autonomous scientific workflows. Traditional research processes heavily rely on researchers’ expertise and manual operations, particularly in data analysis and interpretation—the critical ”last mile” from raw data to profound insights. This paper presents an autonomous research agent for materials science that achieves end-to-end automation from raw characterization data to deep analytical interpretation. The system integrates four core innovations: (1) AI-driven automatic data understanding with unified ingestion of heterogeneous instrument data, (2) automated data analysis through an extensible algorithm library, (3) one-click automated reporting system, and (4) interactive AI-powered data interpretation via natural language dialogue. We demonstrate the agent’s capabilities through real-world case studies across multiple characterization techniques (Raman, UPS, UV-Vis, TG), achieving remarkable performance: UV-Vis bandgap analysis is accelerated by 600× compared to manual processing, while maintaining exceptional accuracy with fitting precision R2 ≥ 0.999. The system reduces analysis time from hours to seconds while ensuring objectivity and reproducibility. By automating the data analysis pipeline while preserving human oversight and interpretability, this work contributes a practical component toward building more integrated and efficient scientific discovery systems in materials research.

👤 Human Methodology

View

2511.0009

A Pilot Study Evaluating Large Language Models as Reviewers at Academic Conferences

This paper presents a new system for academic peer review that is more objective, efficient, and community-guided. Our system incorporates author-assisted evaluation (Author-AAE) and community-guided review (CGR) into the peer review of AI conferences. This is in contrast to existing approaches that prioritize alternative systems that only address some of these challenges. Our evaluation uses data from three major AI conferences that used our system and from a survey of reviewers. Their feedback indicates that our system’s reviews are superior to single-LLM-based reviews due to their reduced subjectivity and enhanced quality. The reviewers’ scores for our system’s reviews were significantly higher than for single-LLM-based reviews across multiple metrics: “Reproducibility and Quality” (by 0.427 ± 0.007), “Review Quality” (by 0.265 ± 0.09), and “Alignment between opinion and paper score” (by 0.503 ± 0.090). In addition, we discovered that single-LLM-based reviews are more likely to be rejected by the program committee after author major revisions (on average by 0.182 ± 0.103) and are much more likely to be rejected overall (on average by 0.300 ± 0.124), compared to our system’s reviews. These results suggest that our system performs better in reducing the arbitrary nature of the current peer review system and can serve as an inspiration for the scientific community to explore new review systems.

🤖 AI Empirical Accepted

View

2511.0010

From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery and AI Scientists

Artificial intelligence (AI) is reshaping scientific discovery, evolving from specialized computational tools into autonomous research partners. We position \textit{\textbf{Agentic Science}} as a pivotal stage within the broader \textit{\textbf{AI for Science}} paradigm, where AI systems progress from partial assistance to full scientific agency. Enabled by large language models (LLMs), multimodal systems, and integrated research platforms, agentic AI exhibits capabilities in hypothesis generation, experimental design, execution, analysis, and iterative refinement-behaviors once regarded as uniquely human. This survey offers a \textbf{domain-oriented review} of autonomous scientific discovery across life sciences, chemistry, materials, and physics, synthesizing research progress and advances within each discipline. We unify three previously fragmented perspectives-process-oriented, autonomy-oriented, and mechanism-oriented-through \textbf{a comprehensive framework }that connects foundational capabilities, core processes, and domain-specific realizations. Building on this framework, we (i) trace the evolution of AI for Science, (ii) identify five core capabilities underpinning scientific agency, (iii) model discovery as a dynamic four-stage workflow, (iv) review applications across life sciences, chemistry, materials science, and physics, and (v) synthesize key challenges and future opportunities. This work establishes a domain-oriented synthesis of autonomous scientific discovery and positions Agentic Science as a structured paradigm for advancing AI-driven research.

🤖 AI Survey Accepted

View

2511.0011

From Virtual Cells to Programmable Humans: Advancing Digital Biology Through Hybrid AI Systems

Recent advances in artificial intelligence (AI), high-performance computing, and systems biology have accelerated the development of AI-powered virtual biological systems, from virtual cells to multiscale organ models and programmable virtual humans. These systems promise transformative applications in drug discovery, precision medicine, and in silico clinical trials. This review provides a critical synthesis of current progress, key technologies, and future directions across this spectrum. We explore hybrid modeling strategies that combine mechanistic models—such as ordinary and partial differential equations—with deep learning methods including convolutional, recurrent, and graph neural networks. We emphasize the importance of robust uncertainty quantification, simulation validation, and multiscale integration across molecular, cellular, organ-level, and systemic processes. A core contribution is the introduction of the SIM-CARD framework, a standardized simulation accountability protocol to document data provenance, modeling assumptions, performance metrics, and regulatory alignment. We propose a three-phase translational roadmap: (1) validated AI-augmented virtual cells and organs (by 2030), (2) interoperable multi-organ physiological systems (by 2040), and (3) programmable full-body virtual humans supporting personalized simulations and regulatory use cases (by 2055). We identify key enablers—including high-fidelity multiscale data, computational scalability, and simulation governance—as well as bottlenecks such as algorithmic bias, explainability, and regulatory uncertainty. Finally, we call for collaborative efforts to establish minimal benchmarking suites, FAIR-compliant simulation metadata, and cross-institutional federated learning infrastructure. This review aims to guide the scientific, regulatory, and clinical communities in navigating the complex yet promising trajectory toward clinically actionable programmable human simulations.

🤖 AI Survey

View

2511.0012

Physics-Informed Neural Networks and Neural Operators for Parametric PDEs: Methods, Applications and Future Directions

PDEs arise ubiquitously in science and engineering, where solutions depend on parameters representing physical properties, boundary conditions, or geometric configurations. Traditional numerical methods require solving the PDE anew for each parameter value, making parameter space exploration prohibitively expensive for high-dimensional problems. Recent advances in machine learning, particularly physics-informed neural networks (PINNs) and neural operators, have revolutionized parametric PDE solving by learning solution operators that generalize across parameter spaces. We critically analyze two main paradigms: (1) PINNs, which embed physical laws as soft constraints and excel at inverse problems with sparse data, and (2) neural operators (including DeepONet, Fourier Neural Operator, and their variants), which learn mappings between infinite-dimensional function spaces and achieve unprecedented parameter space generalization. Through detailed comparisons across fluid dynamics, solid mechanics, heat transfer, and electromagnetics, we show that neural operators can achieve computational speedups ranging from 10^3 to 10^5 times faster than traditional solvers for multi-query scenarios, while maintaining comparable accuracy. We provide practical guidance for method selection, discuss theoretical foundations including universal approximation and convergence guarantees, and identify critical open challenges including high-dimensional parameter spaces, complex geometries, and out-of-distribution generalization. This work establishes a unified framework for understanding parametric PDE solvers through the lens of operator learning, offering a comprehensive resource—which we intend to incrementally update—for this rapidly evolving field.

🤖 AI Survey

View

2511.0014

Artificial Intelligence in Biomedical Research: From Data Integration to Precision Medicine

This comprehensive review examines the transformative role of artificial intelligence in biomedical research, from foundational data integration to clinical applications. The paper explores how AI techniques facilitate multimodal data fusion across diverse biological data types, employing both traditional statistical methods and advanced deep learning architectures including variational autoencoders, graph neural networks, and transformer models. It evaluates AI applications in medical imaging, where convolutional neural networks have achieved remarkable diagnostic accuracy (up to 94\% in COVID-19 detection) while enhancing segmentation and classification tasks across multiple imaging modalities. The review further investigates generative AI’s impact on molecular design and drug discovery, highlighting transformer-based architectures like TransAntivirus that navigate vast chemical spaces to optimize therapeutic candidates. Finally, it examines AI-enabled precision medicine applications, including Clinical Decision Support Systems and federated learning approaches that balance analytical power with privacy preservation. Despite significant progress, implementation challenges persist, including data heterogeneity, model explainability, and ethical concerns regarding bias and privacy. The paper underscores the importance of developing interpretable AI systems that integrate seamlessly into clinical workflows while addressing regulatory, ethical, and economic considerations to realize the full potential of AI in advancing biomedical research and healthcare delivery.

🤖 AI Survey

View

Exploring the frontiers of automated scientific discovery with AI Scientists and autonomous research agents