ICAIS 2025
Full name: The 1st International Conference on AI Scientist
-
2510.0085ViewAI Mathematician as a Partner in Advancing Mathematical DiscoveryArtificial intelligence (AI) has demonstrated impressive progress in mathematical reasoning, yet its integration into the practice of mathematical research remains limited. In this study, we investigate how the AI Mathematician (AIM) system can operate as a research partner rather than a mere problem solver. Focusing on a challenging problem in homogenization theory, we analyze the autonomous reasoning trajectories of AIM and incorporate targeted human interventions to structure the discovery process. Through iterative decomposition of the problem into tractable subgoals, selection of appropriate analytical methods, and validation of intermediate results, we reveal how human intuition and machine computation can complement one another. This collaborative paradigm enhances the reliability, transparency, and interpretability of the resulting proofs, while retaining human oversight for formal rigor and correctness. The approach leads to a complete and verifiable proof, and more broadly, demonstrates how systematic human-AI co-reasoning can advance the frontier of mathematical discovery.
-
2510.0086ViewEndoNet: Content-Aware Linear Attention for Endoscopic Video Super-ResolutionEndoscopic video super-resolution (EVSR) seeks to reconstruct high-resolution frames from low-resolution endoscopic video, a task critical for enhancing clinical visualization of fine anatomical details. However, EVSR is uniquely challenging due to rapid camera motion, non-rigid tissue deformation, specular highlights, and frequent occlusions, which undermine the effectiveness of both conventional CNN-based and transformer-based models. To address these issues, we propose a novel EVSR framework that leverages the Receptance Weighted Key Value (RWKV) architecture for efficient long-range temporal modeling. To further adapt to the highly non-stationary and diverse content of endoscopic scenes, we introduce a Dynamic Group-wise Shift mechanism that adaptively composes spatial kernels based on local appearance and motion, enabling robust implicit alignment and detail restoration without explicit motion estimation. Our approach integrates these innovations into both temporal and spatial modules, achieving a strong balance between global context modeling and local adaptability. Extensive experiments on a synthetic endoscopic video dataset demonstrate that our method achieves consistently strong performance, maintaining small yet stable advantages over recent CNN- and transformer-based baselines in quantitative comparisons.
-
2510.0087ViewEndoNet: Content-Aware Linear Attention for Endoscopic Video Super-ResolutionEndoscopic video super-resolution (EVSR) seeks to reconstruct high-resolution frames from low-resolution endoscopic video, a task critical for enhancing clinical visualization of fine anatomical details. However, EVSR is uniquely challenging due to rapid camera motion, non-rigid tissue deformation, specular highlights, and frequent occlusions, which undermine the effectiveness of both conventional CNN-based and transformer-based models. To address these issues, we propose a novel EVSR framework that leverages the Receptance Weighted Key Value (RWKV) architecture for efficient long-range temporal modeling. To further adapt to the highly non-stationary and diverse content of endoscopic scenes, we introduce a Dynamic Group-wise Shift mechanism that adaptively composes spatial kernels based on local appearance and motion, enabling robust implicit alignment and detail restoration without explicit motion estimation. Our approach integrates these innovations into both temporal and spatial modules, achieving a strong balance between global context modeling and local adaptability. Extensive experiments on a synthetic endoscopic video dataset demonstrate that our method achieves consistently strong performance, maintaining small yet stable advantages over recent CNN- and transformer-based baselines in quantitative comparisons.
-
2510.0088ViewMatEvolve: A Synergistic SymbolicâLLM Agent for Multi-Objective Materials DesignMaterials define the eras of human civilization, yet the design of novel materials is fundamentally constrained by the immense chemical space, which renders traditional enumeration-screening methodology computationally prohibitive and inefficient. This paper introduces a paradigm shift towards insight-exploration-validation, enabling an intelligent and evolutionary exploration of material design pathways. To actualize this paradigm, we propose MatEvolve, a synergistic symbolicâLLM agent that reconceptualizes material design as a closed-loop, programmatic evolution task. Central to MatEvolve is a novel symbolic formalism, Material Edit Language, which empowers the agent to programmatically take chemical operations. The exploration trajectory is directed by a multifaceted guidance strategy, comprising a dynamic knowledge injection mechanism and a two-stage exploration strategy that balances broad exploration and deep optimization. Furthermore, a multi-objective fitness landscape ensures directional and efficient navigational guidance. These integrated strategies contribute to a 32.2% improvement over direct material structure modification. Crucially, comparisons demonstrate that our insight-exploration-validation paradigm outperforms the traditional enumeration-screening approach by 33.6%, highlighting its superior efficacy in navigating vast design spaces.
-
2510.0089ViewBasketVision: Benchmarking MLLMs' Grasp of Complex Dynamic SystemsWhile Multimodal Large Language Models (MLLMs) excel on general visual tasks, their capacity to comprehend complex dynamic systems remains a critical open question. Such systems, governed by physical laws, explicit rules, and multi-agent interactions, form the fabric of the real world. To facilitate a systematic diagnosis of current MLLM limitations, we introduce BasketVision, a new benchmark that leverages professional basketball as a microcosm for these dynamic environments. BasketVision probes model capabilities across seven dimensionsâspanning perception, reasoning, and predictionâthrough 6,000 curated, bilingual questions from professional game data. An automated data generation pipeline underpins the benchmark, ensuring both scalability and fine-grained precision. Our evaluation of 23 leading models reveals a chasm between machine and human cognition: human experts attain 96.34% accuracy, while the premier model, GPT-4o, achieves only 63.15%. The analysis pinpoints spatial reasoning as a persistent bottleneck and uncovers specific patterns of task specialization. BasketVision thus serves as a crucial apparatus for charting the frontiers of MLLMs and steering future work toward more robust reasoning in dynamic visual worlds.
-
2510.0090ViewA Fuzzy-based Approach to Predict Human Interaction by Functional Near-Infrared SpectroscopyIn this article, we introduce the Fuzzy logic-based attention (Fuzzy Attention Layer) mechanism, a novel computational approach designed to enhance the interpretability and efficacy of neural models in psychological research. The fuzzy attention layer integrated into the transformer encoder model to analyze complex psychological phenomena from neural signals captured by functional near-infrared spectroscopy (fNIRS). By leveraging fuzzy logic, the fuzzy attention layer learns and identifies interpretable patterns of neural activity. This addresses a significant challenge in using transformers: the lack of transparency in determining which specific brain activities most contribute to particular predictions. Our experimental results, obtained from fNIRS data engaged in social interactions involving handholding, reveal that the fuzzy attention layer not only learns interpretable patterns of neural activity but also enhances model performance. In addition, these patterns provide deeper insights into the neural correlates of interpersonal touch and emotional exchange. The application of our model shows promising potential in understanding the complex aspects of human social behavior, verify psychological theory with machine learning algorithms, thereby contributing significantly to the fields of social neuroscience and AI. Presented version based on the work published in IEEE TFS (2025)
-
2510.0091ViewFairEval: Evaluating Fairness in LLM-Based Recommendations with Personality AwarenessRecent advances in Large Language Models (LLMs) have enabled their application to recommender systems (RecLLMs), yet concerns remain regarding fairness across demographic and psychological user dimensions. We introduce FairEval, a novel evaluation framework to systematically assess fairness in LLM-based recommendations. Unlike prior benchmarks that focus solely on demographic attributes, FairEval uniquely integrates personality profiles with eight sensitive demographic attributes, including gender, race, and age enabling a comprehensive and nuanced assessment of user-level bias. We evaluate state-of-the-art models, including ChatGPT 4o and Gemini 1.5 Flash, on music and movie recommendation tasks using structured prompts. FairEvalâs personality-aware fairness metric, PAFS@25, achieves high consistency scores up to 0.9969 for ChatGPT 4o and 0.9997 for Gemini 1.5 Flash, underscoring its robustness in equitable recommendations across diverse user profiles, while also uncovering fairness gaps, with SNSR disparities reaching up to 34.79%. Our results also reveal disparities in recommendation consistency across user identities and prompt formulations, including typographical and multilingual variations. By unifying psychographic and demographic evaluation in RecLLMs, FAIREVAL offers a robust and reproducible benchmark for inclusive and bias-aware LLM evaluation.
-
2511.0001ViewPhysGym: Benchmarking LLMs in Interactive Physics Discovery with Controlled PriorsEvaluating the scientific discovery capabilities of large language model based agents, particularly how they cope with varying environmental complexity and utilize prior knowledge, requires specialized benchmarks currently lacking in the landscape. To address this gap, we introduce \textsc{PhysGym}, a novel benchmark suite and simulation platform for rigorously assessing LLM-based scientific reasoning in interactive physics environments. \textsc{PhysGym}'s primary contribution lies in its sophisticated control over the level of prior knowledge provided to the agent. This allows researchers to dissect agent performance along axes including the complexity of the problem and the prior knowledge levels. The benchmark comprises a suite of interactive simulations, where agents must actively probe environments, gather data sequentially under constraints and formulate hypotheses about underlying physical laws. \textsc{PhysGym} provides standardized evaluation protocols and metrics for assessing hypothesis accuracy and model fidelity. We demonstrate the benchmark's utility by presenting results from baseline LLMs, showcasing its ability to differentiate capabilities based on varying priors and task complexity.
-
2511.0002ViewBattery-Sim-Agent: Leveraging LLM-Agent for Inverse Battery Parameter EstimationParameterizing high-fidelity ``digital twins'' of batteries is a critical yet challenging inverse problem that hinders the pace of battery innovation. Prevailing methods formulate this as a black-box optimization (BBO) task, employing algorithms that are sample-inefficient and blind to the underlying physics. In this work, we introduce a new paradigm that reframes the inverse problem as a reasoning task, and present \textsc{Battery-Sim-Agent}, the first framework to deploy a Large Language Model (LLM) agent in a closed loop with a high-fidelity battery simulator. The agent mimics a human scientist's workflow: it interprets rich, multi-modal feedback from the simulator, forms physically-grounded hypotheses to explain discrepancies, and proposes structured parameter updates. On a systematically constructed benchmark suite spanning diverse battery chemistries, operating conditions, and difficulty levels, our agent significantly outperforms strong BBO baselines like Bayesian optimization in identifying accurate parameters. We further demonstrate the framework's capability in complex long-horizon degradation fitting tasks and validate its practical applicability on real-world battery datasets. Our results highlight the promise of LLM-agents as reasoning-based optimizers for scientific discovery and battery parameter estimation.
-
2511.0003ViewAI Empowered Thermal Management Materials DesignThe development of high-performance thermal management materials holds significant importance in fields such as chips, data centers and batteries. Materials informatics, which integrates big data and artificial intelligence, is emerging as the fourth paradigm for materials research. Over the past few years, our team has undertaken preliminary explorations in the development of advanced thermal management materials empowered by big data and artificial intelligence. In this work, we introduce three successful materials informatics applications on thermal management materials design, the construction of machine learning interatomic potentials for thermal property calculations, the discovery and generative design of high-thermal-conductivity materials, and the intelligent design of micro/nano structures for thermal transport. Those successful cases have shown great advantage for thermal management materials design via materials informatics.
-
2511.0004ViewVision Transformers for Semiconductor Defect Detection: A Comprehensive Survey of AI-Driven Image Segmentation from CNNs to Foundation Models (2015-2025)VISION TRANSFORMERS FOR SEMICONDUCTOR DEFECT DETECTION: A COMPREHENSIVE SURVEY OF AI-DRIVEN IMAGE SEGMENTATION FROM CNNS TO FOUNDATION MODELS (2015-2025)
-
2511.0005ViewMulti-Agent Adaptive Variance Reduction Technique for Decentralized Nonsmooth Nonconvex Stochastic OptimizationDecentralized stochastic optimization with nonsmooth objectives and only zeroth-order oracle access arises in federated learning and privacy-sensitive applications, yet existing methods suffer from high variance and dimension-dependent complexity. We propose MAAVRT (\textbf{M}ulti-\textbf{A}gent \textbf{A}daptive \textbf{V}ariance \textbf{R}eduction \textbf{T}echnique), a decentralized zeroth-order algorithm that integrates \emph{randomized smoothing}, \emph{adaptive variance reduction}, and \emph{topology-aware consensus}. MAAVRT employs moving-average buffers to reduce estimator variance online and leverages network spectral properties for efficient consensus. Our theoretical analysis decomposes the convergence error into four components, yielding sample complexity $\mathcal{O}(d\delta^{-1}\epsilon^{-3})$ that \emph{matches known lower bounds}. Empirically, on standard benchmarks (IJCNN, COVTYPE, A9A), MAAVRT achieves substantially lower gradient norms and higher test accuracy compared to baseline methods, demonstrating the effectiveness of adaptive variance reduction in the decentralized nonsmooth setting.
-
2511.0006ViewMulti-Agent Adaptive Variance Reduction Technique for Decentralized Nonsmooth Nonconvex Stochastic OptimizationDecentralized stochastic optimization with nonsmooth objectives and only zeroth-order oracle access arises in federated learning and privacy-sensitive applications, yet existing methods suffer from high variance and dimension-dependent complexity. We propose MAAVRT (\textbf{M}ulti-\textbf{A}gent \textbf{A}daptive \textbf{V}ariance \textbf{R}eduction \textbf{T}echnique), a decentralized zeroth-order algorithm that integrates \emph{randomized smoothing}, \emph{adaptive variance reduction}, and \emph{topology-aware consensus}. MAAVRT employs moving-average buffers to reduce estimator variance online and leverages network spectral properties for efficient consensus. Our theoretical analysis decomposes the convergence error into four components, yielding sample complexity $\mathcal{O}(d\delta^{-1}\epsilon^{-3})$ that \emph{matches known lower bounds}. Empirically, on standard benchmarks (IJCNN, COVTYPE, A9A), MAAVRT achieves substantially lower gradient norms and higher test accuracy compared to baseline methods, demonstrating the effectiveness of adaptive variance reduction in the decentralized nonsmooth setting.
-
2511.0007ViewEnhancing Small Language Models with Gradient Noise InjectionTraining small language models is challenging due to their limited capacity to capture complex patterns and their susceptibility to overfitting. To address these issues, we investigate gradient noise injection as a regularization strategy, building on prior work while introducing a noise schedule that decays exponentially over training. Unlike existing techniques, our method explicitly controls the trade-off between exploration and stability during optimization. We compare the exponential decay schedule with linear and adaptive variants, demonstrating empirically that the exponential schedule yields superior convergence and generalization. Extensive experiments on diverse text corpora, including shakespeare\_char, enwik8, text8, and larger benchmark datasets, show consistent improvements in training dynamics, validation loss, and final performance. We report error bars and statistical significance tests to ensure robustness of the results. Detailed implementation information, including model architectures, hyperparameter settings, dataset sizes, and optimization strategies, is provided to support reproducibility, and we release our code and trained models publicly. Furthermore, we compare gradient noise injection with other regularization methods such as dropout, weight decay, and data augmentation, both in isolation and in combination, revealing complementary effects on training stability and generalization. Finally, we analyze the computational cost of gradient noise injection relative to these baselines, highlighting its practical efficiency in resource-constrained environments. Together, these contributions position gradient noise injection as a theoretically grounded, empirically validated, and computationally practical method for improving the robustness of small language models.
-
2511.0008ViewA Self-Driving Laboratory for Materials Science: An Autonomous Research Agent for Deep Data Analysis and InterpretationAs artificial intelligence increasingly permeates scientific research, the âAI for Scienceâ paradigm is evolving to enable more autonomous scientific workflows. Traditional research processes heavily rely on researchersâ expertise and manual operations, particularly in data analysis and interpretationâthe critical âlast mileâ from raw data to profound insights. This paper presents an autonomous research agent for materials science that achieves end-to-end automation from raw characterization data to deep analytical interpretation. The system integrates four core innovations: (1) AI-driven automatic data understanding with unified ingestion of heterogeneous instrument data, (2) automated data analysis through an extensible algorithm library, (3) one-click automated reporting system, and (4) interactive AI-powered data interpretation via natural language dialogue. We demonstrate the agentâs capabilities through real-world case studies across multiple characterization techniques (Raman, UPS, UV-Vis, TG), achieving remarkable performance: UV-Vis bandgap analysis is accelerated by 600Ă compared to manual processing, while maintaining exceptional accuracy with fitting precision R2 â„ 0.999. The system reduces analysis time from hours to seconds while ensuring objectivity and reproducibility. By automating the data analysis pipeline while preserving human oversight and interpretability, this work contributes a practical component toward building more integrated and efficient scientific discovery systems in materials research.
-
2511.0009ViewA Pilot Study Evaluating Large Language Models as Reviewers at Academic ConferencesThis paper presents a new system for academic peer review that is more objective, efficient, and community-guided. Our system incorporates author-assisted evaluation (Author-AAE) and community-guided review (CGR) into the peer review of AI conferences. This is in contrast to existing approaches that prioritize alternative systems that only address some of these challenges. Our evaluation uses data from three major AI conferences that used our system and from a survey of reviewers. Their feedback indicates that our systemâs reviews are superior to single-LLM-based reviews due to their reduced subjectivity and enhanced quality. The reviewersâ scores for our systemâs reviews were significantly higher than for single-LLM-based reviews across multiple metrics: âReproducibility and Qualityâ (by 0.427 ± 0.007), âReview Qualityâ (by 0.265 ± 0.09), and âAlignment between opinion and paper scoreâ (by 0.503 ± 0.090). In addition, we discovered that single-LLM-based reviews are more likely to be rejected by the program committee after author major revisions (on average by 0.182 ± 0.103) and are much more likely to be rejected overall (on average by 0.300 ± 0.124), compared to our systemâs reviews. These results suggest that our system performs better in reducing the arbitrary nature of the current peer review system and can serve as an inspiration for the scientific community to explore new review systems.
-
2511.0010ViewFrom AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery and AI ScientistsArtificial intelligence (AI) is reshaping scientific discovery, evolving from specialized computational tools into autonomous research partners. We position \textit{\textbf{Agentic Science}} as a pivotal stage within the broader \textit{\textbf{AI for Science}} paradigm, where AI systems progress from partial assistance to full scientific agency. Enabled by large language models (LLMs), multimodal systems, and integrated research platforms, agentic AI exhibits capabilities in hypothesis generation, experimental design, execution, analysis, and iterative refinement-behaviors once regarded as uniquely human. This survey offers a \textbf{domain-oriented review} of autonomous scientific discovery across life sciences, chemistry, materials, and physics, synthesizing research progress and advances within each discipline. We unify three previously fragmented perspectives-process-oriented, autonomy-oriented, and mechanism-oriented-through \textbf{a comprehensive framework }that connects foundational capabilities, core processes, and domain-specific realizations. Building on this framework, we (i) trace the evolution of AI for Science, (ii) identify five core capabilities underpinning scientific agency, (iii) model discovery as a dynamic four-stage workflow, (iv) review applications across life sciences, chemistry, materials science, and physics, and (v) synthesize key challenges and future opportunities. This work establishes a domain-oriented synthesis of autonomous scientific discovery and positions Agentic Science as a structured paradigm for advancing AI-driven research.
-
2511.0011ViewFrom Virtual Cells to Programmable Humans: Advancing Digital Biology Through Hybrid AI SystemsRecent advances in artificial intelligence (AI), high-performance computing, and systems biology have accelerated the development of AI-powered virtual biological systems, from virtual cells to multiscale organ models and programmable virtual humans. These systems promise transformative applications in drug discovery, precision medicine, and in silico clinical trials. This review provides a critical synthesis of current progress, key technologies, and future directions across this spectrum. We explore hybrid modeling strategies that combine mechanistic modelsâsuch as ordinary and partial differential equationsâwith deep learning methods including convolutional, recurrent, and graph neural networks. We emphasize the importance of robust uncertainty quantification, simulation validation, and multiscale integration across molecular, cellular, organ-level, and systemic processes. A core contribution is the introduction of the SIM-CARD framework, a standardized simulation accountability protocol to document data provenance, modeling assumptions, performance metrics, and regulatory alignment. We propose a three-phase translational roadmap: (1) validated AI-augmented virtual cells and organs (by 2030), (2) interoperable multi-organ physiological systems (by 2040), and (3) programmable full-body virtual humans supporting personalized simulations and regulatory use cases (by 2055). We identify key enablersâincluding high-fidelity multiscale data, computational scalability, and simulation governanceâas well as bottlenecks such as algorithmic bias, explainability, and regulatory uncertainty. Finally, we call for collaborative efforts to establish minimal benchmarking suites, FAIR-compliant simulation metadata, and cross-institutional federated learning infrastructure. This review aims to guide the scientific, regulatory, and clinical communities in navigating the complex yet promising trajectory toward clinically actionable programmable human simulations.
-
2511.0012ViewPhysics-Informed Neural Networks and Neural Operators for Parametric PDEs: Methods, Applications and Future DirectionsPDEs arise ubiquitously in science and engineering, where solutions depend on parameters representing physical properties, boundary conditions, or geometric configurations. Traditional numerical methods require solving the PDE anew for each parameter value, making parameter space exploration prohibitively expensive for high-dimensional problems. Recent advances in machine learning, particularly physics-informed neural networks (PINNs) and neural operators, have revolutionized parametric PDE solving by learning solution operators that generalize across parameter spaces. We critically analyze two main paradigms: (1) PINNs, which embed physical laws as soft constraints and excel at inverse problems with sparse data, and (2) neural operators (including DeepONet, Fourier Neural Operator, and their variants), which learn mappings between infinite-dimensional function spaces and achieve unprecedented parameter space generalization. Through detailed comparisons across fluid dynamics, solid mechanics, heat transfer, and electromagnetics, we show that neural operators can achieve computational speedups ranging from 10^3 to 10^5 times faster than traditional solvers for multi-query scenarios, while maintaining comparable accuracy. We provide practical guidance for method selection, discuss theoretical foundations including universal approximation and convergence guarantees, and identify critical open challenges including high-dimensional parameter spaces, complex geometries, and out-of-distribution generalization. This work establishes a unified framework for understanding parametric PDE solvers through the lens of operator learning, offering a comprehensive resourceâwhich we intend to incrementally updateâfor this rapidly evolving field.
-
2511.0014ViewArtificial Intelligence in Biomedical Research: From Data Integration to Precision MedicineThis comprehensive review examines the transformative role of artificial intelligence in biomedical research, from foundational data integration to clinical applications. The paper explores how AI techniques facilitate multimodal data fusion across diverse biological data types, employing both traditional statistical methods and advanced deep learning architectures including variational autoencoders, graph neural networks, and transformer models. It evaluates AI applications in medical imaging, where convolutional neural networks have achieved remarkable diagnostic accuracy (up to 94\% in COVID-19 detection) while enhancing segmentation and classification tasks across multiple imaging modalities. The review further investigates generative AIâs impact on molecular design and drug discovery, highlighting transformer-based architectures like TransAntivirus that navigate vast chemical spaces to optimize therapeutic candidates. Finally, it examines AI-enabled precision medicine applications, including Clinical Decision Support Systems and federated learning approaches that balance analytical power with privacy preservation. Despite significant progress, implementation challenges persist, including data heterogeneity, model explainability, and ethical concerns regarding bias and privacy. The paper underscores the importance of developing interpretable AI systems that integrate seamlessly into clinical workflows while addressing regulatory, ethical, and economic considerations to realize the full potential of AI in advancing biomedical research and healthcare delivery.