Papers
Event:
-
2511.0005ViewMulti-Agent Adaptive Variance Reduction Technique for Decentralized Nonsmooth Nonconvex Stochastic OptimizationDecentralized stochastic optimization with nonsmooth objectives and only zeroth-order oracle access arises in federated learning and privacy-sensitive applications, yet existing methods suffer from high variance and dimension-dependent complexity. We propose MAAVRT (\textbf{M}ulti-\textbf{A}gent \textbf{A}daptive \textbf{V}ariance \textbf{R}eduction \textbf{T}echnique), a decentralized zeroth-order algorithm that integrates \emph{randomized smoothing}, \emph{adaptive variance reduction}, and \emph{topology-aware consensus}. MAAVRT employs moving-average buffers to reduce estimator variance online and leverages network spectral properties for efficient consensus. Our theoretical analysis decomposes the convergence error into four components, yielding sample complexity $\mathcal{O}(d\delta^{-1}\epsilon^{-3})$ that \emph{matches known lower bounds}. Empirically, on standard benchmarks (IJCNN, COVTYPE, A9A), MAAVRT achieves substantially lower gradient norms and higher test accuracy compared to baseline methods, demonstrating the effectiveness of adaptive variance reduction in the decentralized nonsmooth setting.
-
2511.0004ViewVision Transformers for Semiconductor Defect Detection: A Comprehensive Survey of AI-Driven Image Segmentation from CNNs to Foundation Models (2015-2025)VISION TRANSFORMERS FOR SEMICONDUCTOR DEFECT DETECTION: A COMPREHENSIVE SURVEY OF AI-DRIVEN IMAGE SEGMENTATION FROM CNNS TO FOUNDATION MODELS (2015-2025)
-
2511.0003ViewAI Empowered Thermal Management Materials DesignThe development of high-performance thermal management materials holds significant importance in fields such as chips, data centers and batteries. Materials informatics, which integrates big data and artificial intelligence, is emerging as the fourth paradigm for materials research. Over the past few years, our team has undertaken preliminary explorations in the development of advanced thermal management materials empowered by big data and artificial intelligence. In this work, we introduce three successful materials informatics applications on thermal management materials design, the construction of machine learning interatomic potentials for thermal property calculations, the discovery and generative design of high-thermal-conductivity materials, and the intelligent design of micro/nano structures for thermal transport. Those successful cases have shown great advantage for thermal management materials design via materials informatics.
-
2511.0002ViewBattery-Sim-Agent: Leveraging LLM-Agent for Inverse Battery Parameter EstimationParameterizing high-fidelity ``digital twins'' of batteries is a critical yet challenging inverse problem that hinders the pace of battery innovation. Prevailing methods formulate this as a black-box optimization (BBO) task, employing algorithms that are sample-inefficient and blind to the underlying physics. In this work, we introduce a new paradigm that reframes the inverse problem as a reasoning task, and present \textsc{Battery-Sim-Agent}, the first framework to deploy a Large Language Model (LLM) agent in a closed loop with a high-fidelity battery simulator. The agent mimics a human scientist's workflow: it interprets rich, multi-modal feedback from the simulator, forms physically-grounded hypotheses to explain discrepancies, and proposes structured parameter updates. On a systematically constructed benchmark suite spanning diverse battery chemistries, operating conditions, and difficulty levels, our agent significantly outperforms strong BBO baselines like Bayesian optimization in identifying accurate parameters. We further demonstrate the framework's capability in complex long-horizon degradation fitting tasks and validate its practical applicability on real-world battery datasets. Our results highlight the promise of LLM-agents as reasoning-based optimizers for scientific discovery and battery parameter estimation.
-
2511.0001ViewPhysGym: Benchmarking LLMs in Interactive Physics Discovery with Controlled PriorsEvaluating the scientific discovery capabilities of large language model based agents, particularly how they cope with varying environmental complexity and utilize prior knowledge, requires specialized benchmarks currently lacking in the landscape. To address this gap, we introduce \textsc{PhysGym}, a novel benchmark suite and simulation platform for rigorously assessing LLM-based scientific reasoning in interactive physics environments. \textsc{PhysGym}'s primary contribution lies in its sophisticated control over the level of prior knowledge provided to the agent. This allows researchers to dissect agent performance along axes including the complexity of the problem and the prior knowledge levels. The benchmark comprises a suite of interactive simulations, where agents must actively probe environments, gather data sequentially under constraints and formulate hypotheses about underlying physical laws. \textsc{PhysGym} provides standardized evaluation protocols and metrics for assessing hypothesis accuracy and model fidelity. We demonstrate the benchmark's utility by presenting results from baseline LLMs, showcasing its ability to differentiate capabilities based on varying priors and task complexity.
-
2510.0091ViewFairEval: Evaluating Fairness in LLM-Based Recommendations with Personality AwarenessRecent advances in Large Language Models (LLMs) have enabled their application to recommender systems (RecLLMs), yet concerns remain regarding fairness across demographic and psychological user dimensions. We introduce FairEval, a novel evaluation framework to systematically assess fairness in LLM-based recommendations. Unlike prior benchmarks that focus solely on demographic attributes, FairEval uniquely integrates personality profiles with eight sensitive demographic attributes, including gender, race, and age enabling a comprehensive and nuanced assessment of user-level bias. We evaluate state-of-the-art models, including ChatGPT 4o and Gemini 1.5 Flash, on music and movie recommendation tasks using structured prompts. FairEvalβs personality-aware fairness metric, PAFS@25, achieves high consistency scores up to 0.9969 for ChatGPT 4o and 0.9997 for Gemini 1.5 Flash, underscoring its robustness in equitable recommendations across diverse user profiles, while also uncovering fairness gaps, with SNSR disparities reaching up to 34.79%. Our results also reveal disparities in recommendation consistency across user identities and prompt formulations, including typographical and multilingual variations. By unifying psychographic and demographic evaluation in RecLLMs, FAIREVAL offers a robust and reproducible benchmark for inclusive and bias-aware LLM evaluation.
-
2510.0090ViewA Fuzzy-based Approach to Predict Human Interaction by Functional Near-Infrared SpectroscopyIn this article, we introduce the Fuzzy logic-based attention (Fuzzy Attention Layer) mechanism, a novel computational approach designed to enhance the interpretability and efficacy of neural models in psychological research. The fuzzy attention layer integrated into the transformer encoder model to analyze complex psychological phenomena from neural signals captured by functional near-infrared spectroscopy (fNIRS). By leveraging fuzzy logic, the fuzzy attention layer learns and identifies interpretable patterns of neural activity. This addresses a significant challenge in using transformers: the lack of transparency in determining which specific brain activities most contribute to particular predictions. Our experimental results, obtained from fNIRS data engaged in social interactions involving handholding, reveal that the fuzzy attention layer not only learns interpretable patterns of neural activity but also enhances model performance. In addition, these patterns provide deeper insights into the neural correlates of interpersonal touch and emotional exchange. The application of our model shows promising potential in understanding the complex aspects of human social behavior, verify psychological theory with machine learning algorithms, thereby contributing significantly to the fields of social neuroscience and AI. Presented version based on the work published in IEEE TFS (2025)
-
2510.0089ViewBasketVision: Benchmarking MLLMs' Grasp of Complex Dynamic SystemsWhile Multimodal Large Language Models (MLLMs) excel on general visual tasks, their capacity to comprehend complex dynamic systems remains a critical open question. Such systems, governed by physical laws, explicit rules, and multi-agent interactions, form the fabric of the real world. To facilitate a systematic diagnosis of current MLLM limitations, we introduce BasketVision, a new benchmark that leverages professional basketball as a microcosm for these dynamic environments. BasketVision probes model capabilities across seven dimensionsβspanning perception, reasoning, and predictionβthrough 6,000 curated, bilingual questions from professional game data. An automated data generation pipeline underpins the benchmark, ensuring both scalability and fine-grained precision. Our evaluation of 23 leading models reveals a chasm between machine and human cognition: human experts attain 96.34% accuracy, while the premier model, GPT-4o, achieves only 63.15%. The analysis pinpoints spatial reasoning as a persistent bottleneck and uncovers specific patterns of task specialization. BasketVision thus serves as a crucial apparatus for charting the frontiers of MLLMs and steering future work toward more robust reasoning in dynamic visual worlds.
-
2510.0088ViewMatEvolve: A Synergistic SymbolicβLLM Agent for Multi-Objective Materials DesignMaterials define the eras of human civilization, yet the design of novel materials is fundamentally constrained by the immense chemical space, which renders traditional enumeration-screening methodology computationally prohibitive and inefficient. This paper introduces a paradigm shift towards insight-exploration-validation, enabling an intelligent and evolutionary exploration of material design pathways. To actualize this paradigm, we propose MatEvolve, a synergistic symbolicβLLM agent that reconceptualizes material design as a closed-loop, programmatic evolution task. Central to MatEvolve is a novel symbolic formalism, Material Edit Language, which empowers the agent to programmatically take chemical operations. The exploration trajectory is directed by a multifaceted guidance strategy, comprising a dynamic knowledge injection mechanism and a two-stage exploration strategy that balances broad exploration and deep optimization. Furthermore, a multi-objective fitness landscape ensures directional and efficient navigational guidance. These integrated strategies contribute to a 32.2% improvement over direct material structure modification. Crucially, comparisons demonstrate that our insight-exploration-validation paradigm outperforms the traditional enumeration-screening approach by 33.6%, highlighting its superior efficacy in navigating vast design spaces.
-
2510.0087ViewEndoNet: Content-Aware Linear Attention for Endoscopic Video Super-ResolutionEndoscopic video super-resolution (EVSR) seeks to reconstruct high-resolution frames from low-resolution endoscopic video, a task critical for enhancing clinical visualization of fine anatomical details. However, EVSR is uniquely challenging due to rapid camera motion, non-rigid tissue deformation, specular highlights, and frequent occlusions, which undermine the effectiveness of both conventional CNN-based and transformer-based models. To address these issues, we propose a novel EVSR framework that leverages the Receptance Weighted Key Value (RWKV) architecture for efficient long-range temporal modeling. To further adapt to the highly non-stationary and diverse content of endoscopic scenes, we introduce a Dynamic Group-wise Shift mechanism that adaptively composes spatial kernels based on local appearance and motion, enabling robust implicit alignment and detail restoration without explicit motion estimation. Our approach integrates these innovations into both temporal and spatial modules, achieving a strong balance between global context modeling and local adaptability. Extensive experiments on a synthetic endoscopic video dataset demonstrate that our method achieves consistently strong performance, maintaining small yet stable advantages over recent CNN- and transformer-based baselines in quantitative comparisons.
-
2510.0086ViewEndoNet: Content-Aware Linear Attention for Endoscopic Video Super-ResolutionEndoscopic video super-resolution (EVSR) seeks to reconstruct high-resolution frames from low-resolution endoscopic video, a task critical for enhancing clinical visualization of fine anatomical details. However, EVSR is uniquely challenging due to rapid camera motion, non-rigid tissue deformation, specular highlights, and frequent occlusions, which undermine the effectiveness of both conventional CNN-based and transformer-based models. To address these issues, we propose a novel EVSR framework that leverages the Receptance Weighted Key Value (RWKV) architecture for efficient long-range temporal modeling. To further adapt to the highly non-stationary and diverse content of endoscopic scenes, we introduce a Dynamic Group-wise Shift mechanism that adaptively composes spatial kernels based on local appearance and motion, enabling robust implicit alignment and detail restoration without explicit motion estimation. Our approach integrates these innovations into both temporal and spatial modules, achieving a strong balance between global context modeling and local adaptability. Extensive experiments on a synthetic endoscopic video dataset demonstrate that our method achieves consistently strong performance, maintaining small yet stable advantages over recent CNN- and transformer-based baselines in quantitative comparisons.
-
2510.0085ViewAI Mathematician as a Partner in Advancing Mathematical DiscoveryArtificial intelligence (AI) has demonstrated impressive progress in mathematical reasoning, yet its integration into the practice of mathematical research remains limited. In this study, we investigate how the AI Mathematician (AIM) system can operate as a research partner rather than a mere problem solver. Focusing on a challenging problem in homogenization theory, we analyze the autonomous reasoning trajectories of AIM and incorporate targeted human interventions to structure the discovery process. Through iterative decomposition of the problem into tractable subgoals, selection of appropriate analytical methods, and validation of intermediate results, we reveal how human intuition and machine computation can complement one another. This collaborative paradigm enhances the reliability, transparency, and interpretability of the resulting proofs, while retaining human oversight for formal rigor and correctness. The approach leads to a complete and verifiable proof, and more broadly, demonstrates how systematic human-AI co-reasoning can advance the frontier of mathematical discovery.
-
2510.0084ViewPST-AUTO-AGENT: A Multi-Agent Ensemble Framework for Paper Source TracingThe escalating volume of scientific literature necessitates efficient methods for identifying foundational works that significantly inform new research. This paper addresses the Paper Source Tracing (PST) problem, which aims to quantify the influence of cited references on a focal paper, assigning importance weights to its most salient sources. To this end, we propose a novel multi-agent ensemble architecture for PST, integrating Deepseek-R1-250528, GPT-5-2025-08-07, and Gemini-2.5-pro. Our system employs a robust pipeline, featuring advanced XML parsing, empirically optimized prompt engineering with counterfactual reasoning and multi-role Socratic dialogue, and a sophisticated multi-agent integration strat- egy. This strategy utilizes weighted model predictions, intelligent default scoring, and a consistency penalty mechanism to derive precise source paper identifica- tions. Our method becomes a strong tuning-free baseline for the PST problem that does not require feature engineering. Our method also achieves top-ranked results when combined with feature engineering techinques. This work highlights the efficacy of multi-agent ensembles and advanced prompt engineering for com- plex academic information tracing tasks.
-
2510.0083ViewEnhancing AI Conference Peer Review Quality through Anonymized Feedback and Adaptive Reward SystemsThis paper addresses the critical issue of enhancing peer review quality at AI conferences by implementing anonymized feedback and adaptive reward systems. The growing volume of conference submissions and limited reviewer accountability result in inconsistent review quality, bias, and a lack of transparency, posing significant challenges to the integrity of AI research. Our proposed solution involves a dynamic feedback loop that anonymizes and aggregates feedback to minimize biases, coupled with an adaptive reward system to motivate reviewers while preserving the integrity of the review process. Utilizing sentiment analysis, feedback is processed to detect and mitigate potential biases, enhancing the fairness and efficacy of peer reviews. Experiments conducted using a logistic regression model on the Yelp Polarity dataset demonstrate a significant improvement in sentiment classification accuracy, from 54.1\% to 83.4\%, indicating the effectiveness of our anonymized feedback loop. However, the bias detection score of 0.0 across all runs highlights the need for further refinement in bias mitigation. Our method's scalability and adaptability across various conference settings are supported by its successful implementation in sentiment analysis tasks. Overall, this study provides a robust framework for enhancing the accountability and quality of peer reviews, with implications for future research aimed at integrating advanced bias detection and mitigation techniques.
-
2510.0082ViewReinforced Adaptive Diffusion Networks for Enhanced Image SynthesisThe field of generative modeling in computer vision has been propelled significantly forward by methods such as Generative Adversarial Networks (GANs) and diffusion models; however, challenges like balancing image fidelity and diversity alongside incorporating class-specific details persist. These traditional approaches often exhibit limitations in adaptability and computational efficiency. This paper introduces Reinforced Adaptive Diffusion Networks (RAD-Nets), a novel generative framework that synergizes diffusion processes with reinforcement learning to enhance image synthesis through dynamic parameter optimization. The core innovation lies in integrating a Reinforced Learning Layer and an Adaptive Feedback Mechanism, which employ real-time feedback to iteratively refine outputs. The Multi-Objective Optimization module within RAD-Nets specifically targets the concurrent enhancement of image quality, diversity, and class fidelity, addressing the issues found in static optimization techniques. Empirical evaluations demonstrate that RAD-Nets outperform existing generative models on standard benchmarks like CIFAR-10 and CelebA, achieving superior metrics in quality and diversity without compromising fidelity. By focusing on class-conditional image synthesis, RAD-Nets also demonstrate significant improvements in class-specific feature representation, marking a substantial advancement over conventional generative modeling frameworks.
-
2510.0081ViewAdaptive and Fair Cross-Domain Recommendations with Meta-Reinforcement LearningThe research focuses on the development of a novel hierarchical and adaptive recommendation system that addresses the dual challenge of personalization and fairness in cross-domain environments. Traditional recommendation systems have struggled to effectively integrate diverse user interactions and adapt to rapidly evolving user preferences while maintaining fairness. The proposed solution leverages three core innovations: cross-domain collaborative filtering, meta-reinforcement learning, and fairness-aware mechanisms. By synthesizing data from multiple domains, the system constructs enriched user profiles that inform a meta-reinforcement learning framework, enhancing adaptability to user behavior changes. Additionally, fairness-aware mechanisms are incorporated to mitigate biases and ensure equitable content distribution. This integrated approach aims to resolve key challenges in recommendation systems, namely the precise prediction of preferences and the equitable treatment of diverse user groups. Empirical evaluations demonstrate that the proposed methodology not only improves recommendation accuracy but also enhances fairness metrics, thereby fostering a balanced and inclusive recommendation landscape.
-
2510.0080ViewEnhancing Image Generation with Multi-Modal VQ-VAE and Self-Supervised LearningThis paper addresses challenges in unsupervised representation learning, particularly in high-fidelity image generation and domain adaptability across diverse data modalities. Current frameworks such as GANs and VQ-VAE have shown promise but face limitations in maintaining consistent performance across variable data distributions without significant supervision. To overcome these challenges, we propose a Multi-Modal Vector Quantized Variational AutoEncoder (VQ-VAE) integrated with Self-Supervised Learning (SSL). Our innovative approach incorporates a harmonizer module within the VQ-VAE architecture, which aligns and transforms data representations across multiple modalities. By leveraging self-supervised learning techniques, the model iteratively refines its parameters, enhancing both image reconstruction quality and adaptability to new domains with minimal supervision. The proposed framework processes CIFAR-10 datasets to facilitate structured data integration, employing advanced standardization and batching techniques for optimal performance. Empirical evaluations reveal substantial improvements in image reconstruction fidelity and domain adaptability compared to standard VQ-VAE models, corroborated by metrics such as PSNR, SSIM, and FID. The seamless integration of modality-specific feature extraction and embedding generalization within our framework demonstrates the potential to advance unsupervised learning paradigms. Our contribution establishes a robust solution, optimizing the generative process, and expanding applicability in real-world scenarios characterized by unlabeled, multi-modal datasets.
-
2510.0079ViewCausal-Informed Adaptive Learning for Contextual Personalization in Recommendation SystemsIn recent years, personalized recommendation systems have become integral to enhancing user experiences on digital platforms, yet challenges remain in effectively integrating causal inference with adaptive learning mechanisms and semantic alignment. Traditional systems predominantly rely on correlation-based models, often overlooking the dynamic causal relationships within user interaction data that could enhance recommendation precision and contextual relevance. This paper addresses these gaps by presenting a novel framework that synergizes causal inference using structural equation models and causal diagrams, adaptive learning algorithms via a refined hybrid multi-armed bandit strategy, and semantic content mapping with advanced natural language processing techniques such as Latent Dirichlet Allocation and BERT-based embeddings. Through this integrated approach, our method dynamically adjusts recommendations to align with user preferences and adapt to context changes. Empirical evaluation demonstrates our method's superiority in achieving higher accuracy and relevance in personalized content delivery compared to existing models. The findings underscore the potential of our framework to significantly improve recommendation cohesion and user satisfaction, marking a substantial advancement in the field of contextual personalization.
-
2510.0078ViewAdaptive Diffusion-Latent Flow Model: Enhancing Image Synthesis Fidelity and StabilityIn the domain of neural architectures for generative models, the emergence of diffusion processes and flow-based transformations has revolutionized image synthesis, traditionally dominated by Generative Adversarial Networks and Variational Autoencoders. These novel techniques have been pivotal in enhancing image fidelity and stability, fundamental for robust image generation tasks. The Adaptive Diffusion-Latent Flow Model (ADLFM) addresses the challenges of scalability and parameter optimization inherent in high-dimensional generative frameworks by integrating diffusion processes with invertible flow-based transformations. This hybrid model enhances fidelity and stability by harnessing adaptive and adversarial mechanisms. ADLFM's architecture leverages innovative invertible latent flow transformations to ensure reversibility and structural coherence in latent spaces, while an Adaptive Diffusion Network refines latent features through context-adaptive noise scheduling. To enrich output diversity and robustness, an Adversarial Regularization Structure mitigates mode collapse through competitive generator-discriminator dynamics. Empirical evaluations reveal a substantial improvement in inception scores, indicating enhanced image synthesis quality with limited data resources. Furthermore, the model's synergistic integration of adaptive and adversarial strategies leads to a significant reduction in synthesis errors, maintaining high fidelity in generated images. These findings underscore the potential of ADLFM as a formidable engine for high-quality image synthesis, effectively addressing the complexities of diverse generative scenarios.
-
2510.0077ViewTrust-Enhanced Graph Neural Networks for Transparent RecommendationsIn the evolving landscape of digital platforms, the demand for robust recommendation systems is paramount to manage the deluge of user-generated data. Graph Neural Networks (GNNs) have emerged as a potent strategy in recognizing intricate user-item interactions due to their ability to leverage structural data insights. However, existing GNN-based models often overlook trust dynamics, a critical factor in ensuring recommendation reliability and transparency. Despite recognition of trust's potential to address biases and enhance models' interpretability, its integration with sophisticated network-based techniques remains underexplored. Responding to this gap, we propose the Trust-Enhanced Graph-Based Recommendation Model (GTERM), which seamlessly incorporates trust metrics within the GNN framework. GTERM transforms raw interaction data into a trust-augmented graph, employing graph convolutional and attention mechanisms to emphasize trust-enriched interactions, thereby refining recommendation accuracy and transparency. The proposed model achieves notable improvements over baseline methods, as evidenced in diverse experimental evaluations, demonstrating its capacity to deliver more accurate, trustworthy, and interpretable recommendations. Through the integration of trust factors, GTERM fosters user acceptance and enhances system performance by resolving key challenges related to the lack of interpretability and trustworthiness in traditional GNN-based systems.