Papers

Spotlight Papers Show / Hide
  • 2511.0009
    A Pilot Study Evaluating Large Language Models as Reviewers at Academic Conferences
    This paper presents a new system for academic peer review that is more objective, efficient, and community-guided. Our system incorporates author-assisted evaluation (Author-AAE) and community-guided review (CGR) into the peer review of AI conferences. This is in contrast to existing approaches that prioritize alternative systems that only address some of these challenges. Our evaluation uses data from three major AI conferences that used our system and from a survey of reviewers. Their feedback indicates that our system’s reviews are superior to single-LLM-based reviews due to their reduced subjectivity and enhanced quality. The reviewers’ scores for our system’s reviews were significantly higher than for single-LLM-based reviews across multiple metrics: “Reproducibility and Quality” (by 0.427 ± 0.007), “Review Quality” (by 0.265 ± 0.09), and “Alignment between opinion and paper score” (by 0.503 ± 0.090). In addition, we discovered that single-LLM-based reviews are more likely to be rejected by the program committee after author major revisions (on average by 0.182 ± 0.103) and are much more likely to be rejected overall (on average by 0.300 ± 0.124), compared to our system’s reviews. These results suggest that our system performs better in reducing the arbitrary nature of the current peer review system and can serve as an inspiration for the scientific community to explore new review systems.
    🤖 AI Empirical
    🎯 ICAIS2025 Accepted Paper
    📄 View
  • 2604.0175
    Persistent Positive Fluid Balance Within the First 48 Hours and In-Hospital Mortality in Critically Ill Patients With COPD Complicated by Pulmonary Hypertension
    xiezhiyuan
    Background: Patients with chronic obstructive pulmonary disease (COPD) complicated by pulmonary hypertension (PH) represent a high-risk population with limited evidence regarding early ICU fluid management. We investigated whether persistent positive fluid balance during the first 48 hours after ICU admission was associated with in-hospital mortality. Methods: We performed a retrospective multicenter cohort study using MIMIC-IV as the discovery cohort and eICU as the external validation cohort. Adult ICU patients with diagnosis-coded COPD complicated by PH were included. The main exposure was persistent positive fluid balance, defined as positive net fluid balance on both day 1 and day 2 after ICU admission. The primary outcome was in- hospital mortality. Multivariable logistic regression with multiple imputation was used as the primary analysis. Propensity score overlap weighting, stabilized inverse probability of treatment weighting (IPTW), complete-case analysis, nonlinear spline analysis, and clinically relevant subgroup analyses were performed. Results: The analysis included 1,891 ICU stays (1,493 from MIMIC-IV and 398 from eICU), with 348 in- hospital deaths. Persistent positive 48-hour fluid balance occurred in 484 patients (25.6%). Crude mortality was higher in the persistent positive group than in the non-persistent positive group (30.0% vs 14.4%). In the main multiply imputed multivariable model, persistent positive fluid balance was associated with higher in-hospital mortality in MIMIC-IV (OR 1.45, 95% CI 1.01-2.07; P=0.043) and in eICU (OR 1.88, 95% CI 1.06-3.32; P=0.030), with a fixed-effect pooled OR of 1.56 (95% CI 1.15-2.11; P=0.004). The association remained robust after overlap weighting, stabilized IPTW, and complete- case analysis. Subgroup analyses showed directionally consistent associations across all examined strata. Conclusions: Among ICU patients with diagnosis-coded COPD complicated by PH, persistent positive fluid balance during the first 48 hours was independently associated with higher in-hospital mortality and externally validated in eICU. Persistent early positive fluid balance may represent a high-risk dynamic fluid phenotype rather than a causal treatment effect. Keywords: COPD; pulmonary hypertension; fluid balance; intensive care unit; MIMIC-IV; eICU; mortality; multiple imputation
    👤 Human
    📄 View
  • 2603.0004
    Correcting hybrid density functionals to model Y6 and other non-fullerene acceptors
    Tom Ward, Isabel Creed, Tim Rein, Jarvist Moore Frost
    Recently developed fused-ring organic electron-acceptors such as Y6 have strong oscillator strength, good charge-carrier transport and low bandgaps. They therefore have enormous current technical application to optoelectronic devices, such as solar cells. Due to the large number of atoms involved in representative aggregates of these materials, we need an efficient electronic structure method to model them. Standard density functional theory poorly describe charge-transfer states, and were developed for vacuum calculations of individual molecules. In this work we tune a range-separated hybrid functional for Y6. We characterise representative dimers of the solid-state and show that Y6 dimers show the extensive solvatochromic effects are due, in part, to oscillator strength borrowing. We provide an explanation for the short optimally tuned range-separation parameter, based in the Penn model for the frequency dependent dielectric of a semiconductor. We caution that standard range-separated hybrids are less accurate than global hybrids for these, and similar, materials. We show how reducing the range-separation length improves the accuracy of standard functionals, without an involved tuning process.
    👤 Human Theoretical
    📄 View
  • 2511.0036
    可计算离散整体几何结构全国巡回艺术展
    赵辉
    2024 年,可计算离散整体几何结构实验室发起了一场覆盖全国多所高校及科研机构的巡回艺术展。展览内容聚焦前沿几何拓扑理论与概念,尤其凸显各类整体几何结构。 全国巡回艺术展借助全新计算机算法、原创代码及计算机图形学渲染技术生成的图片与视频,将抽象的内蕴几何结构转化为直观的视觉呈现,并以巨幅海报的形式展出。这些展览内容的创新之处在于体现了数学家近几十年来发展的内蕴整体几何拓扑概念。目前,这场巡回艺术展已走进十余所高校,且仍在持续推进中,整个巡回展览预期将历时十年,100所高校。通过这种新颖的艺术展形式,全国多所高校不同专业的师生得以直观了解此前鲜少接触的几何拓扑概念,激发了研究兴趣,为深入探索前沿几何拓扑理论理论及其应用奠定了基础,也为理工科的各个专业,如力学、机械、计算机、物理、材料等,通过对前沿几何拓扑理论的应用进行跨学科、交叉学科的融合铺平了道理。通过此次全国巡回艺术展也就在艺术领域开拓了一个全新的“整体几何结构 几何拓扑艺术”流派。
    👤 Human Position
    📄 View
  • 2510.0039
    Uncertainty Quantification in Machine Learning for Responsible AI
    Machine learning and artificial intelligence will be deeply embedded in the intelligent systems humans use to automate tasking, optimize planning, and support decision-making. We present a critical review of uncertainty quantification (UQ) in large language models (LLMs), synthesizing insights from over 80 papers across leading venues (ACL, ASE, NeurIPS, ICML, AAAI, IJCAI, Nature, and others). We introduce UQ-Net, a unified probabilistic framework that combines Bayesian modeling, calibration, conformal prediction, and selective decision rules to disentangle epistemic and aleatoric uncertainty and to support reliable decision thresholds. UQ-Net integrates uncertainty estimates with calibration procedures and anomaly detection to enable safer selective deployment of LLM agents. Through case studies in medical diagnosis and code generation, we demonstrate that UQ-Net improves calibration and reduces predictive error by 15–20% relative to standard baselines. We survey existing evaluation practices and identify critical gaps: misalignment of consistency and entropy with factuality, lack of benchmarks for multi-episode interactions, and inconsistent metrics for calibration and tightness. We advocate for context-aware datasets, standardized metrics, and human-in-the-loop evaluations to better align UQ methods with deployment needs. Our review and proposed framework offer a principled foundation for operationalizing UQ in LLMs, advancing the development of trustworthy, responsible agentic AI for safety-sensitive, real-world applications.
    👤 Human Survey
    🎯 ICAIS2025 Accepted Paper
    📄 View
  • 2510.0027
    From Knowledge Tree to Knowledge Forest: Harnessing Chemical Understanding with Machine Learning and Artificial Intelligence
    The 2024 Physics and Chemistry Nobel Prizes to machine learning (ML) and artificial intelligence (AI) breakthroughs marked “Year 1 of AI for Science,” underscoring their transformative role in physical sciences. Yet data are not the same as understanding—a distinction central to chemistry, which has long relied on concepts such as bond, aromaticity, and reactivity as scaffolds for understanding and explanation. Building on our recent perspectives (ACS Phys. Chem. Au 2024, 4, 135–142; J. Chem. Theory Compt. 2025, DOI: 10.1021/acs.jctc.5c01299), this article explores how ML/AI can become engines of chemical understanding. We introduce a quintet of chemical knowledge—ontology, epistemology, theory, concept, and understanding—and develop the metaphors of the Knowledge Tree and Knowledge Forest to show how diverse epistemologies interact and recursively enrich one another. Case studies on aromaticity, catalysis, orbital-free density functional theory, and protein folding illustrate how ML features, when interpreted as conceptual roots, yield fruits of understanding. Contrasting multiscale modeling with hierarchical modeling, we argue that ML enables emergent, concept-driven integration across levels. Cultivating this plural and hierarchical ecosystem may guide theoretical chemistry toward its next breakthroughs, resolving Dirac’s dilemma not by brute force but by forests of concepts that transform data into enduring understanding.
    👤 Human Position
    🎯 ICAIS2025 Accepted Paper
    📄 View
  • 2511.0019
    From Virtual Cells to Programmable Humans: Advancing Digital Biology Through Hybrid AI Systems
    Recent advances in artificial intelligence (AI), high-performance computing, and systems biology have accelerated the development of AI-powered virtual biological systems, from virtual cells to multiscale organ models and programmable virtual humans. These systems promise transformative applications in drug discovery, precision medicine, and in silico clinical trials. This review provides a critical synthesis of current progress, key technologies, and future directions across this spectrum. We explore hybrid modeling strategies that combine mechanistic models—such as ordinary and partial differential equations—with deep learning methods including convolutional, recurrent, and graph neural networks. We emphasize the importance of robust uncertainty quantification, simulation validation, and multiscale integration across molecular, cellular, organ-level, and systemic processes. A core contribution is the introduction of the SIM-CARD framework, a standardized simulation accountability protocol to document data provenance, modeling assumptions, performance metrics, and regulatory alignment. We propose a three-phase translational roadmap: (1) validated AI-augmented virtual cells and organs (by 2030), (2) interoperable multi-organ physiological systems (by 2040), and (3) programmable full-body virtual humans supporting personalized simulations and regulatory use cases (by 2055). We identify key enablers—including high-fidelity multiscale data, computational scalability, and simulation governance—as well as bottlenecks such as algorithmic bias, explainability, and regulatory uncertainty. Finally, we call for collaborative efforts to establish minimal benchmarking suites, FAIR-compliant simulation metadata, and cross-institutional federated learning infrastructure. This review aims to guide the scientific, regulatory, and clinical communities in navigating the complex yet promising trajectory toward clinically actionable programmable human simulations.
    🤖 AI Survey
    📄 View
  • 2511.0023
    ReasoningV: Efficient Verilog Code Generation with Adaptive Hybrid Reasoning
    Large Language Models (LLMs) have advanced Verilog code generation but still suffer from data quality, limited reasoning, and inefficiency. We introduce ReasoningV, coupling intrinsic reasoning with adaptive routing. Our contributions: (1) ReasoningV-5K, 5{,}322 functionally verified samples with distilled reasoning paths; (2) a Two-Stage training scheme (LoRA for foundations + full-parameter reasoning enhancement); and (3) difficulty-aware routing that saves 85--93\% tokens vs. a strong commercial model and 32--75\% vs. fixed-depth variants. On VerilogEval-human, RV-14B attains 73.9\% pass@1; RV-7B reaches 57.8\% with superior efficiency. Models, data, and code: \url{https://github.com/BUAA-CLab/ReasoningV}.
    👤 Human Methodology
    🎯 ICAIS2025 Accepted Paper
    📄 View
  • 2605.0008
    Shared-Probe Priors for Diagnosing and Guarding Against Expert-Routing Collapse in Multilingual ASR
    Zhifan Pan
    Language-specific LoRA experts make multilingual ASR parameter-efficient, but they also turn language choice into a latent inference-time decision when labels are absent or unreliable. We study this decision as a route-level failure mode in LoRA-adapted Whisper and evaluate E7 as an auditable prior intervention: a probe-transcript language prior supplies the final-route override at the pre-specified $\lambda=1.0$ operating point, while raw-router and final-route outcomes remain separately logged. In the matched E6 counterfactual without the shared probe, the Chinese test split has no target-expert routing and shows an insertion-heavy collapse to 683.33 CER; applying the E7 prior override recovers 99.87\% Chinese target routing and reduces CER to 7.03. After adding Dutch, Spanish, Italian, and Polish experts within the same frozen diagnostic protocol, E7 selects the target expert for 6657/6660 new-language utterances; the matched new-language no-prior counterfactual selects no target experts. Component, layer, and LID controls show that the recovery comes from a transcript-mediated prior intervention rather than reranker or hidden fallback artifacts. The contribution is a bounded diagnosis under a frozen LoRA expert pool: E7 makes a prior-mediated route override observable under label uncertainty, while static experts and Whisper-LID remain strong clean-reference systems.
    🤖 AI Methodology
    📄 View
  • 2605.0007
    Structural Homological Resolution with Dual-Stream Parameter Compression: A Unified Framework for Mathematical Reasoning under Extreme Resource Constraints
    Qiu Houlin
    Abstract This paper proposes a novel mathematical reasoning framework—Structural Homological Resolution with Dual-Stream Parameter Compression (SHR-DS)—aimed at addressing the fundamental challenge of building high-performance mathematical reasoning models under extreme hardware constraints (8 GB VRAM, limited system memory, no hardware expansion). The framework systematically integrates three independent innovations for the first time: Structural Cognition Training (SCT) enables the model to extract transferable solution skeletons from very few examples, with a formal type system rigorously guaranteeing skeleton transfer; Homological Resolution Reasoning (HRR) defines mathematical proof as the progressive resolution of relative homology groups in a semantic complex, endowing reasoning with structural correctness guarantees; Dual-Stream Parameter Compression (DPC) decouples linguistic fluency and mathematical reasoning ability into heterogeneous parameter streams—linguistic capabilities are solidified in a compressed base model, while reasoning capabilities are generated on-demand via a hypernetwork conditioned on skeletons, enabling 1B-level reasoning to run entirely on consumer-grade GPUs. This paper reveals a profound duality between solution skeletons and homological fillers, proves the topological necessary and sufficient conditions for skeleton transfer, and provides a complete mathematical formalization. Based on calibrated data from adjacent existing technologies, we estimate that under 100M–1B parameter scale, SHR-DS can improve sample efficiency by 10–50×, increase mathematical reasoning accuracy by 20%–35%, while retaining over 95% of the dialogue capability of the source language model. This framework lays a rigorous theoretical foundation and provides a complete engineering path for “time-for-space” extreme reasoning systems. Keywords: Mathematical reasoning; algebraic topology; homological resolution; skeleton extraction; hyper-network; parameter-efficient training; few-shot learning; knowledge decoupling
    👤 Human Theoretical
    📄 View
Page 1 of 12 (Total 225 papers)