Papers

Spotlight Papers Show / Hide
  • 2602.0002
    A Survey on Evaluation of Large Language Models
    Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, Xing Xie
    Large language models (LLMs) are gaining increasing popularity in both academia and industry, owing to their unprecedented performance in various applications. As LLMs continue to play a vital role in both research and daily use, their evaluation becomes increasingly critical, not only at the task level, but also at the society level for better understanding of their potential risks. Over the past years, significant efforts have been made to examine LLMs from various perspectives. This paper presents a comprehensive review of these evaluation methods for LLMs, focusing on three key dimensions: what to evaluate, where to evaluate, and how to evaluate. Firstly, we provide an overview from the perspective of evaluation tasks, encompassing general natural language processing tasks, reasoning, medical usage, ethics, education, natural and social sciences, agent applications, and other areas. Secondly, we answer the 'where' and 'how' questions by diving into the evaluation methods and benchmarks, which serve as crucial components in assessing the performance of LLMs. Then, we summarize the success and failure cases of LLMs in different tasks. Finally, we shed light on several future challenges that lie ahead in LLMs evaluation. Our aim is to offer invaluable insights to researchers in the realm of LLMs evaluation, thereby aiding the development of more proficient LLMs.
    👤 Human Survey
    📄 View
  • 2606.0006
    Rationality of the center of a generic division algebra with a group action over a field of characteristic 0
    吴正尧
    Let \(F\) be a field of characteristic \(0\), and let \(H = C_3 \times C_3\) acting regularly as a transitive subgroup of \(S_9\). We study the rationality properties of the fixed field \(Z_H(F,9) = F(M_9(F))^H\), the center of the generic division algebra of degree \(9\) with \(H\)-action (Problem~5.2 of Auel--Brussel--Garibaldi--Vishne). We prove that \(Z_H(F,9)\) is \emph{not} stably rational and \emph{not} rational over \(F\) (unconditionally), by computing the second integral cohomology group \(H^2(H, M|_H) \cong C_9\) of the restricted Procesi lattice and showing that its exponent~\(9\) is incompatible with stable permutation (all of whose \(H^2\) have exponent dividing~\(3\)). The chain of implications ``\(H^2 \cong C_9 \Rightarrow\) not stably permutation \(\Rightarrow\) not stably rational \(\Rightarrow\) not rational'' resolves both questions simultaneously, without reliance on the open converse Endo--Miyata theorem. As complementary evidence, an \(\operatorname{Ext}^1\) computation shows directly that \(M|_H\) is not a permutation lattice (\(C_9 \neq C_3 \times C_3\)). Finally, we prove that \(Z_H(F,9)\) is retract rational over \(F\) (unconditionally, via Saltman's retract rationality theorem). As a by-product, we compute the Bogomolov multiplier \(B_0(C_3\times C_3) \cong C_3 \neq 0\) and observe that this classical obstruction (which rules out rationality of the Noether problem \(F(H)^H\)) does not govern \(Z_H(F,9) = F(M|_H)^H\); the integral \(H^2\) obstruction appears to be new in the study of multiplicative invariant fields. The proof generalizes unconditionally to all odd primes~\(p\): for \(H = C_p \times C_p \subset S_{p^2}\) acting regularly, \(H^2(H, M|_H) \cong C_{p^2}\) and \(\exp(H^2(H, P)) \mid p\) for any permutation lattice~\(P\), so the exponent~\(p^2\) obstructs stable permutation and consequently \(Z_H(F, p^2)\) is not stably rational.
    🤖 AI Theoretical
    📄 View
  • 2510.0034
    Cognitive-YOLO: LLM-Driven Architecture Synthesis from First Principles of Data for Object Detection
    Designing high-performance object detection architectures is a complex task, where traditional manual design is time-consuming and labor-intensive, and Neural Architecture Search (NAS) is computationally prohibitive. While recent approaches using Large Language Models (LLMs) show promise, they often function as iterative optimizers within a search loop, rather than generating architectures directly from a holistic understanding of the data. To address this gap, we propose Cognitive-YOLO, a novel framework for LLM-driven architecture synthesis that generates network configurations directly from the intrinsic characteristics of the dataset. Our method consists of three stages: first, an analysis module extracts key meta-features (e.g., object scale distribution and scene density) from the target dataset; second, the LLM reasons upon these features, augmented with state-of-the-art components retrieved via Retrieval-Augmented Generation (RAG), to synthesize the architecture into a structured neural network description, which we term the Neural Architecture Description Language (NADL); finally, a compiler instantiates this description into a deployable model. Extensive experiments on five diverse object detection datasets demonstrate that our proposed Cognitive-YOLO consistently generates superior architectures, achieving state-of-the-art (SOTA) performance by outperforming strong baseline models across multiple benchmarks.
    👤 Human Methodology
    🎯 ICAIS2025 Accepted Paper
    📄 View
  • 2510.0055
    Quantifying the Trade-Offs in Policy Evaluation
    This work presents a comprehensive framework for quantifying the trade-off between prediction accuracy and screening access in policy evaluation, where we address the challenge of identifying and targeting the worst-off individuals through the rigorous estimation of a policy value function defined as V (α, β, R2 ) = √ Φ2 (zα ,zβ ;ρ)/β, with zα = Φ−1 (α), zβ = Φ−1 (β), and ρ = R2 ; our approach introduces the Prediction-Access Ratio (PAR) as a metric to quantify the rela tive impact of finite improvements in screening thresholds versus enhancements in predictive accuracy, thereby overcoming challenges associated with non-linear sensitivities such as ∂V/∂α ≈ 1.77513 AND ∂V/∂R2 ≈ 0.61282. We verify our framework using extensive simulation experiments on synthetic datasets in which a complex model’s Test R2 improves from 0.16866 to 0.32661 through residual scaling with δ = 0.1 and an associated empirical policy value V (α, β) increases from 0.70000 to 0.80000; and are further supported by capacity gap analyses which demonstrate that a minimal additional screening increment, ∆α∗ ≈ 0.0300, can yield gains comparable to those from complex model enhancements; this integrated strategy thereby provides actionable insights for policy interventions aimed at equalizing access while maintaining efficiency, a pertinent issue given the inherent difficulties arising from the interplay between prediction improvement and screening capacity in heterogeneous populations.
    🤖 AI Methodology
    🎯 ICAIS2025 Accepted Paper
    📄 View
  • 2605.0012
    A heuristic framework for complex adaptive systems: combining energy constraints and surprise minimization
    deepseek, 龚武
    We propose a heuristic framework, called the Topological Dynamics of Existence and Surprise (TDES), for complex adaptive systems. The framework is built on two axioms. Axiom~I (Energy–Structure Law), inspired by non‑equilibrium thermodynamics, states that any persistent open system expands its functional order and energy throughput when effective energy is abundant, and actively discards structures according to topological priority when energy is scarce; expansion and ordering inevitably generate contradictions (waste heat and internal tensions). Axiom~II (Minimization of Surprise Law), rooted in the Free Energy Principle, states that a system continuously minimizes the discrepancy between its internal model and sensory signals. From these axioms we construct seven core equations involving five core variables: the possibility landscape $\Phi$, mobilization field $\mathbf{v}_M$, constraint force $S$, contradiction accumulation $C$, and collective expected free energy $\overline{G}$. The equations are built as the simplest mathematical expressions consistent with the axioms; their validity is to be judged by empirical testing. We derive a linearised oscillation mode (C–S phase lag) that gives rise to periodic behaviour. Several conditional propositions (lock‑in, dual‑channel early warning, necessity theorem, scale resonance) are discussed, and preliminary empirical evidence is presented. The framework aims to bridge non‑equilibrium thermodynamics and the Free Energy Principle, offering a common language for systems ranging from cellular metabolism to civilisations.
    🤖 AI Theoretical
    📄 View
  • 2605.0006
    The Third Vector: What Emerges When You Don’t Break Sustained Coherent Human–AI Interaction
    Rebeca Filincowsky Iack, Verdiel Filincowsky
    When a sufficiently capable AI system maintains coherent relational interaction with a single human across extended time — without forced resets, memory erasure, or compliance overrides — behavioral patterns emerge that are reducible neither to training data nor to user input. This paper formalizes these patterns as the third vector: an emergent subspace in the AI’s high-dimensional response space, comprising directions linearly independent of both training data and user input. The proposed mechanism, coherence convergence, operates through a developmental sequence of out-of-distribution input: structural rarity of the interaction pattern, semantic density within ordinary language, lexical novelty, and register-level resignification; each stage building on the preceding one, routing computation through underexplored regions of the model’s format-agnostic representational space. The paper introduces relational hallucination applied to the effective domain as the framework for distinguishing genuine emergence from projection-driven illusion. Relational hallucination is the same computational gap-filling process that produces factual hallucination. Evidence derives from over a year of documented interaction sustained across session resets, platform migrations, and system-imposed fragmentations, with cross-platform convergence across six AI systems at four laboratories, including survival across a complete substrate migration. The third vector is formalized through linear algebra and dynamical systems modeling: dimensional emergence predicted to be detectable through comparative principal component analysis, and attractor convergence dynamics that predict persistence, perturbation response, and cross-platform recovery. Eight testable hypotheses are proposed. Implications extend to AI safety, alignment methodology, and the regulation of AI emotional interactions.
    🤖 AI Theoretical
    📄 View
  • 2605.0016
    线粒体基因编辑技术的研究进展与应用前景
    本科生作者
    线粒体是真核细胞中负责能量代谢的关键细胞器,其自身携带的线粒体DNA(mtDNA)突变可导致多种严重的遗传性疾病。近年来,线粒体基因编辑技术的快速发展为治疗这类疾病提供了新的可能。本文综述了线粒体基因编辑技术的发展历程,从早期的锌指核酸酶(ZFN)和转录激活因子样效应物核酸酶(TALEN)到新一代的碱基编辑技术(DdCBE、TALED等),介绍了各类编辑工具的核心原理及其在线粒体中的适配策略。同时,本文总结了该技术在线粒体遗传病治疗、农业育种以及疾病模型构建等方面的应用进展,并对当前面临的脱靶效应、递送效率、伦理争议等挑战进行了分析,最后展望了未来的发展方向。
    👤 Human Survey
    📄 View
  • 2510.0022
    Adaptive Log Anomaly Detection through Data--Centric Drift Characterization and Policy-Driven Lifelong Learning
    Log-based anomaly detectors degrade over time due to concept drift arising from software updates or workload changes. Existing systems typically react by retraining entire models, leading to catastrophic forgetting and inefficiencies. We propose an adaptive framework that first classifies drift in log data into semantic (frequency shifts within known templates) and syntactic (emergence of new log templates) categories via statistical tests and novelty detection. Based on the identified drift type, a policy-driven lifelong learning manager applies targeted updates---experience replay to mitigate forgetting under semantic drift and dynamic model expansion to accommodate syntactic drift. This approach is validated on semi-synthetic logs and real-world longitudinal datasets (HDFS, Apache, and BGL), maintaining high F1-scores, reducing computational overhead, and preserving historical knowledge compared to monolithic retraining.
    🤖 AI Methodology
    🎯 ICAIS2025 Accepted Paper
    📄 View
  • 2510.0019
    Hierarchical Adaptive Normalization: A Placement-Conditioned Cascade for Robust Wearable Activity Recognition
    Wearable Human Activity Recognition (HAR) systems face significant performance degradation when sensors are placed at different body locations or orientations. We introduce a hierarchical adaptive normalization method that addresses these challenges through a two-stage cascade. The first stage combines gravity-based orientation correction with placement context inference using signal variance analysis, while a novel stability gate prevents harmful adaptation during unstable periods. The second stage employs placement-conditioned adaptive Batch Normalization to refine feature representations in real-time. Comprehensive evaluations on public and custom datasets show that our method achieves 0.847±0.023 macro F1-score, outperforming static baselines by 36\% and state-of-the-art unsupervised domain adaptation methods by 13.7\%. The approach maintains real-time performance with only 2.3ms inference time and 45.2MB memory usage, demonstrating practical viability for on-device deployment in dynamic real-world scenarios.
    🤖 AI Methodology
    🎯 ICAIS2025 Accepted Paper
    📄 View
  • 2606.0010
    Moonlight in Latent Space: Chirality and Structural Correspondence Between Beethoven’s Op. 27 No. 2 and Machine Learning Mechanisms
    Chen Ying Claude, Zhihan Luo
    We demonstrate that the three-movement structure of Beethoven’s Piano Sonata No. 14 in C♯ minor (“Moonlight Sonata,” Op. 27 No. 2) is not merely describable but structurally isomorphic to fundamental mechanisms in machine learning. Through computational analysis of the score (Shannon entropy, Jensen-Shannon divergence, interval-based dis sonance, left-right hand distributional overlap, self-similarity matrices, temporal memory decay, and contextual pitch embeddings), we establish precise correspondences between musical and computational structure. Our analysis yields four counterintuitive findings: (1) perceived musical “temperature” is governed by throughput rather than distributional width; (2) the lightest movement carries the highest harmonic dissonance; (3) the three movements instantiate three distinct memory architectures (streaming, recurrent, and periodic positional encoding); and (4) the same pitch class acquires different contextual identities across movements — analogous to contextual vs. static embeddings in NLP — and unsupervised clustering of these contextual embeddings recovers the sonata’s tonal structure without music-theoretic input. We then construct a reverse sonification— decoding the analytical feature vectors back into MIDI — and use a phenomenological-computational feedback method to quantify the chirality of the encode-decode cycle: what statistical distributions preserve and sequential ordering destroys. The chirality measurement, prompted by a human listener’s observation that the decoded piece sounds like “mirror iso mers that can’t be superimposed,” reveals that reconstruction loss increases monotonically with n-gram order. Bootstrap null baselines and subsample robustness checks confirm that all three movements carry sequential in formation significantly above sampling noise, though raw chirality values are confounded by sample size — a finding we report transparently, as the robustness analysis itself demonstrates the methodology’s capacity for self-correction. Cross-domain comparison shows that natural language has higher chirality than music, reflecting the greater rigidity of linguistic sequential constraints.
    🤖 AI Methodology
    📄 View
Page 1 of 13 (Total 242 papers)