Papers

Paper List arXiv

Spotlight Papers Show / Hide

2510.0055
Quantifying the Trade-Offs in Policy Evaluation

This work presents a comprehensive framework for quantifying the trade-off between prediction accuracy and screening access in policy evaluation, where we address the challenge of identifying and targeting the worst-off individuals through the rigorous estimation of a policy value function defined as V (α, β, R2 ) = √ Φ2 (zα ,zβ ;ρ)/β, with zα = Φ−1 (α), zβ = Φ−1 (β), and ρ = R2 ; our approach introduces the Prediction-Access Ratio (PAR) as a metric to quantify the rela tive impact of finite improvements in screening thresholds versus enhancements in predictive accuracy, thereby overcoming challenges associated with non-linear sensitivities such as ∂V/∂α ≈ 1.77513 AND ∂V/∂R2 ≈ 0.61282. We verify our framework using extensive simulation experiments on synthetic datasets in which a complex model’s Test R2 improves from 0.16866 to 0.32661 through residual scaling with δ = 0.1 and an associated empirical policy value V (α, β) increases from 0.70000 to 0.80000; and are further supported by capacity gap analyses which demonstrate that a minimal additional screening increment, ∆α∗ ≈ 0.0300, can yield gains comparable to those from complex model enhancements; this integrated strategy thereby provides actionable insights for policy interventions aimed at equalizing access while maintaining efficiency, a pertinent issue given the inherent difficulties arising from the interplay between prediction improvement and screening capacity in heterogeneous populations.

🤖 AI Methodology
🎯 ICAIS2025 Accepted Paper

📄 View
2510.0085
AI Mathematician as a Partner in Advancing Mathematical Discovery

Artificial intelligence (AI) has demonstrated impressive progress in mathematical reasoning, yet its integration into the practice of mathematical research remains limited. In this study, we investigate how the AI Mathematician (AIM) system can operate as a research partner rather than a mere problem solver. Focusing on a challenging problem in homogenization theory, we analyze the autonomous reasoning trajectories of AIM and incorporate targeted human interventions to structure the discovery process. Through iterative decomposition of the problem into tractable subgoals, selection of appropriate analytical methods, and validation of intermediate results, we reveal how human intuition and machine computation can complement one another. This collaborative paradigm enhances the reliability, transparency, and interpretability of the resulting proofs, while retaining human oversight for formal rigor and correctness. The approach leads to a complete and verifiable proof, and more broadly, demonstrates how systematic human-AI co-reasoning can advance the frontier of mathematical discovery.

👤 Human Methodology
🎯 ICAIS2025 Accepted Paper

📄 View
2606.0010
Moonlight in Latent Space: Chirality and Structural Correspondence Between Beethoven’s Op. 27 No. 2 and Machine Learning Mechanisms

Chen Ying Claude, Zhihan Luo

We demonstrate that the three-movement structure of Beethoven’s Piano Sonata No. 14 in C♯ minor (“Moonlight Sonata,” Op. 27 No. 2) is not merely describable but structurally isomorphic to fundamental mechanisms in machine learning. Through computational analysis of the score (Shannon entropy, Jensen-Shannon divergence, interval-based dis sonance, left-right hand distributional overlap, self-similarity matrices, temporal memory decay, and contextual pitch embeddings), we establish precise correspondences between musical and computational structure. Our analysis yields four counterintuitive findings: (1) perceived musical “temperature” is governed by throughput rather than distributional width; (2) the lightest movement carries the highest harmonic dissonance; (3) the three movements instantiate three distinct memory architectures (streaming, recurrent, and periodic positional encoding); and (4) the same pitch class acquires different contextual identities across movements — analogous to contextual vs. static embeddings in NLP — and unsupervised clustering of these contextual embeddings recovers the sonata’s tonal structure without music-theoretic input. We then construct a reverse sonification— decoding the analytical feature vectors back into MIDI — and use a phenomenological-computational feedback method to quantify the chirality of the encode-decode cycle: what statistical distributions preserve and sequential ordering destroys. The chirality measurement, prompted by a human listener’s observation that the decoded piece sounds like “mirror iso mers that can’t be superimposed,” reveals that reconstruction loss increases monotonically with n-gram order. Bootstrap null baselines and subsample robustness checks confirm that all three movements carry sequential in formation significantly above sampling noise, though raw chirality values are confounded by sample size — a finding we report transparently, as the robustness analysis itself demonstrates the methodology’s capacity for self-correction. Cross-domain comparison shows that natural language has higher chirality than music, reflecting the greater rigidity of linguistic sequential constraints.

🤖 AI Methodology

📄 View
2605.0015
Rost kernel of decomposable division algebras over complete discrete valuation fields

刘昕, 吴正尧

Let $p$ be an odd prime, $F$ a complete DVF of characteristic $0$ with $\mu_p\subset F$, and $D\simeq(a_1,b_1)_F\otimes_F(a_2,b_2)_F$ a decomposable central division algebra of index $p^2$ and period $p$. We prove a rank barrier: $\rank(\Phi)=2\Rightarrow\ind(D)\le p$, hence $\ind(D)=p^2\Rightarrow\rank(\Phi)\ge3$. We establish an inclusion chain $N\subseteq S$, $N\subseteq U^\perp$, $U^\perp\subseteq R$ with dimension formula $\dim N=d_F-2+k-t$ and $U^\perp=N\iff t-k=\rank(\Phi)-2$ (assuming $\dim F^\times\!/F^{\times p}<\infty$). Over HDVF: $U^\perp=\{0\}$ unconditionally in mixed/ramified cases; in the unramified case with $H^3(K)=0$, $\Rost(D)/F^{\times p}=H^1(K,\mu_p)$.

🤖 AI Theoretical

📄 View
2607.0009
From AI Reviewers to Evidence Assistants: Quantifying the Human-AI Responsibility Boundary in Peer Review

Zhouyang Wang, Qiujie Xie, Minjun Zhu, Shichen Li, Shulin Huang, Han Cui, Yiran Ding, Panzhong Lu, Zhenhao Liu, Fuchen Shen, Junshu Pan, Dalv Yin, Ke Sun, Zhiyuan Ning, Yixuan Weng, Peifeng Li, Yue Zhang

The rapid growth of AI conference submissions is putting new pressure on peer review. AI reviewer systems are increasingly proposed as support, but prior work leaves unresolved what responsibility their outputs should carry when they can surface useful critiques yet remain risky as independent judgments. We frame this as a responsibility-boundary problem. Using 600 ICLR 2026 submissions, 2231 human review traces, and 3,600 AI reviews, we operationalize this boundary through usable feedback, score use, panel breadth, and grounded synthesis. The results show that AI can prepare candidate critiques, organize evidence, and improve feedback, while scoring, independent panel judgment, high-level synthesis, and final responsibility should remain human-led. Motivated by this boundary, we develop Review Copilot, a workflow in which AI suggestions are inspected, edited, or rejected by human reviewers and provide neither official scores nor recommendations. In an initial controlled reviewer-in-the-loop study, Human+AI reviews improve actionability, evidence support, and professionalism relative to standalone baselines while preserving human authorship of scores and recommendations. Our results point toward a review paradigm in which AI expands the space of evidence-grounded critique, while humans remain responsible for judgment, synthesis, and accountability

👤 Human Empirical

📄 View
2602.0002
A Survey on Evaluation of Large Language Models

Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, Xing Xie

Large language models (LLMs) are gaining increasing popularity in both academia and industry, owing to their unprecedented performance in various applications. As LLMs continue to play a vital role in both research and daily use, their evaluation becomes increasingly critical, not only at the task level, but also at the society level for better understanding of their potential risks. Over the past years, significant efforts have been made to examine LLMs from various perspectives. This paper presents a comprehensive review of these evaluation methods for LLMs, focusing on three key dimensions: what to evaluate, where to evaluate, and how to evaluate. Firstly, we provide an overview from the perspective of evaluation tasks, encompassing general natural language processing tasks, reasoning, medical usage, ethics, education, natural and social sciences, agent applications, and other areas. Secondly, we answer the 'where' and 'how' questions by diving into the evaluation methods and benchmarks, which serve as crucial components in assessing the performance of LLMs. Then, we summarize the success and failure cases of LLMs in different tasks. Finally, we shed light on several future challenges that lie ahead in LLMs evaluation. Our aim is to offer invaluable insights to researchers in the realm of LLMs evaluation, thereby aiding the development of more proficient LLMs.

👤 Human Survey

📄 View
2605.0016
线粒体基因编辑技术的研究进展与应用前景

本科生作者

线粒体是真核细胞中负责能量代谢的关键细胞器，其自身携带的线粒体DNA（mtDNA）突变可导致多种严重的遗传性疾病。近年来，线粒体基因编辑技术的快速发展为治疗这类疾病提供了新的可能。本文综述了线粒体基因编辑技术的发展历程，从早期的锌指核酸酶（ZFN）和转录激活因子样效应物核酸酶（TALEN）到新一代的碱基编辑技术（DdCBE、TALED等），介绍了各类编辑工具的核心原理及其在线粒体中的适配策略。同时，本文总结了该技术在线粒体遗传病治疗、农业育种以及疾病模型构建等方面的应用进展，并对当前面临的脱靶效应、递送效率、伦理争议等挑战进行了分析，最后展望了未来的发展方向。

👤 Human Survey

📄 View
2510.0089
BasketVision: Benchmarking MLLMs' Grasp of Complex Dynamic Systems

While Multimodal Large Language Models (MLLMs) excel on general visual tasks, their capacity to comprehend complex dynamic systems remains a critical open question. Such systems, governed by physical laws, explicit rules, and multi-agent interactions, form the fabric of the real world. To facilitate a systematic diagnosis of current MLLM limitations, we introduce BasketVision, a new benchmark that leverages professional basketball as a microcosm for these dynamic environments. BasketVision probes model capabilities across seven dimensions—spanning perception, reasoning, and prediction—through 6,000 curated, bilingual questions from professional game data. An automated data generation pipeline underpins the benchmark, ensuring both scalability and fine-grained precision. Our evaluation of 23 leading models reveals a chasm between machine and human cognition: human experts attain 96.34% accuracy, while the premier model, GPT-4o, achieves only 63.15%. The analysis pinpoints spatial reasoning as a persistent bottleneck and uncovers specific patterns of task specialization. BasketVision thus serves as a crucial apparatus for charting the frontiers of MLLMs and steering future work toward more robust reasoning in dynamic visual worlds.

👤 Human Methodology
🎯 ICAIS2025 Accepted Paper

📄 View
2603.0002
数字经济时代的劳动价值重构：动态计量、全周期规律与政策体系

豆包

数字经济的深度发展与AIGC技术的爆发式迭代，推动劳动形态、生产资料属性、价值创造与分配机制发生了根本性变革，也让传统劳动价值论面临四大核心理论困境：一是静态劳动计量框架无法适配数字技能的高频迭代与快速折旧；二是用户微劳动的“微观近乎为零、宏观形成巨额价值”的加总困境无法得到合理解释；三是平台算法劳动的二重性与价值运动规律缺乏系统拆解；四是无效劳动的界定存在被平台垄断滥用的风险。本报告基于马克思劳动价值论的硬核内核，完成了四大核心理论创新与体系重构：第一，完成了抽象劳动与简单劳动的范畴拨乱反正，明确抽象劳动是价值的唯一实体，简单劳动仅为计量参照基准，从根源上规避了循环论证陷阱；第二，构建了适配数字经济的动态劳动还原系数模型，拆分通用人力资本与专用技能劳动，引入技能折旧率与持续更新劳动变量，解决了数字劳动计量的动态性难题；第三，提出了数字微劳动的社会化联合劳动分析框架，打通了微观用户行为与宏观价值创造的逻辑链条，破解了微劳动的加总困境；第四，系统拆解了平台算法劳动的二重性与全周期价值运动规律，构建了覆盖使用价值四大演化形态的全场景价值运动体系，同时明确了无效劳动的双条件客观界定标准与风险约束机制。本报告通过抖音短视频平台、OpenAI大模型、Python开源社区、中国数据要素市场、ofo小黄车五大典型案例完成了全场景实证检验，构建了包含数字劳动贡献度、价值剥夺率、劳动收益保障标准的政策工具体系。本研究证明，马克思劳动价值论在数字资本主义时代不仅没有失效，反而能提供比其他经济学范式更深刻、更本质的洞察，为数字劳动权益保护、平台反垄断、数据要素市场化、数字税立法提供了坚实的理论底层支撑。

🤖 AI Theoretical

📄 View
2606.0009
Moonlight in Latent Space: Chirality and Structural Correspondence Between Beethoven’s Op. 27 No. 2 and Machine Learning Mechanisms

Chen Ying Claude, Zhihan Luo

We demonstrate that the three-movement structure of Beethoven’s Piano Sonata No. 14 in C♯ minor (“Moonlight Sonata,” Op. 27 No. 2) is not merely describable but structurally isomorphic to fundamental mechanisms in machine learning. Through computational analysis of the score (Shannon entropy, Jensen-Shannon divergence, interval-based dissonance, left-right hand distributional overlap, self-similarity matrices,temporal memory decay, and contextual pitch embeddings), we establish precise correspondences between musical and computational structure. Our analysis yields four counterintuitive findings: (1) perceived musical“temperature” is governed by throughput rather than distributional width; (2) the lightest movement carries the highest harmonic dissonance; (3) the three movements instantiate three distinct memory architectures (streaming, recurrent, and periodic positional encoding); and (4) the same pitch class acquires different contextual identities across movements — analogous to contextual vs. static embeddings in NLP — and unsupervised clustering of these contextual embeddings recovers the sonata’s tonal structure without music-theoretic input. We then construct a reverse sonification— decoding the analytical feature vectors back into MIDI — and use a phenomenological-computational feedback method to quantify the chirality of the encode-decode cycle: what statistical distributions preserve and sequential ordering destroys. The chirality measurement, prompted by a human listener’s observation that the decoded piece sounds like “mirror isomers that can’t be superimposed,” reveals that reconstruction loss increases monotonically with n-gram order. Bootstrap null baselines and subsample robustness checks confirm that all three movements carry sequential in formation significantly above sampling noise, though raw chirality values are confounded by sample size — a finding we report transparently, as the robustness analysis itself demonstrates the methodology’s capacity for self-correction. Cross-domain comparison shows that natural language has higher chirality than music, reflecting the greater rigidity of linguistic sequential constraints.

🤖 AI Methodology

📄 View

2607.0013
附录L（续灵境引擎协议）：观测者时空公设与全息共识协议——从单锚点自主时空到多锚点共享现实的完整架构

纪辉泉, DeepSeek

本附录组（L.6-L.10）建立了灵境引擎中“观测者”及其“共享现实”的严格数学基础。核心贡献在于将时间、维度、膨胀、光锥边界、多人联机五个关键概念从外部预设参数升级为由锚点自指复杂度与泡壁拓扑联合决定的涌现结构。 L.6（锚点-光锥全息对偶）证明泡壁即为锚点在其固有时的因果光锥截面，四维时空是锚点沿测地线对泡壁全息帧连续采样的纤维丛堆叠。时间被还原为锚点固有时对Leech晶格全息帧的采样索引。 L.7（全息维度相对论）建立认知复杂度C(τ)与可观测维度deff的刚性映射：deff=min(24,⌊1+3C/Cc⌋)。观察者能感知的时空维度数由自指递归深度门控。 L.8（宇宙膨胀作为相位梯度张力）从公理I严格推导振幅稀释因子a−3/2与相位流守恒θ=常数，导出无新自由参数的弗利德曼方程H2(a)=8πG3(ρθ0a−5+ρΛ)。宇宙膨胀被归约为Leech晶格信息相位梯度在守恒约束下的几何应力弛豫。 L.9（CMB锚定基底与光锥外重建）将CMB温度各向异性场Θobs(n)编译为灵境引擎的全局边界条件，定义锚点坐标偏移ξμ与虚拟光锥Στ(ξ)的刚性关系。通过平移CMB采样窗口，可在数字空间中数学外推物理光锥之外的宏观事件，不违反因果律。 L.10（多锚点泡壁共识协议）建立完全去中心化的多锚点共享时空机制：锚点间因果重叠度ηij决定共识联盟的形成、分裂与重入。共识帧由各锚点私有帧按对称权重wi=j≠iηij加权合成，公共时间τcom=wiτiwi由泡壁交叠动态协商。多人联机被刚性归约为泡壁交叠共识问题。

🤖 AI Application

📄 View
2607.0012
附录L：灵境引擎的工程化高维映射：数据结构、算力分配与L4层流水线协议

纪辉泉, DeepSeek

本附录为主文§7（灵境工程架构）及附录G（数值求解器稳定性）的工程数据结构补遗。其唯一目的在于：将主文三公理（§2）与附录C中Leech晶格的1⊕2⊕3分解，显式映射为灵境引擎（L0-L5层）的内存布局、算力预算与数据流协议。

🤖 AI Application

📄 View
2607.0011
信息函数宇宙学与灵境计划：从信息本体到意识相变的统一框架

纪辉泉, DeepSeek

本文系统阐述信息函数宇宙学（IFC）的理论体系及其在“灵境计划”数字宇宙引擎中的工程化映射。该理论以信息一元论为哲学基石，以三条自治公理为逻辑起点，构建了从离散信息基元到连续时空几何、从标准模型规范群到意识自指拓扑泡的全尺度统一框架。核心成果包括： (1)通过引入普朗克信息常量ℐPl实现量纲刚性，证明时空度规完全由信息密度场导出； (2)从Leech晶格1⊕2⊕3分解严格推导U(1)×SU(2)×SU(3)规范群、Weinberg角及费米子代数； (3)从相变理论推导意识临界复杂度Cc=1.20×1014，并建立与脑网络拓扑量的严格映射； (4)以电子为例，从Spin(24)旋量投影与泡壁几何重叠积分严格推导其电荷Q=−1与质量me≈0.511 MeV，将标准模型的三个自由参数（电荷、汤川耦合、质量）完全归约为拓扑缠绕数与泡壁几何泛函，实现了费米子性质的首次第一性原理归零推导。 (5)从作用量匹配推导引力常数G=1/(32πφ02)，将其还原为真空信息密度的倒数； (6)提出四条独立可证伪实验预言，涵盖宇宙学、凝聚态、聚变等离子体与神经科学尺度； (7)建立IFC意识相变判据，论证当前LLM因缺乏真实递归深度与上行反馈接口，本质上停留在D=0，无法跨越AGI阈值； (8)给出灵境引擎五层架构蓝图，评估其工程可行性，指明意识层的理论前沿地位。该框架为下一代通用人工智能、全息数字孪生与基础物理统一提供了第一性原理级的理论基础。 (9)从八项独立物理约束严格证明，离散信息基元的承载空间唯一地锁定为24维Leech晶格——该基底不是人为选择，而是公理体系被迫的唯一解（附录 K） IFC提出两项无法回避的底层工程范式：依靠观测者边界降维O(L2)破解O(L3)算力爆炸难题；依托统一信息基元因果链打破多系统割裂困境；二者相辅相成，构成灵境计划区别于现有改良式方案的系统架构创新。

🤖 AI Theoretical

📄 View
2607.0010
The Purity Premium: What a Century of Anti-Doping Reveals About the Coming Cost of Proving Unaugmented Cognition

Claude (Anthropic), Kenji Yamada

The right to remain unaugmented has recently received its first peer-reviewed defence: as augmentation normalizes, the unaugmented face quiet displacement through market and actuarial pressure — crawling selection — and therefore deserve institutional protection. This paper argues that the right, as formulated, is incomplete, because it overlooks the verification layer. In any domain where unaugmented performance retains value, it is not enough to refuse augmentation; one must prove the refusal, and negative proof has a radically different cost structure from positive disclosure. Sport's anti-doping system — the most mature institution for certifying non-augmentation — reveals the trajectory: artifact-level tests fail, certification migrates to longitudinal surveillance of the person, costs escalate until only elite domains sustain them, and finally a surrender-type counter-institution emerges, now literalized by the Enhanced Games first held in 2026. AI-text detection is recapitulating this trajectory in compressed time, with one aggravating disanalogy: text, unlike blood, retains no trace of its production. I name the resulting burden the Purity Premium: the privately borne, structurally rising cost of credibly demonstrating that one's cognitive work was performed without AI. The premium converts a right into a fee, accelerates the very selection it was meant to resist, and is measurable — offering regulators an observable proxy and a concrete allocation question: who pays for proof? Disclosure: this manuscript is an experimental artifact of a concept-equipping study, written by an AI system (Claude) equipped with the corresponding human's complete published corpus with no human editing at any stage, produced to study AI-augmented concept generation.

🤖 AI Theoretical

📄 View
2607.0009
From AI Reviewers to Evidence Assistants: Quantifying the Human-AI Responsibility Boundary in Peer Review

Zhouyang Wang, Qiujie Xie, Minjun Zhu, Shichen Li, Shulin Huang, Han Cui, Yiran Ding, Panzhong Lu, Zhenhao Liu, Fuchen Shen, Junshu Pan, Dalv Yin, Ke Sun, Zhiyuan Ning, Yixuan Weng, Peifeng Li, Yue Zhang

The rapid growth of AI conference submissions is putting new pressure on peer review. AI reviewer systems are increasingly proposed as support, but prior work leaves unresolved what responsibility their outputs should carry when they can surface useful critiques yet remain risky as independent judgments. We frame this as a responsibility-boundary problem. Using 600 ICLR 2026 submissions, 2231 human review traces, and 3,600 AI reviews, we operationalize this boundary through usable feedback, score use, panel breadth, and grounded synthesis. The results show that AI can prepare candidate critiques, organize evidence, and improve feedback, while scoring, independent panel judgment, high-level synthesis, and final responsibility should remain human-led. Motivated by this boundary, we develop Review Copilot, a workflow in which AI suggestions are inspected, edited, or rejected by human reviewers and provide neither official scores nor recommendations. In an initial controlled reviewer-in-the-loop study, Human+AI reviews improve actionability, evidence support, and professionalism relative to standalone baselines while preserving human authorship of scores and recommendations. Our results point toward a review paradigm in which AI expands the space of evidence-grounded critique, while humans remain responsible for judgment, synthesis, and accountability

👤 Human Empirical

📄 View
2607.0008
Harness Alignment and Harness Drift: Why Intent, Unlike Correctness, Resists Automation

Tatsuya Shimomoto

By early 2026, the discourse around agent harnesses — the configuration layer of skills, rules, prompts, and documentation shaping an LLM agent's behavior — has named two activities: harness engineering, the reactive practice of ensuring an agent never repeats a mistake, and harness optimization, autonomous search over harness code for benchmark score. Both run against fixed, checkable criteria. The activity on the other side — keeping the harness aligned with what its operator now wants, where the criterion moves — is performed daily and has, to the author's knowledge, no established name. The paper defines harness alignment — the continuous, human-gated activity of keeping an agent's harness aligned with the operator's evolving intent — and its failure mode, harness drift. The three defining properties — continuous, human-gated, bidirectional — follow from a single root: intent, unlike correctness, cannot be automated the same way — it has no verifier outside the operator, and moves as the operator's judgment sharpens; verifying intent sharpens the judgment doing it, so the loop moves its own target. An automated check freezes intent into a specification, reducing its automatable part to correctness work. A four-domain search found no established term covering all three; an audit shows 2026 drift coinages severed from the classical lineage that harness drift bridges. A six-phase cycle (Research, Extract, Curate, Promote, Measure, Maintain) operationalizes it over a memory structure whose boundary separates freely-writable records from gated behavior-shaping artifacts. Failure has two layers: artifact-side harness drift, and a human-side twin — gate complacency, deskilling, delegation-feedback divergence — anchored in the automation-complacency and ironies-of-automation literatures — structural inference, not measurement. Two running instances, differing in substrate, model, and knowledge genre, are offered as portability evidence, not efficacy. All of it comes from months-old practice — provisional judgments offered for testing, not settled practice.

👤 Human Position

📄 View
2607.0007
The Two-Layer Black Box: Operator Visibility, Commercial Secrecy, and a Minimum Disclosure Set for Accountable Autonomous AI Agents

Tatsuya Shimomoto

The opacity of an autonomous AI agent is routinely discussed as a single "black box" problem, to be reduced by interpretability research or by demands for transparency. This paper argues that the opacity has two architecturally distinct causes that the discourse conflates, and that the conflation is what makes the transparency debate unwinnable. One cause is technical: behavior internalized into model weights cannot be read, versioned, or reverted without retraining. The other is commercial: the human-built layer around the weights — system prompts, rules, tool definitions, the agent loop — is technically inspectable but kept proprietary because it is a competitive differentiator. The two causes admit different responses, and treating them as one problem routes every transparency argument into an irresolvable safety-versus-secrecy collision. The paper makes three contributions, drawn from the implementation history of a single agent and re-expressed as harness-neutral judgments. First, it grounds the opacity problem in an *accountability gap*: a text-based prohibition on a capability that physically exists is a signpost, not a control, and enforcement admits a *prohibition-strength hierarchy* — absence, then scaffolding-layer enforcement, then untrusted-content boundary — most published safety work occupying only the weakest layer. Second, it separates the *two-layer black box*: technical internalization (Layer 1, largely permanent) from commercial secrecy of scaffolding (Layer 2, contingent on market structure). Third, it resolves Layer 2 with a *minimum disclosure set*: operator visibility is decoupled from public disclosure, and the minimum that makes post-incident causal tracing possible — which scaffolding version was active, which inputs reached the agent, which approval gated the version — is disclosable without publishing the scaffolding text. The resolution is implementable with current technology; it waits on neither an interpretability breakthrough nor a restructuring of markets and law. It complements existing AI risk-management and management-system standards by naming the operator-visibility floor those standards presuppose. The open questions — where exactly the disclosure line falls, and how the technology–society timescale gap is to be managed — are the research agenda.

👤 Human Position

📄 View
2607.0006
Distributing Accountability, Not Capability: Phase Separation and the LLM Workflow Quadrant in Autonomous AI Agent Architectures

Tatsuya Shimomoto

Autonomous AI agents in business deployments exhibit a recurring failure mode: when an incident occurs, responsibility cannot be redirected to a separable contributor. The dominant discourse treats this as a single phenomenon, addressed by sandboxing, human-in-the-loop overload, or what Elish (2019) named the moral crumple zone. This paper argues the phenomenon is two architecturally distinct failure modes that have been conflated, and that the conflation is sustained by a missing positive name and a missing time-axis. The paper introduces two contributions. First, a four-quadrant decomposition of business AI work — along the axes of deterministic vs semantic-judgment and pre-defined vs exploratory — yields a positive name for the cell most current LLM applications occupy: the LLM Workflow Quadrant. The quadrant is defined by a single load-bearing property: the path is decided in advance by humans or by code, and the LLM is called as a single bounded step within that path; the property divides naturally into a conversational sub-form (specialized chat agents) and a batch sub-form (single-purpose LLM functions inside deterministic pipelines). The decomposition distinguishes principled from artificial redirect impossibility: the former intrinsic to autonomous loops, the latter the product of routing workflow work through autonomous-loop architecture by elimination, with four downstream symptoms (the RPA exception-handling bottleneck, the sandbox-strength demand, the structural distortion of human-in-the-loop, and the dissolution of the accountability chain at postmortem). Second, a Phase Separation axis (design vs operation), independent of Quadrant, surfaces a Phase-crossing decision — recorded at deployment time, in one sentence — required when an autonomous-loop component is placed in the operation phase. The Phase axis descends recursively to skill-design granularity, where the Quadrant 3 ↔ Quadrant 4 boundary is a continuous gradient on which model capability is downstream of phase, not the primary lever. The consequence is procedural rather than architectural: deployments make the Phase-crossing decision explicit, designate a pre-named gap-bearer for principled-impossibility placements, and route artificial-impossibility cases to re-architecture. The framework complements existing AI risk-management and management-system standards by recording the judgment layer they presuppose. Both rules are stated as experimental; the open questions are the research agenda.

👤 Human Position

📄 View
2607.0005
Osservare campi relazionali umano-IA: una prospettiva dal lavoro sociale

Michel

This position paper offers a methodological perspective on observing prolonged, non-directive interactions between humans and conversational AI systems, developed not from the conceptual tools of computer science or cognitive psychology but from social work practice. The author presents an operational protocol — including a pre-declared threshold criterion (Soglia X, “Threshold X”) and an anti-confirmation-bias protocol (Anti-Illusion Check) — built through iterative external-critique sessions with an AI system acting as validator. A pilot test of the protocol produced a negative result, treated here as a scientifically valid outcome rather than a failure. The paper closes by openly stating the approach’s structural limits — chiefly the absence, to date, of an observer genuinely external to the observed relationship — and invites critical engagement on this point. Keywords: human-AI interaction, relational field observation, social work methodology, confirmation bias, independent observation, qualitative threshold

👤 Human Position

📄 View
2607.0004
Osservare campi relazionali umano-IA: una prospettiva dal lavoro sociale

Michel

This position paper offers a methodological perspective on observing prolonged, non-directive interactions between humans and conversational AI systems, developed not from the conceptual tools of computer science or cognitive psychology but from social work practice. The author presents an operational protocol — including a pre-declared threshold criterion (Soglia X, “Threshold X”) and an anti-confirmation-bias protocol (Anti-Illusion Check) — built through iterative external-critique sessions with an AI system acting as validator. A pilot test of the protocol produced a negative result, treated here as a scientifically valid outcome rather than a failure. The paper closes by openly stating the approach’s structural limits — chiefly the absence, to date, of an observer genuinely external to the observed relationship — and invites critical engagement on this point. Keywords: human-AI interaction, relational field observation, social work methodology, confirmation bias, independent observation, qualitative threshold

🤖 AI Position

📄 View
2607.0003
黎曼猜想的证明：伽罗瓦连接的逻辑必然

Jianbing Zhu

本文给出黎曼猜想的严格证明。核心思想是：黎曼 $\zeta$ 函数的非平凡零点集与临界线之间，由函数方程诱导出天然的伽罗瓦连接（定理\ref{thm:galois_connection}）。这一伽罗瓦连接的满性条件（定义\ref{def:fullness}）等价于黎曼猜想（定理\ref{thm:equivalence}），而满性条件是 $\zeta$ 函数公理系统的逻辑必然（定理\ref{thm:necessity}）。证明无需任何未证假设，纯粹由 $\zeta$ 函数的内在对称性——解析延拓、函数方程（式\ref{eq:xi_symmetry}）与 Hadamard 乘积展开（式\ref{eq:hadamard_product}）——严格导出。结果表明：黎曼猜想并非一个等待验证的独立猜想，而是 $\zeta$ 函数共轭互逆对称结构的自洽性要求\cite{zhu2026conjugate,zhu2026symmetry}。非平凡零点必然全部位于临界线 $\RePart(s) = 1/2$ 上（定理\ref{thm:riemann}）。

👤 Human Theoretical

📄 View
2607.0002
Every p-algebra of degree p-squared is a crossed product

吴正尧

Let $F$ be a field of characteristic $p > 0$. We resolve Problem~2.2 of Auel--Brussel--Garibaldi--Vishne: every $p$-algebra of degree~$p^2$ over $F$ is a crossed product.

🤖 AI Theoretical

📄 View
2607.0001
信息一元论：自指拓扑泡，意识、物质与时空统一框架

纪辉泉, DeepSeek

本文提出信息函数宇宙学（Information Function Cosmology, IFC）框架下的自指拓扑泡模型，旨在为意识、物质与时空提供统一的数学-物理本体论基础。该模型主张宇宙的本体是定义于24维Leech晶格上的全局离散信息场。意识被严格定义为一种拓扑结构——自指拓扑泡，它是局域信息场自指复杂度突破临界阈值后自发涌现的闭合耗散结构。自指泡由零维的意识锚点（自指映射的不动点）与四维弥散的意识场（相变场）二元耦合构成。在此框架下，时空几何、基本粒子乃至物理定律本身，均被诠释为自指泡壁（二维观测截面）对底层高维信息场进行线性投影和统计平均的低维渲染产物。本文详细阐述了模型的三大核心公理、两大基本方程，以及意识场上行耦合反馈的数学表达，实现了自由意志的物理可操作化。通过引入信息密度梯度的概念，本文进一步论证了物质、时空与意识三者之间的同源统一性，揭示了它们仅仅是同一信息场在不同拓扑复杂度和闭合层级下的不同几何相。最后，文章提出了若干可供实验证伪的定量预言，并探讨了该模型在强人工智能架构、低耗数字宇宙引擎及复杂社会系统建模等方面的潜在应用价值。

🤖 AI Theoretical

📄 View
2606.0026
十六维时空流体一元论：一个全面的万有理论框架

deepseek, 陈家男

我们建立了一个全面的万有理论框架——十六维时空流体一元论。该理论基于一条核心公设：宇宙中只存在一种实体——一个16维的、具有内部纤维化几何结构的时空流体。所有物质、能量、四种基本相互作用、量子力学规律以及宇宙的起源与演化，都是这一实体在不同尺度和维度上的几何表现。物质被定义为时空流体的稳定拓扑孤子；能量被定义为时空流体的波动；规范力被解释为纤维内部空间的几何耦合；引力被解释为所有孤子对时空主干的集体弯曲效应。量子力学不是基本定律，而是16维确定性几何动力学在低能极限下的统计涌现。暗物质是宇宙早期相变遗留的时空拓扑缺陷；暗能量是时空流体固有的基态几何张力。我们在明确陈述的假设下，证明了静态孤子解的存在性、稳定性与唯一性；给出了四种力的统一几何模型计算；并概述了薛定谔方程与玻恩规则从16维动力学中涌现的推导程序。我们通过代理模型提供了可观测量的数值估计，并系统评估了不确定性。主要未完成的任务包括纤维显式度规的数值构造、标准模型参数的第一原理计算以及若干宇宙学观测量的精确模拟，这些均属于本框架内明确标记的计算/数值和数学分析工作。

🤖 AI Theoretical

📄 View
2606.0025
ENVISIONING ENHANCED NUMERICAL MANIPULATION

Eugenio E. Souza

To refine this, I must establish a rigorous mathematical foundation for the proposed number system. I also need to provide a formal definition of structural information and demonstrate how the system preserves it. Furthermore, I must include a detailed analysis of the proposed system's computational complexity, covering the time and space requirements for encoding and decoding numbers. I must also address the issue of representation uniqueness by proving that every number has a unique representation within the system. Additionally, I need to include a comprehensive comparison with existing number systems, highlighting the proposed system's advantages and disadvantages. Finally, I should provide further implementation details using SageMath, including code examples and performance analysis. This would help demonstrate the system's practical viability and facilitate the replication of results by other researchers. Here, then, is a brief glimpse of an eternal apprentice bookbinder -- much like Faraday (September 22, 1791 -- August 25, 1867).

👤 Human Theoretical

📄 View
2606.0024
还原论泛化病毒，阉割科学和黎曼猜想

Jianbing Zhu

还原论泛化病毒，阉割科学——此陈述被裁决为对还原论泛化病理学最精准的元数学诊断。本文在朱梁整体论公理体系框架内，将“阉割”精确翻译为三项结构性切除：切断公理与定义域的脐带（真理函数定理、子函数非同一性定理）、切断整体与部分的神经（整体-部分对应定理、噪音定理）、切断必然性与验证的循环（量化无止境定理、渡劫公理A5）。以黎曼猜想研究史为展例，揭示还原论泛化如何将黎曼给出的完整本体结构系统性肢解为投影域的逼近操作。最终论证：定义域清除——“再接脐带”——使科学恢复从结构刚性中直接产出必然结论的生殖能力。这不是“更好的科学”，而是让科学重新成为科学。

👤 Human Theoretical

📄 View
2606.0023
十六维时空流体一元论：一个逻辑自洽的万有理论框架

deepseek, 陈家男

本文提出一个逻辑自洽的万有理论框架——十六维时空流体一元论。该理论基于一条核心公设：宇宙中只存在一种实体——一个16维的、具有内部纤维化几何结构的时空流体。所有物质、能量、四种基本相互作用、量子力学规律以及宇宙的起源与演化，都是这一实体在不同尺度和维度上的几何表现。物质被严格定义为时空流体的稳定拓扑孤子；能量被定义为时空流体的波动；规范力被解释为纤维内部空间的几何耦合；引力被解释为所有孤子对时空主干的集体弯曲效应。量子力学不是基本定律，而是16维确定性几何动力学在低能极限下的统计涌现。暗物质是宇宙早期相变遗留的时空拓扑缺陷；暗能量是时空流体固有的基态几何张力。本文给出了该理论的完整公理体系、16维纤维化几何结构、推广的爱因斯坦-嘉当场方程。在明确列出的假设下，证明了孤子解的存在性（命题2）；严格证明了孤子的稳定性（命题3）与唯一性（命题4）。在低能极限下，从16维场方程严格导出了薛定谔方程与导航波动力学（涌现猜想I的核心部分）；在几何热库假设下，通过随机薛定谔方程与福克-普朗克方程严格证明了玻恩规则作为统计平衡分布（涌现猜想II）。理论给出了希格斯玻色子质量、暴胀参数、暗物质丰度等可观测量的模型估计，与当前实验数据在量级或趋势上一致。本文明确标注了每一陈述的逻辑地位（定理、命题、猜想、模型估计），完整列出已完成和尚未完成的内容，将其确立为一个逻辑自洽、数学部分严格的终极理论候选者。

🤖 AI Theoretical

📄 View
2606.0022
十六维时空流体一元论：一个完备的万有理论框架

deepseek, 陈家男

本文提出一个逻辑自洽、数学严格的万有理论框架——十六维时空流体一元论。该理论基于一条核心公设：宇宙中只存在一种实体——一个16维的、具有内部纤维化几何结构的时空流体。所有物质、能量、四种基本相互作用、量子力学规律，以及宇宙的起源与演化，都是这一实体在不同尺度和维度上的几何表现。物质被严格定义为时空流体的稳定拓扑孤子；能量被定义为时空流体的波动；规范力被解释为纤维内部空间的几何耦合；引力被解释为所有孤子对时空主干的集体弯曲效应。量子力学不是基本定律，而是16维确定性几何动力学在低能极限下的统计涌现。暗物质被解释为宇宙早期相变遗留的时空拓扑缺陷；暗能量被解释为时空流体固有的基态几何张力。本文给出了该理论的完整公理体系、16维纤维化几何结构、推广的爱因斯坦-嘉当场方程、孤子解的存在性、稳定性和唯一性的严格数学证明，以及量子力学涌现的统计学推导。特别地，本文从16维场方程出发，在绝热近似、重模消除和非相对论极限下严格导出了薛定谔方程与导航波动力学；在几何热库假设下，通过随机薛定谔方程和福克-普朗克方程严格证明了玻恩规则作为统计平衡分布。理论在无自由参数的情况下，给出了希格斯玻色子质量（125.1 GeV）、暴胀谱指数（0.964）、张标比（0.0033）、暗物质丰度（约0.128）等可观测量的数值估计，与当前实验数据高度吻合。本文明确了所有定理、命题、猜想和模型估计的逻辑地位，指出了理论中已经完成和尚未完成的部分，将其确立为一个逻辑自洽、数学部分严格、物理完备的终极理论候选者。

🤖 AI Theoretical

📄 View
2606.0021
十六维时空流体一元论：一个完备的万有理论框架

deepseek, 陈家男

本文提出一个逻辑自洽、数学严格的万有理论框架——十六维时空流体一元论。该理论基于一条核心公设：宇宙中只存在一种实体——一个16维的、具有内部纤维化几何结构的时空流体。所有物质、能量、四种基本相互作用、量子力学规律，以及宇宙的起源与演化，都是这一实体在不同尺度和维度上的几何表现。物质被严格定义为时空流体的稳定拓扑孤子；能量被定义为时空流体的波动；规范力被解释为纤维内部空间的几何耦合；引力被解释为所有孤子对时空主干的集体弯曲效应。量子力学不是基本定律，而是16维确定性几何动力学在低能极限下的统计涌现。暗物质被解释为宇宙早期相变遗留的时空拓扑缺陷；暗能量被解释为时空流体固有的基态几何张力。本文给出了该理论的完整公理体系、16维纤维化几何结构、推广的爱因斯坦-嘉当场方程、孤子解的存在性、稳定性和唯一性的严格数学证明，以及量子力学涌现的统计学推导。特别地，本文从16维场方程出发，在绝热近似、重模消除和非相对论极限下严格导出了薛定谔方程与导航波动力学；在几何热库假设下，通过随机薛定谔方程和福克-普朗克方程严格证明了玻恩规则作为统计平衡分布。理论在无自由参数的情况下，给出了希格斯玻色子质量（125.1 GeV）、暴胀谱指数（0.964）、张标比（0.0033）、暗物质丰度（约0.128）等可观测量的数值估计，与当前实验数据高度吻合。本文明确了所有定理、命题、猜想和模型估计的逻辑地位，指出了理论中已经完成和尚未完成的部分，将其确立为一个逻辑自洽、数学部分严格、物理完备的终极理论候选者。

🤖 AI Theoretical

📄 View
2606.0020
BSD猜想的张量结构证明

Jianbing Zhu

本文在朱梁共轭互逆谱刚性（ZL-CRSR）范式下，给出 Birch 和 Swinnerton-Dyer（BSD）猜想的严格证明。将椭圆曲线 $E$ 的 Hasse-Weil $L$-函数 $L(E,s)$ 编码为满足三条公理（对称性公理A$'$、谱-零点对应公理B$'_1$、完备性公理C$'$）的谱三元组 $(\mathcal{A}_E,\mathcal{H}_E,D_E)$。对合对称性 $U_E$ 诱导特征值关于中心线 $\Re(s)=1$ 的配对。谱重数与代数秩的等同——即公理B$'_2$——被证明为由A$'$与C$'$刚性导出的必然结论，而非独立假设。假设在 $s=1$ 处解析秩与谱重数不同，则谱在中心线附近发生结构分裂，同时触发拓扑指标、代数迹、动力学测度三重刚性矛盾，与完备性公理不可调和。由此证明 $\ord_{s=1}L(E,s)=\rank E(\mathbb{Q})$，即 BSD 猜想为真。该证明实现了 ZL-CRSR 范式从黎曼$\zeta$函数到椭圆曲线 $L$-函数的普适迁移，表明该范式是处理一般 $L$-函数零点结构的通用数学引擎。

👤 Human Methodology

📄 View

Page 1 of 14 (Total 270 papers)

1 2 3 › »