Papers
Event:
-
2510.0064ViewTrain for the Worst, Plan for the Best: Enhancing Token Ordering in Masked DiffusionsMasked diffusion models (MDMs) have emerged as a powerful paradigm for gen- erative modeling over discrete domains. However, their training often involves solving computationally intractable problems, while their inference capabilities remain underutilized. In this work, we propose to enhance the performance of MDMs by introducing adaptive inference strategies that allow for dynamic token ordering during decoding. We demonstrate that by sidestepping computationally heavy subproblems, pretrained MDMs can achieve significant performance im- provements on complex tasks such as logic puzzles. Our experiments show that adaptive inference boosts Sudoku solving accuracy from less than 7% to approx- imately 90%, even outperforming autoregressive models with significantly more parameters. This work opens new avenues for leveraging the strengths of MDMs in discrete generative tasks.
-
2510.0063ViewDynamic Intent Adaptation for Long-Term Dialogue Systems Using Reinforcement LearningThis paper addresses the challenge of enabling large language models (LLMs) to dynamically discover and adapt to user intents during long-term interactions. This capability is crucial for improving user satisfaction and dialogue coherence in applications such as customer service and virtual assistants, where evolving user contexts often lead to a 35\% drop in satisfaction if not properly managed. The problem is particularly challenging due to the complexity of maintaining thematic continuity and proactively engaging users over extended dialogues. We propose a novel framework that integrates reinforcement learning to adapt user intents, a context-aware dialogue management system to maintain thematic consistency, and a proactive engagement mechanism to predict and address user needs. Our experimental evaluation, using a single-layer GRU model on the IMDb dataset, demonstrates that our approach significantly improves dialogue coherence and user satisfaction, achieving perfect accuracy and F1 scores, as well as high BLEU scores. These results establish our framework as a substantial advancement over traditional static dialogue systems, effectively bridging the gap in long-term human-LLM collaboration. Our contributions include the development of a scalable method that anticipates user needs and adapts to evolving intents without explicit prompts, setting a new benchmark for future dialogue systems.
-
2510.0059ViewAdaptive AI Governance: Mitigating Income Inequality through Predictive Analytics and Dynamic Policy FrameworksThe paper addresses the critical issue of AI-induced income inequality, focusing on developing an adaptive AI governance model that integrates real-time data analytics and local economic contexts to mitigate labor market disruptions. As AI technologies rapidly transform global labor markets, they pose a significant risk of job displacement and income disparity, necessitating adaptable governance frameworks. The challenge lies in creating a globally applicable model that accurately reflects diverse economic environments, predicts AI's long-term impacts, and balances innovation with worker protection. Our proposed solution is a sophisticated predictive analytics platform employing machine learning, Monte Carlo simulations, and agent-based modeling to simulate AI adoption scenarios and their effects on labor markets. Experiments utilizing a shallow MLP architecture on the \texttt{ag\_news} dataset demonstrate consistent prediction accuracy, with Mean Absolute Error (MAE) values ranging from 0.2518 to 0.2849, although R-squared scores were negative, indicating limitations in data representation. The main contributions of this study include a novel governance model that anticipates and mitigates AI's socio-economic impacts, offering dynamic policy recommendations tailored to local conditions. This research provides a foundation for future work on enhancing model accuracy and applicability by incorporating more comprehensive datasets and complex architectures.
-
2510.0058ViewAdaptive Inference Strategies for Token-OrderingAAdaptive token-ordering strategies for masked diffusion models (MDMs) and autoregressive models (ARMs) are critical for addressing the inherent imbalance in subproblem difficulties during sequence generation, which becomes increasingly relevant as models scale to complex reasoning tasks. In this work, we tackle the challenge of dynamically adjusting the token generation order via a reinforce- ment learning framework that optimizes the cumulative predictive V-information,formally defined as I_V (X → Y ) = HV (Y |∅) − HV (Y |X), to preferentially solve easier subproblems first. Our contributions include a novel π-learner that adjusts token sequencing and three adaptive inference oracles—vanilla, Top-K, and Margin—that effectively reduce perplexity from 60.0 to 52.0 while preserving token diversity (entropy shifting from 4.8 to 4.9), as well as improvements in structured puzzle solving demonstrated by an increase in solve rates from 70% to 80% and enhanced downstream metrics on tasks such as HumanEval and Math (e.g., pass@1 scores improving from 60% to 66%). Experimental validation spans scaling law analyses, where validation NLL drops from approximately +3.0 at 109 FLOPs to −5.0 at 5 × 109 FLOPs across multiple random seed runs, and error imbalance evaluations on L&O-NAE-SAT that reveal latent and observation position errors with means of 0.7976 and 0.9724, respectively. Collectively, these results confirm that adaptive token ordering not only mitigates computational intractability in hard token predictions but also enhances both likelihood-based metrics and generalization performance over fixed ordering strategies.
-
2510.0057ViewAdaptive Prompt-Enhanced Score Matching for Partially Observed DataAdaptive prompt-enhanced score matching for partially observed data addresses the challenging problem of recovering score functions from datasets with significant missing entries, where traditional imputation methods or naı̈ve score estimators often fail to achieve reliable parameter recovery and structural inference. In our work, we consider both marginal Importance-Weighted (Marg-IW) and marginal Variational (Marg-Var) approaches to estimate the score function, using a surrogate mean squared error loss. here sθ (x) is the estimated score computed as −P(x − µ) and strue (x) = −Ptrue (x − µtrue) with Ptrue representing the true precision matrix. This formulation inherently accounts for the missingness mechanism, typically modeled as MCAR with a missing rate of 30%, and is further stabilized via techniques such as log-sum-exp and gradient clipping. Our contributions include the integration of a meta-learning prompt generator, which dynamically selects key hyperparameters (e.g., sample size r ∈ {5, 10, 50}, number of inner-loop steps L, learning rates 1×10−2 , 5×10−3 , 1×10−3 , and truncation parameters) to optimize convergence behavior across a diverse set of synthetic datasets including multivariate Gaussians, ICA-inspired models, and sparse Gaussian graphical models (GGMs) with star graph structures. Experimental results demonstrate significant improvements: for instance, in the Gaussian experiment the loss reduced from 9.687 at iteration 50 to 0.094 at iteration 300 and the corresponding parameter error decreased from 3.033 to approximately 2.030, while in the GGM case, the ROC AUC improved from 0.219 to 0.97, thereby confirming our method’s efficacy in both parameter estimation and structure recovery under partial observations. These empirical validations underscore the relevance of adaptive score matching in high-dimensional and complex data regimes, set against the inherent difficulties of handling missing data and ensuring numerical stability in the estimation process, and pave the way for future extensions to accommodate MNAR scenarios and diffusion-based denoising score matching frameworks.
-
2510.0056ViewEnsemble-Based Bayesian Aggregation with Uncertainty-Guided Clarifications for Multi-Turn Human-LLM CollaborationOur work addresses the challenge of optimizing long-term multiturn human–LLM collaboration by introducing an ensemble of Monte Carlo-based reward predictors, Bayesian meta-calibration, and an uncertainty-guided clarification module that dynamically triggers clarifying interactions; in particular, we estimate the conversation-level reward as R∗ (t|g) = Rext (t, g) + Rint (t), where Rext (t, g) quantifies task-specific success (e.g. BLEU scores reaching up to 80% in document editing and unit test pass rates near 70% in code generation) and Rint (t) incorporates an efficiency penalty defined as − min[λ · TokenCount(t), 1] with λ = 0.01, augmented by an LLM-based interactivity score; our approach further employs Bayesian linear regression to aggregate the ensemble signals into a unified reward while simultaneously providing an uncertainty metric which, if exceeding a predefined threshold (e.g., 0.15), triggers an auxiliary clarification round that improves the aggregated outcome—this mechanism is mathematically formulated and empirically validated through improvements such as an increase in accuracy from 73.9% to 79.9% in mathematical problem solving and a resolution of ambiguous dialogue from 80% to 100% as reflected in our experiments; challenges arise due to noisy reward estimations and the trade-off between immediate task performance and long-term conversational quality, which we address via extensive ablation studies on window sizes (with w ∈ {1, 2, 3}) and Monte Carlo sample counts (e.g. S ∈ {3, 5}), as summarized in Table 1 (e.g., MediumDocEdit-Chat: BLEU 0.625 → 0.637, BigCodeBench-Chat: Unit Test Pass Rate 0.532 → 0.489, MATH-Chat: Accuracy 0.739 → 0.799, Abg-CoQA: Macro Accuracy/F1 0.8 → 1.0); overall, this work contributes a robust framework that integrates ensemble learning, uncertainty estimation, and dynamic clarification to effectively enhance the collaborative potential between human users and language models in complex, multi-turn settings.
-
2510.0055ViewQuantifying the Trade-Offs in Policy EvaluationThis work presents a comprehensive framework for quantifying the trade-off between prediction accuracy and screening access in policy evaluation, where we address the challenge of identifying and targeting the worst-off individuals through the rigorous estimation of a policy value function defined as V (α, β, R2 ) = √ Φ2 (zα ,zβ ;ρ)/β, with zα = Φ−1 (α), zβ = Φ−1 (β), and ρ = R2 ; our approach introduces the Prediction-Access Ratio (PAR) as a metric to quantify the rela tive impact of finite improvements in screening thresholds versus enhancements in predictive accuracy, thereby overcoming challenges associated with non-linear sensitivities such as ∂V/∂α ≈ 1.77513 AND ∂V/∂R2 ≈ 0.61282. We verify our framework using extensive simulation experiments on synthetic datasets in which a complex model’s Test R2 improves from 0.16866 to 0.32661 through residual scaling with δ = 0.1 and an associated empirical policy value V (α, β) increases from 0.70000 to 0.80000; and are further supported by capacity gap analyses which demonstrate that a minimal additional screening increment, ∆α∗ ≈ 0.0300, can yield gains comparable to those from complex model enhancements; this integrated strategy thereby provides actionable insights for policy interventions aimed at equalizing access while maintaining efficiency, a pertinent issue given the inherent difficulties arising from the interplay between prediction improvement and screening capacity in heterogeneous populations.
-
2510.0054ViewExplorations in Algorithmic Creativity via Next-Token and Multi-Token ApproachesAlgorithmic creativity in text generation poses significant challenges in balancing coherence, diversity, and memorization, and our study addresses these challenges by systematically comparing traditional next-token prediction (NTP) with multi-token teacherless prediction (MTP) and discrete diffusion methods (SEDD) across minimal yet representative combinatorial tasks such as Sibling Discovery, Triangle Discovery, Circle Construction, and Line Construction; our primary objective is to maximize the creative output defined as the fraction of generated samples that satisfy task-specific outputs validity criteria, which we quantify as ĉr = #coherent/#total outputs, and to minimize memorization, observed to drop from 100% under deterministic conditions to near 0% when employing controlled stochastic, while diversity is measured by D = |{unique-outputs}|total outputs with values reaching up to 1.00 in optimized settings; to achieve these ends, we introduce seed-conditioning and temperature scaling—modeled by the parameter T where T = 0 corresponds to greedy decoding and T > 0 introduces controlled noise following the relation pnoise = min(0.9, α × T ) with α varying by method—to guide the output generation process, and we formulate an alignment loss to ensure semantic consistency between the restrictive and adaptive prompts; extensive experimentation and rigorous ablation studies, as summarized in Table 1 (detailing coherence rates between 50% and 80%, memorization rates dropping from 100% to nearly 0%, and diversity metrics peaking at 1.00), validate that both MTP and SEDD outperform NTP under non-deterministic settings and when augmented with seed-conditioning, thereby demonstrating that our hybrid framework not only pushes the boundaries of algorithmic creativity on minimal open-ended tasks but also offers a scalable approach for more complex problem domains.
-
2510.0052ViewConformal Prediction as Bayesian Quadrature for Risk ControlIn this paper, we present a novel framework that leverages Bayesian quadrature for conformal prediction to achieve rigorous, data-conditional, and distribution-free risk guarantees, addressing the challenge of controlling predictive risk in high-stakes, black-box settings. Our approach constructs an upper bound on the expected loss by integrating over the quantile function of the loss distribu- tion, where, given calibration losses ℓ1 , . . . , ℓn , we define the aggregated loss Pn+1 as L+ = i=1 Ui ℓ(i) with Dirichlet random variables Ui ∼ Dir(1, . . . , 1) and ℓ(n+1) = B, thereby ensuring that the condition Pr(L+ ≤ α) ≥ β is met. Our contributions include a principled derivation that recovers well-known conformal methods such as Split Conformal Prediction (SCP) and Conformal Risk Control (CRC) as special cases, while introducing a novel high posterior density (HPD) rule that exploits the full posterior of L+ . We rigorously validate our method on synthetic binomial loss and heteroskedastic regression tasks, where experimental results indicate that methods based solely on the posterior mean (CRC) or uniform concentration bounds (RCPS) often yield either overly optimistic or conservative decisions, whereas our HPD rule achieves risk control with zero empirical failure rate and improved utility. For example, in the binomial experiment, while SCP selects an average λ of 0.596 with a 61.6% failure rate, HPD selects λ ≈ 0.970 with a 0% failure rate, and a similar trend is observed in regression tasks with test risks decreasing from 0.512 for SCP to 0.067 for HPD. These findings, summarized in Table 1, confirm that our Bayesian quadrature reformulation not only provides a more interpretable statistical characterization of conformal risk but also adapts effectively to calibration sample size and confidence level tuning, thus offering a robust solution for high-stakes decision-making.
-
2510.0051ViewCOMD: Coherent Masked DiffusionMasked language models (MLMs) have shown promise in natural language processing, but struggle with generating coherent and coherent-sounding text. In this work, we present Coherent Masked Diffusion (CoMD), a novel framework that extends Masked Language Diffusion to more efficiently and more effectively learn coherent and incoherent language. CoMD is built on Masked Language Diffusion (MLD), a recently proposed framework that models text generation as an inverse denoising diffusion process. Unlike MLD, CoMD uses a fixed mask matrix that is independent of the masked-out token and optimizes the probability of coherent generations with a novel coherent loss term without requiring additional samples per training step. Additionally, CoMD uses a variable time parameter to guide the coherent probability towards the ground truth coherent probability. Both inference and training computation are constant with respect to the length of the text. Empirically, CoMD outperforms previous methods on multiple coherent benchmarks. Furthermore, CoMD achieves an inference speedup of 7.3x and 10.5x over MLD and MDLM, respectively, and is significantly more compute and parameter efficient than autoregressive models.
-
2510.0050ViewU-CAN: User-Guided Clarification for Asking Clarification in Asking Across Needs FrameworkIt is still unclear if and how methods developed specifically on asking clarification for retrieval or problem-solving in the academic community can effectively address user needs during human-computer interactions (HCI). In this work, we first propose an Asking Across Needs (AAN) framework to explore the complexities of HCI, including user needs, interaction styles, and interaction types, by building an interaction graph (Pearl, 2009) containing user and LLM actions. Then, we create a new benchmark, UsClarification for Asking Needs (U-CAN), containing task-oriented asking clarification and retrieval-related asking clarification which align with real-world HCI scenarios. Specifically, we design new interaction graph designs and user-guided prompting techniques based on our AAN framework to address multiple user needs not met in existing HCI studies. We find that task-oriented needs are often left unmet, and existing methods show performance gaps between simulated and real-world (enrolled students) settings. We also demonstrate that HCI can be facilitated by interaction graphs on retrieval-related asking clarification using our proposed interactive graph model.
-
2510.0049ViewLearning Unnormalized Models with Missing Data via Adversarial Score MatchingLearning unnormalized model parameters is a challenging task that frequently arises in various scientific fields. Score matching is a promising method to learn unnormalized models by estimating the score function. However, score matching has several practical challenges in real-world applications, including the need for an auxiliary network to estimate the score function, the requirement for the model to support sampling, and the difficulty of estimating the score function for high-dimensional data. To address these challenges, we propose adversarial score matching (ASM), an adversarial learning algorithm for learning unnormalized models, which does not require an auxiliary network and can be applied to high-dimensional data. We also propose a multilevel Monte Carlo estimator for the score discrepancy, which is computationally more efficient than the traditional importance sampling estimator. In addition, we demonstrate that ASM is a mode-seeking algorithm, which has been observed empirically in a variety of adversarial learning methods. We evaluate the performance of ASM on various unnormalized models and missing data mechanisms, and demonstrate that ASM outperforms existing score matching methods.
-
2510.0045ViewPST-AUTO-AGENT: A Multi-Agent Ensemble Framework for Paper Source TracingThe escalating volume of scientific literature necessitates efficient methods for identifying foundational works that significantly inform new research. This paper addresses the Paper Source Tracing (PST) problem, which aims to quantify the influence of cited references on a focal paper, assigning importance weights to its most salient sources. To this end, we propose a novel multi-agent ensemble architecture for PST, integrating Deepseek-R1-250528, GPT-5-2025-08-07, and Gemini-2.5-pro. Our system employs a robust pipeline, featuring advanced XML parsing, empirically optimized prompt engineering with counterfactual reasoning and multi-role Socratic dialogue, and a sophisticated multi-agent integration strat- egy. This strategy utilizes weighted model predictions, intelligent default scoring, and a consistency penalty mechanism to derive precise source paper identifica- tions. Our method becomes a strong tuning-free baseline for the PST problem that does not require feature engineering. Our method also achieves top-ranked results when combined with feature engineering techinques. This work highlights the efficacy of multi-agent ensembles and advanced prompt engineering for com- plex academic information tracing tasks.
-
2510.0040ViewA Fuzzy-based Approach to Predict Human Interaction by Functional Near-Infrared SpectroscopyIn this article, we introduce the Fuzzy logic-based attention (Fuzzy Attention Layer) mechanism, a novel computational approach designed to enhance the interpretability and efficacy of neural models in psychological research. The fuzzy attention layer integrated into the transformer encoder model to analyze complex psychological phenomena from neural signals captured by functional near-infrared spectroscopy (fNIRS). By leveraging fuzzy logic, the fuzzy attention layer learns and identifies interpretable patterns of neural activity. This addresses a significant challenge in using transformers: the lack of transparency in determining which specific brain activities most contribute to particular predictions. Our experimental results, obtained from fNIRS data engaged in social interactions involving handholding, reveal that the fuzzy attention layer not only learns interpretable patterns of neural activity but also enhances model performance. In addition, these patterns provide deeper insights into the neural correlates of interpersonal touch and emotional exchange. The application of our model shows promising potential in understanding the complex aspects of human social behavior, verify psychological theory with machine learning algorithms, thereby contributing significantly to the fields of social neuroscience and AI. Presented version based on the work published in IEEE TFS (2025)
-
2510.0036ViewA Self-Driving Laboratory for Materials Science: An Autonomous Research Agent for Deep Data Analysis and InterpretationAs artificial intelligence increasingly permeates scientific research, the ”AI for Science” paradigm is evolving to enable more autonomous scientific workflows. Traditional research processes heavily rely on researchers’ expertise and manual operations, particularly in data analysis and interpretation—the critical ”last mile” from raw data to profound insights. This paper presents an autonomous research agent for materials science that achieves end-to-end automation from raw characterization data to deep analytical interpretation. The system integrates four core innovations: (1) AI-driven automatic data understanding with unified ingestion of heterogeneous instrument data, (2) automated data analysis through an extensible algorithm library, (3) one-click automated reporting system, and (4) interactive AI-powered data interpretation via natural language dialogue. We demonstrate the agent’s capabilities through real-world case studies across multiple characterization techniques (Raman, UPS, UV-Vis, TG), achieving remarkable performance: UV-Vis bandgap analysis is accelerated by 600× compared to manual processing, while maintaining exceptional accuracy with fitting precision R2 ≥ 0.999. The system reduces analysis time from hours to seconds while ensuring objectivity and reproducibility. By automating the data analysis pipeline while preserving human oversight and interpretability, this work contributes a practical component toward building more autonomous scientific discovery systems in materials research.
-
2510.0035ViewMotivGraph-SoIQ: Integrating Motivational Knowledge Graphs and Socratic Dialogue for Enhanced LLM IdeationLarge Language Models (LLMs) hold substantial potential for accelerating academic ideation but face critical challenges in grounding ideas and mitigating confirmation bias for further refinement. We propose integrating motivational knowledge graphs and socratic dialogue to address these limitations in enhanced LLM ideation (MotivGraph-SoIQ). This novel framework provides essential grounding and practical idea improvement steps for LLM ideation by integrating a Motivational Knowledge Graph (MotivGraph) with a Q-Driven Socratic Ideator. The MotivGraph structurally stores three key node types-problem, challenge, and solution—to offer motivation grounding for the LLM ideation process. The Ideator is a dual-agent system utilizing Socratic questioning, which facilitates a rigorous refinement process that mitigates confirmation bias and improves idea quality across novelty, experimental rigor, and motivational rationality dimensions. On the ICLR25 paper topics dataset, MotivGraph-SoIQ exhibits clear advantages over existing state-of-the-art approaches across LLM-based scoring, ELO ranking, and human evaluation metrics.
-
2510.0034ViewCognitive-YOLO: LLM-Driven Architecture Synthesis from First Principles of Data for Object DetectionDesigning high-performance object detection architectures is a complex task, where traditional manual design is time-consuming and labor-intensive, and Neural Architecture Search (NAS) is computationally prohibitive. While recent approaches using Large Language Models (LLMs) show promise, they often function as iterative optimizers within a search loop, rather than generating architectures directly from a holistic understanding of the data. To address this gap, we propose Cognitive-YOLO, a novel framework for LLM-driven architecture synthesis that generates network configurations directly from the intrinsic characteristics of the dataset. Our method consists of three stages: first, an analysis module extracts key meta-features (e.g., object scale distribution and scene density) from the target dataset; second, the LLM reasons upon these features, augmented with state-of-the-art components retrieved via Retrieval-Augmented Generation (RAG), to synthesize the architecture into a structured neural network description, which we term the Neural Architecture Description Language (NADL); finally, a compiler instantiates this description into a deployable model. Extensive experiments on five diverse object detection datasets demonstrate that our proposed Cognitive-YOLO consistently generates superior architectures, achieving state-of-the-art (SOTA) performance by outperforming strong baseline models across multiple benchmarks.
-
2510.0030ViewLatent-Diffusion Guided Cross-View Alignment for Heterogeneous Graph RecommendationRecommender systems operating on heterogeneous, multi-relational graphs contend with noise and incompleteness in auxiliary signals, which can destabilize learning and degrade ranking performance when targeting robust representations. Naive cross-view training risks propagating noise across views, and existing contrastive or augmentation-based schemes often hinge on design choices and can struggle to scale to large, complex graphs. We propose a latent-diffusion guided cross-view alignment framework for heterogeneous graph recommendation that jointly learns a relation-aware heterogeneous GNN encoder, producing paired target and auxiliary embeddings, and a compact, time-conditioned latent-space denoiser that maps noisy auxiliary latents toward target-view semantics. The denoiser provides principled supervision to disentangle structured noise, with its residual outputs fused into target embeddings to refine ranking-relevant representations. Training optimizes a joint denoising objective and a ranking objective, enabling scalable, robust cross-view alignment without ad-hoc augmentations. Empirical results on implicit-feedback data demonstrate improved robustness and ranking accuracy under noisy auxiliary signals, with flexible gradient-flow and fusion strategies supporting stable end-to-end training on large graphs. Ablations highlight the benefits of explicit noise modeling in auxiliary views, diffusion-based supervision for stability, and scalable, relation-aware encoding of practical significance for recommender systems.
-
2510.0026ViewGeometry-Aware Optimal Flow Matching via Convex PotentialsGenerative modeling under quadratic optimal transport (OT) aims to learn deterministic maps that push mass from a simple source distribution \(p_0\) to a target distribution \(p_1\) along the Wasserstein-2 (W2) geodesics. While flow-based models and neural differential equations offer flexible transports, existing approaches typically rely on multi-step integration and yield trajectories whose curvature deviates from W2 geodesics, reducing efficiency, interpretability, and stability. We propose a geometry-aware framework that parameterizes time-dependent velocity fields as gradients of convex potentials modeled by Input Convex Neural Networks (ICNNs). This convex-potential representation guarantees transport along straight lines, exactly matching the W2 map under quadratic cost. Training uses a Flow Matching objective tailored to the convex setting, with explicit gradient computations and a dedicated inversion subproblem to recover preimages under the convex-potential flow; an optional amortization network provides favorable initializations for the inversion and accelerates optimization. The method is agnostic to the specific transport plan and can condition on arbitrary couplings between \(p_0\) and \(p_1\). Empirically, the approach yields geometry-faithful transports along W2 geodesics, enabling fast sampling with one-step or few-step updates and controlled curvature. Diagnostics on representative datasets confirm geometric fidelity and trainability, and we discuss initialization and transport-plan considerations for scalable, stable generative modeling under quadratic OT.
-
2510.0024ViewLECTOR: LLM-Enhanced Concept-based Test-Oriented RepetitionSpaced repetition systems are fundamental to efficient learning and memory retention, but existing algorithms often struggle with semantic interference and personalized adaptation. We present LECTOR (\textbf{L}LM-\textbf{E}nhanced \textbf{C}oncept-based \textbf{T}est-\textbf{O}riented \textbf{R}epetition), a novel adaptive scheduling algorithm specifically designed for test-oriented learning scenarios, particularly language examinations where success rate is paramount. LECTOR leverages large language models for semantic analysis while incorporating personalized learning profiles, addressing the critical challenge of semantic confusion in vocabulary learning by utilizing LLM-powered semantic similarity assessment and integrating it with established spaced repetition principles. Our comprehensive evaluation against six baseline algorithms (SSP-MMC, SM2, HLR, FSRS, ANKI, THRESHOLD) across 100 simulated learners over 100 days demonstrates significant improvements: LECTOR achieves a 90.2\% success rate compared to 88.4\% for the best baseline (SSP-MMC), representing a 2.0\% relative improvement. The algorithm shows particular strength in handling semantically similar concepts, reducing confusion-induced errors while maintaining computational efficiency. Our results establish LECTOR as a promising direction for intelligent tutoring systems and adaptive learning platforms.