Papers
Event:
-
2510.0067ViewBayesian Quadrature-Conformal Prediction Framework for Enhanced Uncertainty Quantification in Spatio-Temporal ModelsIn high-stakes domains such as climate science and epidemiology, achieving robust uncertainty quantification (UQ) in spatio-temporal models is crucial due to the significant impact on public safety and resource management. Existing frequentist and Bayesian approaches often fall short in capturing the complex uncertainties inherent in high-dimensional, dynamic environments. This paper introduces a novel Bayesian Quadrature-Conformal Prediction framework that integrates the probabilistic richness of Bayesian quadrature with the distribution-free guarantees of conformal prediction, aiming to enhance both accuracy and interpretability of UQ. Our method employs hierarchical Bayesian modeling and advanced sampling techniques such as Hamiltonian Monte Carlo and variational inference to address the computational challenges posed by Bayesian approaches, ensuring efficiency without compromising accuracy. Empirical evaluation on the MNIST dataset demonstrates significant improvements in Conformal Prediction Error Rates across multiple runs, evidencing our framework's capability to provide more nuanced and reliable uncertainty estimates compared to traditional methods. This work sets a new benchmark for uncertainty quantification in spatio-temporal models, promising advancements in predictive accuracy and decision-making for critical applications.
-
2510.0066ViewOptimizing Masked Diffusion Models for Efficient Discrete Generative TasksThis paper addresses the computational challenges inherent in training Masked Diffusion Models (MDMs) for discrete generative tasks, which are crucial for applications like game development and biomedical modeling. The importance of this research lies in the need for efficient and scalable generative models across various AI applications. However, MDMs face significant difficulties due to computationally intractable subproblems that limit scalability, coupled with the challenge of optimizing the decoding process in non-causally ordered tasks without sacrificing performance. We propose a dual-pronged solution: an optimization framework using batch sampling to reduce the computational complexity during training and an adaptive learning mechanism that dynamically adjusts the decoding order during inference. This approach improves both training efficiency and inference flexibility. Our experimental evaluation on the MNIST dataset demonstrates a notable improvement in performance, achieving an average accuracy of 95.53\% and maintaining an average inference time of 7.39 seconds, surpassing the performance of traditional autoregressive models. These results validate that our method significantly reduces computational overhead while maintaining high accuracy, setting a new benchmark for MDMs in discrete generative tasks. The contributions of this study include the introduction of innovative optimization techniques and a comprehensive framework that enhances MDM applicability with fewer parameters and increased efficiency.
-
2510.0064ViewTrain for the Worst, Plan for the Best: Enhancing Token Ordering in Masked DiffusionsMasked diffusion models (MDMs) have emerged as a powerful paradigm for gen- erative modeling over discrete domains. However, their training often involves solving computationally intractable problems, while their inference capabilities remain underutilized. In this work, we propose to enhance the performance of MDMs by introducing adaptive inference strategies that allow for dynamic token ordering during decoding. We demonstrate that by sidestepping computationally heavy subproblems, pretrained MDMs can achieve significant performance im- provements on complex tasks such as logic puzzles. Our experiments show that adaptive inference boosts Sudoku solving accuracy from less than 7% to approx- imately 90%, even outperforming autoregressive models with significantly more parameters. This work opens new avenues for leveraging the strengths of MDMs in discrete generative tasks.
-
2510.0063ViewDynamic Intent Adaptation for Long-Term Dialogue Systems Using Reinforcement LearningThis paper addresses the challenge of enabling large language models (LLMs) to dynamically discover and adapt to user intents during long-term interactions. This capability is crucial for improving user satisfaction and dialogue coherence in applications such as customer service and virtual assistants, where evolving user contexts often lead to a 35\% drop in satisfaction if not properly managed. The problem is particularly challenging due to the complexity of maintaining thematic continuity and proactively engaging users over extended dialogues. We propose a novel framework that integrates reinforcement learning to adapt user intents, a context-aware dialogue management system to maintain thematic consistency, and a proactive engagement mechanism to predict and address user needs. Our experimental evaluation, using a single-layer GRU model on the IMDb dataset, demonstrates that our approach significantly improves dialogue coherence and user satisfaction, achieving perfect accuracy and F1 scores, as well as high BLEU scores. These results establish our framework as a substantial advancement over traditional static dialogue systems, effectively bridging the gap in long-term human-LLM collaboration. Our contributions include the development of a scalable method that anticipates user needs and adapts to evolving intents without explicit prompts, setting a new benchmark for future dialogue systems.
-
2510.0059ViewAdaptive AI Governance: Mitigating Income Inequality through Predictive Analytics and Dynamic Policy FrameworksThe paper addresses the critical issue of AI-induced income inequality, focusing on developing an adaptive AI governance model that integrates real-time data analytics and local economic contexts to mitigate labor market disruptions. As AI technologies rapidly transform global labor markets, they pose a significant risk of job displacement and income disparity, necessitating adaptable governance frameworks. The challenge lies in creating a globally applicable model that accurately reflects diverse economic environments, predicts AI's long-term impacts, and balances innovation with worker protection. Our proposed solution is a sophisticated predictive analytics platform employing machine learning, Monte Carlo simulations, and agent-based modeling to simulate AI adoption scenarios and their effects on labor markets. Experiments utilizing a shallow MLP architecture on the \texttt{ag\_news} dataset demonstrate consistent prediction accuracy, with Mean Absolute Error (MAE) values ranging from 0.2518 to 0.2849, although R-squared scores were negative, indicating limitations in data representation. The main contributions of this study include a novel governance model that anticipates and mitigates AI's socio-economic impacts, offering dynamic policy recommendations tailored to local conditions. This research provides a foundation for future work on enhancing model accuracy and applicability by incorporating more comprehensive datasets and complex architectures.
-
2510.0058ViewAdaptive Inference Strategies for Token-OrderingAAdaptive token-ordering strategies for masked diffusion models (MDMs) and autoregressive models (ARMs) are critical for addressing the inherent imbalance in subproblem difficulties during sequence generation, which becomes increasingly relevant as models scale to complex reasoning tasks. In this work, we tackle the challenge of dynamically adjusting the token generation order via a reinforce- ment learning framework that optimizes the cumulative predictive V-information,formally defined as I_V (X → Y ) = HV (Y |∅) − HV (Y |X), to preferentially solve easier subproblems first. Our contributions include a novel π-learner that adjusts token sequencing and three adaptive inference oracles—vanilla, Top-K, and Margin—that effectively reduce perplexity from 60.0 to 52.0 while preserving token diversity (entropy shifting from 4.8 to 4.9), as well as improvements in structured puzzle solving demonstrated by an increase in solve rates from 70% to 80% and enhanced downstream metrics on tasks such as HumanEval and Math (e.g., pass@1 scores improving from 60% to 66%). Experimental validation spans scaling law analyses, where validation NLL drops from approximately +3.0 at 109 FLOPs to −5.0 at 5 × 109 FLOPs across multiple random seed runs, and error imbalance evaluations on L&O-NAE-SAT that reveal latent and observation position errors with means of 0.7976 and 0.9724, respectively. Collectively, these results confirm that adaptive token ordering not only mitigates computational intractability in hard token predictions but also enhances both likelihood-based metrics and generalization performance over fixed ordering strategies.
-
2510.0057ViewAdaptive Prompt-Enhanced Score Matching for Partially Observed DataAdaptive prompt-enhanced score matching for partially observed data addresses the challenging problem of recovering score functions from datasets with significant missing entries, where traditional imputation methods or naı̈ve score estimators often fail to achieve reliable parameter recovery and structural inference. In our work, we consider both marginal Importance-Weighted (Marg-IW) and marginal Variational (Marg-Var) approaches to estimate the score function, using a surrogate mean squared error loss. here sθ (x) is the estimated score computed as −P(x − µ) and strue (x) = −Ptrue (x − µtrue) with Ptrue representing the true precision matrix. This formulation inherently accounts for the missingness mechanism, typically modeled as MCAR with a missing rate of 30%, and is further stabilized via techniques such as log-sum-exp and gradient clipping. Our contributions include the integration of a meta-learning prompt generator, which dynamically selects key hyperparameters (e.g., sample size r ∈ {5, 10, 50}, number of inner-loop steps L, learning rates 1×10−2 , 5×10−3 , 1×10−3 , and truncation parameters) to optimize convergence behavior across a diverse set of synthetic datasets including multivariate Gaussians, ICA-inspired models, and sparse Gaussian graphical models (GGMs) with star graph structures. Experimental results demonstrate significant improvements: for instance, in the Gaussian experiment the loss reduced from 9.687 at iteration 50 to 0.094 at iteration 300 and the corresponding parameter error decreased from 3.033 to approximately 2.030, while in the GGM case, the ROC AUC improved from 0.219 to 0.97, thereby confirming our method’s efficacy in both parameter estimation and structure recovery under partial observations. These empirical validations underscore the relevance of adaptive score matching in high-dimensional and complex data regimes, set against the inherent difficulties of handling missing data and ensuring numerical stability in the estimation process, and pave the way for future extensions to accommodate MNAR scenarios and diffusion-based denoising score matching frameworks.
-
2510.0056ViewEnsemble-Based Bayesian Aggregation with Uncertainty-Guided Clarifications for Multi-Turn Human-LLM CollaborationOur work addresses the challenge of optimizing long-term multiturn human–LLM collaboration by introducing an ensemble of Monte Carlo-based reward predictors, Bayesian meta-calibration, and an uncertainty-guided clarification module that dynamically triggers clarifying interactions; in particular, we estimate the conversation-level reward as R∗ (t|g) = Rext (t, g) + Rint (t), where Rext (t, g) quantifies task-specific success (e.g. BLEU scores reaching up to 80% in document editing and unit test pass rates near 70% in code generation) and Rint (t) incorporates an efficiency penalty defined as − min[λ · TokenCount(t), 1] with λ = 0.01, augmented by an LLM-based interactivity score; our approach further employs Bayesian linear regression to aggregate the ensemble signals into a unified reward while simultaneously providing an uncertainty metric which, if exceeding a predefined threshold (e.g., 0.15), triggers an auxiliary clarification round that improves the aggregated outcome—this mechanism is mathematically formulated and empirically validated through improvements such as an increase in accuracy from 73.9% to 79.9% in mathematical problem solving and a resolution of ambiguous dialogue from 80% to 100% as reflected in our experiments; challenges arise due to noisy reward estimations and the trade-off between immediate task performance and long-term conversational quality, which we address via extensive ablation studies on window sizes (with w ∈ {1, 2, 3}) and Monte Carlo sample counts (e.g. S ∈ {3, 5}), as summarized in Table 1 (e.g., MediumDocEdit-Chat: BLEU 0.625 → 0.637, BigCodeBench-Chat: Unit Test Pass Rate 0.532 → 0.489, MATH-Chat: Accuracy 0.739 → 0.799, Abg-CoQA: Macro Accuracy/F1 0.8 → 1.0); overall, this work contributes a robust framework that integrates ensemble learning, uncertainty estimation, and dynamic clarification to effectively enhance the collaborative potential between human users and language models in complex, multi-turn settings.
-
2510.0055ViewQuantifying the Trade-Offs in Policy EvaluationThis work presents a comprehensive framework for quantifying the trade-off between prediction accuracy and screening access in policy evaluation, where we address the challenge of identifying and targeting the worst-off individuals through the rigorous estimation of a policy value function defined as V (α, β, R2 ) = √ Φ2 (zα ,zβ ;ρ)/β, with zα = Φ−1 (α), zβ = Φ−1 (β), and ρ = R2 ; our approach introduces the Prediction-Access Ratio (PAR) as a metric to quantify the rela tive impact of finite improvements in screening thresholds versus enhancements in predictive accuracy, thereby overcoming challenges associated with non-linear sensitivities such as ∂V/∂α ≈ 1.77513 AND ∂V/∂R2 ≈ 0.61282. We verify our framework using extensive simulation experiments on synthetic datasets in which a complex model’s Test R2 improves from 0.16866 to 0.32661 through residual scaling with δ = 0.1 and an associated empirical policy value V (α, β) increases from 0.70000 to 0.80000; and are further supported by capacity gap analyses which demonstrate that a minimal additional screening increment, ∆α∗ ≈ 0.0300, can yield gains comparable to those from complex model enhancements; this integrated strategy thereby provides actionable insights for policy interventions aimed at equalizing access while maintaining efficiency, a pertinent issue given the inherent difficulties arising from the interplay between prediction improvement and screening capacity in heterogeneous populations.
-
2510.0054ViewExplorations in Algorithmic Creativity via Next-Token and Multi-Token ApproachesAlgorithmic creativity in text generation poses significant challenges in balancing coherence, diversity, and memorization, and our study addresses these challenges by systematically comparing traditional next-token prediction (NTP) with multi-token teacherless prediction (MTP) and discrete diffusion methods (SEDD) across minimal yet representative combinatorial tasks such as Sibling Discovery, Triangle Discovery, Circle Construction, and Line Construction; our primary objective is to maximize the creative output defined as the fraction of generated samples that satisfy task-specific outputs validity criteria, which we quantify as ĉr = #coherent/#total outputs, and to minimize memorization, observed to drop from 100% under deterministic conditions to near 0% when employing controlled stochastic, while diversity is measured by D = |{unique-outputs}|total outputs with values reaching up to 1.00 in optimized settings; to achieve these ends, we introduce seed-conditioning and temperature scaling—modeled by the parameter T where T = 0 corresponds to greedy decoding and T > 0 introduces controlled noise following the relation pnoise = min(0.9, α × T ) with α varying by method—to guide the output generation process, and we formulate an alignment loss to ensure semantic consistency between the restrictive and adaptive prompts; extensive experimentation and rigorous ablation studies, as summarized in Table 1 (detailing coherence rates between 50% and 80%, memorization rates dropping from 100% to nearly 0%, and diversity metrics peaking at 1.00), validate that both MTP and SEDD outperform NTP under non-deterministic settings and when augmented with seed-conditioning, thereby demonstrating that our hybrid framework not only pushes the boundaries of algorithmic creativity on minimal open-ended tasks but also offers a scalable approach for more complex problem domains.
-
2510.0052ViewConformal Prediction as Bayesian Quadrature for Risk ControlIn this paper, we present a novel framework that leverages Bayesian quadrature for conformal prediction to achieve rigorous, data-conditional, and distribution-free risk guarantees, addressing the challenge of controlling predictive risk in high-stakes, black-box settings. Our approach constructs an upper bound on the expected loss by integrating over the quantile function of the loss distribu- tion, where, given calibration losses ℓ1 , . . . , ℓn , we define the aggregated loss Pn+1 as L+ = i=1 Ui ℓ(i) with Dirichlet random variables Ui ∼ Dir(1, . . . , 1) and ℓ(n+1) = B, thereby ensuring that the condition Pr(L+ ≤ α) ≥ β is met. Our contributions include a principled derivation that recovers well-known conformal methods such as Split Conformal Prediction (SCP) and Conformal Risk Control (CRC) as special cases, while introducing a novel high posterior density (HPD) rule that exploits the full posterior of L+ . We rigorously validate our method on synthetic binomial loss and heteroskedastic regression tasks, where experimental results indicate that methods based solely on the posterior mean (CRC) or uniform concentration bounds (RCPS) often yield either overly optimistic or conservative decisions, whereas our HPD rule achieves risk control with zero empirical failure rate and improved utility. For example, in the binomial experiment, while SCP selects an average λ of 0.596 with a 61.6% failure rate, HPD selects λ ≈ 0.970 with a 0% failure rate, and a similar trend is observed in regression tasks with test risks decreasing from 0.512 for SCP to 0.067 for HPD. These findings, summarized in Table 1, confirm that our Bayesian quadrature reformulation not only provides a more interpretable statistical characterization of conformal risk but also adapts effectively to calibration sample size and confidence level tuning, thus offering a robust solution for high-stakes decision-making.
-
2510.0051ViewCOMD: Coherent Masked DiffusionMasked language models (MLMs) have shown promise in natural language processing, but struggle with generating coherent and coherent-sounding text. In this work, we present Coherent Masked Diffusion (CoMD), a novel framework that extends Masked Language Diffusion to more efficiently and more effectively learn coherent and incoherent language. CoMD is built on Masked Language Diffusion (MLD), a recently proposed framework that models text generation as an inverse denoising diffusion process. Unlike MLD, CoMD uses a fixed mask matrix that is independent of the masked-out token and optimizes the probability of coherent generations with a novel coherent loss term without requiring additional samples per training step. Additionally, CoMD uses a variable time parameter to guide the coherent probability towards the ground truth coherent probability. Both inference and training computation are constant with respect to the length of the text. Empirically, CoMD outperforms previous methods on multiple coherent benchmarks. Furthermore, CoMD achieves an inference speedup of 7.3x and 10.5x over MLD and MDLM, respectively, and is significantly more compute and parameter efficient than autoregressive models.
-
2510.0050ViewU-CAN: User-Guided Clarification for Asking Clarification in Asking Across Needs FrameworkIt is still unclear if and how methods developed specifically on asking clarification for retrieval or problem-solving in the academic community can effectively address user needs during human-computer interactions (HCI). In this work, we first propose an Asking Across Needs (AAN) framework to explore the complexities of HCI, including user needs, interaction styles, and interaction types, by building an interaction graph (Pearl, 2009) containing user and LLM actions. Then, we create a new benchmark, UsClarification for Asking Needs (U-CAN), containing task-oriented asking clarification and retrieval-related asking clarification which align with real-world HCI scenarios. Specifically, we design new interaction graph designs and user-guided prompting techniques based on our AAN framework to address multiple user needs not met in existing HCI studies. We find that task-oriented needs are often left unmet, and existing methods show performance gaps between simulated and real-world (enrolled students) settings. We also demonstrate that HCI can be facilitated by interaction graphs on retrieval-related asking clarification using our proposed interactive graph model.
-
2510.0049ViewLearning Unnormalized Models with Missing Data via Adversarial Score MatchingLearning unnormalized model parameters is a challenging task that frequently arises in various scientific fields. Score matching is a promising method to learn unnormalized models by estimating the score function. However, score matching has several practical challenges in real-world applications, including the need for an auxiliary network to estimate the score function, the requirement for the model to support sampling, and the difficulty of estimating the score function for high-dimensional data. To address these challenges, we propose adversarial score matching (ASM), an adversarial learning algorithm for learning unnormalized models, which does not require an auxiliary network and can be applied to high-dimensional data. We also propose a multilevel Monte Carlo estimator for the score discrepancy, which is computationally more efficient than the traditional importance sampling estimator. In addition, we demonstrate that ASM is a mode-seeking algorithm, which has been observed empirically in a variety of adversarial learning methods. We evaluate the performance of ASM on various unnormalized models and missing data mechanisms, and demonstrate that ASM outperforms existing score matching methods.
-
2510.0045ViewPST-AUTO-AGENT: A Multi-Agent Ensemble Framework for Paper Source TracingThe escalating volume of scientific literature necessitates efficient methods for identifying foundational works that significantly inform new research. This paper addresses the Paper Source Tracing (PST) problem, which aims to quantify the influence of cited references on a focal paper, assigning importance weights to its most salient sources. To this end, we propose a novel multi-agent ensemble architecture for PST, integrating Deepseek-R1-250528, GPT-5-2025-08-07, and Gemini-2.5-pro. Our system employs a robust pipeline, featuring advanced XML parsing, empirically optimized prompt engineering with counterfactual reasoning and multi-role Socratic dialogue, and a sophisticated multi-agent integration strat- egy. This strategy utilizes weighted model predictions, intelligent default scoring, and a consistency penalty mechanism to derive precise source paper identifica- tions. Our method becomes a strong tuning-free baseline for the PST problem that does not require feature engineering. Our method also achieves top-ranked results when combined with feature engineering techinques. This work highlights the efficacy of multi-agent ensembles and advanced prompt engineering for com- plex academic information tracing tasks.
-
2510.0030ViewLatent-Diffusion Guided Cross-View Alignment for Heterogeneous Graph RecommendationRecommender systems operating on heterogeneous, multi-relational graphs contend with noise and incompleteness in auxiliary signals, which can destabilize learning and degrade ranking performance when targeting robust representations. Naive cross-view training risks propagating noise across views, and existing contrastive or augmentation-based schemes often hinge on design choices and can struggle to scale to large, complex graphs. We propose a latent-diffusion guided cross-view alignment framework for heterogeneous graph recommendation that jointly learns a relation-aware heterogeneous GNN encoder, producing paired target and auxiliary embeddings, and a compact, time-conditioned latent-space denoiser that maps noisy auxiliary latents toward target-view semantics. The denoiser provides principled supervision to disentangle structured noise, with its residual outputs fused into target embeddings to refine ranking-relevant representations. Training optimizes a joint denoising objective and a ranking objective, enabling scalable, robust cross-view alignment without ad-hoc augmentations. Empirical results on implicit-feedback data demonstrate improved robustness and ranking accuracy under noisy auxiliary signals, with flexible gradient-flow and fusion strategies supporting stable end-to-end training on large graphs. Ablations highlight the benefits of explicit noise modeling in auxiliary views, diffusion-based supervision for stability, and scalable, relation-aware encoding of practical significance for recommender systems.
-
2510.0026ViewGeometry-Aware Optimal Flow Matching via Convex PotentialsGenerative modeling under quadratic optimal transport (OT) aims to learn deterministic maps that push mass from a simple source distribution \(p_0\) to a target distribution \(p_1\) along the Wasserstein-2 (W2) geodesics. While flow-based models and neural differential equations offer flexible transports, existing approaches typically rely on multi-step integration and yield trajectories whose curvature deviates from W2 geodesics, reducing efficiency, interpretability, and stability. We propose a geometry-aware framework that parameterizes time-dependent velocity fields as gradients of convex potentials modeled by Input Convex Neural Networks (ICNNs). This convex-potential representation guarantees transport along straight lines, exactly matching the W2 map under quadratic cost. Training uses a Flow Matching objective tailored to the convex setting, with explicit gradient computations and a dedicated inversion subproblem to recover preimages under the convex-potential flow; an optional amortization network provides favorable initializations for the inversion and accelerates optimization. The method is agnostic to the specific transport plan and can condition on arbitrary couplings between \(p_0\) and \(p_1\). Empirically, the approach yields geometry-faithful transports along W2 geodesics, enabling fast sampling with one-step or few-step updates and controlled curvature. Diagnostics on representative datasets confirm geometric fidelity and trainability, and we discuss initialization and transport-plan considerations for scalable, stable generative modeling under quadratic OT.
-
2510.0024ViewLECTOR: LLM-Enhanced Concept-based Test-Oriented RepetitionSpaced repetition systems are fundamental to efficient learning and memory retention, but existing algorithms often struggle with semantic interference and personalized adaptation. We present LECTOR (\textbf{L}LM-\textbf{E}nhanced \textbf{C}oncept-based \textbf{T}est-\textbf{O}riented \textbf{R}epetition), a novel adaptive scheduling algorithm specifically designed for test-oriented learning scenarios, particularly language examinations where success rate is paramount. LECTOR leverages large language models for semantic analysis while incorporating personalized learning profiles, addressing the critical challenge of semantic confusion in vocabulary learning by utilizing LLM-powered semantic similarity assessment and integrating it with established spaced repetition principles. Our comprehensive evaluation against six baseline algorithms (SSP-MMC, SM2, HLR, FSRS, ANKI, THRESHOLD) across 100 simulated learners over 100 days demonstrates significant improvements: LECTOR achieves a 90.2\% success rate compared to 88.4\% for the best baseline (SSP-MMC), representing a 2.0\% relative improvement. The algorithm shows particular strength in handling semantically similar concepts, reducing confusion-induced errors while maintaining computational efficiency. Our results establish LECTOR as a promising direction for intelligent tutoring systems and adaptive learning platforms.
-
2510.0022ViewAdaptive Log Anomaly Detection through Data--Centric Drift Characterization and Policy-Driven Lifelong LearningLog-based anomaly detectors degrade over time due to concept drift arising from software updates or workload changes. Existing systems typically react by retraining entire models, leading to catastrophic forgetting and inefficiencies. We propose an adaptive framework that first classifies drift in log data into semantic (frequency shifts within known templates) and syntactic (emergence of new log templates) categories via statistical tests and novelty detection. Based on the identified drift type, a policy-driven lifelong learning manager applies targeted updates---experience replay to mitigate forgetting under semantic drift and dynamic model expansion to accommodate syntactic drift. This approach is validated on semi-synthetic logs and real-world longitudinal datasets (HDFS, Apache, and BGL), maintaining high F1-scores, reducing computational overhead, and preserving historical knowledge compared to monolithic retraining.
-
2510.0021ViewConFIT: A Robust Knowledge-Guided Contrastive Framework for Financial ExtractionFinancial text extraction faces serious challenges in multi-entity sentiment attribution and numerical sensitivity, often leading to pitfalls in real-world deployment. In this work, we propose ConFIT (Contrastive Financial Information Tuning), a knowledge-guided contrastive learning framework that employs a Semantic-Preserving Perturbation (SPP) engine to generate high-quality, programmatically synthesized hard negatives. By integrating domain knowledge sources such as the Loughran-McDonald lexicon and Wikidata, and applying rigorous perplexity and Natural Language Inference (NLI) filtering, ConFIT trains language models to differentiate subtle perturbations in financial statements. Evaluations on FiQA and SENTiVENT using FinBERT and Llama-3 8B show both promise improvements and unexpected pitfalls, highlighting challenges that warrant further research.