This paper introduces UQ-Net, a unified probabilistic framework designed to enhance uncertainty quantification (UQ) in large language models (LLMs). The core contribution lies in the integration of Bayesian modeling, calibration techniques, conformal prediction, and selective decision rules within a single framework. This integration aims to disentangle epistemic and aleatoric uncertainties, thereby supporting more reliable decision-making in the context of LLMs. The authors argue that current evaluation practices for UQ in LLMs are inadequate, highlighting issues such as the misalignment of consistency and entropy with factuality, the lack of benchmarks for multi-episode interactions, and inconsistent metrics for calibration and tightness. To address these shortcomings, they advocate for context-aware datasets, standardized metrics, and human-in-the-loop evaluations. The proposed UQ-Net framework is presented as a means to improve calibration and reduce predictive errors, with a focus on safety-sensitive applications. The paper presents two case studies, one in medical diagnosis and another in code generation, to demonstrate the effectiveness of UQ-Net. In the medical diagnosis case study, the authors use synthetic datasets to simulate medical diagnosis tasks, while in the code generation case study, they focus on improving task sequencing accuracy in robotic software engineering workflows. The empirical results from these case studies suggest that UQ-Net can achieve better calibration and reduce predictive errors compared to standard deep neural network (DNN) baselines. Specifically, the Bayesian UQ component of UQ-Net, using Monte Carlo Dropout, is shown to achieve a lower Expected Calibration Error (ECE) compared to baseline DNNs. Furthermore, the multi-episode modeling component of UQ-Net is reported to improve task sequencing accuracy by 18% over single-episode approaches. The authors also present a comparison of UQ performance metrics across different methods, including baseline DNNs, Bayesian UQ with dropout, multi-episode UQ, and conformal prediction. Overall, the paper aims to provide a principled foundation for operationalizing UQ in LLMs, advancing the development of trustworthy and responsible AI for real-world applications. The authors emphasize the importance of addressing the identified gaps in current evaluation practices and propose UQ-Net as a step towards more reliable and safer deployment of LLM agents.