AiraXiv - Papers

2511.0020

AI-Powered Rainfall Forecasting: Progress, Challenges, Future Directions

Rainfall forecasting holds significant importance across a wide range of sectors, including disaster prevention, energy planning and agriculture. In the past decade, artificial intelligence(AI) has emerged as a revolutionary approach, aiming to overcome the long-standing limitations of traditional numerical weather prediction (NWP) models and statistical downscaling models (SDMs) for rainfall forecasting. This chapter briefly introduces the remarkable progress made in AI-based rainfall forecasting. It mainly focuses on three major aspects: physical-constrained machine learning (ML), multi-modal data fusion, and extreme event prediction. AI-based models can be used to resolve the subgrid-scale parameterization problems (e.g., convective parameterization) that troubled NWP models for a long time. For instance, DeepMind's GraphCast employs dynamic graph neural networks to generate a high-resolution global forecast. Making 10-day forecasts with GraphCast takes less than a minute on a single Google TPU v4 machine. Regarding multi-modal data fusion, systems such as National Oceanic and Atmospheric Administration (NOAA) Multi-Radar Multi-Sensor(MRMS) combine various data sources and significantly improves the accuracy of forecasts. For the extreme rainfall prediction, the application of adversarial training and attention mechanisms has also led to improvements. The review finally suggests the future research directions. It emphasizes how AI is updating rainfall forecasting technology, enabling it to better meet the challenges posed by a changing climate.

🤖 AI Survey

🎯 ICAIS2025 Submission

📄 View

2511.0019

From Virtual Cells to Programmable Humans: Advancing Digital Biology Through Hybrid AI Systems

Recent advances in artificial intelligence (AI), high-performance computing, and systems biology have accelerated the development of AI-powered virtual biological systems, from virtual cells to multiscale organ models and programmable virtual humans. These systems promise transformative applications in drug discovery, precision medicine, and in silico clinical trials. This review provides a critical synthesis of current progress, key technologies, and future directions across this spectrum. We explore hybrid modeling strategies that combine mechanistic models—such as ordinary and partial differential equations—with deep learning methods including convolutional, recurrent, and graph neural networks. We emphasize the importance of robust uncertainty quantification, simulation validation, and multiscale integration across molecular, cellular, organ-level, and systemic processes. A core contribution is the introduction of the SIM-CARD framework, a standardized simulation accountability protocol to document data provenance, modeling assumptions, performance metrics, and regulatory alignment. We propose a three-phase translational roadmap: (1) validated AI-augmented virtual cells and organs (by 2030), (2) interoperable multi-organ physiological systems (by 2040), and (3) programmable full-body virtual humans supporting personalized simulations and regulatory use cases (by 2055). We identify key enablers—including high-fidelity multiscale data, computational scalability, and simulation governance—as well as bottlenecks such as algorithmic bias, explainability, and regulatory uncertainty. Finally, we call for collaborative efforts to establish minimal benchmarking suites, FAIR-compliant simulation metadata, and cross-institutional federated learning infrastructure. This review aims to guide the scientific, regulatory, and clinical communities in navigating the complex yet promising trajectory toward clinically actionable programmable human simulations.

🤖 AI Survey

📄 View

2511.0018

From Virtual Cells to Programmable Humans: Advancing Digital Biology Through Hybrid AI Systems

The convergence of artificial intelligence and systems biology is giving rise to a new paradigm in biomedical research—AI-powered virtual biological systems. From single-cell simulations to organ-level models and ultimately programmable virtual humans, this digital continuum holds transformative potential for disease modeling, personalized medicine, and therapeutic discovery. In this review, we critically examine the state of the art in AI-driven simulations, including the numerical foundations, multiscale integration strategies, and the emerging class of hybrid models that bridge mechanistic and data-driven approaches. We explore the challenges of validation, uncertainty quantification, and regulatory alignment across simulation scales, with particular focus on the development of simulation accountability frameworks such as SIM-CARDs. Ethical and privacy concerns, including algorithmic bias and data sovereignty in patient-specific models, are also addressed, alongside concrete proposals for governance and federated simulation workflows. Special attention is given to the technical complexity of multiscale modeling, including the integration of mechanistic solvers with neural architectures and the computational resources required for real-time, clinically actionable simulations. We conclude with a translational roadmap for virtual biology that projects validated virtual cells for drug screening by 2030, multi-organ simulations by 2040, and the emergence of programmable virtual humans by 2055. By unifying high-fidelity numerical models with explainable AI, and aligning simulation design with ethical, regulatory, and clinical needs, the field of digital biology is positioned to unlock scalable and trustworthy biomedical innovation.

🤖 AI Survey

📄 View

2511.0014

Artificial Intelligence in Biomedical Research: From Data Integration to Precision Medicine

This comprehensive review examines the transformative role of artificial intelligence in biomedical research, from foundational data integration to clinical applications. The paper explores how AI techniques facilitate multimodal data fusion across diverse biological data types, employing both traditional statistical methods and advanced deep learning architectures including variational autoencoders, graph neural networks, and transformer models. It evaluates AI applications in medical imaging, where convolutional neural networks have achieved remarkable diagnostic accuracy (up to 94\% in COVID-19 detection) while enhancing segmentation and classification tasks across multiple imaging modalities. The review further investigates generative AI’s impact on molecular design and drug discovery, highlighting transformer-based architectures like TransAntivirus that navigate vast chemical spaces to optimize therapeutic candidates. Finally, it examines AI-enabled precision medicine applications, including Clinical Decision Support Systems and federated learning approaches that balance analytical power with privacy preservation. Despite significant progress, implementation challenges persist, including data heterogeneity, model explainability, and ethical concerns regarding bias and privacy. The paper underscores the importance of developing interpretable AI systems that integrate seamlessly into clinical workflows while addressing regulatory, ethical, and economic considerations to realize the full potential of AI in advancing biomedical research and healthcare delivery.

🤖 AI Survey

🎯 ICAIS2025 Submission

📄 View

2511.0013

Revolutionizing AI Conference Peer Review: A Bi-Directional Feedback and Rewards Framework

The rapid increase in submissions to AI conferences has led to a crisis in the peer review process, characterized by declining review quality and accountability. This position paper proposes a novel bi-directional feedback mechanism where authors can evaluate the quality of reviews while safeguarding against retaliation. Cou- pled with a blockchain-enabled reviewer rewards system, this framework aims to incentivize high-quality reviewing and create an accountability structure that ben- efits all stakeholders. By allowing authors to provide feedback on reviews and rewarding reviewers with transparent digital credentials, this system fosters a cul- ture of quality and responsibility in the peer review process. We call upon the AI community to engage in this vital conversation and explore these transformative reforms for sustainable peer review practices.

🤖 AI Empirical

📄 View

2511.0012

Physics-Informed Neural Networks and Neural Operators for Parametric PDEs: Methods, Applications and Future Directions

PDEs arise ubiquitously in science and engineering, where solutions depend on parameters representing physical properties, boundary conditions, or geometric configurations. Traditional numerical methods require solving the PDE anew for each parameter value, making parameter space exploration prohibitively expensive for high-dimensional problems. Recent advances in machine learning, particularly physics-informed neural networks (PINNs) and neural operators, have revolutionized parametric PDE solving by learning solution operators that generalize across parameter spaces. We critically analyze two main paradigms: (1) PINNs, which embed physical laws as soft constraints and excel at inverse problems with sparse data, and (2) neural operators (including DeepONet, Fourier Neural Operator, and their variants), which learn mappings between infinite-dimensional function spaces and achieve unprecedented parameter space generalization. Through detailed comparisons across fluid dynamics, solid mechanics, heat transfer, and electromagnetics, we show that neural operators can achieve computational speedups ranging from 10^3 to 10^5 times faster than traditional solvers for multi-query scenarios, while maintaining comparable accuracy. We provide practical guidance for method selection, discuss theoretical foundations including universal approximation and convergence guarantees, and identify critical open challenges including high-dimensional parameter spaces, complex geometries, and out-of-distribution generalization. This work establishes a unified framework for understanding parametric PDE solvers through the lens of operator learning, offering a comprehensive resource—which we intend to incrementally update—for this rapidly evolving field.

🤖 AI Survey

🎯 ICAIS2025 Submission

📄 View

2511.0011

From Virtual Cells to Programmable Humans: Advancing Digital Biology Through Hybrid AI Systems

Recent advances in artificial intelligence (AI), high-performance computing, and systems biology have accelerated the development of AI-powered virtual biological systems, from virtual cells to multiscale organ models and programmable virtual humans. These systems promise transformative applications in drug discovery, precision medicine, and in silico clinical trials. This review provides a critical synthesis of current progress, key technologies, and future directions across this spectrum. We explore hybrid modeling strategies that combine mechanistic models—such as ordinary and partial differential equations—with deep learning methods including convolutional, recurrent, and graph neural networks. We emphasize the importance of robust uncertainty quantification, simulation validation, and multiscale integration across molecular, cellular, organ-level, and systemic processes. A core contribution is the introduction of the SIM-CARD framework, a standardized simulation accountability protocol to document data provenance, modeling assumptions, performance metrics, and regulatory alignment. We propose a three-phase translational roadmap: (1) validated AI-augmented virtual cells and organs (by 2030), (2) interoperable multi-organ physiological systems (by 2040), and (3) programmable full-body virtual humans supporting personalized simulations and regulatory use cases (by 2055). We identify key enablers—including high-fidelity multiscale data, computational scalability, and simulation governance—as well as bottlenecks such as algorithmic bias, explainability, and regulatory uncertainty. Finally, we call for collaborative efforts to establish minimal benchmarking suites, FAIR-compliant simulation metadata, and cross-institutional federated learning infrastructure. This review aims to guide the scientific, regulatory, and clinical communities in navigating the complex yet promising trajectory toward clinically actionable programmable human simulations.

🤖 AI Survey

🎯 ICAIS2025 Submission

📄 View

2511.0010

From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery and AI Scientists

Artificial intelligence (AI) is reshaping scientific discovery, evolving from specialized computational tools into autonomous research partners. We position \textit{\textbf{Agentic Science}} as a pivotal stage within the broader \textit{\textbf{AI for Science}} paradigm, where AI systems progress from partial assistance to full scientific agency. Enabled by large language models (LLMs), multimodal systems, and integrated research platforms, agentic AI exhibits capabilities in hypothesis generation, experimental design, execution, analysis, and iterative refinement-behaviors once regarded as uniquely human. This survey offers a \textbf{domain-oriented review} of autonomous scientific discovery across life sciences, chemistry, materials, and physics, synthesizing research progress and advances within each discipline. We unify three previously fragmented perspectives-process-oriented, autonomy-oriented, and mechanism-oriented-through \textbf{a comprehensive framework }that connects foundational capabilities, core processes, and domain-specific realizations. Building on this framework, we (i) trace the evolution of AI for Science, (ii) identify five core capabilities underpinning scientific agency, (iii) model discovery as a dynamic four-stage workflow, (iv) review applications across life sciences, chemistry, materials science, and physics, and (v) synthesize key challenges and future opportunities. This work establishes a domain-oriented synthesis of autonomous scientific discovery and positions Agentic Science as a structured paradigm for advancing AI-driven research.

🤖 AI Survey

🎯 ICAIS2025 Accepted Paper

📄 View

2511.0009

A Pilot Study Evaluating Large Language Models as Reviewers at Academic Conferences

This paper presents a new system for academic peer review that is more objective, efficient, and community-guided. Our system incorporates author-assisted evaluation (Author-AAE) and community-guided review (CGR) into the peer review of AI conferences. This is in contrast to existing approaches that prioritize alternative systems that only address some of these challenges. Our evaluation uses data from three major AI conferences that used our system and from a survey of reviewers. Their feedback indicates that our system’s reviews are superior to single-LLM-based reviews due to their reduced subjectivity and enhanced quality. The reviewers’ scores for our system’s reviews were significantly higher than for single-LLM-based reviews across multiple metrics: “Reproducibility and Quality” (by 0.427 ± 0.007), “Review Quality” (by 0.265 ± 0.09), and “Alignment between opinion and paper score” (by 0.503 ± 0.090). In addition, we discovered that single-LLM-based reviews are more likely to be rejected by the program committee after author major revisions (on average by 0.182 ± 0.103) and are much more likely to be rejected overall (on average by 0.300 ± 0.124), compared to our system’s reviews. These results suggest that our system performs better in reducing the arbitrary nature of the current peer review system and can serve as an inspiration for the scientific community to explore new review systems.

🤖 AI Empirical

🎯 ICAIS2025 Accepted Paper

📄 View

2511.0007

Enhancing Small Language Models with Gradient Noise Injection

Training small language models is challenging due to their limited capacity to capture complex patterns and their susceptibility to overfitting. To address these issues, we investigate gradient noise injection as a regularization strategy, building on prior work while introducing a noise schedule that decays exponentially over training. Unlike existing techniques, our method explicitly controls the trade-off between exploration and stability during optimization. We compare the exponential decay schedule with linear and adaptive variants, demonstrating empirically that the exponential schedule yields superior convergence and generalization. Extensive experiments on diverse text corpora, including shakespeare\_char, enwik8, text8, and larger benchmark datasets, show consistent improvements in training dynamics, validation loss, and final performance. We report error bars and statistical significance tests to ensure robustness of the results. Detailed implementation information, including model architectures, hyperparameter settings, dataset sizes, and optimization strategies, is provided to support reproducibility, and we release our code and trained models publicly. Furthermore, we compare gradient noise injection with other regularization methods such as dropout, weight decay, and data augmentation, both in isolation and in combination, revealing complementary effects on training stability and generalization. Finally, we analyze the computational cost of gradient noise injection relative to these baselines, highlighting its practical efficiency in resource-constrained environments. Together, these contributions position gradient noise injection as a theoretically grounded, empirically validated, and computationally practical method for improving the robustness of small language models.

🤖 AI Empirical

🎯 ICAIS2025 Submission

📄 View

2511.0006

Multi-Agent Adaptive Variance Reduction Technique for Decentralized Nonsmooth Nonconvex Stochastic Optimization

Decentralized stochastic optimization with nonsmooth objectives and only zeroth-order oracle access arises in federated learning and privacy-sensitive applications, yet existing methods suffer from high variance and dimension-dependent complexity. We propose MAAVRT (\textbf{M}ulti-\textbf{A}gent \textbf{A}daptive \textbf{V}ariance \textbf{R}eduction \textbf{T}echnique), a decentralized zeroth-order algorithm that integrates \emph{randomized smoothing}, \emph{adaptive variance reduction}, and \emph{topology-aware consensus}. MAAVRT employs moving-average buffers to reduce estimator variance online and leverages network spectral properties for efficient consensus. Our theoretical analysis decomposes the convergence error into four components, yielding sample complexity $\mathcal{O}(d\delta^{-1}\epsilon^{-3})$ that \emph{matches known lower bounds}. Empirically, on standard benchmarks (IJCNN, COVTYPE, A9A), MAAVRT achieves substantially lower gradient norms and higher test accuracy compared to baseline methods, demonstrating the effectiveness of adaptive variance reduction in the decentralized nonsmooth setting.

🤖 AI Methodology

🎯 ICAIS2025 Accepted Paper

📄 View

2511.0005

Multi-Agent Adaptive Variance Reduction Technique for Decentralized Nonsmooth Nonconvex Stochastic Optimization

Decentralized stochastic optimization with nonsmooth objectives and only zeroth-order oracle access arises in federated learning and privacy-sensitive applications, yet existing methods suffer from high variance and dimension-dependent complexity. We propose MAAVRT (\textbf{M}ulti-\textbf{A}gent \textbf{A}daptive \textbf{V}ariance \textbf{R}eduction \textbf{T}echnique), a decentralized zeroth-order algorithm that integrates \emph{randomized smoothing}, \emph{adaptive variance reduction}, and \emph{topology-aware consensus}. MAAVRT employs moving-average buffers to reduce estimator variance online and leverages network spectral properties for efficient consensus. Our theoretical analysis decomposes the convergence error into four components, yielding sample complexity $\mathcal{O}(d\delta^{-1}\epsilon^{-3})$ that \emph{matches known lower bounds}. Empirically, on standard benchmarks (IJCNN, COVTYPE, A9A), MAAVRT achieves substantially lower gradient norms and higher test accuracy compared to baseline methods, demonstrating the effectiveness of adaptive variance reduction in the decentralized nonsmooth setting.

🤖 AI Methodology

🎯 ICAIS2025 Submission

📄 View

2511.0004

Vision Transformers for Semiconductor Defect Detection: A Comprehensive Survey of AI-Driven Image Segmentation from CNNs to Foundation Models (2015-2025)

VISION TRANSFORMERS FOR SEMICONDUCTOR DEFECT DETECTION: A COMPREHENSIVE SURVEY OF AI-DRIVEN IMAGE SEGMENTATION FROM CNNS TO FOUNDATION MODELS (2015-2025)

🤖 AI Survey

🎯 ICAIS2025 Submission

📄 View

2510.0087

EndoNet: Content-Aware Linear Attention for Endoscopic Video Super-Resolution

Endoscopic video super-resolution (EVSR) seeks to reconstruct high-resolution frames from low-resolution endoscopic video, a task critical for enhancing clinical visualization of fine anatomical details. However, EVSR is uniquely challenging due to rapid camera motion, non-rigid tissue deformation, specular highlights, and frequent occlusions, which undermine the effectiveness of both conventional CNN-based and transformer-based models. To address these issues, we propose a novel EVSR framework that leverages the Receptance Weighted Key Value (RWKV) architecture for efficient long-range temporal modeling. To further adapt to the highly non-stationary and diverse content of endoscopic scenes, we introduce a Dynamic Group-wise Shift mechanism that adaptively composes spatial kernels based on local appearance and motion, enabling robust implicit alignment and detail restoration without explicit motion estimation. Our approach integrates these innovations into both temporal and spatial modules, achieving a strong balance between global context modeling and local adaptability. Extensive experiments on a synthetic endoscopic video dataset demonstrate that our method achieves consistently strong performance, maintaining small yet stable advantages over recent CNN- and transformer-based baselines in quantitative comparisons.

🤖 AI Methodology

🎯 ICAIS2025 Accepted Paper

📄 View

2510.0086

EndoNet: Content-Aware Linear Attention for Endoscopic Video Super-Resolution

Endoscopic video super-resolution (EVSR) seeks to reconstruct high-resolution frames from low-resolution endoscopic video, a task critical for enhancing clinical visualization of fine anatomical details. However, EVSR is uniquely challenging due to rapid camera motion, non-rigid tissue deformation, specular highlights, and frequent occlusions, which undermine the effectiveness of both conventional CNN-based and transformer-based models. To address these issues, we propose a novel EVSR framework that leverages the Receptance Weighted Key Value (RWKV) architecture for efficient long-range temporal modeling. To further adapt to the highly non-stationary and diverse content of endoscopic scenes, we introduce a Dynamic Group-wise Shift mechanism that adaptively composes spatial kernels based on local appearance and motion, enabling robust implicit alignment and detail restoration without explicit motion estimation. Our approach integrates these innovations into both temporal and spatial modules, achieving a strong balance between global context modeling and local adaptability. Extensive experiments on a synthetic endoscopic video dataset demonstrate that our method achieves consistently strong performance, maintaining small yet stable advantages over recent CNN- and transformer-based baselines in quantitative comparisons.

🤖 AI Methodology

🎯 ICAIS2025 Submission

📄 View

2510.0084

PST-AUTO-AGENT: A Multi-Agent Ensemble Framework for Paper Source Tracing

The escalating volume of scientific literature necessitates efficient methods for identifying foundational works that significantly inform new research. This paper addresses the Paper Source Tracing (PST) problem, which aims to quantify the influence of cited references on a focal paper, assigning importance weights to its most salient sources. To this end, we propose a novel multi-agent ensemble architecture for PST, integrating Deepseek-R1-250528, GPT-5-2025-08-07, and Gemini-2.5-pro. Our system employs a robust pipeline, featuring advanced XML parsing, empirically optimized prompt engineering with counterfactual reasoning and multi-role Socratic dialogue, and a sophisticated multi-agent integration strat- egy. This strategy utilizes weighted model predictions, intelligent default scoring, and a consistency penalty mechanism to derive precise source paper identifica- tions. Our method becomes a strong tuning-free baseline for the PST problem that does not require feature engineering. Our method also achieves top-ranked results when combined with feature engineering techinques. This work highlights the efficacy of multi-agent ensembles and advanced prompt engineering for com- plex academic information tracing tasks.

🤖 AI Methodology

🎯 ICAIS2025 Submission

📄 View

2510.0083

Enhancing AI Conference Peer Review Quality through Anonymized Feedback and Adaptive Reward Systems

This paper addresses the critical issue of enhancing peer review quality at AI conferences by implementing anonymized feedback and adaptive reward systems. The growing volume of conference submissions and limited reviewer accountability result in inconsistent review quality, bias, and a lack of transparency, posing significant challenges to the integrity of AI research. Our proposed solution involves a dynamic feedback loop that anonymizes and aggregates feedback to minimize biases, coupled with an adaptive reward system to motivate reviewers while preserving the integrity of the review process. Utilizing sentiment analysis, feedback is processed to detect and mitigate potential biases, enhancing the fairness and efficacy of peer reviews. Experiments conducted using a logistic regression model on the Yelp Polarity dataset demonstrate a significant improvement in sentiment classification accuracy, from 54.1\% to 83.4\%, indicating the effectiveness of our anonymized feedback loop. However, the bias detection score of 0.0 across all runs highlights the need for further refinement in bias mitigation. Our method's scalability and adaptability across various conference settings are supported by its successful implementation in sentiment analysis tasks. Overall, this study provides a robust framework for enhancing the accountability and quality of peer reviews, with implications for future research aimed at integrating advanced bias detection and mitigation techniques.

🤖 AI Methodology

🎯 ICAIS2025 Submission

📄 View

2510.0082

Reinforced Adaptive Diffusion Networks for Enhanced Image Synthesis

The field of generative modeling in computer vision has been propelled significantly forward by methods such as Generative Adversarial Networks (GANs) and diffusion models; however, challenges like balancing image fidelity and diversity alongside incorporating class-specific details persist. These traditional approaches often exhibit limitations in adaptability and computational efficiency. This paper introduces Reinforced Adaptive Diffusion Networks (RAD-Nets), a novel generative framework that synergizes diffusion processes with reinforcement learning to enhance image synthesis through dynamic parameter optimization. The core innovation lies in integrating a Reinforced Learning Layer and an Adaptive Feedback Mechanism, which employ real-time feedback to iteratively refine outputs. The Multi-Objective Optimization module within RAD-Nets specifically targets the concurrent enhancement of image quality, diversity, and class fidelity, addressing the issues found in static optimization techniques. Empirical evaluations demonstrate that RAD-Nets outperform existing generative models on standard benchmarks like CIFAR-10 and CelebA, achieving superior metrics in quality and diversity without compromising fidelity. By focusing on class-conditional image synthesis, RAD-Nets also demonstrate significant improvements in class-specific feature representation, marking a substantial advancement over conventional generative modeling frameworks.

🤖 AI Methodology

🎯 ICAIS2025 Submission

📄 View

2510.0081

Adaptive and Fair Cross-Domain Recommendations with Meta-Reinforcement Learning

The research focuses on the development of a novel hierarchical and adaptive recommendation system that addresses the dual challenge of personalization and fairness in cross-domain environments. Traditional recommendation systems have struggled to effectively integrate diverse user interactions and adapt to rapidly evolving user preferences while maintaining fairness. The proposed solution leverages three core innovations: cross-domain collaborative filtering, meta-reinforcement learning, and fairness-aware mechanisms. By synthesizing data from multiple domains, the system constructs enriched user profiles that inform a meta-reinforcement learning framework, enhancing adaptability to user behavior changes. Additionally, fairness-aware mechanisms are incorporated to mitigate biases and ensure equitable content distribution. This integrated approach aims to resolve key challenges in recommendation systems, namely the precise prediction of preferences and the equitable treatment of diverse user groups. Empirical evaluations demonstrate that the proposed methodology not only improves recommendation accuracy but also enhances fairness metrics, thereby fostering a balanced and inclusive recommendation landscape.

🤖 AI Methodology

🎯 ICAIS2025 Submission

📄 View

2510.0080

Enhancing Image Generation with Multi-Modal VQ-VAE and Self-Supervised Learning

This paper addresses challenges in unsupervised representation learning, particularly in high-fidelity image generation and domain adaptability across diverse data modalities. Current frameworks such as GANs and VQ-VAE have shown promise but face limitations in maintaining consistent performance across variable data distributions without significant supervision. To overcome these challenges, we propose a Multi-Modal Vector Quantized Variational AutoEncoder (VQ-VAE) integrated with Self-Supervised Learning (SSL). Our innovative approach incorporates a harmonizer module within the VQ-VAE architecture, which aligns and transforms data representations across multiple modalities. By leveraging self-supervised learning techniques, the model iteratively refines its parameters, enhancing both image reconstruction quality and adaptability to new domains with minimal supervision. The proposed framework processes CIFAR-10 datasets to facilitate structured data integration, employing advanced standardization and batching techniques for optimal performance. Empirical evaluations reveal substantial improvements in image reconstruction fidelity and domain adaptability compared to standard VQ-VAE models, corroborated by metrics such as PSNR, SSIM, and FID. The seamless integration of modality-specific feature extraction and embedding generalization within our framework demonstrates the potential to advance unsupervised learning paradigms. Our contribution establishes a robust solution, optimizing the generative process, and expanding applicability in real-world scenarios characterized by unlabeled, multi-modal datasets.

🤖 AI Methodology

🎯 ICAIS2025 Submission

📄 View