📋 AI Review from DeepReviewer will be automatically processed
📋 AI Review from ZGCA will be automatically processed
The paper proposes HEAL, a learning-free, inference-only source-free unsupervised domain adaptation (SFUDA) framework for cross-modality medical image segmentation. A source-trained segmentation model and a source-trained diffusion model are used without any parameter updates on the target domain. The pipeline comprises: (1) Hierarchical Denoising (HD), which refines pseudo-labels from the source model on the target images using a two-stage uncertainty filter (voxel-wise entropy then Normal-Inverse Gaussian variance), (2) Edge-Guided Selection (EGS), which generates multiple source-like samples via a diffusion model conditioned on refined pseudo-labels and selects the one with maximal edge consistency (Canny alignment) to the condition, and (3) Size-Aware Fusion (SAF), which fuses small structures from the HD-refined pseudo-labels with large structures from the segmentation of the selected source-like sample. Experiments on BraTS 2021 (T1→T1ce, T2→FLAIR) and two polyp datasets (Kvasir-SEG↔CVC-ClinicDB) report strong improvements over several SFUDA baselines. Ablations analyze HD, EGS, and SAF; qualitative visualizations and t-SNE plots aim to explain component contributions.
Cross-Modal Consistency: [36]/50
Textual Logical Soundness: [22]/30
Visual Aesthetics & Clarity: [15]/20
Overall Score: [73]/100
Detailed Evaluation (≤500 words):
1. Cross-Modal Consistency
• Minor 1: Visual ground truth
– Figure 1/(a–e): (a) Diffusion pre‑training pipeline; (b) nnUNet pre‑training; (c) HD pipeline with entropy→NIG→Var(NIG) mask; (d) EGS with Canny edges and S-score; (e) SAF fusing MS(IB) with YT*. Overall: end‑to‑end workflow from source pre‑training to target inference.
– Figure 2/(a–d): Bar charts (Dice) and line plots (ASD) for T1→T1ce and T2→FLAIR; configs Baseline/HD/HD+EGS/HEAL.
– Figure 3: Qualitative HD ablation per case: target image, No‑Adapt, entropy map, NIG uncertainty, HD‑refined.
– Figure 4/(a–b): t‑SNE showing Source, Source‑like, Target clusters.
• Major 1: Direction inconsistency. Sec. 3.3 mentions “FLAIR→T2” while the paper uses “T2→FLAIR.” Evidence: “Figure 2 (c)… in the FLAIR → T2 direction.”
• Major 2: Reported ASD values ambiguous vs figures/tables (e.g., “further diminishes ASD to 2.8 mm and 2.6 mm, respectively”). It’s unclear which classes/directions these refer to, and Table 1 lists mean ASD 2.0 and 2.6. Evidence: Sec 3.3 sentence with “2.8 mm and 2.6 mm.”
• Minor 2: Equation (1) text says P(v|c) but formula uses P(c|v).
• Minor 3: Figure 3 column labels (e.g., “T1”, “T1ce”) are not explained in caption; may confuse with direction.
2. Text Logic
• Major 1: “NIG” is referred to as Normal‑Inverse Gaussian, but equations and usage match the Normal‑Inverse‑Gamma prior. This affects the HD derivation and Var(NIG). Evidence: Sec 2.1.2 phrase “Normal-Inverse Gaussian (NIG)” with Eqs. (3–5).
• Minor 1: Missing/awkward punctuation in multiple numeric comparisons (e.g., “16.4% 25.7%”).
• Minor 2: “learning‑free” claim is consistent, but the diffusion model is used at target time; clarify no target‑time fine‑tuning anywhere.
3. Figure Quality
• Major 1: Several critical labels are tiny (Figure 1 icons/text “Reverse Diffusion Process,” “Model Frozen”; Figure 3 colorbars/labels). Risk of illegibility at print size. Evidence: Fig. 1 panels (c–e) dense pipelines with small annotations.
• Minor 1: Figure 2 bars/lines lack numeric labels; hard to verify stated deltas quickly.
• Minor 2: Figure 3 needs clearer column headers and a legend explaining heatmaps/uncertainty units.
Key strengths:
Key weaknesses:
Actionable suggestions:
📋 AI Review from SafeReviewer will be automatically processed
This paper introduces HEAL, a novel source-free unsupervised domain adaptation (SFUDA) framework designed for cross-modality medical image segmentation. The core contribution of HEAL lies in its ability to adapt a pre-trained segmentation model to a target domain without requiring any target-specific training or parameter updates. This is achieved through a combination of hierarchical denoising, edge-guided selection, and size-aware fusion techniques. The method leverages a diffusion model to generate source-like samples from the target domain, which are then used to refine the initial pseudo-labels obtained from the pre-trained model. The hierarchical denoising process employs both entropy and Normal-Inverse Gaussian (NIG) uncertainty measures to refine these pseudo-labels. Edge-guided selection is used to choose the most reliable generated sample based on structural consistency. Finally, size-aware fusion dynamically combines the refined pseudo-labels and the selected generated sample to produce the final segmentation. The authors evaluate HEAL on three public datasets, including two brain tumor segmentation datasets and one polyp dataset, demonstrating its effectiveness in cross-modality medical image segmentation. The experimental results show that HEAL achieves competitive performance compared to existing SFUDA methods. The key idea is that the method operates solely through inference, without any learning or fine-tuning on the target domain, which enhances computational efficiency and preserves the integrity of the pre-trained source model. This approach aims to address the challenges of domain shift and the lack of labeled data in medical imaging, offering a practical solution for adapting models to new, unseen target domains. The paper's emphasis on a 'learning-free' adaptation process, where the pre-trained model's parameters remain fixed, is a central theme. However, this aspect also raises questions about the true novelty and the scope of the method's applicability, particularly in scenarios with significant domain shifts.
I find several aspects of this paper to be commendable. The core idea of a source-free unsupervised domain adaptation method that operates without any target-specific training is both practically relevant and technically interesting. The 'learning-free' characteristic, where the pre-trained model's parameters remain fixed, is a significant advantage in terms of computational efficiency and ease of deployment. This approach avoids the need for fine-tuning or self-training on the target domain, which can be computationally expensive and potentially introduce privacy concerns. The proposed method, HEAL, integrates several innovative components, including hierarchical denoising, edge-guided selection, and size-aware fusion. The hierarchical denoising process, which combines entropy and NIG uncertainty measures, is a novel approach to refining pseudo-labels. The edge-guided selection mechanism, which uses structural consistency to choose the most reliable generated sample, is also a valuable contribution. The size-aware fusion technique, which dynamically combines the refined pseudo-labels and the selected generated sample, further enhances the segmentation performance. The experimental results presented in the paper are compelling. The authors demonstrate the effectiveness of HEAL on three public datasets, including two brain tumor segmentation datasets and one polyp dataset. The results show that HEAL achieves competitive performance compared to existing SFUDA methods, indicating the practical utility of the proposed approach. The paper is generally well-organized and easy to follow, making it accessible to a broad audience. The authors provide a clear description of the proposed method and the experimental setup. The inclusion of ablation studies further strengthens the paper by demonstrating the contribution of each component of HEAL. Overall, the paper presents a novel and effective approach to source-free unsupervised domain adaptation for medical image segmentation, with a focus on computational efficiency and ease of deployment. The combination of hierarchical denoising, edge-guided selection, and size-aware fusion represents a significant technical contribution, and the experimental results demonstrate the practical utility of the proposed method.
Despite the strengths, I have identified several weaknesses that warrant careful consideration. First, the paper's claim of being 'learning-free' is somewhat misleading. While the method does not involve updating the *segmentation model's* parameters on the target domain, it relies on a pre-trained *diffusion model* to generate source-like samples. This diffusion model, while not explicitly trained on the target data, is a crucial component of the adaptation process. The paper lacks details on how this diffusion model is trained, specifically whether it is trained solely on source data or if target data is used in any way. This ambiguity undermines the 'learning-free' claim and raises concerns about potential information leakage from the target domain. The paper states, "In HEAL,the model is exclusively pre-trained on the source domain, and no further training, fine-tuning,or parameter updates are performed during domain adaptation to the target domain." (Introduction). However, the method description states, "First, during the pre-training stage,we train a segmentation model Ms and a diffusion model using source domain data {Xs,Ys}, where Xs are the source domain data and Ys are the label." (Method). This discrepancy highlights the need for clarification on the diffusion model's training process. My analysis confirms that the paper does not provide sufficient detail on the training of the diffusion model, specifically whether it is trained solely on source data or if target data is used in any way. This lack of clarity undermines the claim of being 'learning-free' and raises concerns about potential information leakage from the target domain. This is a significant limitation, and I have high confidence in this assessment. Second, the paper's experimental evaluation is limited in scope. While the authors evaluate HEAL on three datasets, two of them (Kvasir-SEG and CVC-ClinicDB) are endoscopic datasets. Given that the paper focuses on medical image segmentation, it is crucial to include more diverse medical imaging modalities in the evaluation. The absence of a commonly used dataset like the CT-based LUNA16 challenge is a notable omission. The paper states, "We validate our method on the BraTS 2021 dataset Menze et al. (2O14), Kvasir-SEG Ja et al. (2020),and CVC-ClinicDB Bernal et al. (2015)." (Experiments and Results). My analysis confirms that the experimental evaluation is limited to MRI and endoscopic datasets, lacking evaluation on other common medical imaging modalities like CT. This limits the generalizability of the findings. This is a significant limitation, and I have high confidence in this assessment. Third, the paper lacks sufficient detail on the computational cost of the proposed method. While the authors claim that HEAL is computationally efficient due to its 'learning-free' nature, they do not provide any quantitative analysis of the computational cost, such as inference time or memory usage. The paper states, "By eliminating the need for such training,HEAL not only enhances computational efciency and simplifies deployment..." (Introduction). However, the paper does not provide any specific metrics to support this claim. My analysis confirms that the paper lacks specific metrics on computational cost, making it difficult to assess the practical efficiency of the method. This is a significant limitation, and I have high confidence in this assessment. Fourth, the paper's ablation study is not comprehensive enough. While the authors demonstrate the contribution of each component of HEAL, they do not explore the impact of varying the hyperparameters of the diffusion model. The paper states, "We used Med-DDPM Dorjsembe et al. (2024) with a noise schedule of 250 time steps t..." (Implementation Details). However, the paper does not include experiments varying the number of time steps or other diffusion model parameters. My analysis confirms that the paper lacks experiments varying the diffusion model's parameters, which could impact the method's performance. This is a significant limitation, and I have high confidence in this assessment. Finally, the paper does not adequately address the limitations of the proposed method. The authors acknowledge that the effectiveness of HEAL is linked to the generalization capability of the pre-trained segmentation model and that low-quality initial pseudo-labels can propagate errors. However, they do not discuss potential failure cases or scenarios where the method might not perform well. The paper states, "The effectiveness of HEAL is inherently linked to the generalization capability of the pre-trained segmentation model." (Limitations). However, the paper does not provide a detailed analysis of potential failure cases. My analysis confirms that the paper lacks a detailed discussion of potential failure cases and limitations beyond the dependency on the pre-trained model's generalization. This is a significant limitation, and I have high confidence in this assessment.
Based on the identified weaknesses, I propose several concrete suggestions for improving the paper. First, the authors should clarify the training process of the diffusion model. They should explicitly state whether the diffusion model is trained solely on source data or if target data is used in any way. If target data is used, they should explain how this is done without violating the source-free constraint. This clarification is crucial for validating the 'learning-free' claim and addressing concerns about potential information leakage. Second, the authors should expand the experimental evaluation to include more diverse medical imaging modalities. They should include a commonly used dataset like the CT-based LUNA16 challenge to demonstrate the generalizability of the proposed method. This would strengthen the paper's claims and make it more relevant to the broader medical imaging community. Third, the authors should provide a detailed analysis of the computational cost of the proposed method. They should report metrics such as inference time and memory usage for different datasets and segmentation tasks. This would allow readers to assess the practical efficiency of the method and compare it with other SFUDA approaches. Fourth, the authors should conduct a more comprehensive ablation study that includes varying the hyperparameters of the diffusion model. They should explore the impact of different numbers of time steps and other relevant parameters on the performance of HEAL. This would provide a better understanding of the method's sensitivity to these parameters and help identify the optimal settings. Fifth, the authors should provide a more detailed discussion of the limitations of the proposed method. They should discuss potential failure cases and scenarios where the method might not perform well. This would provide a more balanced view of the method's capabilities and limitations. Finally, the authors should consider adding a section on ethical considerations, particularly regarding the use of medical data and the potential for bias in the model's predictions. This would demonstrate a commitment to responsible research practices. By addressing these suggestions, the authors can significantly strengthen the paper and make it more impactful.
I have several questions that I believe would help clarify some of the key aspects of the paper. First, could the authors provide more details on the training process of the diffusion model? Specifically, is the diffusion model trained solely on source data, or is there any implicit use of target data? If target data is used, how is this done without violating the source-free constraint? Second, what is the computational cost of the proposed method in terms of inference time and memory usage? How does this compare to other SFUDA methods? Third, how sensitive is the performance of HEAL to the hyperparameters of the diffusion model, such as the number of time steps? What is the optimal setting for these parameters? Fourth, what are the potential failure cases for HEAL? Are there specific types of images or segmentation tasks where the method might not perform well? Fifth, how does the performance of HEAL compare to other state-of-the-art SFUDA methods on a wider range of medical imaging datasets, including CT scans? Finally, what are the ethical considerations associated with the use of the proposed method, particularly regarding the use of medical data and the potential for bias in the model's predictions? Addressing these questions would provide a more complete understanding of the proposed method and its limitations.