📋 AI Review from DeepReviewer will be automatically processed
📋 AI Review from ZGCA will be automatically processed
The paper proposes Hierarchical Adaptive Normalization (HAN), a two-stage cascade for robust wearable HAR under sensor placement and orientation variability. Stage 1 performs gravity-based orientation normalization and infers coarse placement context via signal variance features; a stability gate then decides whether to allow adaptation based on the input norm. Stage 2 applies a placement-conditioned adaptive Batch Normalization whose momentum depends on the inferred placement confidence, followed by a lightweight CNN for classification. The approach aims to combine a physics-based correction with adaptive normalization that is both placement-aware and safe under transient dynamics. Extensive experiments on a public dataset (Opportunity) and a custom dataset report macro F1 improvements over baselines (static, gravity-only, naive adaptive BN, conditioned BN only) while keeping latency and memory overhead low. Ablations, robustness, hyperparameter sensitivity, and cross-subject results are provided.
Cross‑Modal Consistency: [18]/50
Textual Logical Soundness: [14]/30
Visual Aesthetics & Clarity: [10]/20
Overall Score: [42]/100
Detailed Evaluation (≤500 words):
Visual ground truth (figure‑alone pass)
• Figure 1/(a): Multi‑line loss vs epoch (kernels 1,3,5,7; Train/Val). Y‑axis “Loss”, legend of 8 items. Loss traces near 0–1 yet axis up to 50.
• Figure 1/(b): F1 vs epoch; peaks ≈0.45–0.50.
• Figure 2/(a): Bar chart Cross‑Domain F1 (Domain B≈0.30, C≈0.25).
• Figure 2/(b): Bar chart Cross‑Domain test loss (~1.3 both).
• Figure 2/(c): Scatter “Ground Truth vs Predictions” (labels 0–3).
• Figure 3/(a): Multi‑dataset final validation F1 bars (≈0.19–0.29).
• Figure 3/(b): Matching losses (~1.35).
• Figure 4: Bar chart “Diverse Domains: Final F1 Scores” (≈0.25–0.41).
• Figure 5/(a–f): Cross‑dataset train/val F1 vs epoch; peaks ≤0.6; mid‑epoch dips.
1. Cross‑Modal Consistency
• Major 1: Fig. 1b contradicts Sec 5.1 claim of F1>0.8. Evidence: “kernel sizes 5 and 7 … (F1 > 0.8)” (Sec 5.1) and Fig. 1(b) ≤0.5.
• Major 2: Fig. 2a shows F1 ≈0.25–0.30, but caption/text claim ≥0.75. Evidence: “maintains consistent performance (F1 > 0.75)” (Fig. 2 caption) vs Fig. 2(a).
• Major 3: Table 1 reports F1=0.847±0.023, yet Fig. 3a shows ≤0.29 across datasets. Evidence: Table 1 vs Fig. 3(a) bars ≤0.29.
• Minor 1: Fig. 2 caption describes “left/right panels” but three panels shown. Evidence: Fig. 2 displays three subplots.
• Minor 2: Fig. 1 titles say “Baseline”, while Sec 5.1 discusses “our approach”. Evidence: Fig. 1 headings vs “our adaptive normalization” (Sec 5.1).
2. Text Logic
• Major 1: Stability‑gate logic contradicts intent: gate allows updates when norm>τ yet aims to block unstable periods. Evidence: “if the norm is above τ, adaptive updates are allowed” (Sec 4) vs “prevents adaptation during unstable dynamics” (Sec 1).
• Major 2: Orientation normalization conflates BN with rotation; R only rotates about z, not full alignment. Evidence: R matrix in Sec 4.1 (2D yaw only).
• Minor 1: Placement classifier size inconsistent. Evidence: “two‑layer…64 hidden units” (Sec 4) vs “64 and 32” (Sec 4.1).
• Minor 2: Opportunity dataset citation/year questionable. Evidence: “Opportunity dataset … 2021” (Sec 5).
3. Figure Quality
• Major 1: Scale/legibility issues obscure trends in Fig. 1a (y‑axis 0–50; traces near 0–1). Evidence: Fig. 1(a) axis range.
• Minor 1: Legends/labels small and crowded (e.g., 8‑item legend in Fig. 1). Evidence: Fig. 1 legend at 295–450 px.
• Minor 2: Many figures lack units and dataset identifiers; captions needed to interpret. Evidence: Figs. 3–4 bar charts without units.
Key strengths:
• Clear modular idea (orientation correction + placement‑conditioned adaptive BN + gate).
• Comprehensive experimental plan (ablation, multi‑dataset, efficiency).
Key weaknesses:
• Severe figure–text numerical mismatches undermine claims.
• Ambiguous/contradictory stability‑gate logic.
• Oversimplified rotation model; unclear how BN performs “orientation normalization.”
• Several presentation/legibility issues and minor architectural inconsistencies.
📋 AI Review from SafeReviewer will be automatically processed
This paper introduces a novel hierarchical adaptive normalization method designed to address the challenges of sensor placement and orientation variability in wearable human activity recognition (HAR). The core contribution lies in a two-stage cascade approach. The first stage employs a gravity-based correction to normalize sensor orientation, leveraging the consistent direction of gravity to align sensor data across different orientations. Simultaneously, it infers the sensor placement context by analyzing the variance of the normalized signals, classifying the sensor's location into categories like wrist, waist, or ankle. A stability gate is incorporated to prevent adaptation during periods of unstable dynamics, ensuring that the system does not overfit to transient movements. The second stage refines the normalized features using a placement-conditioned adaptive Batch Normalization (BN) layer, which adjusts its parameters based on the inferred sensor placement. This adaptive BN is designed to be computationally efficient, making it suitable for real-time applications on resource-constrained wearable devices. The authors evaluate their method on both public and custom datasets, demonstrating improvements in macro F1-score compared to static models and unsupervised domain adaptation approaches. The experimental results suggest that the proposed method effectively mitigates the impact of sensor placement and orientation variability on HAR performance. The paper emphasizes the practical applicability of the approach, highlighting its low latency and minimal memory overhead, which are crucial for on-device deployment. The authors also present a comprehensive ablation study to demonstrate the contribution of each component of the proposed method. Overall, the paper presents a promising approach to address a significant challenge in wearable HAR, with a focus on real-world applicability and computational efficiency.
I find several aspects of this paper to be particularly strong. The core idea of combining physics-based correction with data-driven adaptation is a novel and effective approach to address the challenges of sensor variability in HAR. The use of gravity as a reference for orientation normalization is intuitive and leverages a consistent physical phenomenon, which is a clever way to achieve robustness. The hierarchical structure of the proposed method, with its two-stage cascade, is well-organized and allows for a systematic handling of different aspects of sensor variability. The inclusion of a stability gate to prevent adaptation during unstable periods is a valuable addition, ensuring that the system does not overfit to transient movements. The authors also demonstrate a strong commitment to practical applicability by focusing on computational efficiency. The use of a placement-conditioned adaptive Batch Normalization layer, designed to be lightweight, is a significant step towards enabling real-time on-device deployment. The experimental results, while having some limitations, do show consistent improvements in macro F1-score compared to baseline methods, which is a positive outcome. The ablation study, although not fully comprehensive, does provide some evidence for the contribution of each component of the proposed method. The paper is also well-written and easy to follow, which is important for accessibility. The authors clearly articulate the problem, the proposed solution, and the experimental results. Overall, the paper presents a novel and promising approach to address a significant challenge in wearable HAR, with a clear focus on real-world applicability and computational efficiency.
Despite the strengths, I have identified several weaknesses that need to be addressed. Firstly, the paper's experimental evaluation lacks sufficient breadth in terms of dataset comparison. While the authors use a public dataset (Opportunity) and a custom dataset, they do not include other commonly used benchmarks in the HAR field, such as PAMAP2, USC-HAD, and MHealth. This omission makes it difficult to assess the generalizability of the proposed method and its performance relative to other state-of-the-art techniques. The absence of these datasets weakens the paper's claim of robustness and limits the ability to compare the method with existing approaches. Secondly, the paper's baseline comparisons are not comprehensive enough. The authors compare their method against a static baseline, a gravity-only normalization method, and a naive adaptive BN. However, they do not include comparisons with other state-of-the-art HAR models, such as feature-based methods, other domain adaptation techniques, and advanced deep learning models like Inception or LSTM. This lack of comparison makes it difficult to determine the true effectiveness of the proposed method relative to existing approaches. The limited baseline selection weakens the paper's claim of superior performance. Thirdly, the paper lacks a thorough analysis of the stability gate's performance. While the authors describe the stability gate and its function, they do not provide specific metrics on its accuracy in distinguishing stable and unstable periods, nor do they analyze its impact on the timing and quality of adaptive updates. This lack of analysis makes it difficult to assess the effectiveness of the stability gate and its contribution to the overall performance of the method. The absence of this analysis weakens the paper's claim about the stability gate's effectiveness. Fourthly, the paper's description of the custom dataset is insufficient. While the authors mention the number of subjects and the types of activities, they do not provide detailed information about the sensor modalities, the number of classes, or a comparison with the Opportunity dataset. This lack of detail makes it difficult to assess the generalizability of the results and the specific challenges addressed by the custom dataset. The insufficient dataset description limits the ability to evaluate the method's performance on diverse data. Fifthly, the paper's presentation of results in Figure 4 is unclear. The figure lacks a clear legend explaining the colors, and the connection between the left panel's domain labels (Domain B and C) and the right panel's dataset labels (Opportunity, Custom, Synthetic) is not explicitly stated within the figure. This lack of clarity makes it difficult to interpret the results presented in the figure. The unclear figure presentation hinders the understanding of the experimental results. Finally, the paper's evaluation primarily focuses on macro F1-score, and it lacks a detailed analysis of performance on individual activity classes. This makes it difficult to assess whether the method performs equally well across all activities or if there are specific activities where it struggles. The lack of per-class performance analysis limits the understanding of the method's strengths and weaknesses. These weaknesses, which I have verified through direct examination of the paper, significantly impact the paper's overall strength and generalizability.
To address the identified weaknesses, I recommend several concrete improvements. Firstly, the authors should expand their experimental evaluation to include more commonly used benchmark datasets in the HAR field, such as PAMAP2, USC-HAD, and MHealth. This would provide a more comprehensive assessment of the method's generalizability and allow for a more direct comparison with other state-of-the-art techniques. Secondly, the authors should include a more comprehensive set of baseline comparisons, including feature-based methods, other domain adaptation techniques, and advanced deep learning models like Inception or LSTM. This would provide a more robust evaluation of the proposed method's effectiveness and allow for a better understanding of its strengths and weaknesses relative to existing approaches. Thirdly, the authors should conduct a more thorough analysis of the stability gate's performance, including metrics on its accuracy in distinguishing stable and unstable periods, and its impact on the timing and quality of adaptive updates. This would provide a better understanding of the stability gate's contribution to the overall performance of the method. Fourthly, the authors should provide a more detailed description of the custom dataset, including information about the sensor modalities, the number of classes, and a comparison with the Opportunity dataset. This would allow for a better assessment of the generalizability of the results and the specific challenges addressed by the custom dataset. Fifthly, the authors should improve the clarity of Figure 4 by including a clear legend explaining the colors and explicitly stating the connection between the left and right panels. This would improve the interpretability of the results presented in the figure. Sixthly, the authors should include a more detailed analysis of performance on individual activity classes, not just the macro F1-score. This would provide a better understanding of the method's strengths and weaknesses across different activities. Additionally, the authors should consider exploring the impact of different sensor placements and combinations of sensors, as well as the robustness of the method to noisy sensor data. Finally, the authors should provide a more detailed analysis of the computational cost of the proposed method, including a comparison with other methods and an analysis of the trade-off between accuracy and computational cost. These improvements would significantly strengthen the paper and provide a more comprehensive and robust evaluation of the proposed method.
Based on my analysis, I have several questions that I believe are important for further clarification. Firstly, how does the proposed method perform on other commonly used HAR datasets, such as PAMAP2, USC-HAD, and MHealth, and how does its performance compare to state-of-the-art methods on these datasets? Secondly, how does the proposed method compare to other state-of-the-art HAR models, such as feature-based methods, other domain adaptation techniques, and advanced deep learning models like Inception or LSTM? Thirdly, what is the accuracy of the stability gate in distinguishing stable and unstable periods, and how does this accuracy impact the overall performance of the method? Fourthly, what are the specific characteristics of the custom dataset, including the sensor modalities, the number of classes, and how does it compare to the Opportunity dataset in terms of complexity and diversity? Fifthly, what is the performance of the proposed method on individual activity classes, and are there any specific activities where the method performs particularly well or poorly? Sixthly, how does the computational cost of the proposed method compare to other methods, and what is the trade-off between accuracy and computational cost? Finally, how does the method handle noisy sensor data, and what are the limitations of the method in the presence of significant noise? These questions are crucial for a more complete understanding of the proposed method's strengths, weaknesses, and limitations.