2510.0019 Hierarchical Adaptive Normalization: A Placement-Conditioned Cascade for Robust Wearable Activity Recognition v1

🎯 ICAIS2025 Submission

AI Review from DeepReviewer

AI Review available after:
--d --h --m --s

📋 AI Review from DeepReviewer will be automatically processed

📋 Summary

This paper introduces a novel approach to human activity recognition (HAR) using wearable sensors, aiming to improve robustness against variations in sensor placement and orientation. The core contribution lies in a hierarchical adaptive normalization method that integrates gravity-based orientation correction, placement context inference, a stability gate, and placement-conditioned adaptive batch normalization. The method begins by estimating the gravity vector to correct for sensor orientation, followed by a placement context classifier that categorizes the sensor's location (e.g., wrist, waist, ankle). A stability gate is then employed to prevent adaptations during unstable periods, ensuring that the model only updates its parameters when the sensor data is deemed reliable. Finally, a placement-conditioned adaptive batch normalization layer refines the feature representations based on the inferred sensor placement. The authors evaluate their method on both public and custom datasets, demonstrating improved performance compared to baseline approaches. The empirical findings suggest that the proposed method effectively mitigates the impact of sensor misplacement and orientation changes, leading to more robust HAR models. The authors also conduct ablation studies to analyze the contribution of each component, showing that the combination of all components yields the best results. The paper's significance lies in addressing a critical challenge in wearable sensor-based HAR, which is the sensitivity to sensor placement and orientation, thereby enhancing the usability and reliability of HAR systems in real-world settings. The proposed method is designed to be efficient and suitable for on-device deployment, which is crucial for wearable applications. The authors also provide a detailed analysis of the method's performance under varying noise levels and sensor sampling rates, further demonstrating its robustness. Overall, the paper presents a well-motivated and empirically validated approach to address a significant problem in the field of wearable sensor-based human activity recognition.

✅ Strengths

I found several aspects of this paper to be particularly strong. First, the paper tackles a very important problem in the field of wearable sensor-based human activity recognition: the sensitivity of HAR models to sensor placement and orientation. This is a critical issue that significantly impacts the usability and reliability of HAR systems in real-world settings. The proposed method offers a practical solution by integrating multiple techniques to address this challenge. The hierarchical adaptive normalization approach, which combines gravity-based orientation correction, placement context inference, a stability gate, and placement-conditioned adaptive batch normalization, is a novel and well-motivated contribution. The use of gravity-based correction to address orientation variability is a clever approach, and the placement context classifier allows the model to adapt to different sensor locations. The stability gate is another key innovation, as it prevents harmful updates during unstable periods, ensuring that the model only learns from reliable data. The authors also deserve credit for the thorough empirical evaluation of their method. They conducted experiments on multiple datasets, including both public and custom datasets, and demonstrated consistent improvements over baseline approaches. The ablation studies provide valuable insights into the contribution of each component, showing that the combination of all components yields the best results. Furthermore, the authors analyzed the method's performance under varying noise levels and sensor sampling rates, further demonstrating its robustness. The paper also includes a detailed analysis of the method's performance, including convergence analysis, ablation studies, computational efficiency analysis, robustness analysis, hyperparameter sensitivity analysis, and cross-subject generalization analysis. This comprehensive evaluation provides a strong foundation for the claims made in the paper. Finally, the paper is well-organized and easy to follow, which makes it accessible to a wider audience. The authors clearly explain the proposed method and its components, and the experimental results are presented in a clear and concise manner. Overall, the paper presents a significant contribution to the field of wearable sensor-based human activity recognition, offering a practical and effective solution to a critical challenge.

❌ Weaknesses

Despite the strengths of this paper, I have identified several weaknesses that warrant attention. First, the paper suffers from some notational inconsistencies and a lack of clarity in certain sections. Specifically, the notation 's' is used in multiple contexts, which creates ambiguity. While 's' is defined as a binary stability mask in Section 4, its use in the context of placement-conditioned adaptation could lead to confusion. The paper does not explicitly link the binary mask 's' to the specific placement categories (wrist, waist, ankle), and this lack of clarity makes it difficult to fully understand how the stability gate interacts with the placement-conditioned batch normalization. This is a partially valid concern, as while 's' is defined as a binary mask, its contextual use could be clearer to avoid misinterpretation. My confidence in this issue is high. Second, the terms "Domain B" and "Domain C" are introduced in Section 5.3 without any prior definition. The paper states, "Figure 2 illustrates our cross-domain evaluations. The left part of the figure compares F1-scores between Domain B and Domain C, revealing that Domain B achieves superior performance." However, the paper does not explain what these domains represent, making it impossible to interpret the results of the cross-domain evaluation. This is a valid concern, and my confidence is high. The lack of definition for these terms makes it difficult to understand the context of the cross-domain evaluation. Finally, in Section 5.8, the paper refers to a missing figure with the placeholder "Figure ??". The text states, "Figure ?? shows the performance degradation analysis under increasing noise levels and varying sensor sampling rates." This missing figure makes it impossible to verify the claims made in that section, and this is a valid concern with high confidence. The absence of this figure hinders the understanding of the method's robustness to noise and varying sampling rates. These issues with notation, undefined terms, and missing figures significantly impact the clarity and readability of the paper, making it difficult to fully assess the validity of the claims made. Beyond these clarity issues, I also identified a significant weakness in the experimental evaluation. The paper does not include a direct comparison with several state-of-the-art methods that address similar challenges in HAR. While the paper includes ablations, it lacks direct comparisons with methods like invariant feature learning, explicit placement recognition strategies, or specific unsupervised domain adaptation techniques. This is a valid concern, and my confidence is high. The absence of these comparisons makes it difficult to assess the true novelty and effectiveness of the proposed method relative to the broader landscape of existing solutions. The paper only compares against a static baseline, gravity-only normalization, naive adaptive BN, and conditioned BN only. While these baselines are relevant, the lack of comparison with more advanced methods weakens the demonstration of the proposed method's superiority. Furthermore, while the paper lists its contributions in the introduction, it does not clearly articulate the novelty of the proposed method by contrasting it with existing approaches. The paper describes the components but doesn't strongly emphasize why this specific combination and design are novel. This is a partially valid concern, and my confidence is medium. The lack of explicit comparison of novelty makes it difficult to fully appreciate the unique aspects of the proposed method. The paper could benefit from a more explicit discussion of how the proposed method differs from existing methods and what specific problems it is able to solve that other methods cannot. In summary, the paper has several weaknesses that need to be addressed to improve its clarity, rigor, and impact. The notational inconsistencies, undefined terms, missing figures, lack of comparison with state-of-the-art methods, and the lack of clear articulation of novelty all detract from the overall quality of the paper.

💡 Suggestions

To address the identified weaknesses, I recommend several concrete improvements. First, the authors should clarify the notation used in the paper, particularly the use of 's'. I suggest using a different symbol for the binary stability mask and the sensor placement context to avoid confusion. For example, the placement context could be denoted by 'p' as suggested by the reviewer, and the stability mask could be denoted by another symbol, such as 'm'. This would make the paper more readable and less ambiguous. Second, the authors should clearly define the terms "Domain B" and "Domain C" in Section 5.3. They should provide a detailed explanation of what these domains represent and how they relate to the datasets and activities used in the experiments. This would make the cross-domain evaluation more understandable and meaningful. Third, the authors should include the missing figure in Section 5.8. The figure should be properly labeled and referenced in the text. This would allow the reader to verify the claims made in that section and gain a better understanding of the method's robustness to noise and varying sampling rates. Fourth, the authors should include a more comprehensive comparison with state-of-the-art methods in the experimental section. This should include methods such as invariant feature learning, explicit placement recognition strategies, and unsupervised domain adaptation techniques. This would provide a more complete picture of the proposed method's performance relative to existing solutions and better demonstrate its novelty and effectiveness. The authors should also discuss the advantages and disadvantages of the proposed method compared to these existing approaches. Fifth, the authors should more clearly articulate the novelty of the proposed method. They should explicitly state the specific technical contributions of their work and explain how their method differs from existing approaches. This should be done in the introduction and conclusion of the paper. The authors should emphasize the unique aspects of their proposed solution and highlight the specific problems that their method is able to solve that other methods cannot. This would make the paper more impactful and better demonstrate its contribution to the field. Finally, the authors should carefully proofread the paper to ensure that there are no missing figures or other errors. This would improve the overall quality of the paper and make it more accessible to a wider audience. By addressing these issues, the authors can significantly improve the clarity, rigor, and impact of their work. These changes would make the paper more convincing and better demonstrate the value of the proposed method for addressing the challenges of sensor placement and orientation variability in human activity recognition.

❓ Questions

I have several questions that arise from my analysis of the paper. First, regarding the stability gate, how is the threshold T determined in practice? The paper mentions that the threshold is optimized on a validation set, but I'm curious about the specific process. Is it a fixed threshold across all activities and subjects, or is it adapted based on the activity or subject? Furthermore, how sensitive is the performance of the method to the choice of this threshold? Second, concerning the placement context classifier, how does the classifier perform under noisy conditions? The paper mentions that the classifier outputs a probability distribution over three placement categories, but I'm curious about the confidence of these predictions. What happens if the classifier is uncertain about the placement context? Does this uncertainty affect the performance of the subsequent adaptive batch normalization? Third, regarding the gravity-based orientation correction, how does the method perform when the sensor is subjected to non-gravitational accelerations? The paper focuses on correcting for orientation changes due to gravity, but what happens when there are other accelerations involved? Does the method assume that the only acceleration present is due to gravity? Fourth, in the experimental evaluation, why were specific state-of-the-art methods chosen as baselines, and why were other methods not included? I'm particularly interested in understanding the rationale behind the choice of baselines and the limitations of the chosen baselines in addressing the problem of sensor placement and orientation variability. Finally, regarding the cross-subject generalization analysis, how does the method perform on subjects that are not included in the training data? The paper mentions cross-subject generalization, but I'm curious about the extent of the performance drop when the method is applied to unseen subjects. What are the limitations of the method in terms of generalizing to new subjects? These questions are aimed at clarifying some of the key methodological choices and assumptions made in the paper, and I believe that addressing these questions would further strengthen the paper's contribution.

📊 Scores

Soundness:2.5
Presentation:2.0
Contribution:2.0
Rating: 3.5

AI Review from ZGCA

ZGCA Review available after:
--d --h --m --s

📋 AI Review from ZGCA will be automatically processed

📋 Summary

The paper proposes Hierarchical Adaptive Normalization (HAN), a two-stage cascade for robust wearable HAR under sensor placement and orientation variability. Stage 1 performs gravity-based orientation normalization and infers coarse placement context via signal variance features; a stability gate then decides whether to allow adaptation based on the input norm. Stage 2 applies a placement-conditioned adaptive Batch Normalization whose momentum depends on the inferred placement confidence, followed by a lightweight CNN for classification. The approach aims to combine a physics-based correction with adaptive normalization that is both placement-aware and safe under transient dynamics. Extensive experiments on a public dataset (Opportunity) and a custom dataset report macro F1 improvements over baselines (static, gravity-only, naive adaptive BN, conditioned BN only) while keeping latency and memory overhead low. Ablations, robustness, hyperparameter sensitivity, and cross-subject results are provided.

✅ Strengths

  • Practical, lightweight integration of orientation normalization, placement context inference, and adaptive BN with a gating mechanism tailored for on-device use (Section 4).
  • Clear articulation of the deployment problem (placement/orientation variability) and motivation for a hierarchical adaptive approach (Sections 1, 3.1).
  • Thorough internal analyses: ablations isolating each component (Section 5.4), hyperparameter sensitivity (Section 5.8), robustness to noise and sampling rates (Section 5.7), cross-subject generalization and adaptation speed (Section 5.9).
  • Empirical gains over simple baselines across placements with modest overhead (Table 1, Table 2; Section 5.6).
  • Design choices are easy to implement and plausibly portable to edge devices (e.g., conditioning BN momentum via placement confidence; Section 4.1).

❌ Weaknesses

  • No direct comparisons to state-of-the-art lightweight UDA/test-time adaptation baselines for time-series/HAR despite claims of outperforming ‘complex UDA approaches’ (Abstract, Section 2). The provided baselines (Static, Gravity-Only, Naive Adaptive BN, Conditioned BN only) are insufficient to substantiate comparative claims (Section 5.2).
  • Orientation ‘gravity-based’ correction seems under-specified and possibly insufficient: the implementation rotates only in the xy-plane (R with θ = arctan2(gy, gx), Section 4.1), which does not generally align a 3D sensor frame to the world frame; tilt (pitch/roll) is not addressed and gyroscope integration is not used.
  • Stability gate logic appears inconsistent with the stated goal: s = I(||x_norm||2 > τ) (Section 4.1) allows adaptation for large norms; yet the paper’s narrative says the gate suppresses adaptation during unstable high-impact events, which typically have large norms (Sections 1, 3.1, 5.3). This requires clarification/justification or correction.
  • Placement classifier details are inconsistent across sections (two-layer with 64 units in Section 4 vs two layers with 64 and 32 units in Section 4.1). The pipeline depends on this classifier; misclassification could mis-set BN momentum; no analysis of the impact of misclassification errors on downstream performance is provided.
  • On-device claims are supported by GPU timing and memory on RTX 3080 (Section 5, Table 2), not embedded hardware. The conclusions about edge suitability would be stronger with measurements on mobile/MCU platforms.
  • Some reported numbers are hard to reconcile (e.g., earlier kernel-size tuning notes mention validation F1 near 0.49, while later F1 exceeds 0.8; Section 5). Reproducibility would benefit from clearer task definitions, train/val/test composition, and exact preprocessing for each dataset/placement.

❓ Questions

  • Stability gate: Please reconcile the apparent mismatch between the gate definition s = I(||x_norm||2 > τ) (Sections 4, 4.1) and the stated intention to prevent adaptation during high-impact/unstable events (Sections 1, 3.1, 5.3). Under what conditions is high norm considered stable? Could you provide a calibration study showing gate activations vs. acceleration magnitude/jerk and their relation to instability?
  • Orientation normalization: Section 4.1 uses R with θ = arctan2(gy, gx), which only rotates in the xy-plane. How is full 3D gravity alignment (pitch/roll) handled? Do you project gravity to align with a canonical axis (e.g., z) via a 3D rotation (e.g., aligning g/||g|| to [0,0,1])? Are gyroscope signals used (e.g., complementary/Kalman filtering) to improve gravity estimation during dynamics?
  • Placement classifier: Section 4 says a two-layer FC with 64 units; Section 4.1 says 64 and 32. Which is correct? How sensitive is performance to placement misclassification? Please include an analysis where you deliberately perturb the placement prediction/confidence and report impact on F1.
  • Comparative baselines: Which lightweight UDA/TTA baselines do you consider ‘complex’? Please add direct comparisons to strong lightweight methods applicable at test-time (e.g., BN-statistics adaptation, TENT, SHOT variants adapted to time series, calibration-free TTA). Include both accuracy and per-window latency/energy/memory on the same hardware.
  • On-device evaluation: Can you provide measurements on a representative embedded platform (e.g., smartphone CPU, Raspberry Pi, or MCU), including real-time throughput at typical window sizes and energy per inference?
  • Gravity estimation under dynamics: You use a 1-second moving average of acceleration (Section 4.1). How robust is this to fast activities? Did you try longer windows or filtering (LPF, complementary) and quantify orientation error vs. performance?
  • BN conditioning details: Is momentum conditioned for all BN layers or only selected layers? Are per-placement running statistics maintained separately or is there a single set updated with variable momentum? Please clarify implementation choices and their ablation.
  • Adaptation speed: Section 5.9 reports an average of 15.3±2.1 samples to adapt. How is ‘adapt’ defined and measured? Is this number consistent across placements and activities?
  • Dataset specifics: For Opportunity, which sensor streams and placements are used, and how are cross-placement splits constructed? For the custom dataset, please detail activities, sensor specs, placements, windowing, and label protocol.
  • Ablation interpretability: Could you provide a component-wise breakdown of computational overhead (latency/memory) added by each module (orientation correction, placement classifier, gating, adaptive BN)?

⚠️ Limitations

  • Reliance on a supervised placement classifier trained on limited placements (wrist/waist/ankle) may limit generalization to unseen or nonstandard placements; misclassification may negatively affect BN adaptation.
  • Current orientation normalization appears to rely on simplistic gravity estimation and planar rotation; this may be fragile under rapid dynamics or when gyroscope drift/measurement noise are significant.
  • The stability gate uses a fixed norm threshold tuned on validation data; this may be dataset/activity-specific. A more adaptive or multi-criterion gating (e.g., using jerk/variance/entropy) could generalize better.
  • Claims about on-device suitability are based on desktop GPU measurements; energy/latency on constrained devices may differ substantially.
  • Potential societal impacts are low-risk, but privacy/security considerations around continuous wearable sensing and on-device adaptation are not discussed (e.g., data retention, potential leakage via model updates).

🖼️ Image Evaluation

Cross‑Modal Consistency: [18]/50

Textual Logical Soundness: [14]/30

Visual Aesthetics & Clarity: [10]/20

Overall Score: [42]/100

Detailed Evaluation (≤500 words):

Visual ground truth (figure‑alone pass)

• Figure 1/(a): Multi‑line loss vs epoch (kernels 1,3,5,7; Train/Val). Y‑axis “Loss”, legend of 8 items. Loss traces near 0–1 yet axis up to 50.

• Figure 1/(b): F1 vs epoch; peaks ≈0.45–0.50.

• Figure 2/(a): Bar chart Cross‑Domain F1 (Domain B≈0.30, C≈0.25).

• Figure 2/(b): Bar chart Cross‑Domain test loss (~1.3 both).

• Figure 2/(c): Scatter “Ground Truth vs Predictions” (labels 0–3).

• Figure 3/(a): Multi‑dataset final validation F1 bars (≈0.19–0.29).

• Figure 3/(b): Matching losses (~1.35).

• Figure 4: Bar chart “Diverse Domains: Final F1 Scores” (≈0.25–0.41).

• Figure 5/(a–f): Cross‑dataset train/val F1 vs epoch; peaks ≤0.6; mid‑epoch dips.

1. Cross‑Modal Consistency

• Major 1: Fig. 1b contradicts Sec 5.1 claim of F1>0.8. Evidence: “kernel sizes 5 and 7 … (F1 > 0.8)” (Sec 5.1) and Fig. 1(b) ≤0.5.

• Major 2: Fig. 2a shows F1 ≈0.25–0.30, but caption/text claim ≥0.75. Evidence: “maintains consistent performance (F1 > 0.75)” (Fig. 2 caption) vs Fig. 2(a).

• Major 3: Table 1 reports F1=0.847±0.023, yet Fig. 3a shows ≤0.29 across datasets. Evidence: Table 1 vs Fig. 3(a) bars ≤0.29.

• Minor 1: Fig. 2 caption describes “left/right panels” but three panels shown. Evidence: Fig. 2 displays three subplots.

• Minor 2: Fig. 1 titles say “Baseline”, while Sec 5.1 discusses “our approach”. Evidence: Fig. 1 headings vs “our adaptive normalization” (Sec 5.1).

2. Text Logic

• Major 1: Stability‑gate logic contradicts intent: gate allows updates when norm>τ yet aims to block unstable periods. Evidence: “if the norm is above τ, adaptive updates are allowed” (Sec 4) vs “prevents adaptation during unstable dynamics” (Sec 1).

• Major 2: Orientation normalization conflates BN with rotation; R only rotates about z, not full alignment. Evidence: R matrix in Sec 4.1 (2D yaw only).

• Minor 1: Placement classifier size inconsistent. Evidence: “two‑layer…64 hidden units” (Sec 4) vs “64 and 32” (Sec 4.1).

• Minor 2: Opportunity dataset citation/year questionable. Evidence: “Opportunity dataset … 2021” (Sec 5).

3. Figure Quality

• Major 1: Scale/legibility issues obscure trends in Fig. 1a (y‑axis 0–50; traces near 0–1). Evidence: Fig. 1(a) axis range.

• Minor 1: Legends/labels small and crowded (e.g., 8‑item legend in Fig. 1). Evidence: Fig. 1 legend at 295–450 px.

• Minor 2: Many figures lack units and dataset identifiers; captions needed to interpret. Evidence: Figs. 3–4 bar charts without units.

Key strengths:

• Clear modular idea (orientation correction + placement‑conditioned adaptive BN + gate).

• Comprehensive experimental plan (ablation, multi‑dataset, efficiency).

Key weaknesses:

• Severe figure–text numerical mismatches undermine claims.

• Ambiguous/contradictory stability‑gate logic.

• Oversimplified rotation model; unclear how BN performs “orientation normalization.”

• Several presentation/legibility issues and minor architectural inconsistencies.

📊 Scores

Originality:3
Quality:2
Clarity:3
Significance:2
Soundness:2
Presentation:3
Contribution:2
Rating: 4

AI Review from SafeReviewer

SafeReviewer Review available after:
--d --h --m --s

📋 AI Review from SafeReviewer will be automatically processed

📋 Summary

This paper introduces a novel hierarchical adaptive normalization method designed to address the challenges of sensor placement and orientation variability in wearable human activity recognition (HAR). The core contribution lies in a two-stage cascade approach. The first stage employs a gravity-based correction to normalize sensor orientation, leveraging the consistent direction of gravity to align sensor data across different orientations. Simultaneously, it infers the sensor placement context by analyzing the variance of the normalized signals, classifying the sensor's location into categories like wrist, waist, or ankle. A stability gate is incorporated to prevent adaptation during periods of unstable dynamics, ensuring that the system does not overfit to transient movements. The second stage refines the normalized features using a placement-conditioned adaptive Batch Normalization (BN) layer, which adjusts its parameters based on the inferred sensor placement. This adaptive BN is designed to be computationally efficient, making it suitable for real-time applications on resource-constrained wearable devices. The authors evaluate their method on both public and custom datasets, demonstrating improvements in macro F1-score compared to static models and unsupervised domain adaptation approaches. The experimental results suggest that the proposed method effectively mitigates the impact of sensor placement and orientation variability on HAR performance. The paper emphasizes the practical applicability of the approach, highlighting its low latency and minimal memory overhead, which are crucial for on-device deployment. The authors also present a comprehensive ablation study to demonstrate the contribution of each component of the proposed method. Overall, the paper presents a promising approach to address a significant challenge in wearable HAR, with a focus on real-world applicability and computational efficiency.

✅ Strengths

I find several aspects of this paper to be particularly strong. The core idea of combining physics-based correction with data-driven adaptation is a novel and effective approach to address the challenges of sensor variability in HAR. The use of gravity as a reference for orientation normalization is intuitive and leverages a consistent physical phenomenon, which is a clever way to achieve robustness. The hierarchical structure of the proposed method, with its two-stage cascade, is well-organized and allows for a systematic handling of different aspects of sensor variability. The inclusion of a stability gate to prevent adaptation during unstable periods is a valuable addition, ensuring that the system does not overfit to transient movements. The authors also demonstrate a strong commitment to practical applicability by focusing on computational efficiency. The use of a placement-conditioned adaptive Batch Normalization layer, designed to be lightweight, is a significant step towards enabling real-time on-device deployment. The experimental results, while having some limitations, do show consistent improvements in macro F1-score compared to baseline methods, which is a positive outcome. The ablation study, although not fully comprehensive, does provide some evidence for the contribution of each component of the proposed method. The paper is also well-written and easy to follow, which is important for accessibility. The authors clearly articulate the problem, the proposed solution, and the experimental results. Overall, the paper presents a novel and promising approach to address a significant challenge in wearable HAR, with a clear focus on real-world applicability and computational efficiency.

❌ Weaknesses

Despite the strengths, I have identified several weaknesses that need to be addressed. Firstly, the paper's experimental evaluation lacks sufficient breadth in terms of dataset comparison. While the authors use a public dataset (Opportunity) and a custom dataset, they do not include other commonly used benchmarks in the HAR field, such as PAMAP2, USC-HAD, and MHealth. This omission makes it difficult to assess the generalizability of the proposed method and its performance relative to other state-of-the-art techniques. The absence of these datasets weakens the paper's claim of robustness and limits the ability to compare the method with existing approaches. Secondly, the paper's baseline comparisons are not comprehensive enough. The authors compare their method against a static baseline, a gravity-only normalization method, and a naive adaptive BN. However, they do not include comparisons with other state-of-the-art HAR models, such as feature-based methods, other domain adaptation techniques, and advanced deep learning models like Inception or LSTM. This lack of comparison makes it difficult to determine the true effectiveness of the proposed method relative to existing approaches. The limited baseline selection weakens the paper's claim of superior performance. Thirdly, the paper lacks a thorough analysis of the stability gate's performance. While the authors describe the stability gate and its function, they do not provide specific metrics on its accuracy in distinguishing stable and unstable periods, nor do they analyze its impact on the timing and quality of adaptive updates. This lack of analysis makes it difficult to assess the effectiveness of the stability gate and its contribution to the overall performance of the method. The absence of this analysis weakens the paper's claim about the stability gate's effectiveness. Fourthly, the paper's description of the custom dataset is insufficient. While the authors mention the number of subjects and the types of activities, they do not provide detailed information about the sensor modalities, the number of classes, or a comparison with the Opportunity dataset. This lack of detail makes it difficult to assess the generalizability of the results and the specific challenges addressed by the custom dataset. The insufficient dataset description limits the ability to evaluate the method's performance on diverse data. Fifthly, the paper's presentation of results in Figure 4 is unclear. The figure lacks a clear legend explaining the colors, and the connection between the left panel's domain labels (Domain B and C) and the right panel's dataset labels (Opportunity, Custom, Synthetic) is not explicitly stated within the figure. This lack of clarity makes it difficult to interpret the results presented in the figure. The unclear figure presentation hinders the understanding of the experimental results. Finally, the paper's evaluation primarily focuses on macro F1-score, and it lacks a detailed analysis of performance on individual activity classes. This makes it difficult to assess whether the method performs equally well across all activities or if there are specific activities where it struggles. The lack of per-class performance analysis limits the understanding of the method's strengths and weaknesses. These weaknesses, which I have verified through direct examination of the paper, significantly impact the paper's overall strength and generalizability.

💡 Suggestions

To address the identified weaknesses, I recommend several concrete improvements. Firstly, the authors should expand their experimental evaluation to include more commonly used benchmark datasets in the HAR field, such as PAMAP2, USC-HAD, and MHealth. This would provide a more comprehensive assessment of the method's generalizability and allow for a more direct comparison with other state-of-the-art techniques. Secondly, the authors should include a more comprehensive set of baseline comparisons, including feature-based methods, other domain adaptation techniques, and advanced deep learning models like Inception or LSTM. This would provide a more robust evaluation of the proposed method's effectiveness and allow for a better understanding of its strengths and weaknesses relative to existing approaches. Thirdly, the authors should conduct a more thorough analysis of the stability gate's performance, including metrics on its accuracy in distinguishing stable and unstable periods, and its impact on the timing and quality of adaptive updates. This would provide a better understanding of the stability gate's contribution to the overall performance of the method. Fourthly, the authors should provide a more detailed description of the custom dataset, including information about the sensor modalities, the number of classes, and a comparison with the Opportunity dataset. This would allow for a better assessment of the generalizability of the results and the specific challenges addressed by the custom dataset. Fifthly, the authors should improve the clarity of Figure 4 by including a clear legend explaining the colors and explicitly stating the connection between the left and right panels. This would improve the interpretability of the results presented in the figure. Sixthly, the authors should include a more detailed analysis of performance on individual activity classes, not just the macro F1-score. This would provide a better understanding of the method's strengths and weaknesses across different activities. Additionally, the authors should consider exploring the impact of different sensor placements and combinations of sensors, as well as the robustness of the method to noisy sensor data. Finally, the authors should provide a more detailed analysis of the computational cost of the proposed method, including a comparison with other methods and an analysis of the trade-off between accuracy and computational cost. These improvements would significantly strengthen the paper and provide a more comprehensive and robust evaluation of the proposed method.

❓ Questions

Based on my analysis, I have several questions that I believe are important for further clarification. Firstly, how does the proposed method perform on other commonly used HAR datasets, such as PAMAP2, USC-HAD, and MHealth, and how does its performance compare to state-of-the-art methods on these datasets? Secondly, how does the proposed method compare to other state-of-the-art HAR models, such as feature-based methods, other domain adaptation techniques, and advanced deep learning models like Inception or LSTM? Thirdly, what is the accuracy of the stability gate in distinguishing stable and unstable periods, and how does this accuracy impact the overall performance of the method? Fourthly, what are the specific characteristics of the custom dataset, including the sensor modalities, the number of classes, and how does it compare to the Opportunity dataset in terms of complexity and diversity? Fifthly, what is the performance of the proposed method on individual activity classes, and are there any specific activities where the method performs particularly well or poorly? Sixthly, how does the computational cost of the proposed method compare to other methods, and what is the trade-off between accuracy and computational cost? Finally, how does the method handle noisy sensor data, and what are the limitations of the method in the presence of significant noise? These questions are crucial for a more complete understanding of the proposed method's strengths, weaknesses, and limitations.

📊 Scores

Soundness:2.75
Presentation:2.75
Contribution:2.5
Confidence:4.0
Rating: 5.25

Keywords

Click the button to extract keywords

Insights

Click the button to extract insights
Version 1 ⚠️ Not latest
Citation Tools

📝 Cite This Paper