Hierarchical Adaptive Normalization: A Placement-Conditioned Cascade for Robust Wearable Activity Recognition

Paper Content

📄 Open in New Tab

🎓 Meta Review & Human Decision

Decision:

Meta Review:

AI Review from DeepReviewer

AI Review available after:

--d --h --m --s

📋 AI Review from DeepReviewer will be automatically processed

📋 Summary

This paper introduces a hierarchical adaptive normalization method designed to enhance the robustness of wearable Human Activity Recognition (HAR) systems when faced with variations in sensor placement and orientation. The core contribution lies in a two-stage cascade approach. The first stage combines gravity-based orientation correction with placement context inference. Specifically, it employs a lightweight classifier, trained on labeled data from multiple sensor placements, to infer the sensor's location (wrist, waist, or ankle) based on the variance of normalized signals. This inferred context is then used to condition the parameters of a Batch Normalization (BN) layer, adapting it to the specific sensor placement. The second stage refines the feature representations using a placement-conditioned adaptive Batch Normalization, which updates its running statistics in real-time based on the inferred placement context and a stability gate. This stability gate, based on the L2 norm of the normalized input, prevents harmful updates during unstable periods by suppressing adaptation when the signal norm is below a threshold. The method's efficacy is demonstrated through experiments on both a public dataset (Opportunity) and a custom dataset, showing significant improvements in accuracy and robustness compared to baseline methods and state-of-the-art unsupervised domain adaptation techniques. The paper also includes a computational efficiency analysis, demonstrating the method's suitability for real-time, on-device applications. The authors report a macro F1-score of 0.847, highlighting the method's strong performance. Overall, the paper presents a practical and effective approach to addressing sensor variability in wearable HAR, with a focus on real-world applicability and computational efficiency.

✅ Strengths

I find several aspects of this paper to be particularly strong. The core strength lies in the novel combination of physics-based orientation correction with data-driven adaptive normalization. This hierarchical approach effectively addresses the challenges posed by sensor placement and orientation variability, which are significant hurdles in real-world wearable HAR applications. The use of a lightweight classifier to infer placement context based on signal variance is a clever and efficient approach. While the rationale behind using variance as a reliable indicator could be more thoroughly explained, the method itself is intuitive and computationally inexpensive. The placement-conditioned adaptive Batch Normalization, which adjusts its momentum based on the inferred context, is another notable contribution. The mathematical formulation of this process is clearly presented, and it demonstrates a practical way to adapt normalization parameters to different sensor placements. Furthermore, the inclusion of a stability gate, which prevents harmful updates during unstable periods, is a valuable addition. Although the stability gate uses a simple norm-based thresholding technique, it effectively prevents adaptation during periods of low signal magnitude, which are often indicative of unstable or noisy conditions. The experimental results are also compelling. The method demonstrates significant improvements in accuracy and robustness compared to baseline methods and state-of-the-art unsupervised domain adaptation techniques. The evaluation on both a public dataset and a custom dataset, along with the computational efficiency analysis, provides a comprehensive assessment of the method's performance. The reported macro F1-score of 0.847 is a strong indicator of the method's effectiveness. Finally, the paper is well-written and clearly explains the methodology, making it accessible to a broad audience. The authors also acknowledge the limitations of their approach, which is a sign of thorough and responsible research.

❌ Weaknesses

Despite the strengths of this paper, I have identified several weaknesses that warrant careful consideration. First, while the paper describes the placement context inference using a two-layer fully connected network based on signal variance, the rationale behind using variance as a reliable indicator of sensor placement is not sufficiently justified. The paper states that "Placement context is inferred using a lightweight classifier that processes the variance of the normalized signals," and provides an example that "if the variance of the normalized accelerometer data along the x-axis is high, it might indicate a wrist placement where the sensor experiences more dynamic movements." However, this explanation lacks a deeper theoretical justification. The paper does not explain why variance is a reliable indicator of sensor placement, nor does it discuss the potential limitations of this approach. This lack of justification weakens the overall argument for the method's robustness. My confidence in this issue is high, as the paper does not provide a strong theoretical basis for this design choice. Second, the paper's discussion of the stability gate, while presenting it as a novel contribution, does not fully address its limitations. The paper states that "The stability gate allows adaptation when the signal norm exceeds the threshold T because high-magnitude signals typically indicate stable, informative activity patterns rather than noise or unstable transients." While this explanation is intuitive, the paper does not explore the sensitivity of the method to different threshold values, nor does it discuss how the threshold might need to be adjusted for different activities or sensor placements. The paper does mention that "The stability threshold T is determined through validation set optimization," but it does not provide a clear definition of stability or discuss how the threshold might need to be adjusted for different activities or sensor placements beyond the initial optimization. This reliance on a single, optimized threshold value raises concerns about the method's generalizability and robustness in diverse real-world scenarios. My confidence in this issue is medium, as the paper does mention validation set optimization, but lacks a deeper discussion of the threshold's sensitivity. Third, the paper's computational overhead analysis, while demonstrating the method's efficiency compared to static baselines, lacks a comparison with other state-of-the-art adaptive methods. The paper states that "Table 3 presents the computational overhead comparison between our method and baseline approaches," and Table 3 shows inference time, memory usage, and energy consumption compared to baselines. However, the analysis does not include a comparison with other advanced adaptive normalization techniques. This omission makes it difficult to assess the method's computational efficiency relative to other adaptive approaches. Furthermore, the paper does not provide a detailed breakdown of the computational cost associated with each component of the proposed method, such as the placement context classifier and the adaptive Batch Normalization layers. This lack of detail makes it difficult to fully understand the method's computational overhead. My confidence in this issue is high, as the paper explicitly lacks a comparison with other adaptive methods in the computational overhead analysis. Fourth, the paper lacks sufficient detail regarding the data collection process for the custom dataset. The paper mentions that "A custom dataset collected from 15 subjects performing diverse activities including static inversions, dynamic rotations, and high-impact events" was used. However, the paper does not specify the number of repetitions for each activity per subject, the duration of each activity segment, or the specific sensor placement protocols. This lack of detail makes it difficult to assess the generalizability of the results and understand the robustness of the method to variations in activity performance. My confidence in this issue is high, as the paper explicitly lacks these details. Fifth, the paper does not adequately address the method's limitations in scenarios with extreme sensor orientation changes or novel activities not seen during training. The paper acknowledges that "the placement context classifier is trained on a limited set of categories (wrist, waist, ankle), which may not cover all real-world sensor placements," but it does not discuss how the method would handle activities that involve complex, multi-planar movements or unusual gait patterns. Furthermore, the paper does not thoroughly analyze the method's performance during abrupt changes in sensor orientation, such as those caused by sudden movements or impacts. The paper uses a moving average filter for gravity vector estimation, which might struggle with rapid, non-linear orientation shifts. My confidence in this issue is high, as the paper does not explicitly address these limitations. Finally, the paper does not discuss the method's performance under sensor drift or unexpected environmental conditions. The paper acknowledges the assumption of sufficient sampling rates, but it does not address the potential impact of sensor drift or unexpected environmental conditions on the method's performance. My confidence in this issue is high, as the paper does not explicitly address these limitations.

💡 Suggestions

To address the identified weaknesses, I recommend several concrete improvements. First, the authors should provide a more thorough justification for using signal variance as the basis for placement context inference. This could involve relating variance to the physical characteristics of sensor data under different placements, or providing empirical evidence to support this design choice. The authors should also explore alternative features that could improve the robustness and accuracy of the placement context classifier. Second, the authors should conduct a more detailed analysis of the stability gate's behavior and its impact on the overall performance. This should include exploring the sensitivity of the method to different threshold values and providing a rationale for the chosen threshold. The authors should also investigate whether the stability gate introduces any biases or limitations in specific scenarios, such as during very slow or very fast movements. Furthermore, the authors should consider incorporating a data-driven approach to determine the threshold, potentially using a validation set to optimize its value for different activities or sensor placements. Third, the authors should expand the computational overhead analysis to include comparisons with other state-of-the-art adaptive methods. This comparison should include a breakdown of the computational cost associated with each component of the proposed method, such as the placement context classifier and the adaptive Batch Normalization layers. This would allow for a more informed assessment of the method's practicality for real-time applications, especially when compared to alternative approaches. Fourth, the authors should provide more details on the data collection process for the custom dataset. This should include the number of subjects, the range of activities, the sensor placement protocols, and the number of repetitions for each activity per subject. The authors should also specify the duration of each activity segment and the sampling rate of the sensor data. This information is crucial for understanding the robustness of the method to variations in activity performance and for enabling other researchers to reproduce the results. Fifth, the authors should investigate the method's performance in scenarios with extreme sensor orientation changes and novel activities. This could involve incorporating a more robust mechanism for handling rapid, non-linear orientation shifts, such as Kalman filtering or other state estimation methods. The authors should also explore the use of data augmentation techniques to simulate extreme orientation changes during training. Furthermore, the authors should evaluate the method's performance on a more diverse set of activities, particularly those that exhibit complex, multi-planar movements or unusual gait patterns. Finally, the authors should address the method's limitations in scenarios with sensor drift or unexpected environmental conditions. This could involve incorporating techniques to mitigate the effects of sensor drift or evaluating the method's performance under different environmental conditions. By addressing these weaknesses, the authors can significantly strengthen the paper and enhance the practical applicability of their proposed method.

❓ Questions

I have several questions that arise from my analysis of this paper. First, how does the method perform in scenarios with significant sensor drift or unexpected environmental conditions? The paper does not explicitly address these factors, and I am curious about the method's robustness under such circumstances. Second, could the stability gate mechanism be further refined to handle more complex dynamic events, such as sudden falls or high-impact activities? The current implementation relies on a simple norm-based threshold, and I wonder if a more sophisticated approach could improve the method's performance in these scenarios. Third, how does the method scale with an increasing number of sensor placements or more complex activity patterns? The current implementation is limited to three placement categories, and I am curious about the method's scalability to a larger number of placements or more complex activity patterns. Fourth, what is the rationale behind using the L2 norm of the normalized input as the basis for the stability gate? The paper states that high-magnitude signals typically indicate stable activity patterns, but I am curious about the theoretical basis for this assumption and whether other metrics could be more effective. Fifth, how sensitive is the method to the choice of hyperparameters, such as the learning rate, the batch size, and the architecture of the placement context classifier? The paper does not provide a detailed analysis of the method's sensitivity to these parameters, and I am curious about the potential impact of different hyperparameter settings on the method's performance. Finally, what are the potential limitations of the placement context classifier when dealing with novel sensor placements not seen during training? The paper acknowledges that the classifier is trained on a limited set of categories, and I am curious about the method's behavior when encountering unseen placements.

📊 Scores

Soundness:2.75

Presentation:2.75

Contribution:2.5

Rating: 4.5

AI Review from ZGCA

ZGCA Review available after:

--d --h --m --s

📋 AI Review from ZGCA will be automatically processed

📋 Summary

The paper proposes a two-stage hierarchical adaptive normalization pipeline for wearable human activity recognition (HAR) under varying sensor placement and orientation. Stage 1 performs gravity-based orientation correction, infers placement context via variance features (wrist/waist/ankle), and applies a norm-based stability gate to control adaptation. Stage 2 uses placement-conditioned adaptive Batch Normalization (momentum scaled by placement confidence) followed by a lightweight CNN classifier. The authors report consistent improvements over static baselines and state-of-the-art unsupervised domain adaptation methods, achieving up to 0.847 ± 0.023 macro F1 with 2.3 ms/window inference time and ~45 MB memory usage. The experimental section includes ablations, per-class analyses, multi-dataset evaluation, cross-subject generalization, and sensitivity studies.

✅ Strengths

Addresses an important and persistent HAR challenge: robustness to placement and orientation shifts.
Method is conceptually simple and computationally light: physics-based correction + adaptive normalization with a gating mechanism.
Placement-conditioned adaptive BN with confidence-weighted momentum is a clear mechanism tying context to adaptation.
Comprehensive empirical evaluation: ablations (Gravity-only, Naive Adaptive BN, Conditioned BN), per-class analysis, cross-subject, multi-dataset, and sensitivity to thresholds and kernel sizes.
Reported gains are substantial across placements, especially in cross-placement scenarios (e.g., wrist and ankle), with statistical significance testing (paired t-tests).
Reproducibility details include seeds, splits, multiple repeats, and hardware description.

❌ Weaknesses

Conceptual inconsistency in the stability gate: the Introduction claims it prevents harmful updates during abrupt dynamic transients (e.g., falls/high-impact events), yet the implementation allows adaptation when the L2 norm exceeds a threshold. High-impact events generally have large norms; the later justification that high magnitude implies stability contradicts the earlier motivation.
Orientation correction appears oversimplified: the rotation matrix is specified as a planar rotation around z using θ = atan2(gy, gx), which does not generally align a 3D accelerometer to gravity. A full 3D alignment typically requires constructing a rotation that maps the gravity vector to the vertical axis (e.g., via a Rodrigues rotation) and handling yaw/roll/pitch interactions.
Placement inference is trained on only three discrete placements (wrist/waist/ankle), limiting generalization to arbitrary or unseen positions. There is no OOD or 'unknown placement' handling, and the effect of misclassification on the adaptive BN is not analyzed.
Internal inconsistency in reported performance: Section 5 states kernel sizes 5/7 yield final training F1 ≈ 0.43 and validation F1 ≈ 0.49, which is difficult to reconcile with later reported F1 > 0.8 on the same model family without clarifying different datasets, settings, or metrics.
Reproducibility gaps: critical training details are missing (batch size, window length/stride, exact modalities and number of channels F, preprocessing). The UDA baselines are not fully specified (which methods/implementations, hyperparameters, and tuning budget), making it hard to assess fairness.
Efficiency claims are measured on a desktop GPU (RTX 3080). Assertions of 'on-device' viability would benefit from evaluations on representative edge platforms (mobile CPU, microcontroller) including end-to-end latency, memory, and energy.
The rationale for the norm threshold selection and its per-activity/subject behavior is under-explored; the stability gate’s robustness across noise regimes and transients needs deeper quantitative analysis.
The gravity estimate via 1 s moving average can be poor during highly dynamic activities, potentially undermining the initial normalization; no quantitative robustness analysis of gravity estimation error is provided.

❓ Questions

Stability gate: Please reconcile the apparent contradiction between the Introduction (suppressing updates during high-impact transients) and the implementation (allowing adaptation when ||x_norm|| > τ). Can you provide quantitative analyses showing how the gate behaves during falls/high-impact events vs. low-activity/noise segments? Could you report F1 with gate inverted (i.e., suppressing adaptation for large norms) and with more nuanced surrogates (e.g., spectral entropy, jerk, short-term variance of norm) to validate the design?
Orientation correction: The rotation R is specified as a 2D rotation using θ = atan2(gy, gx). How do you handle general 3D alignment to gravity? Please provide the complete formulation (e.g., constructing a rotation that maps the estimated gravity vector to the vertical axis) and clarify how gyroscope/magnetometer (if available) are used. Can you report the sensitivity of performance to gravity estimation error during dynamic activities?
Placement classifier: How is it trained and validated with respect to subject splits? Is it trained jointly with the HAR model or separately? What is its confusion matrix, and how does misclassification propagate to adaptive BN and overall F1? Do you have an 'unknown placement' or OOD mechanism (e.g., confidence thresholding) to avoid harmful conditioning?
Baselines: Which UDA methods were implemented, with what codebases and hyperparameter search? Were they allowed online/test-time adaptation comparable to your approach? Please provide full baseline details to judge fairness, including training budgets and tuning protocols.
Reproducibility: Please provide batch size, window length and stride, sensor sampling rates, sensor modalities (accelerometer/gyroscope/magnetometer), number of channels F, normalization/preprocessing steps, and exact CNN architecture (layers, feature widths, parameter count).
Performance discrepancies: Section 5 mentions final training F1 ≈ 0.43 and validation F1 ≈ 0.49 for kernel sizes 5/7, but later sections report F1 > 0.8. Are these different datasets or evaluation settings? Please clarify and reconcile these numbers.
Deployment: Could you report on-device latency, memory, and energy on a representative edge platform (e.g., ARM mobile CPU, smartwatch SoC, or microcontroller)? How frequently do you update BN statistics online, and what safeguards prevent long-term drift (beyond the 30-day claim)?
Hyperparameters: How sensitive are results to τ across subjects/activities, and did you consider adaptive τ (per subject/activity) or a learned gate? Could you share per-activity ablations of τ to demonstrate robustness?
Data splits and leakage: For the custom dataset, please detail subject demographics, activities, placements, and exact cross-subject split procedure. Are the placement classifier’s training samples fully disjoint from test subjects and sessions?
Unknown placements: Have you tested intermediate or off-category placements (e.g., upper arm, thigh, pocket)? How does the method behave when placement is unseen or ambiguous, and can the system fail gracefully (e.g., fallback momentum) in those cases?

⚠️ Limitations

Dependence on a supervised placement classifier trained on only three categories limits generalization to arbitrary sensor positions; misclassification can miscalibrate adaptive BN.
Stability gate based on the L2 norm may not reliably distinguish informative motion from high-impact transients or noisy segments, and could permit harmful updates during abrupt events.
Orientation correction is simplified and may be inaccurate during dynamic activities given the 1 s moving average gravity estimate and a planar rotation, potentially introducing residual orientation artifacts.
Efficiency claims are based on desktop GPU metrics; real-world edge deployment may face stricter memory/latency/energy constraints.
Potential negative impacts: Misclassification or maladaptation could degrade performance in safety-critical applications (e.g., fall detection). Bias may arise if datasets underrepresent certain demographics/activities/placements. Privacy concerns persist in continuous wearable sensing.
Limited coverage of new or complex activities; unknown activities or placements may require explicit handling (e.g., OOD detection) to avoid brittle behavior.

🖼️ Image Evaluation

Cross‑Modal Consistency: 16/50

Textual Logical Soundness: 12/30

Visual Aesthetics & Clarity: 10/20

Overall Score: 38/100

Detailed Evaluation (≤500 words):

Image‑first visual ground truth

Figure 1/(a): Line plots titled “Baseline: Training and Validation Loss Curves.” Axes: Epoch vs Loss (~0–45). Multiple kernel‑size legends.
Figure 1/(b): “Baseline: Training and Validation F1‑Score Curves.” Epoch vs F1 (0–0.5). Peaks ≈0.45.
Figure 2/(a): Bar chart “Cross‑Domain F1 Performance.” Bars ≈0.28 (B) and ≈0.24 (C).
Figure 2/(b): Bar chart “Cross‑Domain Test Loss.” Both ≈1.3.
Figure 2/(c): Scatter “Domain B: Ground Truth vs Predictions,” labels 0–3, tight alignment.
Multi‑dataset panels: bar charts “Multi‑Dataset Final Validation F1 Scores” (≈0.19–0.29) and “Losses” (≈1.35); additional F1‑over‑epoch plots (0.1–0.6).
Tables 1–3: HTML tables with method‑wise F1 and efficiency numbers.

1. Cross‑Modal Consistency

• Major 1: Fig. 1(b) shows F1 ≤0.5 but Sec 5.1 claims F1>0.8 for larger kernels. Evidence: “kernel sizes 5 and 7 consistently achieve superior performance (F1 > 0.8)”.

• Major 2: Fig. 2(a) bars ≈0.25–0.3 contradict caption claim of F1>0.75 across domains. Evidence: “maintains consistent performance (F1 > 0.75) across both domains”.

• Major 3: Multi‑dataset bars (≈0.19–0.29) conflict with Sec 5.5 claims (0.823/0.847/0.789). Evidence: “achieves 0.823 ± 0.021 on the Opportunity dataset… 0.847 ± 0.023… 0.789 ± 0.019”.

• Minor 1: Fig. 1 titles say “Baseline,” while Sec 5.1 text attributes curves to the proposed model.

• Minor 2: Figure numbering/captions in text (Figs. 3–5) do not clearly correspond to provided panels; datasets anonymized as “Dataset1–3”.

2. Text Logic

• Major 1: Internal inconsistency on achievable F1. Sec 5 states “kernel sizes 5 and 7 yielded final training F1 ≈0.43 and validation ≈0.49,” yet Sec 5.1 says “peak F1‑scores above 0.8.” Evidence: “final training F1‑scores around 0.43 and validation F1‑scores near 0.49”.

• Minor 1: Stability‑gate rationale equates high‑norm signals with stability; justification is heuristic and may invert typical notions of instability.

• Minor 2: Claims of +36% and +13.7% rely on Table 1, but the figures widely disagree, weakening the argument chain.

3. Figure Quality

• Minor 1: Several panels use very small fonts/legends; difficult at print size to distinguish all kernel lines.

• Minor 2: Generic dataset labels (“Dataset1–3”) and missing units hinder stand‑alone interpretability.

• Minor 3: Key claims (e.g., error bars/CI for F1) absent in plots; bars lack variance depiction.

Key strengths:

Clear modular method description (orientation correction, placement inference, stability gate, conditioned BN).
Ablation/table structure, reproducibility details, and efficiency reporting are thorough.

Key weaknesses:

Severe figure–text numeric mismatches on core results (F1 levels, cross‑domain, multi‑dataset).
Internal textual inconsistency about achieved F1.
Figures titled “Baseline” where text claims proposed method; ambiguous mapping and low stand‑alone clarity.

📊 Scores

Originality:3

Quality:2

Clarity:2

Significance:2

Soundness:2

Presentation:2

Contribution:2

Rating: 4

AI Review from SafeReviewer

SafeReviewer Review available after:

--d --h --m --s

📋 AI Review from SafeReviewer will be automatically processed

📋 Summary

This paper introduces a hierarchical adaptive normalization method designed to enhance the robustness of wearable Human Activity Recognition (HAR) systems when sensors are placed in varying locations and orientations. The core idea revolves around a two-stage cascade. The first stage employs a gravity-based orientation correction to align sensor data with the world coordinate system, followed by a placement context inference module that uses signal variance analysis to identify the sensor's location (wrist, waist, or ankle). This stage also incorporates a stability gate, which prevents adaptation during periods of high signal instability, ensuring that the model is not misled by transient noise or abrupt movements. The second stage refines the normalized features using a placement-conditioned adaptive Batch Normalization (BN) layer, which adjusts its parameters based on the inferred sensor placement. This adaptive BN is designed to further reduce the impact of sensor misplacement. The authors evaluate their method on both public and custom datasets, reporting a significant improvement in macro F1-score compared to static baselines and other state-of-the-art unsupervised domain adaptation methods. They also emphasize the method's low computational overhead, making it suitable for real-time, on-device deployment. The paper's main contribution lies in its hierarchical approach, which combines physics-based correction with data-driven adaptation, and the introduction of a stability gate to ensure reliable performance during dynamic activities. The experimental results, while promising, are primarily focused on a specific sensor location (waist, wrist, or ankle) and do not explore the performance of the method when multiple sensors are used simultaneously. The paper also lacks a detailed analysis of the method's performance on different sensor modalities and under varying environmental conditions. Despite these limitations, the proposed method represents a significant step towards more robust and reliable wearable HAR systems.

✅ Strengths

I found several aspects of this paper to be particularly compelling. The core strength lies in the novel hierarchical approach to addressing sensor misplacement in wearable HAR. The combination of a physics-based gravity correction with a data-driven adaptive normalization technique is a clever way to tackle the problem. The gravity-based orientation correction, while not entirely new, is effectively integrated into the overall framework. The use of a stability gate to prevent adaptation during unstable periods is another significant contribution. This mechanism, which is based on the norm of the input signal, helps to ensure that the model is not misled by transient noise or abrupt movements, which is a common issue in wearable sensor data. The experimental results, while not without limitations, demonstrate the effectiveness of the proposed method. The reported improvement in macro F1-score compared to static baselines and other state-of-the-art unsupervised domain adaptation methods is substantial. The authors also provide evidence of the method's low computational overhead, which is crucial for real-time, on-device deployment. The ablation studies, while not exhaustive, provide valuable insights into the contribution of each component of the proposed method. The inclusion of a 'Conditioned BN Only' baseline in the ablation study helps to isolate the effect of the placement-conditioned adaptive BN. Furthermore, the paper is generally well-written and easy to follow, which is essential for effective communication of complex technical ideas. The authors clearly articulate their methodology and provide sufficient detail for others to understand and potentially replicate their work. The use of both public and custom datasets also adds to the credibility of the experimental results. The custom dataset, while not fully described, appears to have been carefully designed to evaluate the method's performance under challenging conditions.

❌ Weaknesses

Despite the strengths of this paper, I have identified several weaknesses that warrant careful consideration. First, the paper's evaluation is limited by the choice of baselines. While the authors include a 'Conditioned BN Only' baseline in the ablation study (Table 4), the main results (Table 1) do not include a direct comparison with a standard, non-adaptive BN layer trained with placement labels. This makes it difficult to isolate the specific contribution of the proposed stability gate and the placement-conditioned adaptive BN. Without this comparison, it is hard to determine if the performance gains are due to the adaptive mechanism or simply the inclusion of placement information as a conditional variable. This is a critical omission, as it prevents a clear understanding of the effectiveness of the proposed method compared to a more standard approach. My confidence in this weakness is high, as it is directly observable from the tables. Second, the paper's experimental setup is not as clear as it could be. While the authors describe the training procedure, they do not explicitly state whether the baseline models are trained with or without placement labels. This lack of clarity makes it difficult to interpret the results and to understand the true contribution of the proposed method. Furthermore, the paper does not provide a detailed description of the custom dataset, including the types of activities, the number of subjects, and the sensor configurations. This lack of information makes it difficult to assess the generalizability of the results. My confidence in this weakness is medium, as it is based on the lack of explicit statements in the paper. Third, the paper's focus on single-sensor scenarios limits its applicability to more complex HAR systems. The paper does not address the challenges of multi-sensor placement variability, which is a common scenario in real-world applications. The authors do not discuss how their method would handle different combinations of sensor placements or how it would scale to a larger number of sensors. This is a significant limitation, as many HAR systems rely on data from multiple sensors. My confidence in this weakness is high, as the paper's focus on single-sensor scenarios is evident throughout the text. Fourth, the paper's evaluation is limited to a specific set of sensor locations (waist, wrist, and ankle). The stability gate, which is trained on these locations, may not generalize well to other body parts. The paper does not provide any evidence to support the generalization of the stability gate to other sensor locations. This is a critical limitation, as it restricts the applicability of the method to a limited set of sensor placements. My confidence in this weakness is high, as the training data for the stability gate is limited to wrist, waist, and ankle. Fifth, the paper's description of the stability gate's mechanism is not entirely clear. While the paper states that the gate uses the L2 norm of the input signal to determine stability, it does not provide a detailed explanation of how this norm is calculated or how it relates to the signal's stability. The paper also does not discuss the potential limitations of using a norm-based approach, such as its sensitivity to outliers or its inability to capture more complex patterns of instability. My confidence in this weakness is medium, as the paper provides a basic description of the stability gate but lacks a detailed explanation. Sixth, the paper's presentation of the results is not always clear. For example, Figure 1, which shows the training dynamics, is not particularly informative and could be moved to the appendix. The paper also lacks a clear explanation of the domain adaptation setup, including the specific domains used and the evaluation protocol. My confidence in this weakness is medium, as it is based on my subjective assessment of the figure's informativeness and the lack of clarity in the domain adaptation setup. Finally, the paper's discussion of the limitations of the proposed method is not as thorough as it could be. While the authors acknowledge some limitations, they do not fully explore the potential challenges of deploying the method in real-world scenarios. For example, they do not discuss the potential impact of sensor drift, battery limitations, or varying environmental conditions. My confidence in this weakness is medium, as the paper does acknowledge some limitations but does not fully explore their implications.

💡 Suggestions

To address the identified weaknesses, I recommend several concrete improvements. First, the authors should include a more comprehensive set of baselines in their experimental evaluation. Specifically, they should include a standard Batch Normalization (BN) layer trained with placement labels as a baseline. This would allow for a more direct comparison with the proposed method and would help to isolate the specific contribution of the stability gate and the placement-conditioned adaptive BN. Second, the authors should provide a more detailed description of the experimental setup, including whether the baseline models are trained with or without placement labels. They should also provide a more detailed description of the custom dataset, including the types of activities, the number of subjects, and the sensor configurations. This would improve the reproducibility of the results and would allow for a more thorough assessment of the method's generalizability. Third, the authors should extend their method to handle multi-sensor scenarios. This could involve developing a mechanism for handling different combinations of sensor placements and for scaling the method to a larger number of sensors. This would significantly increase the applicability of the method to real-world HAR systems. Fourth, the authors should evaluate the performance of their method on a wider range of sensor locations. This would help to assess the generalizability of the stability gate and would provide a more comprehensive evaluation of the method's robustness. Fifth, the authors should provide a more detailed explanation of the stability gate's mechanism. This should include a clear explanation of how the L2 norm is calculated and how it relates to the signal's stability. They should also discuss the potential limitations of using a norm-based approach and explore alternative approaches for stability detection. Sixth, the authors should improve the presentation of their results. Specifically, they should consider moving Figure 1 to the appendix and providing a more detailed explanation of the domain adaptation setup. This would make the paper more concise and easier to follow. Finally, the authors should provide a more thorough discussion of the limitations of their method. This should include a discussion of the potential impact of sensor drift, battery limitations, and varying environmental conditions. They should also discuss the potential challenges of deploying the method in real-world scenarios and suggest directions for future research. These changes would significantly improve the quality and impact of the paper.

❓ Questions

I have several questions that arise from my analysis of this paper. First, how does the proposed method perform when the sensor is placed on a non-standard body location, such as the chest or the back? The current evaluation is limited to wrist, waist, and ankle placements, and it is unclear how the stability gate and the placement-conditioned adaptive BN would generalize to other locations. Second, what is the impact of sensor noise on the performance of the proposed method? The paper mentions that the stability gate helps to prevent harmful adaptation during noisy periods, but it does not provide a detailed analysis of the method's robustness to different levels of sensor noise. Third, how does the proposed method compare to other state-of-the-art methods for handling sensor misplacement, such as domain adaptation techniques? The paper compares against some domain adaptation methods, but it does not provide a detailed analysis of the strengths and weaknesses of each approach. Fourth, what is the computational overhead of the proposed method compared to a standard BN layer? The paper mentions that the method has low computational overhead, but it does not provide a detailed analysis of the computational cost of each component. Fifth, how does the proposed method handle changes in sensor orientation? The paper focuses primarily on sensor placement variability, but it does not provide a detailed analysis of the method's robustness to changes in sensor orientation. Finally, what is the impact of different sensor modalities on the performance of the proposed method? The paper focuses primarily on accelerometer data, but it does not provide a detailed analysis of the method's performance with other sensor modalities, such as gyroscopes or magnetometers. These questions are important for understanding the limitations of the proposed method and for guiding future research in this area.

📊 Scores

Soundness:2.5

Presentation:2.25

Contribution:2.25

Rating: 4.25

Keywords

Click the button to extract keywords

Insights

Click the button to extract insights

Paper Content

🎓 Meta Review & Human Decision

Decision:

Meta Review:

AI Review from DeepReviewer

📋 Summary

✅ Strengths

❌ Weaknesses

💡 Suggestions

❓ Questions

📊 Scores

AI Review from ZGCA

📋 Summary

✅ Strengths

❌ Weaknesses

❓ Questions

⚠️ Limitations

🖼️ Image Evaluation

📊 Scores

AI Review from SafeReviewer

📋 Summary

✅ Strengths

❌ Weaknesses

💡 Suggestions

❓ Questions

📊 Scores

Keywords

Insights

📝 Cite This Paper