2510.0019 Hierarchical Adaptive Normalization: A Placement-Conditioned Cascade for Robust Wearable Activity Recognition v2

🎯 ICAIS2025 Submission

AI Review from DeepReviewer

AI Review available after:
--d --h --m --s

📋 AI Review from DeepReviewer will be automatically processed

📋 Summary

This paper introduces a novel hierarchical adaptive normalization method designed to address the challenges of sensor misplacement and misorientation in wearable Human Activity Recognition (HAR) systems. The core contribution lies in a two-stage cascade process that combines physics-based orientation correction with adaptive normalization techniques. The first stage employs gravity-based orientation correction to account for sensor orientation changes, alongside a placement context inference mechanism that classifies the sensor's location (wrist, waist, or ankle). A key innovation is the inclusion of a stability gate, which prevents harmful updates to the model during periods of unstable sensor readings, thereby enhancing the robustness of the adaptation process. The second stage refines feature representations using placement-conditioned adaptive Batch Normalization, which adjusts the normalization parameters based on the inferred sensor placement. The authors evaluate their method on both public and custom datasets, demonstrating significant improvements over static baselines and state-of-the-art unsupervised domain adaptation methods. Specifically, the proposed method achieves a macro F1-score of 0.847, outperforming static baselines by 36% and unsupervised domain adaptation methods by 13.7%. The method also exhibits low computational overhead, with an inference time of only 2.3 ms and memory usage of 45.2 MB, making it suitable for real-time, on-device deployment. The paper includes ablation studies to analyze the contribution of each component, a hyperparameter sensitivity analysis, and cross-subject generalization experiments, providing a comprehensive evaluation of the proposed approach. The authors also demonstrate the method's robustness to noise and its ability to adapt to different sampling rates. Overall, the paper presents a well-motivated and effective solution to a critical problem in wearable HAR, with strong empirical results and practical implications for real-world applications.

✅ Strengths

I found several aspects of this paper to be particularly strong. The hierarchical adaptive normalization method is a novel and well-motivated approach to addressing the challenges of sensor misplacement and misorientation in wearable HAR. The combination of physics-based orientation correction with adaptive normalization, conditioned on sensor placement context, is a significant contribution. The inclusion of the stability gate, inspired by robotics, is another notable innovation. This mechanism effectively prevents harmful updates during unstable periods, enhancing the robustness of the adaptation process. The authors have also demonstrated the method's effectiveness through extensive experiments, showing significant performance improvements over static baselines and state-of-the-art unsupervised domain adaptation methods. The reported macro F1-score of 0.847, with a 36% improvement over static baselines and a 13.7% improvement over unsupervised domain adaptation, is a substantial achievement. Furthermore, the method's low computational overhead, with an inference time of only 2.3 ms and memory usage of 45.2 MB, makes it highly suitable for real-time, on-device deployment in dynamic real-world scenarios. The comprehensive evaluation, including ablation studies, hyperparameter sensitivity analysis, and cross-subject generalization experiments, strengthens the validity of the results. The authors also show the method's robustness to noise and its ability to adapt to different sampling rates, further demonstrating its practical applicability. The clear and concise writing style, along with the well-organized structure of the paper, made it easy to follow and understand the proposed method and its contributions. Overall, the paper presents a strong technical contribution with significant practical implications for the field of wearable HAR.

❌ Weaknesses

Despite the strengths of this paper, I have identified several weaknesses that warrant further consideration. Firstly, the stability gate, while a novel component, lacks a thorough analysis of its behavior under various conditions. While the paper describes the gate's implementation as a simple norm-based threshold and mentions that the threshold is determined through validation set optimization, it does not provide a detailed analysis of how this threshold is selected and how it adapts to different activity types and sensor placements. The paper states that the stability threshold T is determined through validation set optimization, where F1-scores are evaluated across different threshold values (0.1, 0.2, 0.3, 0.4, 0.5), and the optimal value of T=0.3 is selected. However, the paper does not provide a clear methodology for selecting the threshold T, which is a critical parameter for the stability gate. The lack of a systematic approach to threshold selection could lead to suboptimal performance in real-world applications where the characteristics of the data may vary significantly. The paper does include a hyperparameter sensitivity analysis, which evaluates the impact of different stability threshold values, but this analysis is limited to a single validation set and does not explore the generalizability of the threshold across diverse datasets and activity types. The paper also lacks visualizations of the stability gate's output, showing when the gate is active and inactive for different activities and subjects, which would provide a better understanding of its functionality. This lack of detailed analysis makes it difficult to fully assess the robustness and reliability of the stability gate. My confidence in this weakness is medium, as the paper does provide some analysis, but it is not comprehensive enough. Secondly, the placement context classifier is trained on a limited set of categories (wrist, waist, ankle), which may not cover all real-world sensor placements. The paper explicitly states that the classifier is trained on labeled data from these three sensor placements. This limitation could affect the method's generalizability to other placements. The classifier's performance on unseen placements is unknown, and the method's reliance on a predefined set of placements could hinder its applicability in diverse real-world scenarios where sensors might be placed in non-standard locations, such as the chest or clavicle. The paper does not explore the method's performance with sensors placed in other locations, which is a significant limitation. My confidence in this weakness is high, as the paper explicitly states the limited training categories. Finally, the method assumes sufficient sampling rates, which may not always hold for resource-constrained devices. While the paper includes a “Sampling Rate Independence” analysis, showing performance across 25Hz, 50Hz, and 100Hz, it does not provide a detailed analysis of the method's performance under different sampling rates, particularly those lower than the rates used in the experiments. The paper does not define what constitutes “sufficient” sampling rates or discuss the limitations at very low sampling rates for resource-constrained devices. The reliance on high sampling rates for accurate feature extraction and normalization might pose a challenge for devices with limited processing capabilities or in scenarios where power consumption is a critical constraint. My confidence in this weakness is medium, as the paper does test different sampling rates, but does not explore the impact of very low rates or the computational implications for resource-constrained devices at those rates.

💡 Suggestions

Based on the identified weaknesses, I recommend several concrete improvements. Firstly, the authors should conduct a more thorough analysis of the stability gate's behavior. This should include a more detailed explanation of how the stability threshold is determined and how it adapts to different activity types and sensor placements. A sensitivity analysis of the stability threshold is needed to understand its impact on the method's performance. This analysis should include a range of threshold values and their effects on both adaptation speed and accuracy. Furthermore, the authors should consider exploring alternative methods for determining stability, such as using a moving average of the signal or incorporating a measure of signal variance. This would make the method more robust to variations in sensor data and activity types. The paper should also include visualizations of the stability gate's behavior under different conditions, such as during stable and unstable periods, to provide a better understanding of its functionality. This would help in assessing the robustness of the stability gate and its ability to prevent harmful adaptations. Secondly, to improve the generalizability of the method, the authors should evaluate its performance with sensors placed in other locations, such as the chest or clavicle. This would provide a more comprehensive understanding of the method's ability to handle different sensor placements. The authors should also consider exploring methods for adapting the placement context classifier to new sensor locations, such as using transfer learning or domain adaptation techniques. This would make the method more flexible and applicable to a wider range of wearable sensor applications. The paper should also include a discussion of the potential challenges of using the method with different sensor placements and how these challenges might be addressed. Finally, the authors should provide a detailed analysis of the method's performance under different sampling rates, particularly those lower than the rates used in the experiments. This is a critical consideration for real-world deployment on devices with limited processing capabilities. The authors should also consider exploring methods for reducing the computational overhead of the method, such as using model compression techniques or optimizing the implementation for specific hardware platforms. This would make the method more practical for deployment on resource-constrained devices. The paper should also include a discussion of the trade-offs between performance and computational cost and how these trade-offs might be managed in practice. These improvements would address the identified weaknesses and enhance the robustness and applicability of the proposed method.

❓ Questions

I have several questions that arise from my analysis of this paper. Firstly, regarding the stability gate, I am curious about the potential for using a single threshold across all activities or placements, or at least a reduced set of thresholds. The current approach uses a single threshold T, but it is unclear if this threshold is optimal for all activities and placements. I would like to know if the authors have explored this possibility and what the trade-offs might be. Secondly, I am interested in the sensitivity of the method’s performance to variations in the stability threshold T. While the paper includes a hyperparameter sensitivity analysis, I would like to see a more detailed analysis or guidelines for selecting T in different scenarios, especially considering the variability in sensor data and activity patterns. Thirdly, I am curious about how the method performs with sensor placements not included in the training data, such as the chest or clavicle. The current placement context classifier is trained on wrist, waist, and ankle data, and I would like to know how the method would handle sensors placed in other locations. Fourthly, I would like to know if the authors have explored alternative stability metrics for the stability gate, and how they compare to the norm-based approach. The current approach might be too simplistic, and I am curious if more sophisticated metrics could potentially improve robustness. Fifthly, I am interested in how the method performs with lower sampling rates typical of resource-constrained devices. The paper shows results for different sampling rates, but I would like to know the limitations at very low sampling rates and how the method might be adapted for such scenarios. Finally, I am curious about the impact of the stability gate on the adaptation process during high-frequency activities. The paper mentions that the stability gate prevents harmful updates during unstable periods, but I would like to know how this affects the model's ability to adapt to changes in the data during high-frequency activities. These questions target key uncertainties and methodological choices, seeking clarification of critical assumptions and limitations.

📊 Scores

Soundness:2.75
Presentation:2.75
Contribution:2.75
Rating: 5.75

AI Review from ZGCA

ZGCA Review available after:
--d --h --m --s

📋 AI Review from ZGCA will be automatically processed

📋 Summary

The paper proposes Hierarchical Adaptive Normalization for wearable HAR to address performance degradation from sensor placement and orientation variability. The method has two stages: Stage 1 performs gravity-based orientation correction and infers a "placement context" via variance features with a small classifier; a stability gate based on the L2 norm of the normalized input allows or suppresses adaptive updates. Stage 2 applies a placement-conditioned adaptive Batch Normalization whose momentum depends on the inferred placement and confidence; a lightweight CNN produces final predictions. The approach targets real-time deployment and reports macro F1 improvements (0.847 ± 0.023) over static and state-of-the-art UDA baselines, with low latency (2.3 ms) and modest memory (45.2 MB). Extensive experiments include ablations, multi-dataset tests (including Opportunity and a custom dataset), cross-subject analysis, robustness to noise/sampling rates, and sensitivity to hyperparameters (e.g., stability threshold τ and kernel size).

✅ Strengths

  • Addresses a practically significant and well-motivated problem in wearable HAR: cross-placement and orientation robustness (Sections 1, 3.1).
  • Hierarchical design combining physics-based orientation correction, placement inference via variance features, and placement-conditioned adaptive BN with a simple stability gate (Sections 4, 4.1).
  • Lightweight implementation and focus on real-time feasibility; reports low latency and moderate memory usage (Sections 5, 5.6).
  • Broad empirical coverage: comparisons to multiple baselines and state-of-the-art methods, ablations isolating each component, robustness to noise and sampling rates, cross-subject generalization, and hyperparameter sensitivity (Sections 5.2–5.9).
  • Clear ablation narrative showing incremental gains from gravity correction, adaptive BN, conditioning, and gating (Section 5.4).

❌ Weaknesses

  • Orientation correction appears technically insufficient: Section 4.1 uses θ = atan2(g_y, g_x) and a 2D rotation about z, which cannot fully align a 3D gravity vector to [0,0,1] (pitch/roll remain uncorrected). This undermines the claimed gravity-based orientation normalization in Stage 1.
  • Reproducibility ambiguity: experiments are said to fix random seed at 42 and also to repeat 5 times with different random initializations (Section 5), making the reported standard deviations difficult to interpret; batch size and windowing parameters are not specified.
  • Inconsistent performance reporting: Section 5 states kernel sizes 5 and 7 yielded final training F1 around 0.43 and validation around 0.49, which conflicts with later reported F1 > 0.8. This raises concerns about the correctness of reported numbers or their definitions.
  • Stability gate design assumption is debatable: adaptation is enabled when ||x_norm|| > τ (Section 4.1), presuming high-magnitude periods are stable/ informative. This may suppress adaptation during low-magnitude but stable sedentary activities, and allow adaptation during high-impact transients. More principled stability criteria or empirical evidence are needed.
  • Placement classifier is limited to three categories (wrist, waist, ankle) trained on 1000 samples per class from 10 subjects (Section 4); generalization to other placements and misclassification impact on downstream adaptation are not analyzed.
  • Clarity issues: Stage 1 mentions a 'non-affine Batch Normalization layer applied along the feature axis' for orientation normalization alongside explicit rotation matrices (Section 4), creating confusion about what actually performs orientation correction. Dataset specification is vague (e.g., 'e.g., the Opportunity dataset') and lacks pre-processing/windowing details.
  • On-device claims are not fully substantiated: latency and memory are measured on an RTX 3080 (Section 5), not on embedded hardware; 45 MB may be high for many wearables.
  • Significance claims rely on t-tests with small N (5 runs) and unclear independence; without clear randomization protocols and batch size/windowing details, the strength of statistical support is limited.

❓ Questions

  • Orientation correction: Please provide the full 3D derivation for aligning the sensor frame to the world frame using the gravity vector (e.g., computing a rotation that maps g to [0,0,1], including pitch and roll). If gyroscope/magnetometer fusion is not used, how do you mitigate dynamic acceleration bias in gravity estimation? Is the current implementation really only a rotation about z as in Section 4.1?
  • Randomization protocol: Clarify how results are both run with 'random seeds fixed at 42' and also 'repeated 5 times with different random initializations.' If seeds differ across runs, list them; if not, explain how independent initializations were achieved. Please also report batch size, window length/stride, segmentation procedure, and all data pre-processing steps.
  • Performance inconsistency: Section 5 states 'final training F1 around 0.43 and validation around 0.49' for kernel sizes 5 and 7, yet later F1 > 0.8. Is this a typo, or are these different metrics/datasets? Please reconcile and provide a consistent table of train/val/test metrics.
  • Stability gate: What fraction of time does the gate enable updates (m=1) per activity (sitting, standing, walking, running, jumping)? How sensitive are results to τ beyond the coarse grid in Section 5.8? Can you provide an ROC-style analysis of gate decisions versus a ground-truth notion of 'stable' periods?
  • Low-magnitude activities: How does the gate avoid suppressing useful adaptation during sedentary but stable activities where norms are low? Did you try the inverse criterion or activity-aware thresholds?
  • Placement classifier: How robust is the system to misclassification of placement? Please report performance conditioned on correct vs. incorrect placement predictions and a stress test with out-of-class placements (e.g., thigh, chest).
  • Ablations vs. prior TTA baselines: How does your method compare to standard test-time adaptation methods such as BN adapt-only, TENT/entropy minimization, and other calibration-free TTA under identical protocols?
  • On-device viability: Can you report latency, memory, and energy on representative embedded platforms (e.g., ARM microcontroller or mobile SoC)? What is the window duration and model footprint that yield 2.3 ms on RTX 3080, and how do these translate to embedded devices?
  • Data details and ethics: For the custom dataset of 15 subjects, please provide sensor modalities, sampling rates, windowing, annotation protocol, subject demographics, and IRB/consent details. How are the 1000 samples per placement for the placement classifier related to this dataset (10 subjects vs. 15 subjects)?
  • Statistical testing: With N=5 runs, how were paired t-tests constructed (pairing definition across methods)? Please provide effect sizes and confidence intervals alongside p-values.

⚠️ Limitations

  • Orientation correction as currently described likely does not fully realign 3D sensor frames, potentially limiting robustness under complex motion.
  • Placement conditioning depends on a classifier trained on three placements; out-of-distribution placements may degrade performance, and misclassifications may miscalibrate BN momentum.
  • Stability gate’s norm-based criterion may be suboptimal for low-magnitude but stable behaviors, and may permit adaptation during high-impact transients; more principled stability measures could improve reliability.
  • Reproducibility gaps (seed handling, batch size, windowing) and inconsistent reported metrics hinder verification and adoption.
  • On-device deployment claims are not demonstrated on actual embedded hardware; 45 MB memory may be prohibitive for many wearables.
  • Potential societal impacts: Misclassification or unstable adaptation in health/safety monitoring could lead to false alarms or missed events; dataset bias (limited placements/subjects) may reduce fairness across demographics and use-cases; long-term on-device adaptation risks drift if gating fails.

🖼️ Image Evaluation

Cross‑Modal Consistency: 16/50

Textual Logical Soundness: 12/30

Visual Aesthetics & Clarity: 10/20

Overall Score: 38/100

Detailed Evaluation (≤500 words):

Image‑first visual ground truth

  • Figure 1/(a): Line plots titled “Baseline: Training and Validation Loss Curves.” Axes: Epoch vs Loss (~0–45). Multiple kernel‑size legends.
  • Figure 1/(b): “Baseline: Training and Validation F1‑Score Curves.” Epoch vs F1 (0–0.5). Peaks ≈0.45.
  • Figure 2/(a): Bar chart “Cross‑Domain F1 Performance.” Bars ≈0.28 (B) and ≈0.24 (C).
  • Figure 2/(b): Bar chart “Cross‑Domain Test Loss.” Both ≈1.3.
  • Figure 2/(c): Scatter “Domain B: Ground Truth vs Predictions,” labels 0–3, tight alignment.
  • Multi‑dataset panels: bar charts “Multi‑Dataset Final Validation F1 Scores” (≈0.19–0.29) and “Losses” (≈1.35); additional F1‑over‑epoch plots (0.1–0.6).
  • Tables 1–3: HTML tables with method‑wise F1 and efficiency numbers.

1. Cross‑Modal Consistency

• Major 1: Fig. 1(b) shows F1 ≤0.5 but Sec 5.1 claims F1>0.8 for larger kernels. Evidence: “kernel sizes 5 and 7 consistently achieve superior performance (F1 > 0.8)”.

• Major 2: Fig. 2(a) bars ≈0.25–0.3 contradict caption claim of F1>0.75 across domains. Evidence: “maintains consistent performance (F1 > 0.75) across both domains”.

• Major 3: Multi‑dataset bars (≈0.19–0.29) conflict with Sec 5.5 claims (0.823/0.847/0.789). Evidence: “achieves 0.823 ± 0.021 on the Opportunity dataset… 0.847 ± 0.023… 0.789 ± 0.019”.

• Minor 1: Fig. 1 titles say “Baseline,” while Sec 5.1 text attributes curves to the proposed model.

• Minor 2: Figure numbering/captions in text (Figs. 3–5) do not clearly correspond to provided panels; datasets anonymized as “Dataset1–3”.

2. Text Logic

• Major 1: Internal inconsistency on achievable F1. Sec 5 states “kernel sizes 5 and 7 yielded final training F1 ≈0.43 and validation ≈0.49,” yet Sec 5.1 says “peak F1‑scores above 0.8.” Evidence: “final training F1‑scores around 0.43 and validation F1‑scores near 0.49”.

• Minor 1: Stability‑gate rationale equates high‑norm signals with stability; justification is heuristic and may invert typical notions of instability.

• Minor 2: Claims of +36% and +13.7% rely on Table 1, but the figures widely disagree, weakening the argument chain.

3. Figure Quality

• Minor 1: Several panels use very small fonts/legends; difficult at print size to distinguish all kernel lines.

• Minor 2: Generic dataset labels (“Dataset1–3”) and missing units hinder stand‑alone interpretability.

• Minor 3: Key claims (e.g., error bars/CI for F1) absent in plots; bars lack variance depiction.

Key strengths:

  • Clear modular method description (orientation correction, placement inference, stability gate, conditioned BN).
  • Ablation/table structure, reproducibility details, and efficiency reporting are thorough.

Key weaknesses:

  • Severe figure–text numeric mismatches on core results (F1 levels, cross‑domain, multi‑dataset).
  • Internal textual inconsistency about achieved F1.
  • Figures titled “Baseline” where text claims proposed method; ambiguous mapping and low stand‑alone clarity.

📊 Scores

Originality:3
Quality:2
Clarity:2
Significance:2
Soundness:2
Presentation:2
Contribution:2
Rating: 3

AI Review from SafeReviewer

SafeReviewer Review available after:
--d --h --m --s

📋 AI Review from SafeReviewer will be automatically processed

📋 Summary

This paper introduces a hierarchical adaptive normalization method designed to enhance the robustness of wearable human activity recognition (HAR) systems when sensor placement and orientation vary. The core idea revolves around a two-stage cascade. In the first stage, the method employs gravity-based orientation correction to align sensor data with the gravitational vector, aiming to mitigate the impact of sensor orientation changes. This is followed by a placement context inference mechanism that uses signal variance to categorize the sensor's location (e.g., wrist, waist, ankle). A stability gate, inspired by robotics, is incorporated to prevent harmful updates during unstable periods, such as abrupt movements or high-impact activities. The second stage utilizes a placement-conditioned adaptive Batch Normalization (BN) to refine feature representations in real-time, adapting to the specific sensor placement context. The authors evaluate their method on a custom dataset, demonstrating improved performance compared to baseline approaches and some state-of-the-art methods. The method is presented as computationally efficient, making it suitable for real-time, on-device applications. The paper's main contribution lies in the integration of these components into a unified framework that addresses the challenges of sensor variability in wearable HAR. The experimental results, while showing promise, are primarily based on a custom dataset, and the comparisons with state-of-the-art methods could be more comprehensive. Overall, the paper presents an interesting approach to a significant problem in wearable sensing, but it would benefit from further validation and a more thorough comparison with existing techniques.

✅ Strengths

I found several aspects of this paper to be commendable. The core strength lies in the paper's attempt to address a very relevant problem in wearable sensor-based human activity recognition: the variability introduced by changes in sensor placement and orientation. The proposed hierarchical adaptive normalization method, with its two-stage cascade, is a well-structured approach to tackle this issue. The use of gravity-based orientation correction is a logical first step in mitigating the effects of sensor orientation changes. The subsequent placement context inference, using signal variance, provides a way to categorize sensor locations, which is crucial for adapting the processing pipeline. The inclusion of a stability gate, inspired by robotics, is a novel attempt to prevent harmful updates during unstable periods, and the use of signal norm as a stability indicator is a practical choice. The adaptive Batch Normalization, conditioned on the inferred placement context, is a reasonable way to refine feature representations. The experimental results, although primarily based on a custom dataset, do show that the proposed method achieves improved performance compared to baseline approaches and some state-of-the-art methods. The authors also emphasize the computational efficiency of their method, which is a critical factor for real-time, on-device applications. The ablation studies, while not exhaustive, do provide some insights into the contribution of each component. The paper is generally well-organized and easy to follow, which facilitates understanding of the proposed method. The authors also provide a clear description of the experimental setup and the evaluation metrics. The use of a custom dataset, while raising some concerns about generalizability, also allows for a controlled evaluation of the method's performance under specific conditions. The paper's focus on real-time performance and on-device deployment is also a significant strength, as these are crucial considerations for practical wearable applications.

❌ Weaknesses

Despite the strengths, I have identified several weaknesses that warrant careful consideration. First, the paper's reliance on a custom dataset as the primary evaluation dataset is a significant concern. While the authors do include results on the Opportunity dataset, the core quantitative comparisons in Tables 1 and 2 are based on the custom dataset. This raises questions about the generalizability of the findings. The paper lacks detailed information about the custom dataset's characteristics, such as the number of subjects, the specific activities performed, and the duration of the recordings. This lack of transparency makes it difficult to assess the dataset's representativeness and potential biases. The paper also does not provide a clear justification for the choice of activities included in the custom dataset, nor does it discuss how these activities relate to real-world scenarios. This lack of detail about the custom dataset limits my confidence in the results. Second, the paper's comparison with state-of-the-art methods is not as comprehensive as it could be. While the authors compare their method against some existing techniques, the selection of these methods appears somewhat arbitrary. The paper does not provide a clear rationale for why these specific methods were chosen and why other relevant approaches were excluded. For instance, the paper does not compare against methods that explicitly model sensor orientation using quaternion-based representations or those that employ signal decomposition techniques to isolate gravitational components. The absence of a more thorough comparison with a wider range of state-of-the-art methods makes it difficult to assess the true novelty and effectiveness of the proposed approach. The paper also does not discuss the limitations of the chosen comparison methods and why they are not suitable for the problem at hand, which further weakens the justification for the proposed method. Third, the paper's discussion of the stability gate, while interesting, lacks sufficient detail. The paper mentions that the stability gate is inspired by robotics applications, but it does not provide specific citations to relevant robotics literature. The paper also does not fully explain how the stability gate differentiates between actual activity changes and unstable periods. The paper uses a simple norm-based threshold, but it does not discuss the limitations of this approach or how it might be affected by different types of activities or sensor placements. The paper also does not provide a detailed analysis of the stability gate's performance under various challenging conditions, such as abrupt movements or high-impact activities. Fourth, the paper's explanation of the placement-conditioned adaptive Batch Normalization (BN) is somewhat vague. The paper states that the BN parameters are adjusted based on the inferred placement context, but it does not provide a detailed explanation of how this is achieved. The paper does not clarify whether the method uses separate BN layers for each placement or if it adapts a single BN layer based on the context. The paper also does not discuss the specific parameters of the BN that are adapted and how these parameters are updated during training. This lack of clarity makes it difficult to fully understand the inner workings of the proposed method. Fifth, the paper's description of the experimental setup lacks some crucial details. The paper mentions that the CNN is trained on data from a single sensor placement, but it does not specify which placement (wrist, waist, or ankle). The paper also does not provide a clear explanation of how the baseline methods are trained and evaluated. The paper mentions that the baselines are trained on a single placement, but it does not discuss the implications of this choice for cross-placement generalization. The paper also does not provide a clear definition of the domains used in the cross-domain evaluation, which makes it difficult to interpret the results. Finally, the paper's presentation could be improved. The paper uses some abbreviations without explicit definitions, which can make it difficult to follow. The paper also lacks a clear visual representation of the proposed method, such as a flowchart or block diagram, which would greatly enhance the reader's understanding. The paper also does not provide a clear explanation of the evaluation metrics used, which makes it difficult to interpret the results. The paper also does not provide a detailed analysis of the computational cost of the proposed method, which is a critical factor for real-time applications.

💡 Suggestions

Based on the identified weaknesses, I recommend several concrete improvements. First, the authors should conduct a more thorough evaluation of their method using multiple publicly available datasets, in addition to their custom dataset. This would help to assess the generalizability of their findings and address concerns about potential biases in their custom dataset. The authors should also provide detailed information about the characteristics of their custom dataset, including the number of subjects, the specific activities performed, and the duration of the recordings. This would improve the transparency of their work and allow other researchers to better understand the context of their findings. Second, the authors should include a more comprehensive comparison with state-of-the-art methods. This should include methods that explicitly model sensor orientation using quaternion-based representations, methods that employ signal decomposition techniques, and other relevant approaches. The authors should also provide a clear rationale for the selection of comparison methods and discuss the limitations of these methods in the context of the problem being addressed. Third, the authors should provide a more detailed explanation of the stability gate, including specific citations to relevant robotics literature. The authors should also discuss how the stability gate differentiates between actual activity changes and unstable periods, and they should provide a more detailed analysis of the stability gate's performance under various challenging conditions. Fourth, the authors should provide a more detailed explanation of the placement-conditioned adaptive Batch Normalization (BN). This should include a clear description of how the BN parameters are adjusted based on the inferred placement context, and whether the method uses separate BN layers for each placement or if it adapts a single BN layer. The authors should also discuss the specific parameters of the BN that are adapted and how these parameters are updated during training. Fifth, the authors should provide a more detailed description of the experimental setup, including the specific sensor placement used for training the CNN and the baseline methods. The authors should also provide a clear definition of the domains used in the cross-domain evaluation. Sixth, the authors should improve the presentation of their paper. This should include explicit definitions of all abbreviations, a clear visual representation of the proposed method, and a detailed explanation of the evaluation metrics used. The authors should also provide a detailed analysis of the computational cost of their method. Finally, the authors should consider exploring alternative approaches to sensor orientation correction, such as quaternion-based representations or signal decomposition techniques. This would provide a more robust and accurate correction, especially during dynamic activities. They should also consider exploring more advanced techniques for placement context inference, such as using a combination of signal features and sensor location information. These improvements would strengthen the paper and make it more impactful.

❓ Questions

I have several questions that arise from my analysis of the paper. First, regarding the custom dataset, what specific criteria were used to select the subjects, and how was the data collection process standardized to ensure consistency across subjects? Second, concerning the stability gate, what is the specific threshold value used, and how was this value determined? What is the sensitivity of the method to different threshold values? Third, regarding the placement-conditioned adaptive BN, how are the parameters of the BN layer adjusted based on the inferred placement context? Are separate BN layers used for each placement, or is a single BN layer adapted? If a single BN layer is adapted, how is this adaptation achieved? Fourth, regarding the training process, what is the architecture of the CNN used in the experiments, and what are the specific training parameters used? Fifth, regarding the evaluation, what are the specific activities included in the custom dataset, and how do these activities relate to real-world scenarios? What is the performance of the method on each individual activity? Sixth, regarding the comparison with state-of-the-art methods, why were specific methods chosen for comparison, and why were other relevant methods excluded? What are the limitations of the chosen comparison methods? Seventh, regarding the computational cost, what is the inference time and memory usage of the proposed method on a typical wearable device? How does this compare to other state-of-the-art methods? Finally, regarding the gravity-based orientation correction, how does the method handle noisy accelerometer data, and how does it perform during activities that involve rapid changes in orientation? These questions are aimed at clarifying key methodological choices and assumptions, and they are crucial for a deeper understanding of the paper's contributions and limitations.

📊 Scores

Soundness:2.0
Presentation:2.0
Contribution:2.0
Confidence:4.0
Rating: 3.5

Keywords

Click the button to extract keywords

Insights

Click the button to extract insights
Version 2 ⚠️ Not latest
Citation Tools

📝 Cite This Paper