Zum Inhalt springen

From IMU Streams to Real-Time Decisions: Past-Only Next-Window Badminton Action Prediction

Prometheus Redaktion

Open AccessArticle From IMU Streams to Real-Time Decisions: Past-Only Next-Window Badminton Action Prediction by Qinglin Zhu Qinglin Zhu SciProfiles Scilit Preprints.org Google Scholar 1, Jiao Wang Jiao Wang SciProfiles Scilit Preprints.org Google Scholar 2 and Bin Guo Bin Guo SciProfiles Scilit Preprints.org Google Scholar 1,* 1 College of Electrical Engineering, Sichuan University, Chengdu 610065, China 2 College of Mechanical Engineering, Sichuan University, Chengdu 610065, China * Author to whom correspondence should be addressed. Sensors 2026, 26(12), 3651; https://doi.org/10.3390/s26123651 (registering DOI) Submission received: 1 April 2026 / Revised: 18 May 2026 / Accepted: 5 June 2026 / Published: 8 June 2026 Abstract We study real-time next-window badminton action prediction from wearable IMU streams where the system must predict the action label of the upcoming 100 ms window using past-only (causal) information. To handle severe class imbalance in continuous streams, we employ window-level downsampling of the dominant background class and compress multi-sensor time/frequency features using PCA before temporal modeling. We evaluate the full pipeline under a hop-based streaming protocol and show that our BiLSTM + MHSA model achieves high recognition performance (test accuracy 96.36%, Macro-F1 95.82%) while remaining deployable in real time, reaching 58.20 windows/s end to end (including preprocessing), i.e., 5.82× the real-time requirement (10 windows/s under a 100 ms output interval), on a Windows PC with an NVIDIA RTX 3080 GPU. These results support low-latency applications such as live coaching feedback and tactical analytics. 1. Introduction Wearable sensing has become a practical foundation for fine-grained sports analytics, enabling an automated understanding of athletes’ actions, tactics, and biomechanics from lightweight on-body sensors. In badminton, inertial measurement units (IMUs) can capture fast and subtle motion cues of strokes, and recent datasets and studies have demonstrated the feasibility of recognizing rich shot categories from multi-sensor IMU streams. In this paper, we focus on a badminton IMU dataset containing 11 shot types plus an “other” class, recorded at 100 Hz with five IMU placements (lower, upper, left foot, right foot, and racket), yielding 30 raw channels (three-axis accelerometer + three-axis gyroscope per device). Beyond offline recognition, many real-world applications (e.g., real-time coaching, tactical feedback, and downstream control) require predictive inference: the system should anticipate what action is happening in the immediate future rather than only classifying the past. We therefore study next-window action prediction: given a history of past windows, the model predicts the action label of an upcoming window using a strict past-only (causal) formulation. Concretely, with a 100 Hz IMU stream, the system outputs one prediction every 100 ms, enabling continuous online inference without accessing future observations. This hop-based protocol requires a minimum throughput of 10 windows/s for real-time operation; in our end-to-end stream-replay benchmark (feature extraction, PCA, and model inference), the pipeline reaches 58.20 windows/s (5.82× real time) on a Windows PC equipped with an NVIDIA RTX 3080 GPU. This setting naturally aligns with online deployment, where the current/future window is not fully observed when the decision must be made. Next-window prediction for badminton IMU is challenging for several reasons. First, badminton motions are highly dynamic with abrupt state transitions; discriminative patterns may be short-lived while still depending on longer-term context. Second, many strokes exhibit similar short-term signatures (e.g., clear vs. smash), making long-range temporal cues important for disambiguation. Third, the data distribution is strongly imbalanced: the background “other” class can dominate the stream, while rare strokes appear sparsely, which can bias training and degrade minority-class performance. Finally, obtaining accurate frame-/window-level labels for such high-frequency sensor data is labor-intensive, motivating learning and labeling strategies that reduce human annotation cost. To address these challenges, we propose an LSTM-based pipeline for IMU next-window prediction. We first transform each IMU window into a compact representation via multi-channel time/frequency-domain features (13 features per channel) and apply standardization followed by PCA for dimensionality reduction, yielding an m-dimensional embedding per window. We then construct a past-only sequence of length H and feed it into a temporal model combining BiLSTM-based sequence encoding with multi-head attention and an MLP classifier head. This design targets both short-term variations and long-term dependencies, matching the nature of badminton motion streams. In addition, to alleviate labeling cost, we adopt a self-supervised labeling approach derived from LIMU-BERT-style IMU representation learning, which can generate reliable labels and significantly reduce manual annotation overhead. The main contribution of this paper is a complete causal prediction pipeline for badminton IMU streams. We formulate the task as strict past-only next-window prediction, where the system outputs one prediction every 100 ms without accessing future observations. To support this setting, we combine multi-channel time/frequency features, PCA-based compression, and a lightweight BiLSTM + MHSA temporal encoder that captures both short-term stroke dynamics and longer-range motion context. We further evaluate deployability with a full end-to-end streaming benchmark, including feature extraction, standardization, PCA, and model inference, and show that the pipeline reaches 58.20 windows/s, or 5.82× the real-time requirement, on a Windows PC with an NVIDIA RTX 3080 GPU. Finally, because continuous badminton streams are dominated by the other class, we incorporate window-level downsampling and ablation analyses to clarify how imbalance handling, PCA dimensionality, and attention affect prediction robustness, especially at longer horizons. The remainder of the paper is organized to make this pipeline explicit. Section 3 describes the dataset, preprocessing, temporal model, and labeling strategy, while Section 4 evaluates prediction accuracy, real-time feasibility, calibration, and ablation results. This organization connects each methodological component to the deployment requirements of low-latency badminton analytics. Abbreviations For clarity, the main acronyms used in this paper are summarized as follows: IMU, inertial measurement unit; PCA, principal component analysis; LSTM, long short-term memory; BiLSTM, bidirectional LSTM; MHSA, multi-head self-attention; MLP, multilayer perceptron; ECE, expected calibration error; UWB, ultra-wideband; IoU, intersection over union; GT, ground truth; and PR, precision–recall. 2. Related Work 2.1. Wearable Sensing for Badminton Analytics Wearable IMU sensing has been widely adopted for sports motion analysis due to its low cost, portability, and high temporal resolution. For badminton, prior work has demonstrated effective stroke recognition from wearable IMUs (and sometimes additional modalities such as UWB) [ 1, 2, 3, 4, 5, 6]. In parallel, forecasting in badminton has been studied at the match/event level, e.g., movement forecasting [ 7] and rally-wise behavior imitation from offline match trajectories [ 8]. More broadly, wearable inertial sensing has also been explored for anticipatory/predictive inference beyond badminton, including motion intention prediction [ 9], forecasting of future gait patterns from IMUs [ 10], and online frameworks that couple action recognition with motion prediction for early risk assessment [ 11]. This paper builds on the publicly released badminton wearable-sensing dataset introduced by Van Herbruggen et al. [ 1]. Their work collected synchronized IMU/UWB recordings from badminton players and demonstrated the feasibility of wearable-based badminton shot recognition under the original dataset protocol. In that protocol, the goal was to recognize badminton actions from the available sensor recordings using the dataset’s original sensing configuration and label definition, providing an important foundation for subsequent badminton IMU studies. Building on this dataset, we reformulate the problem from offline shot recognition to causal next-window prediction. Specifically, our setting differs in sensing configuration, class taxonomy, task definition, and evaluation protocol: we use the five-IMU subset and predict the upcoming IMU window from past windows only under a streaming protocol. Therefore, the recognition accuracy reported by Van Herbruggen et al. [ 1] is treated as related context rather than as a head-to-head baseline; a controlled numerical comparison would require rerunning the prior recognition model under the same sensor subset, 12-class label set, past-only input constraint, next-window target definition, and train/validation/test split. Compared with these lines of work, our focus is a strictly causal sensor-stream setting that directly supports low-latency online feedback. 2.2. Real-Time Constraints and Temporal Modeling for Wearable IMU Streams In real-time wearable IMU systems, low-latency inference under strict causal constraints is a primary requirement: models must process streaming signals online and output stable predictions within tight timing budgets. Recent reviews of intelligent wearables summarize multiple real-time deployment cases for motion-intent and biomechanical inference [ 12, 13]. Capturing both short-term discriminative cues and long-range context is crucial for highly dynamic motions such as badminton strokes. The Long-Short Term Memory (LSTM) [ 14, 15] framework was proposed to separate fine-grained variations from longer contextual dependencies in time series under complex interventions, providing a principled approach to modeling dynamics. Real-time wearable pipelines have also demonstrated simultaneous action recognition and whole-body motion/dynamics prediction in online settings [ 16]. Attention-based LSTM sensor-fusion studies further show robust performance under challenging NLOS conditions, reinforcing the practical value of temporal modeling for real-time wearable systems [ 17]. Recent multimodal wearable forecasting research further reports strong continuous multi-step-ahead biomechanical prediction using a transformer-style encoder–decoder architecture (KsFormer) [ 18]. Building on this motivation, we adopt an LSTM-inspired design that leverages BiLSTM encoding and multi-head attention to aggregate historical information, and we tailor it to the strict past-only next-window prediction objective. 2.3. Self-Supervised Learning and Automatic Labeling for IMU Data Self-supervised learning (SSL) has emerged as a transformative paradigm for IMU sensing, enabling robust representation learning from massive unlabeled streams and significantly mitigating the reliance on labor-intensive manual annotation [ 19, 20]. The foundational work of LIMU-BERT [ 21] first demonstrated that masked modeling objectives could effectively unlock the latent temporal patterns in IMU sequences. Building upon this, recent universal models such as oneHAR [ 22] have further scaled these representations across diverse sensor modalities and datasets. More recently, bio-inspired SSL frameworks [ 23] have refined this process by incorporating movement-specific inductive biases into the pre-training phase. Inspired by these advancements in automated labeling and feature extraction [ 24], we propose a self-supervised labeling strategy specifically tailored for badminton dynamics. This approach generates high-fidelity, window-level labels that demonstrate strong empirical alignment with ground truth across various subjects, ensuring both data efficiency and labeling precision. 4. Results and Analysis A series of experiments were conducted to verify the proposed method. This section presents the core results obtained in the experiment with a focus on analyzing the performance of real-time prediction models in highly dynamic IMU data. The subsequent analysis provided a solid foundation for verifying the effectiveness and superiority of the proposed method. 4.1. Window-Level Offline Evaluation We evaluate the proposed model on the window-level next-window prediction task (prediction horizon = 1 step). Since the dataset is class-imbalanced (e.g., the other class has substantially larger support), we report Macro-F1 and balanced accuracy in addition to overall accuracy. Table 1 summarizes the overall performance, and Table 2 provides per-class precision/recall/F1, which enables a fine-grained inspection of class-wise strengths and failure modes. 4.2. Performance of the Optimized LSTM Model To validate the effectiveness of the combined hyperparameter optimization strategy, we integrate the optimal temporal configuration ( w = 10 , h o p = 4 , H = 49 ) and the recommended PCA dimensionality ( d = 52 ) to construct the final LSTM model. Figure 3 presents key performance visualizations of this integrated model, including training/validation accuracy, class distribution balance, confusion matrices, and loss dynamics. To avoid repeating the aggregate metrics already reported in Window-Level Offline Evaluation, we focus here on optimization dynamics and class-wise error patterns. In Figure 3a, the training/validation curves remain close throughout optimization, indicating stable convergence and no obvious overfitting under the selected configuration. Figure 3b further shows that most confusion is concentrated among semantically similar stroke classes, while the dominant “other” class remains well separated rather than being over-predicted. The main residual weakness is the rare lob_backhand class (very limited support), which is consistent with the long-tail distribution and suggests that future gains will primarily come from targeted data balancing or class-aware augmentation rather than further global hyperparameter tuning. To further contextualize the IMU-only setting, we discuss its relation to prior wearable-sensing-based badminton recognition work without treating it as a directly comparable benchmark. 4.3. End-to-End Real-Time Inference Benchmark A system is considered real-time feasible in our setting if throughput is at least 10 windows/s, since the deployment hop is 100 ms. Our end-to-end pipeline achieves 58.20 windows/s, i.e., 5.82× the real-time requirement. To verify this under realistic processing overhead, we run an end-to-end stream-replay benchmark. Raw IMU CSV streams are replayed in chronological order, and the pipeline performs sliding-window inference per hop, including window-level feature extraction, standardization, PCA projection, history buffering, and a single model forward pass. After warming up for N w windows, we benchmark N e consecutive windows and report throughput and latency percentiles. Metrics. Let f s be the sampling rate (Hz) and let h out denote the output hop size (samples). The system produces one prediction every Δ t = h out / f s seconds; thus, the minimum required throughput is TP min = 1 Δ t = f s h out ( windows / s ) . (14) Given the measured throughput TP (windows/s), we define the real-time factor (RTF) with respect to the output hop as RTF hop = TP TP min = TP · h out f s . (15) A system is considered real-time feasible if RTF hop > 1 . Results and analysis. With f s = 100 Hz and h out = 10 (i.e., Δ t = 0.1 s, TP min = 10 windows/s), our end-to-end pipeline achieves TP = 58.20 windows/s, corresponding to RTF hop = 5.82 , which satisfies the real-time requirement with a clear safety margin. In addition, model-only inference (batch size 1) reaches 185.17 windows/s with median latency p 50 = 5.08 ms, indicating that the remaining end-to-end overhead mainly stems from preprocessing (feature computation, normalization, and PCA) rather than the network forward pass. This margin suggests the system can tolerate moderate deployment overhead (I/O, scheduling, logging) while still meeting the hop-based real-time constraint. 4.4. Calibration and Selective Prediction Beyond accuracy, deployment quality depends on whether confidence scores are well calibrated and useful for selective prediction. We therefore evaluate reliability with the Expected Calibration Error (ECE) and assess confidence ranking quality with precision–recall analysis. Using B confidence bins, ECE is defined as ECE = ∑ b = 1 B | I b | N acc ( I b ) − conf ( I b ) , (16) where I b is the sample set in bin b, N is the number of test windows, acc ( I b ) is the empirical accuracy, and conf ( I b ) is the mean predicted confidence; we set B = 15 in this paper. A lower ECE indicates better alignment between predicted probabilities and true correctness frequencies. Figure 4a shows that the reliability curve closely follows the diagonal, indicating good calibration. For the reliability diagram ( Figure 4a), the x-axis is the binned mean confidence c ପ୍ତ b ∈ [ 0 , 1 ] , and the y-axis is the corresponding empirical accuracy acc ( I b ) ∈ [ 0 , 1 ] for bin I b . Figure 4b further shows strong precision–recall behavior, suggesting that confidence can effectively separate likely-correct from likely-incorrect predictions. For the PR curve ( Figure 4b), the x-axis is recall R = T P T P + F N and the y-axis is precision P = T P T P + F P . Moreover, the micro-AP is 0.986 and macro-AP is 0.988, indicating strong confidence ranking quality. Precision decreases when recall approaches 1 because lowering the decision threshold introduces more low-confidence false positives. Together, these results support confidence-based deployment strategies (e.g., thresholding or abstention) in streaming scenarios. 4.5. Ablation Studies Downsampling under severe class imbalance. The raw window-level labels are highly imbalanced where the other class dominates. Without handling this imbalance, the classifier tends to collapse to a trivial majority-class predictor, yielding misleadingly high accuracy while failing on minority actions. Therefore, we apply downsampling to mitigate the dominance of the majority class and stabilize optimization. Figure 5 provides a class-distribution comparison before and after downsampling. Effect of PCA. We further evaluate whether PCA-based dimensionality reduction benefits horizon prediction. Table 3 reports the mean accuracy (averaged over all steps within the prediction horizon) under four prediction horizons. PCA consistently improves performance for short-to-medium horizons (10–40), indicating that denoising and compact representations help the encoder learn more robust temporal features. Effect of MHSA. We ablate the multi-head self-attention (MHSA) module by removing it from the encoder while keeping all other settings unchanged. As shown in Table 4, MHSA provides consistent gains across different horizons, especially at longer horizons, which suggests that attention helps highlight informative temporal segments within the historical window for more reliable prediction across horizons. BiLSTM vs. UniLSTM. Finally, we compare a bidirectional LSTM encoder with a unidirectional LSTM encoder under the same training protocol. Note that the bidirectionality is applied only within the observed input window and does not access any future observations from the prediction horizon; hence, it does not introduce information leakage. Figure 6 visualizes the all-correct rate (the percentage of samples for which all steps within the prediction horizon are correctly predicted) over training. BiLSTM achieves a higher all-correct rate throughout training, suggesting that modeling both earlier and later temporal dependencies inside the input window yields a more informative representation for prediction across the horizon. Additional ablation details are provided in Figure 7 for key-parameter sensitivity and in Figure 8 for PCA dimensionality analysis. 5. Conclusions and Future Work 5.1. Conclusions This paper studied real-time badminton next-window action prediction from wearable IMU streams under a strict past-only protocol. We proposed a practical pipeline that combines sliding-window feature extraction, standardization and PCA compression, and a lightweight temporal model (BiLSTM with multi-head self-attention) to capture both short- and long-range motion context. Experiments on a multi-player, multi-sensor badminton dataset demonstrate strong recognition performance under severe class imbalance, while the end-to-end stream-replay benchmark confirms real-time feasibility (58.20 windows/s including preprocessing). Overall, the results indicate that compact IMU representations together with causal temporal modeling can enable accurate and deployable online action prediction for badminton analytics. 5.2. Future Work Future extensions will focus on moving from action-level prediction toward richer real-time badminton intelligence. One direction is to incorporate UWB-based positional information so that body-motion cues from IMUs can be combined with on-court trajectories for strategy-level forecasting, including player movement intent, court coverage, and tactical patterns. This extension also requires multimodal fusion architectures that can align asynchronous IMU and UWB streams while remaining robust to realistic noise, occlusion, and sensor dropouts. Beyond sensing and prediction, large language models (LLMs) may serve as higher-level reasoning and generation modules: real-time predicted actions and positions could be translated into coherent play-by-play descriptions, tactical summaries, and short-horizon forecasts, making the system output more interpretable for players, coaches, and spectators. 6. Additional Analyses This section provides additional analyses, including key-parameter sensitivity, PCA dimensionality effects, calibration and selective prediction, downsampled streaming/event-level evaluation, self-supervised labeling assessment, and deployment-oriented confidence analysis. 6.1. Impact of Key Parameters on Recognition Performance To identify the optimal hyperparameter combination for the LSTM model, we conducted a grid search over three key parameters: window size ( w), model hop size ( hop), and history length ( hist). The grid search results are visualized in Figure 7, which quantifies the model accuracy across different parameter combinations. Here, hop denotes the model-window stride used for feature sequence construction, which is distinct from the 100 ms deployment hop used in the real-time benchmark. As indicated by the peak accuracy in Figure 7, the optimal parameter combination is determined as w = 10 (window size), hop = 4 (model hop size), and hist = 49 (history length). A small window size ( w = 10 ) captures fine-grained dynamic features of badminton motions (e.g., rapid stroke transitions) without over-smoothing short-term IMU signal fluctuations. A small model hop size ( hop = 4 ) ensures dense sampling of the time series, preserving temporal continuity and reducing information loss between adjacent windows. A large history length ( hist = 49 ) enables the LSTM model to leverage long-term contextual dependencies of motion sequences, which is critical for distinguishing similar badminton shots (e.g., forehand clear vs. smash) that share short-term IMU patterns but differ in long-term motion context. 6.2. Impact of PCA Dimensionality on Recognition Performance We investigated the impact of PCA dimensionality on recognition performance while keeping the temporal configuration fixed ( w = 10 , h o p = 4 , H = 49 ) and using the same split protocol. We first conducted a coarse scan over a broad range of PCA components, which is followed by a fine-grained scan around the promising region. Across all evaluated settings, the best observed test accuracy was 96.13 % at d = 52 PCA components. To balance accuracy and model efficiency, we also report the smallest dimensionality whose accuracy is within 0.20 percentage points the best result; this yields a recommended setting of d = 36 with 96.03 % accuracy. Overall, PCA enables substantial dimensionality reduction with a negligible drop in accuracy, indicating that the handcrafted time/frequency features contain considerable redundancy and can be compactly represented without sacrificing recognition performance. The combined figure consolidates the coarse and fine accuracy curves, the best-per-dimension accuracy envelope, and the accuracy gap to the best, enabling a compact view of the accuracy–efficiency trade-off and the diminishing returns beyond the recommended dimensionality. 6.3. Calibration and Selective-Prediction Protocol Calibration analysis evaluates whether the model confidence reflects the empirical probability of correct prediction, which is important when the system is used for real-time coaching or tactical feedback. For each test window, the model produces logits z ∈ R C and class probabilities p = softmax ( z ) . The predicted label is y ^ = arg max c p c , and the associated confidence is max c p c . These quantities support Expected Calibration Error (ECE), coverage–risk analysis, and confidence-thresholded selective prediction where low-confidence windows can be abstained from rather than forced into potentially unreliable action decisions. Table 5 reports the calibration result on the test set. 6.4. Streaming/Event-Level Evaluation (Downsampled Setting) To better reflect event detection performance under controlled class imbalance, we additionally evaluate streaming predictions in a downsampled setting. Following our training-time strategy, we downsample background ( other) windows to match the number of non- other windows (1:1) before constructing history sequences and running inference. We then merge consecutive non- other windows into predicted action segments and match them to ground-truth segments using temporal IoU with threshold θ = 0.5 . We report event-level precision/recall/F1 and detection delay measured by time-to-detect (median and 90th percentile), where delays are computed based on N = 2 consecutive correct predictions at the native resolution of this event-level protocol. Table 6 summarizes the event-level performance in the downsampled streaming setting. We further report the sensitivity to the temporal IoU threshold by sweeping θ ∈ { 0.5 , 0.6 , 0.7 , 0.8 } , as summarized in Table 7. This downsampled setting reduces the dominance of the other class and therefore provides an upper-bound estimate of event detection performance under a balanced background. 6.5. Performance Evaluation of Self-Supervised Labeling In this experiment, the comparison between ground truth (GT) labels and self-supervised predicted (pred) labels for three players is visualized in Figure 9. Each player’s results include a side-by-side comparison of GT labels (left column) and model-generated labels (right column), enabling both a qualitative and quantitative assessment of the self-supervised labeling effectiveness. Quantitative evaluation reveals that the self-supervised model achieves high labeling accuracy across all participants: Player 1 reaches 0.9888 , Player 2 0.9753 , and Player 3 the highest at 0.9913 . Qualitatively, the predicted labels in Figure 9 closely align with the GT labels, accurately capturing the temporal boundaries of badminton actions (e.g., serves, smashes, and drops) without obvious misclassifications. This validates the model’s ability to generate reliable labels without manual annotation, significantly reducing labor costs. Notably, window size configuration is critical for balancing labeling quality and temporal responsiveness. An excessively small window leads to label “glitches” (spurious action transitions) due to heightened sensitivity to short-term IMU signal noise—this is because small windows fail to average out random fluctuations in sensor data. Conversely, an overly large window introduces substantial labeling delay, as it requires accumulating more temporal data before generating a label, which cannot keep pace with the rapid dynamics of badminton motions (e.g., quick stroke reversals or sudden direction changes). The window size adopted in this paper optimizes this trade-off: as evidenced by the smooth label sequences and high accuracy in Figure 9, it effectively minimizes noise-induced glitches while maintaining sufficient temporal responsiveness to match the fast-changing characteristics of IMU data. 6.6. Deployment-Oriented Additional Analyses To complement the main streaming and calibration results, we provide two additional analyses that are directly relevant to real-time deployment. The first examines confidence-based selective prediction, where the system may abstain from low-confidence windows rather than forcing unreliable decisions. The second evaluates the sensitivity of event-level performance to the temporal IoU threshold used for segment matching. Together, these analyses clarify how the proposed system behaves under practical confidence filtering and event-detection criteria. Figure 10 shows the trade-off between prediction coverage and risk under confidence thresholding. This result supports selective deployment modes in which uncertain predictions can be withheld or flagged for downstream review, thereby improving the reliability of the displayed feedback. Table 7 reports the event-level sensitivity to the temporal IoU threshold. As expected, stricter matching thresholds reduce precision, recall, and F1 because predicted segments must align more tightly with ground-truth action intervals. The gradual performance decrease indicates that the predicted event boundaries remain reasonably stable across a range of temporal matching criteria. Author Contributions Conceptualization, Q.Z.; methodology, Q.Z. and B.G.; investigation, Q.Z. and J.W.; data curation, Q.Z. and J.W.; formal analysis, Q.Z.; software, Q.Z.; visualization, Q.Z.; resources, J.W.; writing—original draft preparation, Q.Z.; writing—review and editing, Q.Z. and B.G.; supervision, B.G.; project administration, B.G. All authors have read and agreed to the published version of the manuscript. Funding This research received no external funding. Institutional Review Board Statement Not applicable. Informed Consent Statement Not applicable. Data Availability Statement The data used in this study are available from the corresponding author upon reasonable request. Acknowledgments The authors thank Sichuan University for support. The authors also gratefully acknowledge the dataset contribution from the laboratory team of Van Herbruggen et al. [ 1], whose publicly released badminton IMU/UWB dataset made this study possible. Conflicts of Interest The authors declare no conflicts of interest. References Van Herbruggen, B.; Fontaine, J.; Simoen, J.; De Mey, L.; Peralta, D.; Shahid, A.; De Poorter, E. Strategy analysis of badminton players using deep learning from IMU and UWB wearables. Internet Things 2024, 27, 101260. [ Google Scholar] [ CrossRef] Steels, T.; Van Herbruggen, B.; Fontaine, J.; De Pessemier, T.; Plets, D.; De Poorter, E. Badminton activity recognition using accelerometer data. Sensors 2020, 20, 4685. [ Google Scholar] [ CrossRef] [ PubMed] Kiang, C.T.; Yoong, C.K.; Spowage, A.C. Local sensor system for badminton smash analysis. In IEEE Instrumentation and Measurement Technology Conference (I2MTC); IEEE: Piscataway, NJ, USA, 2009; pp. 883–888. [ Google Scholar] Ghosh, I.; Ramamurthy, S.R.; Chakma, A.; Roy, N. Decoach: Deep learning-based coaching for badminton player assessment. Pervasive Mob. Comput. 2022, 83, 101608. [ Google Scholar] [ CrossRef] Peralta, D.; Van Herbruggen, B.; Fontaine, J.; Debyser, W.; Wieme, J.; De Poorter, E. Badminton stroke classification based on accelerometer data: From individual to generalized models. In IEEE International Conference on Big Data; IEEE: Piscataway, NJ, USA, 2022; pp. 5542–5548. [ Google Scholar] [ CrossRef] Jin, G.; Li, X. Wearable sensing for badminton stroke recognition with one-dimensional convolutional neural network. Sci. Rep. 2025, 15, 25158. [ Google Scholar] [ CrossRef] [ PubMed] Chang, K.S.; Wang, W.Y.; Peng, W.C. Where will players move next? Dynamic graphs and hierarchical fusion for movement forecasting in badminton. Proc. Aaai Conf. Artif. Intell. 2023, 37, 6998–7005. [ Google Scholar] [ CrossRef] Wang, K.D.; Wang, W.Y.; Hsieh, P.C.; Peng, W.C. Offline Imitation of Badminton Player Behavior via Experiential Contexts and Brownian Motion. In Proceedings of the Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track; Springer Nature: Cham, Switzerland, 2024; pp. 348–364. [ Google Scholar] [ CrossRef] Tang, C.; Xu, Z.; Occhipinti, E.; Yi, W.; Xu, M.; Kumar, S.; Virk, G.S.; Gao, S.; Occhipinti, L.G. From brain to movement: Wearables-based motion intention prediction across the human nervous system. Nano Energy 2023, 115, 108712. [ Google Scholar] [ CrossRef] Zhang, W.; Zhang, H.; Jiang, Z.; Servati, A.; Servati, P. Real-time forecasting of pathological gait via IMU navigation: A few-shot and generative learning framework for wearable devices. Discov. Electron. 2025, 2, 51. [ Google Scholar] [ CrossRef] Guo, C.; Rapetti, L.; Darvish, K.; Grieco, R.; Draicchio, F.; Pucci, D. Online Action Recognition for Human Risk Prediction with Anticipated Haptic Alert via Wearables. In 2023 IEEE-RAS 22nd International Conference on Humanoid Robots (Humanoids); IEEE: Piscataway, NJ, USA, 2023; pp. 1–8. [ Google Scholar] [ CrossRef] Xiao, X.; Yin, J.; Xu, J.; Tat, T.; Chen, J. Advances in machine learning for wearable sensors. ACS Nano 2024, 18, 22734–22751. [ Google Scholar] [ CrossRef] [ PubMed] Chen, S.; Peng, C.; Yang, B.; Lin, J.; Zhou, L.; Jiang, Z.; Liu, Z.; Liu, Y.; Tang, L. Recent advances in intelligent wearable systems: From multiscale biomechanical features towards human motion intent prediction. npj Artif. Intell. 2026, 2, 33. [ Google Scholar] [ CrossRef] Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [ Google Scholar] [ CrossRef] [ PubMed] Cai, R.; Huang, H.; Jiang, Z.; Li, Z.; Zhou, C.; Liu, Y.; Liu, Y.; Hao, Z. Disentangling long-short term state under unknown interventions for online time series forecasting. Proc. Aaai Conf. Artif. Intell. 2025, 39, 15641–15649. [ Google Scholar] [ CrossRef] Darvish, K.; Ivaldi, S.; Pucci, D. Simultaneous Action Recognition and Human Whole-Body Motion and Dynamics Prediction from Wearable Sensors. In 2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids); IEEE: Piscataway, NJ, USA, 2022; pp. 488–495. [ Google Scholar] [ CrossRef] Ren, M.; Wei, J.; Qin, J.; Guo, X.; Wang, H.; Li, S. Attention based LSTM framework for robust UWB and INS integration in NLOS environments. Sci. Rep. 2025, 15, 21637. [ Google Scholar] [ CrossRef] Zhou, H.; Peng, Y.; Li, X.; Lyu, X.; Shou, D.; Li, G.; Wang, L. Multimodal wearable sensors-driven KsFormer model for continuous multi-step ahead prediction of lower limb joint moments and ground reaction forces. Biomim. Intell. Robot. 2026, 6, 100287. [ Google Scholar] [ CrossRef] Tan, T.; Shull, P.B.; Hicks, J.L.; Uhlrich, S.D.; Chaudhari, A.S. Self-supervised learning improves accuracy and data efficiency for IMU-based ground reaction force estimation. IEEE Trans. Biomed. Eng. 2024, 71, 2095–2104. [ Google Scholar] [ CrossRef] [ PubMed] Jiang, A.; Ye, J. SelfVis: Self-supervised learning for human activity recognition based on area charts. IEEE Trans. Emerg. Top. Comput. 2024, 13, 196–206. [ Google Scholar] [ CrossRef] Xu, H.; Zhou, P.; Tan, R.; Li, M.; Shen, G. LIMU-BERT: Unleashing the potential of unlabeled data for IMU sensing applications. In Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems (SenSys); Association for Computing Machinery: New York, NY, USA, 2021; pp. 220–233. [ Google Scholar] Wei, Q.; Huang, J.; Gao, Y.; Dong, W. One Model to Fit Them All: Universal IMU-based Human Activity Recognition with LLM-assisted Cross-dataset Representation. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2025, 9, 139:1–139:22. [ Google Scholar] [ CrossRef] Tarale, P.P.; Chu, K.; Varghese, A.; Liu, K.C.; Xu, M.A.; Iyyer, M.; Lee, S.I. Bio-Inspired Self-Supervised Learning for Wrist-worn IMU Signals. arXiv 2026, arXiv:2603.10961. [ Google Scholar] Wang, J.; Zhao, Z.; Cui, J.; Lu, J.; Wu, B. TrackLet: Data-Driven Inertial Tracking on Your Own IMU Data. IEEE Trans. Mob. Comput. 2025, 24, 8301–8313. [ Google Scholar] [ CrossRef] Figure 1. Illustration of the badminton IMU dataset workflow used in the source dataset from Van Herbruggen et al. [ 1]: ( a) IMU placement on the upper body, lower body/waist, left foot, right foot, and racket; ( b) data collection during real badminton matches with multi-camera time synchronization and later manual labeling; and ( c) data processing from raw IMU signals to synchronized and filtered streams, shot labeling, and strategy annotation. The colored curves in panel ( c) schematically represent multi-channel IMU signals. Figure 1. Illustration of the badminton IMU dataset workflow used in the source dataset from Van Herbruggen et al. [ 1]: ( a) IMU placement on the upper body, lower body/waist, left foot, right foot, and racket; ( b) data collection during real badminton matches with multi-camera time synchronization and later manual labeling; and ( c) data processing from raw IMU signals to synchronized and filtered streams, shot labeling, and strategy annotation. The colored curves in panel ( c) schematically represent multi-channel IMU signals. Figure 2. Overall architecture of the real-time badminton model. The past-only multivariate input sequence is first projected into a latent feature space, which is followed by stacked bidirectional LSTM layers for temporal encoding. Multi-head self-attention is then applied to capture global temporal dependencies, and the refined representation is finally mapped to class logits through a feed-forward classification head. Here, d model and d hidden denote feature dimensions, and the last-time-step pooling selects the final encoded state. Figure 2. Overall architecture of the real-time badminton model. The past-only multivariate input sequence is first projected into a latent feature space, which is followed by stacked bidirectional LSTM layers for temporal encoding. Multi-head self-attention is then applied to capture global temporal dependencies, and the refined representation is finally mapped to class logits through a feed-forward classification head. Here, d model and d hidden denote feature dimensions, and the last-time-step pooling selects the final encoded state. Figure 3. Performance of the optimized LSTM-PCA model ( w = 10 , h o p = 4 , H = 49 , d = 52 ). ( a) Training/validation accuracy and loss curves. ( b) Confusion matrix on the test set. Figure 3. Performance of the optimized LSTM-PCA model ( w = 10 , h o p = 4 , H = 49 , d = 52 ). ( a) Training/validation accuracy and loss curves. ( b) Confusion matrix on the test set. Figure 4. Calibration and confidence-ranking diagnostics on the test set: ( a) reliability diagram with the dashed ideal-calibration line; ( b) micro precision–recall curve showing strong confidence ranking quality. Figure 4. Calibration and confidence-ranking diagnostics on the test set: ( a) reliability diagram with the dashed ideal-calibration line; ( b) micro precision–recall curve showing strong confidence ranking quality. Figure 5. Class distribution comparison before and after downsampling. Figure 5. Class distribution comparison before and after downsampling. Figure 6. BiLSTM vs. UniLSTM ablation on horizon prediction (horizon length 20). The metric is the all-correct rate over epochs for train/validation splits. Figure 6. BiLSTM vs. UniLSTM ablation on horizon prediction (horizon length 20). The metric is the all-correct rate over epochs for train/validation splits. Figure 7. Grid search results for LSTM model accuracy: optimal parameters are w = 10 , hop = 4 , hist = 49 . Figure 7. Grid search results for LSTM model accuracy: optimal parameters are w = 10 , hop = 4 , hist = 49 . Figure 8. PCA dimensionality analysis (combined): coarse and fine accuracy curves, best-per-dimension accuracy envelope, and accuracy gap to the best. Figure 8. PCA dimensionality analysis (combined): coarse and fine accuracy curves, best-per-dimension accuracy envelope, and accuracy gap to the best. Figure 9. Self-supervised labeling for Player 3: GT vs. predicted labels with a side zoom of the largest discrepancy segment (( top) GT, ( bottom) Pred). Accuracy: 0.9913. Figure 9. Self-supervised labeling for Player 3: GT vs. predicted labels with a side zoom of the largest discrepancy segment (( top) GT, ( bottom) Pred). Accuracy: 0.9913. Figure 10. Coverage–risk curve under confidence thresholding. As the confidence threshold increases, coverage decreases while Macro-F1 improves, indicating that abstention effectively filters low-confidence errors. Figure 10. Coverage–risk curve under confidence thresholding. As the confidence threshold increases, coverage decreases while Macro-F1 improves, indicating that abstention effectively filters low-confidence errors. Table 1. Window-level test performance (prediction horizon = 1 step). Table 1. Window-level test performance (prediction horizon = 1 step). Metric Value Accuracy 0.9636 Macro-F1 0.9582 Weighted-F1 0.9636 Balanced Accuracy 0.9654 Table 2. Per-class window-level metrics on the test set (prediction horizon = 1 step). Table 2. Per-class window-level metrics on the test set (prediction horizon = 1 step). Class Precision Recall F1 Support clear_forehand 0.9314 0.9825 0.9562 456 dab_forehand 0.9397 0.9532 0.9464 278 drive_forehand 0.9362 0.9462 0.9412 279 drop_forehand 0.9585 0.9493 0.9539 414 lob_backhand 0.9149 0.9556 0.9348 45 lob_forehand 0.9181 0.9602 0.9387 327 net_drop_backhand 0.9926 0.9675 0.9799 277 net_drop_forehand 0.9846 0.9974 0.9910 770 serve_backhand 0.9464 0.9524 0.9494 315 serve_forehand 0.9472 0.9855 0.9660 619 smash_forehand 0.9744 0.9800 0.9772 350 other 0.9728 0.9549 0.9638 4165 Table 3. Ablation study results of PCA (mean accuracy). Table 3. Ablation study results of PCA (mean accuracy). Model Prediction Horizon 10 20 40 80 No PCA 0.899 0.799 0.657 0.519 With PCA 0.915 0.825 0.686 0.519 Table 4. Ablation study results of MHSA (mean accuracy). Table 4. Ablation study results of MHSA (mean accuracy). Model Prediction Horizon 10 20 40 80 No MHSA 0.903 0.782 0.656 0.515 With MHSA 0.915 0.825 0.686 0.519 Table 5. Calibration on the test set. Table 5. Calibration on the test set. Metric Value ECE (bins = B) 0.0267 Table 6. Event-level performance in the downsampled streaming setting (IoU θ = 0.5 ). Table 6. Event-level performance in the downsampled streaming setting (IoU θ = 0.5 ). Setting Prec. Rec. F1 Med. (ms) P90 (ms) Downsampled (1:1) 0.959 0.889 0.923 0 40 Table 7. Event-level results under different temporal IoU thresholds θ in the downsampled streaming setting. Table 7. Event-level results under different temporal IoU thresholds θ in the downsampled streaming setting. IoU θ Precision Recall F1 0.5 0.959 0.889 0.923 0.6 0.942 0.873 0.906 0.7 0.921 0.854 0.887 0.8 0.865 0.802 0.832 Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. © 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Share and Cite MDPI and ACS Style Zhu, Q.; Wang, J.; Guo, B. From IMU Streams to Real-Time Decisions: Past-Only Next-Window Badminton Action Prediction. Sensors 2026, 26, 3651. https://doi.org/10.3390/s26123651 AMA Style Zhu Q, Wang J, Guo B. From IMU Streams to Real-Time Decisions: Past-Only Next-Window Badminton Action Prediction. Sensors. 2026; 26(12):3651. https://doi.org/10.3390/s26123651 Chicago/Turabian Style Zhu, Qinglin, Jiao Wang, and Bin Guo. 2026. "From IMU Streams to Real-Time Decisions: Past-Only Next-Window Badminton Action Prediction" Sensors 26, no. 12: 3651. https://doi.org/10.3390/s26123651 APA Style Zhu, Q., Wang, J., & Guo, B. (2026). From IMU Streams to Real-Time Decisions: Past-Only Next-Window Badminton Action Prediction. Sensors, 26(12), 3651. https://doi.org/10.3390/s26123651 Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here. Article Metrics Article metric data becomes available approximately 24 hours after publication online.

www.mdpi.com

Zum Originalartikel