Interpreting Meaningful Change: SWC, Coefficient of Variation, and Confidence Intervals

Prerequisites: This article assumes familiarity with descriptive statistics (mean, standard deviation), the distinction between external and internal training load, and basic monitoring tools such as GPS and force platforms. If any of these topics are new to you, start with:

External and Internal Load: Concepts and Monitoring

Understanding GPS, Accelerometers & Tracking Systems: Principles, Accuracy, and Limitations

Learning Objectives

Define Typical Error (TE) and Smallest Worthwhile Change (SWC), and explain them as a signal-to-noise relationship.
Explain how to calculate the Coefficient of Variation (CV) and its role in evaluating test sensitivity.
Understand the frequentist interpretation of Confidence Intervals (CI) and apply it to interpreting individual change.
Describe how to integrate SWC, CV, and CI into practical monitoring decision-making.
Understand how the balance between Type I (false positive) and Type II (false negative) errors affects threshold selection.

Why You Shouldn’t Trust Every Number: The Nature of Measurement Error

Every data point collected from an athlete is a composite of two things: the true value and the error surrounding it. Measurement error is the difference between what a test reports (the observed score) and what the athlete’s actual capacity is (the true score). This relationship is expressed as:

$\text{Observed Score} = \text{True Score} + \text{Measurement Error}$

Measurement error has two sources. Instrumentation noise comes from the technology itself: sensor drift, satellite signal quality, software filtering algorithms. Biological noise comes from the athlete: circadian variation, hydration status, sleep quality, motivation, and the residual effects of prior training (Jovanovic et al., 2022). Both sources combine to create the total noise in any measurement.

This distinction matters because it determines what practitioners can and cannot control. A standardised warm-up protocol, consistent time-of-day testing, and familiarisation trials reduce biological noise. Maintaining the same device for the same athlete and avoiding mid-season firmware updates reduce instrumentation noise (Varley et al., 2022).

A real-world example illustrates the consequences of ignoring this. After a GPS software update, one team observed acceleration counts drop from 251 to 177 and deceleration counts from 181 to 151 for the same training session data (Varley et al., 2022). Nothing changed about the athletes or the session. The filter changed, and with it the numbers. Without awareness of measurement error, a practitioner could have mistakenly interpreted this as a meaningful drop in high-intensity output.

The practical implication is clear: before interpreting any change in athlete data, the measurement system itself must be trustworthy. Trusting a system requires quantifying its noise, which leads to the next concept.

Quantifying Noise: Typical Error and Coefficient of Variation

Once it is accepted that every measurement contains error, the next step is to measure how large that error is. Two statistics serve this purpose.

Typical Error (TE) quantifies the magnitude of random error across repeated measurements. It is calculated from two trials on the same athletes under the same conditions:

$TE = \frac{SD_{\text{diff}}}{\sqrt{2}}$

where $SD_{\text{diff}}$ is the standard deviation of the difference scores between trial one and trial two (Jovanovic et al., 2022). TE is expressed in the same unit as the original measurement (e.g., kilograms, centimetres, metres per second), making it directly interpretable.

The Coefficient of Variation (CV) converts TE into a relative measure, expressed as a percentage of the mean:

$CV = \frac{TE}{\text{Mean}} \times 100$

CV allows comparison across variables measured in different units. A CV of 3% for countermovement jump (CMJ) height means the test’s noise is 3% of the average value, whereas a CV of 40% for a deceleration metric means the noise is 40% of the average. The higher the CV, the harder it is to detect a genuine change.

This is not a theoretical concern. When three 10 Hz GPS manufacturers were compared on the same athletes performing the same drills, deceleration metrics showed inter-unit CV ranging from 2.5% to 72.8%, while total distance and speed metrics ranged from just 0.2% to 5.5% (Murray & Clubb, 2022). The variables that practitioners often value most — high-speed running distance, accelerations, and decelerations — tend to have the poorest reliability (Buchheit & Simpson, 2017).

To establish CV in a specific environment, a standardised in-house reliability protocol is recommended: test at least 10 athletes across 3–4 trials, spaced 48–72 hours apart under identical conditions, and calculate ICC, TE, and CV from the resulting data (McGuigan, 2022). Published reliability values from other facilities are useful as a reference, but the equipment, population, and testing conditions in one environment may differ enough to change the numbers.

Variable	Typical CV Range	Implication
Total distance (GPS)	0.2–5.5%	High sensitivity to change.
CMJ height	2–5%	Generally reliable for monitoring.
Decelerations (GPS)	2.5–72.8%	Highly variable; interpret with caution.
Sprint distance (GPS)	5–15%	Moderate; requires sufficient data.

Understanding CV is the foundation for the next question: how large does a change need to be before it means something?

Defining the Signal: SWC and SESOI

If CV quantifies the noise, the Smallest Worthwhile Change (SWC) defines the minimum signal that should influence a practitioner’s judgment. SWC is not merely a statistical threshold. It is the boundary between “this change could matter” and “this change is likely noise.”

A closely related concept is the Smallest Effect Size of Interest (SESOI), defined as the smallest effect that carries practical or clinical significance (Jovanovic et al., 2022). SESOI can be anchored to measurement error (the smallest change a test can reliably detect) or to a practical benchmark (the smallest change that would alter a training decision). In applied sport science, SWC and SESOI are often used interchangeably, though SESOI is the broader statistical term.

The relationship between TE and SESOI determines whether a test is fit for purpose. Consider a bench press 1RM assessment where TE is 2.5 kg and SESOI is set at ±5 kg. The ratio of SESOI to TE indicates the test’s practical sensitivity: a larger ratio means the test can comfortably distinguish signal from noise (Jovanovic et al., 2022).

This leads to a simple decision framework for evaluating any monitoring tool (Varley et al., 2022):

Condition	Interpretation	Action
Noise < SWC	Change can be detected reliably.	Trust the result.
Noise $\approx$ SWC	Detection is uncertain.	Repeat the assessment or reduce error sources.
Noise > SWC	The test cannot detect meaningful change.	Re-evaluate the test’s value.

When a GPS-derived deceleration metric has a CV of 40% and the expected SWC is 5%, the noise dwarfs the signal. No amount of statistical sophistication can recover a meaningful conclusion from that variable. Conversely, when CMJ height has a CV of 3% and the SWC is 5%, the test is sensitive enough to inform decisions.

SWC is therefore both a statistical concept and a practical decision-making tool. It forces a critical question before any data interpretation begins: is this measurement system capable of detecting the changes I care about?

Visualizing Uncertainty: Confidence Intervals

Even when a test is sensitive enough and a change exceeds the SWC, uncertainty remains. A single observed change is a point estimate, and the true value could be higher or lower. Confidence Intervals (CI) quantify this uncertainty.

A 95% CI means that if the same sampling procedure were repeated an infinite number of times, 95% of the resulting intervals would contain the true population parameter (Jovanovic et al., 2022). This is the frequentist interpretation. It does not mean there is a 95% probability that the true value lies within any single observed interval.

The practical power of CI emerges when combined with SESOI. By overlaying the CI onto the SESOI range, practitioners can estimate the probability that an observed change falls into one of three categories: lower (harmful or negative), trivial (no meaningful effect), or higher (beneficial or positive). This approach underpins Minimum Effect Tests (METs), which can produce six distinct conclusions: lower, not higher, trivial, not lower, higher, and equivocal (Jovanovic et al., 2022).

Consider an athlete whose CMJ height increases by 2.5 cm after a training block. If the 95% CI for this change spans from −0.5 cm to +5.5 cm and the SESOI is ±2.0 cm, most of the CI falls above the positive SESOI threshold. The conclusion leans toward “higher,” but a portion of the CI overlaps with the trivial zone. The change is likely meaningful but not certain.

When CI is wide — due to small sample sizes, high within-athlete variability, or fewer test repetitions — the range of plausible values expands, and confidence in any single conclusion weakens. In these situations, practitioners should avoid binary decisions and instead acknowledge uncertainty explicitly. Collecting additional data points, increasing test repetitions, or averaging across multiple sessions can narrow the CI.

CI does not tell practitioners whether an effect exists or not. It reveals the range of plausible effect sizes and their associated uncertainty. This reframes the question from “did the athlete improve?” to “how much did the athlete likely improve, and how confident can we be?”

Practical Application: Individualized Change Detection and Decision-Making

Group-level reliability statistics (TE, CV) provide a starting point, but individual athletes differ in their day-to-day variability. A more precise approach sets personal baselines from each athlete’s own test-retest standard deviation (Rebelo et al., 2026).

From this individual SD, thresholds for meaningful change can be established at different confidence levels:

Threshold	Confidence Level	Sensitivity
±1.0 SD	~68%	Sensitive — flags small changes.
±1.64 SD	~90%	Moderate — reduces false positives.
±2.0 SD	~95%	Conservative — high certainty required.

The choice of threshold depends on the cost of errors. Type I error (false positive) occurs when a practitioner reacts to a change that is actually noise. Type II error (false negative) occurs when a real change is missed. In high-performance environments where missing early signs of maladaptation carries significant cost — increased injury risk, accumulated fatigue, or performance decline — a more sensitive threshold (±1.0 SD) may be preferred. The cost of investigating a false alarm is low compared to the cost of ignoring a genuine warning sign (Rebelo et al., 2026).

In practice, this plays out through neuromuscular monitoring. During the season, practitioners compare each athlete’s CMJ data against their personal baseline, using individual CV and SWC to classify each session’s result as a positive effect, trivial change, or negative effect (Riboli et al., 2023). A persistent decline in CMJ height or RSImod beyond the individual’s SWC, combined with elevated training load, signals potential neuromuscular fatigue that warrants follow-up.

The quadrant model offers one way to integrate these signals. By plotting training load against a response variable (e.g., neuromuscular performance or subjective wellbeing), each athlete’s data falls into one of four quadrants: high load with maintained performance (adapting well), high load with declining performance (potential overreach), low load with maintained performance (recovery), or low load with declining performance (possible non-training stressor). This model does not provide automatic answers, but it structures the conversation between practitioners and coaches (Rebelo et al., 2026).

A daily variation in peak power output can be addressed with absolute thresholds (e.g., 5–10% decline) or relative thresholds (≥1 SD from individual baseline). Importantly, not every decline demands a load reduction. A drop may instead signal an opportunity for preparatory intervention — activation work, soft-tissue treatment, or modified warm-up — rather than removal from training (Brewer, 2022).

Monitoring is a decision-support tool that complements professional judgment. It does not replace the practitioner’s knowledge of the athlete’s training history, psychological state, or tactical context.

What High CV Means: Match-to-Match Variability and Contextual Interpretation

The concepts of CV and SWC apply not only to laboratory or field-based testing but also to match performance data. Match-to-match variability refers to the fluctuation in a player’s performance metrics across games, quantified by CV.

Analysis of a full La Liga season (380 matches) revealed that offensive technical variables — shots, shots on target, and assists — exhibited the highest CV across all contextual conditions. These variables were highly unstable regardless of team quality, opponent strength, venue, or match result (Liu et al., 2015). Defensive variables such as interceptions and clearances were comparatively more stable.

The factors that drove the greatest differences in variability were team quality and opponent strength. Their influence on CV was substantially larger than the effects of match venue (home vs. away) or match outcome (win vs. draw/loss) (Liu et al., 2015). This finding carries a direct implication: using high-CV variables as key performance indicators (KPIs) without contextual adjustment risks misattributing normal variability to player performance changes.

Similar patterns appear in physical output data. At the 2022 FIFA World Cup, match-to-match variability in most demanding passage (MDP) metrics differed by position and by context. Top-ranked teams showed particularly high variability in sprint and high-speed running during peak intensity passages, and match status (winning vs. not winning) influenced variability more than tournament stage or match result (Cortez et al., 2026).

These findings reinforce a broader point. Physical fitness and fatigue are not the only determinants of match running output. Tactical role, opponent quality, scoreline, and game state all contribute to the variability observed in match data (Murray & Clubb, 2022). A central midfielder’s total distance may drop 8% between two matches not because of fatigue, but because the team adopted a low-block defensive strategy against a stronger opponent.

When interpreting match data, practitioners should:

Avoid using single-match values as definitive indicators of fitness or fatigue.
Contextualise physical output by match status, opponent ranking, and tactical setup.
Recognise that high-CV variables (shots, assists, sprint counts) require larger samples to establish stable individual profiles.
Apply the same signal-to-noise logic used in testing: if the match-to-match CV for a variable exceeds the expected SWC, individual match values carry limited diagnostic value.

Key Takeaways

Every measurement contains instrumentation and biological noise. Typical Error (TE) quantifies this noise, while SWC defines the minimum signal size that warrants a decision.
CV (= TE / mean × 100) represents relative test reliability. Higher CV makes meaningful change harder to detect. Calculate CV in-house with at least 10 athletes and 3–4 trials.
A 95% CI means the interval would contain the true parameter 95% of the time under repeated sampling. Combining CI with SESOI allows estimating the probability that observed change is lower, trivial, or higher.
For individual monitoring, set personal baselines from test-retest SD and choose thresholds from ±1 SD (sensitive) to ±2 SD (conservative) based on context. In settings where missing early maladaptation signals is costly, prefer more sensitive thresholds.
When using high-CV variables (shots, assists, sprint metrics) as KPIs, recognise that contextual factors — team quality, opponent strength, match status — contribute to variability and must inform interpretation.

References

Brewer, C. (2022). Performance interventions and operationalizing data. In D. N. French & L. Torres Ronda (Eds.), NSCA’s Essentials of Sport Science. Human Kinetics.
Buchheit, M. & Simpson, B. M. (2017). Player-Tracking Technology: Half-Full or Half-Empty Glass?. International Journal of Sports Physiology and Performance, 12(s2), S2-35-S2-41. https://doi.org/10.1123/ijspp.2016-0499
Cortez, A., Yousefian, F., Folgado, H., Brito, J., Abade, E., Travassos, B., & Gonçalves, B. (2026). Performance profiles and match-to-match variability of the most demanding passages during the FIFA World Cup Qatar 2022 the effect of playing positions and match contextual factors. BMC Sports Science, Medicine and Rehabilitation. https://doi.org/10.1186/s13102-026-01578-z
Jovanović, M., Torres Ronda, L., & French, D. N. (2022). Statistical modeling. In D. N. French & L. Torres Ronda (Eds.), NSCA’s Essentials of Sport Science. Human Kinetics.
Liu, H., Gómez, M., Gonçalves, B., & Sampaio, J. (2015). Technical performance and match-to-match variation in elite football teams. Journal of Sports Sciences, 34(6), 509-518. https://doi.org/10.1080/02640414.2015.1117121
McGuigan, M. (2022). Profiling and Benchmarking. In D. N. French & L. Torres Ronda (Eds.), NSCA’s Essentials of Sport Science. Human Kinetics.
Murray, A. M., & Clubb, J. (2022). Analysis of tracking systems and load monitoring. In D. N. French & L. Torres Ronda (Eds.), NSCA’s Essentials of Sport Science. Human Kinetics.
Rebelo, A., Bishop, C., Thorpe, R. T., Turner, A. N., & Gabbett, T. J. (2026). Monitoring training effects in athletes: A multidimensional framework for decision-making. Sports Medicine. Advance online publication. https://doi.org/10.1007/s40279-026-02417-4
Riboli, A., MacMillan, L., Calder, A., & Mason, L. (2023). Player monitoring and practical application. In A. Calder & A. Centofanti (Eds.), Peak performance for soccer: The elite coaching and training manual. Routledge.
Varley, M. C., Lovell, R., & Carey, D. (2022). Data hygiene. In D. N. French & L. Torres Ronda (Eds.), NSCA’s Essentials of Sport Science. Human Kinetics.

Log in