What is an important amount of change?
Defining the magnitude of change that corresponds to “important” change is necessary for applications such as comparative effectiveness research. There are many terms for these levels of changes (e.g., clinically important change, minimally important difference, minimally perceptible change) and many methods for estimating them.
TIP: THERE IS NO SUCH THING AS THE “TRUE” DIFFERENCE. THE MAGNITUDE OF AN IMPORTANT SCORE DIFFERENCE IS AN ESTIMATE, AND THE ESTIMATE REFLECTS THE VALUES, CONCERNS, AND CONTEXT OF THE ESTIMATOR. AS MADELINE KING EXPRESSES:
“Specific estimates of minimally important differences (MIDs) should therefore not be over interpreted. For a given health-related quality-of-life scale, all available MID estimates (and their confidence intervals) should be considered, amalgamated into general guidelines and applied judiciously to any particular clinical or research context.”
Watch this 3-minute video about meaningful change in PROMIS scores.
Estimates of Meaningful Change Thresholds for HealthMeasures
Evidence continues to accumulate regarding reasonable estimates of meaningful change score thresholds. Several studies of PROMIS® measures have calculated estimates of meaningful important differences. These are summarized below. There are several things you should keep in mind when using these estimates.
- There are different methods for calculating (see below for more information) and they will yield different estimates of a meaningful difference. Also, analyses in different samples and contexts will lead to different estimates.
- Picking an MID requires a judgment on your part. You should consider, for example, the use of the MID. Estimates on the lower end of the MID range might be appropriate for group comparisons. You might choose an estimate at the higher end to categorize changes in individuals.
- The estimates below are only those published through 2017. Check PubMed to see if additional MID studies have been published.
- If there is no empirical literature on which to base an MID estimate, you may want to use a half standard deviation (5 points on a T-score metric). This choice is not without its controversy, however.
- Finally, you should keep in mind that most MID estimates are an average across the range of scores. It may be that people require more or less change to consider it meaningful depending on where they started.
Below we report which administrations of a measure (e.g., 7-item form, CAT) were used to derive the MID estimates. This does not imply that there are different estimates for different forms.
PROMIS ADULT MID ESTIMATES
- Change of 2.5-4.5 points (used 17 item short form) Yost
- Change of 3.0-5.0 points (used 7 item short form) Yost
Based on Neuro-QoL short form, which shares metric with PROMIS Kozlowski:
- Lower Quartile: Change of 4.7-12.2 points
- Middle Half: Change of 4.7 – 5.1 points
- Upper Quartile: Change of 5.0-11.3 points
- Change of 3.5-5.5 points (used CAT with back pain samples) Amtmann
- Change of 2-3 points (used short forms with chronic pain samples) Chen
- Change of 3.5-4.5 points (used short forms with stroke sample) Chen
- Change of 2.35-2.4 points (used short form with knee OA sample) Lee
- Change of 4.0-6.0 points (used 10 item short form with cancer sample) Yost
- Change of 2 points (used 20 item short form with RA sample) Hays
- Change of 1.9-2.2 points (used short form with knee OA sample) Lee
- Change of 4-6 points (used 10 item short form with cancer sample) Yost
- Anxiety: Change of 2.3-3.4 points (used short form with knee OA sample) Lee
- Anxiety: Change of 3.0-4.5 points (used 9 item short form with cancer sample) Yost
- Depression: Change of 3.0-3.1 points (used short form with knee OA sample) Lee
- Depression: Change of 3.0-4.5 points (used 10 item short form with cancer sample) Yost
PROMIS PEDIATRIC MID ESTIMATES
- Depression: Change of 2-3 points Thissen
- Pain Interference: Change of 2-3 points Thissen
- Fatigue: Change of 2-3 points Thissen
- Mobility: Change of 2-3 points Thissen
- Morgan and colleagues found that estimates of MIDs varied by domain, the severity of symptom/dysfunction, and by who was making the judgement (pediatric patient, parent, or clinician).
Neuro-QoL Adult Measures
Conditional minimal detectable change (cMDC) values were estimated for 14 Neuro-QoL measures. Estimates vary based upon the patient’s symptom/dysfunction severity. Index tables and an interactive Excel workbook that calculates cMDCs are available. (Kozlowski)
Methods for Estimating Meaningful Change
Excellent reviews of methods for estimating meaningful change have been published. Typically such methods are divided into distribution-based and anchor based methods of estimation. Crosby and colleagues describe the differences between these strategies. A particularly helpful set of recommendations has been published by Revicki and colleagues. Other useful resources are written by Streiner and deVet. New methods for estimating important change are emerging (see Cook and Thissen).
No single method for defining meaningful change is adequate. Evaluations of what constitutes meaningful change should be based on multiple sources of evidence. To the extent possible, estimates should be grounded in the context of use. It is prudent in setting threshold estimates to consider the consequences of those estimates (e.g., what is the cost-benefit ratio of a lower versus a higher threshold for a given context).
Streiner, D. L., Norman, G. R., & Cairney, J. (2015). Health measurement scales: a practical guide to their development and use. Oxford University Press, USA
Using PRO Scores to Define Responders
It has long been recognized that statistically significant differences are not equivalent to clinically meaningful differences. Further, there are different standards for what constitutes meaningful individual- and group-level differences. “Responder definitions” estimate the threshold of score change that can be judged, defensibly, as a meaningful change. A number of empirical methods can be used to establish responder thresholds (e.g., Coon 2017, Cappelleri 2014). The FDA recommends identifying an a priori responder threshold and reporting the number of responders in different clinical arms (see the FDA Guidance for Industry). This information is used to reveal the impact of interventions at the individual, not just the group-level.