- Posts: 75
The HealthMeasures team in collaboration with the PROMIS Health Organization drafted a document that summarizes our position about modifying HealthMeasures items. It identifies some modifications that are welcome as well as the approach needed to evaluate modifications.
A 1997 publication by Keller et al provides an excellent example of comparing an original and modified measure. In this case, the time frame was altered from four weeks to one week.
Keller SD, Bayliss MS, Ware JE Jr, Hsu MA, Damiano AM, Goss TF. Comparison of responses to SF-36 Health Survey questions with one-week and four-week recall periods. Health Serv Res. 1997;32(3):367–384.
Our philosophy is that measures can be improved and we welcome others to make improvements. We believe measurement problems or measure improvement are based upon quantitative and/or qualitative data, not expert opinion alone. If a user feels that s/he can improve a HealthMeasure, s/he is welcome to test this modification. Ideally, the problem is identified with data (e.g., respondent misinterpretation of an item, differential item functioning [DIF]). If that user generates data demonstrating that the modification does in fact improve the measure, s/he can submit this modification to HealthMeasures for adoption. In contrast, modifications based upon opinion alone (e.g., “I think it would be easier to understand PROMIS scores if higher T-scores always meant bad.”; “I think calibrations should be country-specific.”) will not be adopted by HealthMeasures.
That said, the HealthMeasures scientific teams have invested significant resources in measure development and have made decisions on some aspects of measurement that are not likely to be revised. Specifically:
• T-score Metric: PROMIS and Neuro-QoL utilize a T-score metric.
• Direction of Scores: PROMIS and Neuro-QoL are scored so that high = more of the concept being measured. Sometimes this is a good thing (e.g., physical function) and sometimes it is a bad thing (e.g., fatigue). This decision was made through a deliberate consensus process. Requests to alter the direction of scoring are likely to be declined. However, users are welcome (and sometimes encouraged) to alter the display of scores to facilitate interpretability. For example, utilize two y-axes so that “up” can mean “good” for all measures. However measures are displayed, a viewer should be able to identify the T-scores.
• Measure Names: The measure name should not be altered in publications. However, it is acceptable to remove the measure name on a respondent-facing interface or alter a name seen by respondents that may be stigmatizing (e.g., changing “Depression” to “Mood”).
• Item-Level Calibrations: We support the calculation of population or country-specific norms over recalibrating measures for a specific population or language. Having a common metric across samples and languages is one of the benefits of using HealthMeasures instruments. This is possible through maintaining a single set of calibrations, i.e., discrimination and location parameters. Given evidence of differential item functioning (DIF) by language, however, we support the adoption of non-English language-based calibrations for the particular items showing non-trivial DIF (provided these local item calibrations are then rescaled to the U.S. mean and SD). Calculation and reporting of population- or country-specific norms or percentiles is valuable and supported (e.g., the mean in the general population of a non-U.S. country may not be T=50). We further support research into measurement problems (e.g., analyzing for DIF) and also support research modifications (e.g., removing an item) to improve a measure’s performance.
• Creating Disease or Condition-specific Measures: PROMIS, Neuro-QoL, and NIH Toolbox developed measures to be appropriate across health conditions. This was due in part to the need to improve measurement in individuals with multiple chronic conditions. Sometimes a user requests to modify the respondent’s instructions to answer thinking only of one condition (e.g., “In answering the items below, think about the impact of your ARTHRITIC KNEE on your physical function.” This modification is really changing the nature of the question. It would be reasonable to consider if the original scoring can be used with such a modification or if utilization of raw scores or recalibration would be more appropriate. Research across several studies and multiple disease areas have cast doubt on whether respondents can make these differential attributions. If this modification is made, we would recommend administering the item in its original and modified version separated by other questions or, conduct a randomized trial of one way versus the other. Then, analyze the results to see if there is a difference.
HealthMeasures supports alternative approaches that retain item syntax and current calibrations. For example, researchers have had patients and clinicians review items from a given item bank to select items most relevant for a condition or to identify items that may not be appropriate for use in a given condition (e.g., Cook KF et al 2011 Quality of Life Research). After this review, others have then also constructed new items to reflect domain content not included in the original measure (ideally co-calibrated with the existing measure, e.g., Schifferdecker KE et al 2018 Quality of Life Research).
When translating measures developed in English into other languages, it is sometimes found that a modification to the English item would facilitate translations (e.g., inclusion of a metric equivalent). All modifications related to translations are to be directed to the Director of Translations, Helene Correia. The Director of Translations will decide on modifications in all cases.
HealthMeasures users often ask if they can make modifications to measures in English. Sometimes this is fine. Other times we think it should be studied, and sometimes we discourage the modification.
These changes to the respondent interface are acceptable:
• Removing the item IDs, response scores, and/or measure name
• Removing the domain header from a PROMIS Profile
• Altering the order of items in a fixed length short form or profile
• Adding additional items to an assessment that includes a HealthMeasure (e.g., adding an experimental item at the end of a Neuro-QoL short form) that does not contribute to the Neuro-QoL score
• Presenting self-report items in a grid-like format with a scroll bar on a web-based interface (versus presenting items one at a time on a screen as done in Assessment Center and the NIH Toolbox and PROMIS iPad apps)
• Altering the location of the context. For example, including the context in the instruction (e.g., “Please respond to the questions below thinking about how you have been feeling in the past 7 days.”) This example requires that the instruction is always visible when responding to the associated items. This would most likely be a grid-like format with multiple items on a screen.
• Underlining, italicizing, or bolding text (e.g., “In the past 7 days”, “How would you rate your pain on average?”) or removing underlining, italics, or bolded text. We don’t encourage this, but if there is reasonable argument (e.g., Data collection system does not support underlining in its user interface), we can accept it.
In general, PROMIS items should not be modified unless there is data that supports that modification. This data would include input from patients showing problems with the original item and reduction in that error with a modified item. For example, sometimes, researchers are concerned that an item won’t function well as-is with a specific population. One strategy would be to evaluate the original item and a proposed modified item using cognitive interviews with the target patient population. Demonstrate that patients misinterpret or are unable to answer the item. Demonstrate that the proposed modified item reduces or eliminates that error.
If the researcher would like for this modification to be adopted into the PROMIS measure, it would then be necessary to test the modified, experimental PROMIS item and original item in the same assessment with a larger sample. Data analyses should include evaluating the experimental item’s performance (e.g., testing for differential item functioning, responsiveness, correlations with other variables). As noted by Dr. Hays, calibrating the new item on the PROMIS metric is ideal. The PROMIS Health Organization is very interested in supporting improvements to PROMIS measures. If you feel a modification improves the interpretability or performance of an item, please contact HealthMeasures so that this improvement can be integrated into the measure.
Whether an item was modified for use in your own study or you want the modification to be adopted by PROMIS, describe the modification in any presentations or publications. Identify it as a modified item.