# PROMIS® Reference Populations

## PROMIS measures use scores that have meaning. A PROMIS score of 50 is the average (or mean) score for a specific, relevant group of people (e.g., the U.S. general population, kids with a painful condition). That group is the reference population.

PROMIS measures use scores that are more than just numbers: PROMIS scores have meaning. For example, a PROMIS measure score of 50 is the mean score for that measure. But average for whom? This is where “score meaning” comes in. Each PROMIS measure has its own reference population, that is, its own specific group of people (e.g., the U.S. general population, kids with a painful condition) who have been sampled and then thoroughly assessed with the PROMIS measure. The average score of 50 for that PROMIS measure is their average score. Thus, when someone is newly assessed with a PROMIS measure, and we observe that her score is 50, we have a score value, and we can say that her score is the same score as the average score for the PROMIS measure’s reference population – the population group to which we turn to or refer for score meaning.

#### Reference Populations

A T-score is a standardized score, like z-scores and IQ scores. All standardized scores have a “middle” score; it is zero for z-scores, 100 for IQ scores, and 50 for T-scores. This middle score is the mean of a large sample that is representative of a relevant population—a reference population. The large sample used to represent the reference population is called the Centering Sample.

For some PROMIS measures the reference population (and the centering sample) was a clinical population. This is the case for PROMIS Smoking measures, for which the reference populations was daily smokers; the centering sample was a sample of daily smokers. For many PROMIS measures, the reference population was the 2000 General US Census. The centering sample was a large sample of individuals who represented the 2000 US General Census.

#### Centering Sample and Calibration Sample

It is helpful to remember that the middle score of a standard score range has to be defined. For measures that use a T-score metric, 50 is the mean and 10 is the standard deviation, but they do not start out that way. The scores are first estimated using an item response model and the IRT-calibrated scores are transformed to a T-score metric using a linear transformation. But first you have to decide which score on the IRT metric is going to be the middle score—a score of 50. This is done by collecting scores from a large sample that represents the reference population and then calculating the mean for that sample. That sample is the centering sample.  That score becomes the middle score (e.g., 50 for T-scores). A linear transformation spaces all other scores along the continuum so they have the correct values relative to the middle score (mean of the centering sample) used to represent the middle score.

IMPORTANT: The Centering Sample and the Calibration Sample may not be the same sample.

The purpose of a calibration sample is to estimate item parameters (item characteristics such as difficulty and discrimination) using an item response theory model. Here’s where it can get confusing. Sometimes a single sample was used as both the calibration sample AND the centering sample. Other times one sample was used as the calibration sample and another was used as the centering sample.  Sometimes, a subset of the calibration sample served as the centering sample.

The Reference Population tables show the calibration and the centering samples for PROMIS. Most users will be particularly interested in the last column (Centering Sample). If you want to know what a score in the middle is (e.g., 50 for those scored on a T-score metric), go to the Centering Sample column. For example, if you go to the row for PROMIS-Cancer-Anxiety you will see that the item parameters were estimated (calibrated) using a hybrid of individuals with cancer and individuals from the general population. BUT, the centering sample was the general population. A score of 50 on this measure is comparable to the general population average level of anxiety.

#### What does the middle score mean?

When developing a measure with standard scores, an important consideration is what the middle score means. The scores of such measure are purposefully “centered” at the mean of a specific sample or subsample. PROMIS uses T-score, so the middle score is always 50. Centering scores in this way allows quick interpretation of where an individual is on a symptom or outcome compared to others in the reference population. A score of 50 on PROMIS Fatigue, for example, is comparable to the U.S. “average”. T-scores have a standard deviation of 10, so a score of 60 would indicate fatigue that is a standard deviation higher than the U.S. average.

TIP : Failure to be specific about the reference population invites confusion.

This can all get very confusing because sometimes the calibration sample (the sample used to estimate item response theory parameters) and centering sample (the sample used to define the middle of the score range) were the same. But sometimes they were different. For example, a measure may be calibrated in a clinical sample but then centered in the general population. The mean of T=50 for that measure reflects the average in the general population, not the clinical sample.

* Hybrid: items that did not have DIF between general population and cancer patients used the PROMIS parameters. DIF items used cancer-based parameters. All items included in fatigue item bank did not have DIF and thus all used PROMIS parameters.

 PROMIS Pediatric Bank/Scale Calibration Sample Centering Sample Global Health General population General population Emotional Distress - Anger (v3.0) General population General population Emotional Distress - Anxiety (v3.0) General population General population Emotional Distress - Depression (v3.0) General population General population Cognitive Function General population General population Life Satisfaction General population General population Meaning and Purpose General population General population Psychological Stress Experiences General population General population Positive Affect General population General population Stigma Clinical sample (Children with chronic conditions) Clinical sample (Children with chronic conditions) Stigma - Skin Clinical sample (Children with chronic conditions) Clinical sample (Children with chronic conditions) Fatigue (v3.0) General population General population Itch (PIQ-C) Clinical sample (Children with skin conditions) General population Pain - Behavior (v3.0) General population General population Pain - Interference (v3.0) General population General population Mobility (v3.0) General population General population Upper Extremity (v3.0) General population General population Sleep Disturbance General population General population Sleep-Related Impairment General population General population Physical Activity General population General population Physical Stress Experience General population General population Strength Impact General population General population Asthma Impact Clinical sample Clinical sample Peer Relationships (v3.0) General population General population Pain Quality (v3.0) General population General population Pain Quality - Affective (v3.0) General population General population Pain Quality - Sensory (v3.0) General population General population Family Relationships General population General population
 PROMIS Early Childhood Parent-Report Bank/Scale Calibration Sample Centering Sample Global Health General population General population Anger/Irritability General population General population Anxiety General population General population Engagement – Curiosity General population General population Engagement – Persistence General population General population Depressive Symptoms General population General population Physical Activity General population General population Positive Affect General population General population Self-Regulation – Flexibility General population General population Self-Regulation – Frustration Tolerance General population General population Sleep Health General population General population Social Relationships (Child-Caregiver, Family, Peer) General population General population
 PROMIS Parent Proxy Bank/Scale Calibration Sample Centering Sample Global Health General population General population Emotional Distress - Anger (v3.0) General population General population Emotional Distress - Anxiety (v3.0) General population General population Emotional Distress - Depression (v3.0) General population General population Cognitive Function General population General population Life Satisfaction General population General population Meaning and Purpose General population General population Psychological Stress Experiences General population General population Positive Affect General population General population Stigma Clinical sample (Parents of children with chronic conditions) Clinical sample (Parents of children with chronic conditions) Stigma - Skin Clinical sample (Parents of children with chronic conditions) Clinical sample (Parents of children with chronic conditions) Fatigue (v3.0) General population General population Itch Clinical sample (Parents of children with skin conditions) General population Pain - Behavior (v3.0) General population General population Pain - Interference (v3.0) General population General population Mobility (v3.0) General population General population Upper Extremity (v3.0) General population General population Sleep Disturbance General population General population Sleep-Related Impairment General population General population Physical Activity General population General population Physical Stress Experience General population General population Strength Impact General population General population Asthma Impact Clinical sample Clinical sample Peer Relationships (v3.0) General population General population Family Relationships General population General population

#### Norms

A unique aspect of PROMIS measures is their use of standardized scores that are centered on a relevant reference population. Such scores are called “normative” because their value represents how close or far away they are from a normative population. The word “norm” has different meanings for different contexts. Here, we are not talking about social “norms,” the behaviors we expect from others and ourselves in society. We are also not talking about “normal” per se, even though the term originated from its reference to a standard, bell-shaped distribution curve that labels everything in the vast middle as normal. We use the word norm without applying judgment as to the “normality” of any given score relative to the distribution of scores seen on the same measure in a large group of people. Sometimes we refer to the group as a “reference group” and similarly to “norms” as reference values , because they are points of reference from which to understand a given single score.

For example, Jensen et al published reference values for eight PROMIS domains for individuals with cancer. The mean Pain Interference score for people with cancer was 52. Learn more>>

For some PROMIS measures there are subpopulation norms.

#### Subpopulation Norms

Norms are based on the distribution of scores on a measure for a well-characterized and relevant population. For example, I am 5’6” tall. I might be interested in comparing my height to other people in the world, or maybe just the United States since that’s where I live. On the other hand, I am a woman and so average height without considering gender doesn’t really matter to me. I would be interested in knowing my height relative to other women in the United States. So I might be interested in comparing my height to the mean height of other persons in the United States. According to Wikipedia, the average height for women in the United States is 5’4” and therefore I’m feeling pretty tall right now. I would not feel quite so tall if I were comparing myself to men in the United States (5’9”); and, I would only be eye-to-eye with the average woman in the Netherlands (5’6”), which brings us, finally to normative score comparisons for health outcomes measures.

Let’s start with a distinction between norms that are used for comparative purposes and norms that are used to anchor a scale. Many of the PROMIS measures (see elsewhere on this site) are centered on the mean score of a sample of individuals that, collectively, matched the US 2000 General Census with respect to important demographics (e.g., gender, age, race/ethnicity, education). The beauty of folding the norms into the metric is that it is easy to interpret a score relative to the population whose norm was used. With the PROMIS scores that are centered on the US Census population, the mean is 50 and standard deviation is 10. So, if my fatigue score is 60, I know that I’m not just feeling tired; my fatigue is one standard deviation above the general population (or at least a sample that matched it). But, as of this writing, I’m 62 years old. I’d like to know how my fatigue compares to people my age. That’s where sub-norms can be helpful.

Sub-norms divide a relevant population into subgroups to aid interpretation of scores. Above, I compared my height to that of women in the US, not to all people in the world or even to all people in the US. I found it more relevant to compare my height to a sub-norm value—the mean height of women in the US.

It is theoretically possible to develop sub-norms for scores for a measure based on any relevant population; though such data collection can be expensive. As HealthMeasures continued be used and more data accumulates, however, it may become practical to develop many sub-norms for comparing and interpreting scores.

Fortunately, the initial norming sample for PROMIS was quite large and it was feasible to disaggregate by gender and age ranges in order to estimate sub-group norms. This was done in 2011 for comparing the fatigue and pain by age range in the general population to that of samples of individuals with disabilities. The gender and age range norms were calculated for all PROMIS measures developed in the first phase of PROMIS testing. These were never published but are provided here for users who are interested. Means, standard deviations, and frequencies by domain are reported in the tables below.

#### WARNING

The original PROMIS norming sample was not powered to develop subgroup norms. The user should pay particular attention to the size and characteristics of the sample used to develop each sub-norm. For example, much larger samples were used to calculate sub-norms for males and females than for the age ranges. More confidence is warranted for sub-norms estimated with larger sample sizes. Nevertheless, these sub-norms can be useful both in comparing samples and interpreting scores. For reference, consider how they were used in papers by Cook et al, and Molton et al.

#### Gender and Age Range Sub-norms for Adult PROMIS Measures Centered on the US General Census 2000

 Domain Gender Age in Years Female Male 18-34 35-44 45-54 55-64 65-74 75+ Anger N 1865 1204 730 565 499 495 401 379 Mean 50.6 49.1 53.0 51.5 50.4 48.8 47.5 45.7 SD 10.2 9.6 10.7 10.3 9.5 9.7 8.7 7.9 Anxiety N 1654 1069 659 496 417 442 365 345 Mean 50.9 48.6 52.4 50.9 50.1 49.3 48.1 46.9 SD 10.2 9.5 10.7 11.1 9.5 9.5 8.8 7.9 Depression N 1269 890 496 366 359 373 290 276 Mean 50.9 48.7 52.3 50.6 50.8 49.5 48.4 46.5 SD 10.1 9.7 10.9 10.9 10.0 9.7 8.8 7.2 Fatigue N 1884 1183 706 551 513 516 396 385 Mean 51.1 48.2 50.5 51.0 51.6 49.7 48.1 48.0 SD 10.1 9.6 9.7 10.7 10.1 10.8 9.3 8.3 Pain Behavior N 1851 1199 699 561 507 507 402 374 Mean 50.7 49.0 47.6 50.0 52.2 51.3 50.1 49.7 SD 10.1 9.7 10.2 10.6 10.1 9.7 9.3 8.7 Pain Interference N 1856 1180 712 548 499 488 406 383 Mean 51.1 48.3 47.8 50.1 51.9 51.6 49.9 49.7 SD 10.3 9.3 9.0 10.2 11.1 10.9 9.3 8.7 Physical Function N 2044 1363 782 605 567 565 457 431 Mean 48.9 51.7 55.1 52.0 49.0 47.5 47.2 45.6 SD 10.0 9.7 8.4 9.8 10.4 10.4 9.0 8.5 Global Mental Health N 3008 2206 1183 863 902 873 715 679 Mean 49.4 50.8 48.5 48.4 48.2 50.3 53.1 53.4 SD 10.0 10.0 9.7 10.4 10.3 10.5 8.8 8.4 Global Physical Health N 3015 2212 1182 865 910 875 713 683 Mean 49.1 51.2 51.6 50.1 48.2 48.8 51.0 49.9 SD 10.1 9.8 8.4 9.8 10.9 11.3 9.9 9.2

#### Percentiles

A percentile can be used to reflect how an individual’s score compares to a reference population. Carle and colleagues (2021) estimated percentiles for many PROMIS Pediatric and Parent Proxy v2.0 (Anger, Anxiety, Depressive Symptoms, Fatigue, Mobility, Pain Behavior, Pain Interference, Peer Relationships, Upper Extremity Function) and v1.0 (Family Relationships, Global Health, Life Satisfaction, Meaning and Purpose, Physical Activity, Physical Stress Experiences, Positive Affect, Psychological Stress Experiences, Sleep Disturbance, Sleep Impairment) measures.

Last updated on 4/29/2024