Neuro-QoL measures are scored on the T-score metric. High scores mean more of the concept being measured.

PROMIS®, Neuro-QoL, ASCQ-MeSM, and many of the NIH Toolbox® measures use a T-score metric in which 50 is the mean of a relevant reference population and 10 is the standard deviation (SD) of that population.

On the T-score metric:

  • A score of 40 is one SD lower than the mean of the reference population.
  • A score of 60 is one SD higher than the mean of the reference population.

For Neuro-QoL measures, higher scores equals more of the concept being measured (e.g., more Fatigue, more Lower Extremity Function - Mobility). Thus a score of 60 is one standard deviation above the average referenced population. This could be a desirable or undesirable outcome, depending upon the concept being measured.

Neuro-QoL scores have a mean of 50 and standard deviation (SD) of 10 in a referent population.

  • Scores 0.5 – 1.0 SD worse than the mean = mild symptoms/impairment
  • Scores 1.0 – 2.0 SD worse than the mean = moderate symptoms/impairment
  • Scores 2.0 SD or more worse than the mean = severe symptoms/impairment

 interpreting neuro qol tscores

How do Neuro-QoL scores compare to a relevant reference population?

A unique aspect of Neuro-QoL measures is their use of standardized scores that are centered on a relevant reference population. Such scores are called “normative” because their value represents how close or far away they are from a normative population. The word “norm” has different meanings for different contexts. Here, we are not talking about social “norms,” the behaviors we expect from others and ourselves in society. We are also not talking about “normal” per se, even though the term originated from its reference to a standard, bell-shaped distribution curve that labels everything in the vast middle as normal. We use the word norm without applying judgment as to the “normality” of any given score relative to the distribution of scores seen on the same measure in a large group of people. Sometimes we refer to the group as a “reference group” and similarly to “norms” as reference values, because they are points of reference from which to understand a given single score.

The meaning of the score is defined by how it compares to the scores of others in a referent population.

Reference Populations

A T-score is a standardized score, like z-scores and IQ scores. All standardized scores have a “middle” score; it is zero for z-scores, 100 for IQ scores, and 50 for T-scores. This middle score is the mean of a large sample that is representative of a relevant population—a reference population. The large sample used to represent the reference population is called the Centering Sample.

For some Neuro-QoL measures the reference population (and the centering sample) was a clinical population. This is the case for Neuro-QoL Fatigue measures for example.

What does the middle score mean?

When developing a measure with standard scores, an important consideration is what the middle score means. The scores of such measure are purposefully “centered” at the mean of a specific sample or subsample. Neuro-QoL uses T-score, so the middle score is always 50. Centering scores in this way allows quick interpretation of where an individual is on a symptom or outcome compared to others in the reference population. A score of 50 on Neuro-QoL Anxiety, for example, is comparable to the U.S. “average”. T-scores have a standard deviation of 10, so a score of 60 would indicate anxiety that is a standard deviation higher than the U.S. average.

TIP: Failure to be specific about the reference population invites confusion.

This can all get very confusing because sometimes the calibration sample (the sample used to estimate item response theory parameters) and centering sample (the sample used to define the middle of the score range) were the same. But sometimes they were different. For example, a measure may be calibrated in a clinical sample but then centered in the general population. The mean of T=50 for that measure reflects the average in the general population, not the clinical sample.

Centering Sample and Calibration Sample

It is helpful to remember that the middle score of a standard score range has to be defined. For measures that use a T-score metric, 50 is the mean and 10 is the standard deviation, but they do not start out that way. The scores are first estimated using an item response model and the IRT-calibrated scores are transformed to a T-score metric using a linear transformation. But first you have to decide which score on the IRT metric is going to be the middle score—a score of 50. This is done by collecting scores from a large sample that represents the reference population and then calculating the mean for that sample. That score becomes the middle score (e.g., 50 for T-scores). A linear transformation spaces all other scores along the continuum so they have the correct values relative to the middle score (mean of the centering sample) used to represent the middle score.

IMPORTANT: The Centering Sample and the Calibration Sample may not be the same sample.

The purpose of a calibration sample is to estimate item parameters (item characteristics such as difficulty and discrimination) using an item response theory model. Here’s where it can get confusing. Sometimes a single sample was used as both the calibration sample AND the centering sample. Other times one sample was used as the calibration sample and another was used as the centering sample.

The Reference Population tables show the calibration and the centering samples for Neuro-QoL. Most users will be particularly interested in the last column (Centering Sample). If you want to know what a score in the middle is (e.g., 50 for those scored on a T-score metric), go to the Centering Sample column. For example, if you go to the row for Upper Extremity Function – Fine Motor, Activities of Daily Living, you will see that the item parameters were estimated (calibrated) using a hybrid of individuals getting care in neurology clinics and individuals from the general population. BUT, the centering sample was the general population. A score of 50 on this measure is comparable to the general population average level of upper extremity function.

Reference Populations for Adult Measures

  Neuro-QoL Adult
Bank/scale Calibration Sample Centering Sample
Anxiety General population General population
Depression General population General population
Fatigue Clinical sample Clinical sample
Upper Extremity Function - Fine Motor, ADL General population + Clinical sample General population
Lower Extremity Function –Mobility General population + Clinical sample General population
Cognitive Function Clinical sample General population*
Emotional and Behavioral Dyscontrol Clinical sample Clinical sample
Positive Affect and Well-Being General population General population
Sleep Disturbance Clinical sample Clinical sample
Ability to Participate in Social Roles and Activities General population General population
Satisfaction with Social Roles and Activities General population General population

*Cognition item bank was calibrated using a clinical sample. It was then linked to the PROMIS cognitive function item bank. The published parameters aligned to the general-population based PROMIS cognitive function parameters.


Reference Populations for Pediatric Measures

  Neuro-QoL Pediatrics
Bank/scale Calibration Sample Centering Sample
Anxiety General population General population
Depression General population General population
Anger General population General population
Fatigue General population General population
Cognitive Function General population General population
Social Relations - Interaction with Peers General population General population
Social Relations- Interaction with Adults General population General population
Stigma Clinical sample Clinical sample
Pain Clinical sample Clinical sample