In this installment and the next, we tackle one of the knottier problems that the laboratory faces - how to provide the clinicians with a reference range (RR, reference interval, normal range) suitable to the population they serve. To have a bench mark against which to compare a test result is imperative, for without them how would the clinician know whether a serum porcelain value was low, or high, or normal? How different, if at all, is it to have a reference interval that is inappropriate for your population by, say, 10%? Is that not similar to a test result being wrong by 10%? Are both of them not misleading to the clinician?
When you set out to verify (to make sure or demonstrate that [something] is true, accurate, or justified; as opposed to establish, to create or begin, to bring forth) an RR, you need to know several things about the RR or expected values for your method. Here is a wish list of what we would like to see in the RR in a package insert.
How were they selected?
Were they "apparently healthy?" (What criteria were used to determine "apparently"?)
Were the samples from blood donors? From Employees? Etc?
Where were they from? Is your local environment a variable, as might be the case with vitamin D? Is ethnicity a variable, as is the case with CPK?
How many were in your sample?
Of what ages?
During what time of day (if that is a variable, e.g., cortisol)
During what time of month for female sex hormones?
How were the data presented?
Mean ± 2SD? Requires a symmetrical distribution. [As many as 70% of analytes measured by a laboratory are not symmetical.]
Central 95% range from a symmetrical distribution?
Central 95% range from a visually skewed distribution?
Was the histogram shown?
By sex? By age?
Here is an example to illustrate how to scrutinize a package insert's RR. The example that follow has been adapted from an existing package insert to illustrate our points.
"Data were obtained from serum samples of apparently healthy individuals. Based on a central 95% interval, the following reference ranges were established:
Sample Category N Cortisol Range
A.M. serum (7-9 A.M.) (N=125) 4.30-22.40(µg/dL)
P.M. serum (3-5 P.M.) (N=124) 3.09-16.66"
Let's elaborate a bit on the italicized portions of the above example. First, apparently healthy-to whom did they appear healthy? What criteria were used to label them "apparently healthy"? Was it a verbal questionnaire, a written questionnaire, or based on other laboratory tests, etc? We mention this to make you aware of how variable "apparently healthy" can be. We will have more to say about this in part two of this topic in two weeks. Second, central 95%--this is a useful tool as it eliminates the question of whether the data were skewed either high or low. Elaborating on this idea, consider the histogram below which represents an RR validation using only 40 samples. Based on the fact the data are within ±2 SDs and appear symmetrical, we will presume the data are normally distributed.
On the other hand, the histogram below on the left appears skewed to the high side. In addition, the -2SD value is below the lowest point. Both of these factors suggest that the data are skewed to the high side; and therefore, the x ¯±2SDs is unsatisfactory to validate the RR. However, taking the central 95% (by discarding the lowest and highest 2.5% of the values) yields a histogram below on the right that is more symmetrical, or more "normal." If N = 40, you will discard only 1 at the low end and one at the high end. We feel that for verifying the RR, it is acceptable to omit 2 at the low end and 2 at the high end.
The next time that you are faced with verifying a reference range, we hope that you will use these comments we made about these histograms, so that you will feel confident in your conclusions.
Let's imagine that you have data from 40 samples that reflect your desired population. The first thing to do with your 40 values is to identify the minimum and maximum. If the minimum and maximum are within the range of the package insert, then consider the package insert range acceptable -- verified. Otherwise, discard the 2 lowest and 2 highest values and then identify the min and max again. If by discarding 4 values (10%), your statistics are within the package insert range, then consider the package insert range verified. If your range based on your 90% range is still outside the package insert range, that is either the minimum, the maximum, or both, exceed the package insert you may need to establish your own RR. That is the subject of the next installment.
Continuing our discussion with the package insert, the fact that we have both A.M. and P.M. samples is helpful as cortisol levels differ during the day.
We feel that the N's of 124 and 125 are adequate, but it would be helpful to know how many male and female samples were included, and what their age ranges were. For your verification study, 40 samples would be good; more is better, especially if there are differences between males and females and still more important if there are differences in age groups.
As always, feel free to send comments and suggestions and questions to firstname.lastname@example.org