Estimation of Measurement Uncertainty in Qualitative Testing
Last updated 2024-09-08
Measurement uncertainty (MU) is crucial in ensuring the reliability of test results, particularly in medical laboratories. While the conventional framework, as laid out in JCGM-GUM-3, is well-suited for quantitative testing, its application in qualitative tests (e.g., positive/negative outcomes) presents certain challenges. For this reason, JCGM-GUM-7 offers a more appropriate approach, modeling MU using the principles of probability distribution and Bayesian Probability.
JCGM-GUM-7 Framework and Bayesian Approach
In JCGM-GUM-7, MU is modeled using probability distributions, which can be used to quantify the MU in both qualitative and quantitative results. A significant aspect of this approach is the use of Bayesian Probability to integrate prior knowledge with observed data.
Unlike frequentist methods, which rely solely on the observed data, Bayesian methods combine prior information with data to update our beliefs about a hypothesis.
Binary Outcomes in Qualitative Testing
Most qualitative tests produce binary outcomes, such as positive vs negative or detected vs not detected. These outcomes can be easily modeled using the Bernoulli distribution. A useful way to analyze such results is through a 2x2 contingency table, which compares the test results with the true target condition (TC). Conventionally, the columns represent the true TC, while the rows represent the test results.
Here is a typical contingency table:
|
Target Condition |
| Positive |
Negative |
| Test Result |
Detected |
tp |
fp |
| Not Detected |
fn |
tn |
True Positive Rate (TPR) and False Negative Rate (FNR)
According to Bayesian Probability, the analysis of binary outcomes should be carried out under the context of the true target condition. When the condition is positive, we can calculate:
The probability of a positive test given the condition is positive:
\[P(\text{Test+}|\text{TC+}) = \text{TPR} = \frac{\text{tp}}{\text{tp} + \text{fn}}\]
The probability of a negative test given the condition is positive:
\[P(\text{Test-}|\text{TC+}) = \text{FNR} = \frac{\text{fn}}{\text{tp} + \text{fn}}\]
Similarly, when the condition is negative, we can calculate:
The probability of a negative test given the condition is negative:
\[P(\text{Test-}|\text{TC-}) = \text{TNR} = \frac{\text{tn}}{\text{fp} + \text{tn}}\]
The probability of a positive test given the condition is negative:
\[P(\text{Test+}|\text{TC-}) = \text{FPR} = \frac{\text{fp}}{\text{fp} + \text{tn}}\]
Predictive Values vs. Likelihood Ratios
Although Positive Predictive Value (PPV) and Negative Predictive Value (NPV) are commonly used in diagnostic tests, these metrics are significantly influenced by the prevalence of the condition in the population. On the other hand, Positive Likelihood Ratios (LR+) and Negative Likelihood Ratios (LR-) are much more robust indicators, as they are not directly affected by prevalence.
\[\text{LR+} = \frac{P(\text{Test+}|\text{TC+})}{P(\text{Test+}|\text{TC-})} = \frac{\text{TPR}}{\text{FPR}}\]
\[\text{LR-} = \frac{P(\text{Test-}|\text{TC-})}{P(\text{Test-}|\text{TC+})} = \frac{\text{TNR}}{\text{FNR}}\]
\[\text{PPV} = P(\text{TC+}|\text{Test+}) = \frac{\text{tp}}{\text{tp}+\text{fp}}\]
\[\text{NPV} = P(\text{TC-}|\text{Test-}) = \frac{\text{tn}}{\text{tn}+\text{fn}}\]
Example of Prevalence Impact
Let’s compare PPV, NPV, and likelihood ratios at different prevalence levels. Given the same test sensitivity, specificity and likelihood ratios, the PPV will be low due to the low prevalence, while the NPV and accuracy will be high due to the low prevalence.
|
| Equal Prevalence |
Unbalanced Prevalence |
Low Prevalence |
Very Low Prevalence |
| Prevalence % |
50.00 |
33.33 |
9.09 |
0.99 |
| Contingency Table |
|
|
|
|
| Sensitivity % |
91.03 |
| Specificity % |
94.14 |
| FNR % |
8.97 |
| FPR % |
5.86 |
| LR+ % |
15.53 |
| LR- % |
10.49 |
| PPV % |
93.95 |
88.59 |
60.83 |
13.44 |
| NPV % |
91.30 |
95.45 |
99.06 |
99.90 |
| Accuracy % |
92.59 |
93.10 |
93.86 |
94.11 |
Sources of Data for MU Analysis
Data for constructing the contingency table in the context of measurement uncertainty analysis can come from several sources:
- Validation (MV) studies conducted by the manufacturer and the user.
- Published literature on the performance of similar tests.
- Participation in external quality assurance (EQA) programs.
- Routine internal quality control (IQC) checks.
- Prior estimation of MU.
These data sources can be combined using Bayes’ Theorem, which is particularly useful in pooling information from different sources. When starting with minimal prior information, we often use a uniform non-informative prior such as Beta(1,1) for both TC.
Example of Combining Data Using Bayes’ Theorem
| Prior Values |
Target Condition |
| Positive |
Negative |
| Test Result |
Detected |
1 |
1 |
| Not Detected |
1 |
1 |
| Likelihood Values |
Target Condition |
| Positive |
Negative |
| Test Result |
Detected |
a |
b |
| Not Detected |
c |
d |
Given the above observation, the posterior probability according to Bayes’ Theorem is:
\[P(\text{posterior}) \propto P(\text{prior}) \times P(\text{likelihood})\]
given probability of Binomial(n, p) is:
\[P(x) = \left( \begin{array}{c}
n \\
x \end{array} \right) \centerdot p^x \centerdot (1-p)^{n-x}\]
\[P(x) \propto p^x \centerdot (1-p)^{n-x}\]
hence, for TC+
\[P(\text{posterior}) \propto (p^1 \centerdot (1-p)^1) \centerdot (p^a \centerdot (1-p)^c)\]
\[P(\text{posterior}) \propto p^{1+a} \centerdot (1-p)^{1+c}\]
similarly, for TC-
\[P(\text{posterior}) \propto p^{1+d} \centerdot (1-p)^{1+b}\]
which can be shown in this table:
| Posterior Values |
Target Condition |
| Positive |
Negative |
| Test Result |
Detected |
1 + a |
1 + b |
| Not Detected |
1 + c |
1 + d |
Examples of MU Estimation
Case 1
In this example, the laboratory uses data from the MV study and IQC to estimate MU.
| Year |
Prior |
Data |
Posterior |
LR+ |
LR- |
| - |
Uniform non-informative prior
|
MV
|
|
$$\frac{31/32}{2/32} = 15.50$$ |
$$\frac{30/32}{1/32} = 30.00$$ |
| 2023 |
Previous MU
|
IQC (2022)
|
|
$$\frac{81/82}{2/82} = 40.50$$ |
$$\frac{80/82}{1/82} = 80.00$$ |
| 2024 |
MU (2023)
|
IQC (2023)
|
|
$$\frac{134/135}{2/135} = 67.00$$ |
$$\frac{133/135}{1/135} = 133.00$$ |
Case 2
In this example, the laboratory uses data from the EQA program and IQC to estimate MU. The laboratory has started the test in year 2022 and joined the EQA program in year 2023.
| Year |
Prior |
Data |
Posterior |
LR+ |
LR- |
| EQA |
IQC |
| 2023 |
|
- |
|
|
$$\frac{51/52}{1/52} = 51.00$$ |
$$\frac{51/52}{1/52} = 51.00$$ |
| 2024 |
|
|
|
|
$$\frac{117/118}{2/118} = 58.50$$ |
$$\frac{116/118}{1/118} = 116.00$$ |
Case 3
In this example, the test used in the laboratory produces three possible outcomes: positive, intermediate, and negative. However, the technical manager has decided to group the positive and intermediate results together under a single category called non-negative, because this test is used for screening purposes, where false negatives are considered to have more serious consequences than false positives.
This reclassification simplifies the analysis by transforming the test outcomes into binary results, which is more compatible with our methodology. The technical manager or laboratory director would need to provide a clear justification for this reclassification, explaining why it is appropriate to combine the positive and intermediate results in the context of the test’s clinical use and objectives.
After the reclassification, measurement uncertainty (MU) estimation can proceed using the same approach as outlined in Case 1 and Case 2. With the test outcomes now categorized into non-negative (positive + intermediate) and negative results, we can apply the same Bayesian analysis framework.
Before reclassification:
| Before reclassification |
Target Condition |
| Positive |
Negative |
| Test Result |
Detected |
a |
b |
| Intermediate |
c |
d |
| Not Detected |
e |
f |
After reclassification, note that the number of false positives (b + d) has increased, while the number of false negatives remains unchanged from before the reclassification.:
| After reclassification |
Target Condition |
| Positive |
Negative |
| Test Result |
Non-(Not Detected) |
a + c |
b + d |
| Not Detected |
e |
f |
References
Table of Content