View on GitHub

50 Senz of Sith

Estimation of Measurement Uncertainty in Qualitative Testing

Last updated 2024-09-08

Measurement uncertainty (MU) is crucial in ensuring the reliability of test results, particularly in medical laboratories. While the conventional framework, as laid out in JCGM-GUM-3, is well-suited for quantitative testing, its application in qualitative tests (e.g., positive/negative outcomes) presents certain challenges. For this reason, JCGM-GUM-7 offers a more appropriate approach, modeling MU using the principles of probability distribution and Bayesian Probability.

JCGM-GUM-7 Framework and Bayesian Approach

In JCGM-GUM-7, MU is modeled using probability distributions, which can be used to quantify the MU in both qualitative and quantitative results. A significant aspect of this approach is the use of Bayesian Probability to integrate prior knowledge with observed data.

Unlike frequentist methods, which rely solely on the observed data, Bayesian methods combine prior information with data to update our beliefs about a hypothesis.

Binary Outcomes in Qualitative Testing

Most qualitative tests produce binary outcomes, such as positive vs negative or detected vs not detected. These outcomes can be easily modeled using the Bernoulli distribution. A useful way to analyze such results is through a 2x2 contingency table, which compares the test results with the true target condition (TC). Conventionally, the columns represent the true TC, while the rows represent the test results.

Here is a typical contingency table:

		Target Condition
		Positive	Negative
Test Result	Detected	tp	fp
Test Result	Not Detected	fn	tn

True Positive Rate (TPR) and False Negative Rate (FNR)

According to Bayesian Probability, the analysis of binary outcomes should be carried out under the context of the true target condition. When the condition is positive, we can calculate:

The probability of a positive test given the condition is positive:

\[P(\text{Test+}|\text{TC+}) = \text{TPR} = \frac{\text{tp}}{\text{tp} + \text{fn}}\]

The probability of a negative test given the condition is positive:

\[P(\text{Test-}|\text{TC+}) = \text{FNR} = \frac{\text{fn}}{\text{tp} + \text{fn}}\]

Similarly, when the condition is negative, we can calculate:
The probability of a negative test given the condition is negative:

\[P(\text{Test-}|\text{TC-}) = \text{TNR} = \frac{\text{tn}}{\text{fp} + \text{tn}}\]

The probability of a positive test given the condition is negative:

\[P(\text{Test+}|\text{TC-}) = \text{FPR} = \frac{\text{fp}}{\text{fp} + \text{tn}}\]

Predictive Values vs. Likelihood Ratios

Although Positive Predictive Value (PPV) and Negative Predictive Value (NPV) are commonly used in diagnostic tests, these metrics are significantly influenced by the prevalence of the condition in the population. On the other hand, Positive Likelihood Ratios (LR+) and Negative Likelihood Ratios (LR-) are much more robust indicators, as they are not directly affected by prevalence.

\[\text{LR+} = \frac{P(\text{Test+}|\text{TC+})}{P(\text{Test+}|\text{TC-})} = \frac{\text{TPR}}{\text{FPR}}\] \[\text{LR-} = \frac{P(\text{Test-}|\text{TC-})}{P(\text{Test-}|\text{TC+})} = \frac{\text{TNR}}{\text{FNR}}\] \[\text{PPV} = P(\text{TC+}|\text{Test+}) = \frac{\text{tp}}{\text{tp}+\text{fp}}\] \[\text{NPV} = P(\text{TC-}|\text{Test-}) = \frac{\text{tn}}{\text{tn}+\text{fn}}\]

Example of Prevalence Impact

Let’s compare PPV, NPV, and likelihood ratios at different prevalence levels. Given the same test sensitivity, specificity and likelihood ratios, the PPV will be low due to the low prevalence, while the NPV and accuracy will be high due to the low prevalence.

Equal Prevalence

Unbalanced Prevalence

Low Prevalence

Very Low Prevalence

Prevalence %

50.00

33.33

9.09

0.99

Contingency Table

264	17
26	273

264	34
26	546

264	170
26	2730

264	1700
26	27300

Sensitivity %

91.03

Specificity %

94.14

FNR %

8.97

FPR %

5.86

LR+ %

15.53

LR- %

10.49

PPV %

93.95

88.59

60.83

13.44

NPV %

91.30

95.45

99.06

99.90

Accuracy %

92.59

93.10

93.86

94.11

Sources of Data for MU Analysis

Data for constructing the contingency table in the context of measurement uncertainty analysis can come from several sources:

Validation (MV) studies conducted by the manufacturer and the user.
Published literature on the performance of similar tests.
Participation in external quality assurance (EQA) programs.
Routine internal quality control (IQC) checks.
Prior estimation of MU.

These data sources can be combined using Bayes’ Theorem, which is particularly useful in pooling information from different sources. When starting with minimal prior information, we often use a uniform non-informative prior such as Beta(1,1) for both TC.

Example of Combining Data Using Bayes’ Theorem

Prior Values		Target Condition
Prior Values		Positive	Negative
Test Result	Detected	1	1
Test Result	Not Detected	1	1

Likelihood Values		Target Condition
Likelihood Values		Positive	Negative
Test Result	Detected	a	b
Test Result	Not Detected	c	d

Given the above observation, the posterior probability according to Bayes’ Theorem is:

\[P(\text{posterior}) \propto P(\text{prior}) \times P(\text{likelihood})\]

given probability of Binomial(n, p) is:

\[P(x) = \left( \begin{array}{c} n \\ x \end{array} \right) \centerdot p^x \centerdot (1-p)^{n-x}\] \[P(x) \propto p^x \centerdot (1-p)^{n-x}\]

hence, for TC+

\[P(\text{posterior}) \propto (p^1 \centerdot (1-p)^1) \centerdot (p^a \centerdot (1-p)^c)\] \[P(\text{posterior}) \propto p^{1+a} \centerdot (1-p)^{1+c}\]

similarly, for TC-

\[P(\text{posterior}) \propto p^{1+d} \centerdot (1-p)^{1+b}\]

which can be shown in this table:

Posterior Values		Target Condition
Posterior Values		Positive	Negative
Test Result	Detected	1 + a	1 + b
Test Result	Not Detected	1 + c	1 + d

Examples of MU Estimation

Case 1

In this example, the laboratory uses data from the MV study and IQC to estimate MU.

Year

Prior

Data

Posterior

LR+

LR-

-

Uniform non-informative prior

1	1
1	1

MV

30	1
0	29

31	2
1	30

$$\frac{31/32}{2/32} = 15.50$$

$$\frac{30/32}{1/32} = 30.00$$

2023

Previous MU

31	2
1	30

IQC (2022)

50	0
0	50

81	2
1	80

$$\frac{81/82}{2/82} = 40.50$$

$$\frac{80/82}{1/82} = 80.00$$

2024

MU (2023)

81	2
1	80

IQC (2023)

53	0
0	53

134	2
1	133

$$\frac{134/135}{2/135} = 67.00$$

$$\frac{133/135}{1/135} = 133.00$$

Case 2

In this example, the laboratory uses data from the EQA program and IQC to estimate MU. The laboratory has started the test in year 2022 and joined the EQA program in year 2023.

Year

Prior

Data

Posterior

LR+

LR-

EQA

IQC

2023

1	1
1	1

-

50	0
0	50

51	1
1	51

$$\frac{51/52}{1/52} = 51.00$$

2024

51	1
1	51

16	1
0	15

50	0
0	50

117	2
1	116

$$\frac{117/118}{2/118} = 58.50$$

$$\frac{116/118}{1/118} = 116.00$$

Case 3

In this example, the test used in the laboratory produces three possible outcomes: positive, intermediate, and negative. However, the technical manager has decided to group the positive and intermediate results together under a single category called non-negative, because this test is used for screening purposes, where false negatives are considered to have more serious consequences than false positives.

This reclassification simplifies the analysis by transforming the test outcomes into binary results, which is more compatible with our methodology. The technical manager or laboratory director would need to provide a clear justification for this reclassification, explaining why it is appropriate to combine the positive and intermediate results in the context of the test’s clinical use and objectives.

After the reclassification, measurement uncertainty (MU) estimation can proceed using the same approach as outlined in Case 1 and Case 2. With the test outcomes now categorized into non-negative (positive + intermediate) and negative results, we can apply the same Bayesian analysis framework.

Before reclassification:

Before reclassification		Target Condition
Before reclassification		Positive	Negative
Test Result	Detected	a	b
	Intermediate	c	d
	Not Detected	e	f

After reclassification, note that the number of false positives (b + d) has increased, while the number of false negatives remains unchanged from before the reclassification.:

After reclassification		Target Condition
After reclassification		Positive	Negative
Test Result	Non-(Not Detected)	a + c	b + d
Test Result	Not Detected	e	f

References

Table of Content