On the relative performance of diagnostic tests

We are continuing with the clinical research scenario previously discussed in the last two parts of this article series (Papageorgiou, 2020, 2021). Briefly, an orthodontist is trying to retrospectively assess the prevalence of gingival recession among patients within a large university clinic using two different diagnostic tests (Papageorgiou, 2020). These two tests have different diagnostic performances, which are summarised in Table 1. However, when applying a diagnostic test to a specific population, it is important to bear in mind that even if its diagnostic performance might remain the same, the prevalence of the disease within the selected sample might vary. Take, as an example, the development of gingival recession on any tooth of the dentition, measured at 15, 18 and 21 years of age. We also want to compare the contribution of orthodontic treatment on the development of gingival recession, so we have two patient groups: (1) one treated orthodontically between 12 and 15 years of age and checked again at 18 and 21 years of age for any recession; and (2) an untreated control group also checked for recession at 15, 18 and 21 years of age. Based on some existing data on this scenario (Renkema et al., 2013), we will assume that the true prevalence of gingival recession for the two patient groups at the corresponding timepoints varies as shown in Table 2. We will also assume the orthodontist is testing 200 treated patients and 200 untreated controls (equally divided with the two tests; each patient tested only once). In this scenario, which of the following statements is correct, if any?


A fictional scenario
We are continuing with the clinical research scenario previously discussed in the last two parts of this article series (Papageorgiou, 2020(Papageorgiou, , 2021. Briefly, an orthodontist is trying to retrospectively assess the prevalence of gingival recession among patients within a large university clinic using two different diagnostic tests (Papageorgiou, 2020). These two tests have different diagnostic performances, which are summarised in Table 1. However, when applying a diagnostic test to a specific population, it is important to bear in mind that even if its diagnostic performance might remain the same, the prevalence of the disease within the selected sample might vary. Take, as an example, the development of gingival recession on any tooth of the dentition, measured at 15, 18 and 21 years of age. We also want to compare the contribution of orthodontic treatment on the development of gingival recession, so we have two patient groups: (1) one treated orthodontically between 12 and 15 years of age and checked again at 18 and 21 years of age for any recession; and (2) an untreated control group also checked for recession at 15, 18 and 21 years of age. Based on some existing data on this scenario (Renkema et al., 2013), we will assume that the true prevalence of gingival recession for the two patient groups at the corresponding timepoints varies as shown in Table 2.
We will also assume the orthodontist is testing 200 treated patients and 200 untreated controls (equally divided with the two tests; each patient tested only once). In this scenario, which of the following statements is correct, if any?
(a) The observed risk ratio for the development of gingival recession due to orthodontic treatment (as compared with no treatment) will be the same for both diagnostic tests and will be 15%/5% = 3.0 at 18 years of age and 35%/16.7% = 2.1 at 21 years of age. (b) The observed risk ratio for the development of gingival recession due to orthodontic treatment (as to no treatment) will be different for the two diagnostic tests. (c) The percentage of true gingival recession that will be missed by the diagnostic test will be different between the two diagnostic tests; this will be independent from the follow-up examination. (d) The percentage of true gingival recession that will be missed by the diagnostic test will be different between the two diagnostic tests and will vary according to the follow-up examination. (e) If we now run significantly more tests (500 tests per group/timepoint instead of 100; total of 1000 instead of 200), the percentage of true gingival recession that will be missed by the diagnostic test will be improved (reduced).

Answers
In order to calculate the risk of recession for each of the investigated groups and timepoints, we first need to calculate how many recessions will be picked up from the test in each instance. Starting from the nominal assumed percentage prevalence, we can multiply this by the number of patients per test and reach the true number of recessions that will be present in the study sample ('True recessions' column in Table 3). As can be expected, the results are similar for the two diagnostic tests, since reality is conventionally not affected by which test we might be using. Calculating the recession found by the orthodontist is, however, another story since the test's diagnostic performance comes into play. We can multiply here the sensitivity of each test with the true recession that exists in a specific patient group per timepoint and arrive at the number of recessions the clinician will be able to identify ('Recessions found' column in Table 3). As can be seen, different numbers of recessions are found between the two diagnostic tests in the same group and at the same timepoint (for example 29/100 recessions at treated T21 with test 1 and 33/100 recessions at treated T21 with test 2). Therefore, the observed risk ratio for the development of gingival recession due to orthodontic treatment (as to no treatment) will be different for the two diagnostic tests. At 18 years of age it will be (13/100)/(4/100)=3.25 for diagnostic test 1 and (14/100)/(5/100)=2.80 for diagnostic test 2. Accordingly, at 21 years of age it will be (29/100)/ (14/100)=2.07 for diagnostic test 1 and (33/100)/ (16/100)=2.06 for diagnostic test 2. So statement (a) is wrong and statement (b) is correct.
As far as the number of true gingival recessions that are missed from our diagnostic test is concerned, it becomes obvious from Table 3 that this differs substantially between the two diagnostic tests. Test 1 has considerably lower sensitivity and therefore misses more recessions than test 2. At the same time, it is obvious that the rate of missed true recessions shows striking fluctuation, especially at T15 and T18, where the prevalence of recession is very low. This is  T15  T18  T21  T15  T18  T21  T15  T18  T21  T15  T18  T21   Treated  1   one of the pitfalls of having a diagnostic test that has been developed and validated for use on high-risk symptomatic patients but using it indiscriminately for screening a lowrisk asymptomatic general population. So, statement (c) is wrong and statement (d) is correct. In order to address statement (e), we can look solely at T21 to make things easier (

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.