Early stopping in clinical PET studies: How to reduce expense and exposure

Clinical positron emission tomography (PET) research is costly and entails exposing participants to radioactivity. Researchers should therefore aim to include just the number of subjects needed to fulfill the purpose of the study. In this tutorial we show how to apply sequential Bayes Factor testing in order to stop the recruitment of subjects in a clinical PET study as soon as enough data have been collected to make a conclusion. By using simulations, we demonstrate that it is possible to stop a study early, while keeping the number of erroneous conclusions low. We then apply sequential Bayes Factor testing to a real PET data set and show that it is possible to obtain support in favor of an effect while simultaneously reducing the sample size with 30%. Using this procedure allows researchers to reduce expense and radioactivity exposure for a range of effect sizes relevant for PET research.

This corresponds to a posterior probability of 75% for H1 and 25% for H0. Although posterior odds are the natural extension of the BF, researchers tend to only report the BF as a stand-alone metric for assessing different hypotheses, due to the inherent subjective nature of specifying prior odds.

Functional form of the Cauchy distribution
The functional form of the Cauchy density function can be expressed as: where x0 is the location parameter, specifying the location of the peak of the distribution, and r is the scale parameter which specifies the half-width at halfmaximum.

Jeffrey's Bayesian t-test
The "default" BF t-test (using a zero-centered Cauchy to describe the prior over the parameter of interest under the alternative-hypothesis) is also known as Jeffreys's Bayesian t-test and can be calculated using the conventional t-statistic (t): where r is the Cauchy the width parameter, n is the number of observations, v is n-1 (i.e., the degrees of freedom).  Nmax (20 subjects/group). BF sequential testing is the blue line with shaded area denoting ± 1 SD. Figure S9. Settings for the simulation: H1 is described by one-sided Cauchy (0,0.707), (Nstart = 12,Nmax = 50,BF threshold = 4)

. A and B) The black curve shows the proportion of studies that showed support for H1 (BF>4) during data collection, at a range of population effects (starting at no effect, D = 0). The red curve is the proportion of studies showing support for H0 (BF<¼). The blue curve is the sum of the red and black curves. C and D) shows the average number of subjects needed to reach a stopping decision at different population effects. The flat black line represents Nmax (50 subjects/group). BF sequential testing is the blue line with shaded area denoting ± 1 SD.
Figure S10. Settings for the simulation: H1 is described by one-sided Cauchy(0,0.707), (Nstart = 12,Nmax = 100,BF threshold = 4). A and B) The black curve shows the proportion of studies that showed support for H1 (BF>4) during data collection, at a range of population effects (starting at no effect, D = 0). The red curve is the proportion of studies showing support for H0 (BF<¼). The blue curve is the sum of the red and black curves. C and D) shows the average number of subjects needed to reach a stopping decision at different population effects. The flat black line represents Nmax (100 subjects/group). BF sequential testing is the blue line with shaded area denoting ± 1 SD. Figure S11. Comparison between sequential BF testing and two common NHST alpha spending approaches, using the real clinical data set from Objective 3  The O'Brian Flemming NHST alpha spending approach starts out with a conservative stopping boundary, and gradually becomes more liberal for each intermittent test. The Pocock boundary starts out with a more liberal boundary, increases slightly and then gradually decreases towards the end of the sequential testing. Assuming there is a large population effect, the Pocock and BF approaches will, on average, be able to stop at an earlier phase of the study. However, if the population effect is of smaller size, than the O'Brian Flemming approach might be more suitable, since it has a greater average chance finding a true positive towards the end of the study. If PET researchers still whishes use sequential BF testing, but is interested in obtaining similar stopping characteristics as showed by O'Brian Flemming approach, they can plan a study that starts out with a high BF threshold and then gradually decrease it for each intermittent test. Such an approach is however outside the scope of this tutorial. Figure S12 Settings for the simulation: H1 is a one-sided Cauchy(0,1), (Nstart = 8,Nmax = 30,BF threshold = 4).
A ) The black curve shows the proportion of studies that showed support for H1 (BF>4) during data collection, at a range of population effects (starting at no effect, D = 0  Figure 5 from the main text which have the same settings except for: 1) 1: ~ ℎ (0, 0.707) 1: ~ ℎ (0, 1) and 2) testing starts at 12 subjects.
Increasing the width of the Cauchy can be desirable in e.g., a paired study design, where the variance can be expected to be low, but the difference in raw scores is assumed to be similar to that from a cross-sectional design. The reason for this is that the same change in raw score will correspond to a much larger standardized effect size (see Table 2 in main text), and it is therefore sensible to specify an H1 which predicts a more extreme difference. The lower rate of false positives that results from using a higher r (e.g., 1 instead of 0.707) can be utilized by starting the sequential BF testing earlier than at 12 subjects/group. As a result, the average sample size needed to reach a decision can be further decreased. Figure S13. Recommended steps to follow in order to perform a clinical PET study using sequential BF testing, for a paired or cross-sectional design.