Multi-marker risk-based screening for prostate cancer

Objective To determine prostate cancer screening performance using prostate specific antigen (PSA) along with other markers, expressing markers in age-specific multiples of the median (MoM), and age. Methods A prospective nested case-control study used stored serum from 571 men who died of, or with history of, prostate cancer (cases), and 2169 matched controls. Total, free and intact PSA, human kallikrein-related peptidase 2 (hK2), and microseminoprotein were measured and converted into MoM values. Screening marker distribution parameters were estimated in cases and controls. Monte Carlo simulation used these in a risk-based algorithm to estimate screening performance (detection rates [DRs] and false-positive rates [FPRs]). Results Almost all (99%) cases occurred aged ≥55. Marker values were similar in cases who did and did not die of prostate cancer. Combining age, total PSA and hK2 MoM values (other markers added little or no discrimination) yielded a 1.2% FPR (95% CI 0.2–4.8%) for a 90% DR (59–98%) in men who died of or with a prostate cancer diagnosis within 5 years of blood collection (risk cut-off 1 in 20), two-thirds less than the 4.5% FPR using total PSA alone measured in ng/ml for the same 90% DR (cut-off 3.1 ng/ml). Screening performance over 10 years yielded a 33% (22–46%) FPR for a 90% DR. Conclusion Screening performed up to every 5 years from age 55 using the multi-marker risk-based screening algorithm for future prostate cancer achieves a high DR and a much lower FPR than using PSA alone, resulting in reductions in overdiagnosis and overtreatment.


Introduction
In 2017 there were 10,755 deaths from prostate cancer in England and Wales, 1 and 30,468 deaths in the USA, 2 representing the second commonest cause of death from cancer in men after lung cancer. A man's age has a strong influence on the risk of prostate cancer. Prostate specific antigen (PSA) has been found to be a discriminatory screening test for prostate cancer; the pooled result from four cohort studies totalling 49,261 healthy men showed that men aged 60-74 who had a PSA level ≥12 times the normal median level had about a 50% chance of developing clinical prostate cancer in the next three years. 3 Parkes et al. [1995] converted PSA concentrations into age-specific multiples of the median (MoM) in men who did not present or die of prostate cancer. 3 Whilst gestational age and centre-specific MoM values are widely used in prenatal screening where their utility over simply using mass units is well recognised, the utility of using PSA MoM values instead of mass units has not been adopted.
A drawback of using PSA to screen for prostate cancer is that it can be raised in the absence of prostate cancer in men who have other conditions or in men who have prostate cancer that is not aggressive. Raised PSA in such men essentially leads to falsepositive results. Several task forces have recommended against using PSA as a screening test for prostate cancer. 4 To help overcome the problem of false-positives, other markers have been proposed, such as free PSA and human kallikrein-related peptidase 2 (hK2), a molecule similar to PSA 5 but different in its enzymatic activity 6 and beta-microseminoprotein (MSP). 7 Age and the mass concentrations of total PSA, free PSA, intact PSA and hK2 (sometimes collectively referred to as the kallikrein or 4 K markers) have been incorporated into an algorithm, which has been proposed to predict significant (Gleeson score ≥7) prostate cancer. 8 The same panel has been proposed as a way of limiting the need for a biopsy in men with raised PSA levels. 9 In one study, the ratio of free to total PSA had little or no effect on improving screening performance over PSA alone, suggesting that combining all the markers may not be necessary. 10 We here examine the value of age plus total PSA, intact PSA, free PSA, hK2 and MSP MoM values in different combinations to determine the screening performance of combining the markers that are found to be informative in a risk-based screening algorithm, and do so according to screening interval. We quantify the extent to which the best combination of markers together with age improves screening performance compared to using PSA alone.

Methods
We used data from the British United Provident Association (BUPA) cohort study to (i) determine the association between PSA and future prostate cancer over different periods of follow-up (interval between blood sampling and prostate cancer registration or prostate cancer deaths recorded on the death certificate); (ii) determine the screening performance in terms of the detection rate (the proportion of men who will be affected during a specified time interval with a positive result, also known as sensitivity) and the false positive rate (proportion of men who will be unaffected during the same time interval with a positive result); (iii) determine whether the screening performance differs between men who died of prostate cancer and those who died of other causes but with a history of prostate cancer; (iv) examine several markers in addition to total PSA, namely free PSA, intact PSA, hK2 and MSP; (v) determine whether expressing PSA, hK2 and MSP in MoM values in men of the same age tested in the same laboratory is more discriminatory with respect to future prostate cancer than PSA expressed in mass units (ng/ml); and (vi) incorporate age and any of the other markers shown to be independently predictive of prostate cancer in a multivariate screening algorithm model using a person's risk of prostate cancer over a specified period of time as the screening variable. The BUPA cohort study was approved by the BUPA Ethics Committee.
Cases of prostate cancer were ascertained from national death records (to obtain deaths from prostate cancer) and from the national cancer registry (to obtain notifications of prostate cancer). A total of 571 affected men (men with prostate cancer [ICD 8 and ICD 9 codes 185, ICD 10 code C61] on death certificates or cancer registrations) were identified from the BUPA cohort study. Of these, 324 died of prostate cancer (ICD 8 and ICD 9 codes 185, ICD 10 code C61 on death certificates) and 247 died with prostate cancer (above codes not on death certificates). Gleason scores were not available. Each was matched (for age at collection of serum sample to within same 5-year group, duration of storage of the serum sample and hence length of follow-up [calendar year] and number of freeze-thaw cycles) to 4 unaffected men (men who did not die of, or with a history of, prostate cancer). Serum samples for all were retrieved from storage, shipped on dry ice, thawed and 0.3 ml pipetted into sample tubes. All laboratory analyses were conducted blind to prostate cancer status. Details of the biochemical analyses are given in the Supplementary Appendix. For 107 unaffected men there was insufficient sample to perform the assays, leading to a total of 2169 unaffected men. In some of the statistical analyses the number of cases and controls is slightly lower than the total due to missing data.
Screening performance was estimated using the multivariate Gaussian distribution method employed in prenatal screening for Down's syndrome. 11 A screen-positive result was defined as the risk of developing prostate cancer within a specified time interval greater than or equal to a specified risk cut-off. Estimates of the detection rate and false-positive rate at or greater than specified risk cut-offs for specified durations of follow-up were computed. Details of the statistical methods are given in the Supplementary Appendix. Confidence intervals for the estimates of screening performance were derived by bootstrapping, with 500 dataset replications. Table 1 shows relevant characteristics of the men in the dataset, namely the age distribution of the men, the length of follow-up between blood sampling and prostate cancer death or prostate cancer registration, and median marker levels in mass units among affected and unaffected men. Figure S1 shows the increase in median marker levels in ng/ ml with increasing age, in unaffected men together with the regression lines. Total, free and intact PSA concentrations followed log-quadratic increases with age (p = 0.001, p = 0.003 and p = 0.010 respectively); log-quadratic fits were statistically significantly better than log-linear fits. hK2 and MSP log concentrations increased linearly (both p < 0.001); log-quadratic fits were not statistically significantly better. The regression equations, which were used to calculate MoM values, are shown in the footnote to Figure S1. Table 2 shows the median MoM values in men who died of prostate cancer and those who died of other causes but with a history of prostate cancer after 0-5 years of follow-up, 0-10 years of follow-up and at any time. The MoM values decreased with increasing intervals between sample collection and length of follow-up. There were no statistically significant differences in marker values between men who died of prostate cancer and those who died having had a registration of prostate cancer. Figure 1 shows the median marker levels in MoM values according to length of follow-up in affected and unaffected men. Total PSA shows the greatest discrimination; the median MoM in affected men was 14.4 in those who died of, or with, prostate cancer within five years compared to 1.0 MoM in unaffected men. The median MoM values in men who died of, or had a registration of, prostate cancer declined respectively to 3.7, 2.4, 1.7, and 1.4 for 20 + years with increasing 5-year intervals of follow-up. The next most discriminatory marker was intact PSA followed by free PSA and then hK2. There was no statistically significant discrimination in MSP MoM values for any length of follow-up and therefore this marker is not considered further. Allowing for length of follow-up, there were no statistically significant changes in MoM values for free PSA, total PSA, intact PSA or hK2 in affected men according to age, as well as, by definition, in unaffected men.

Results
The PSA markers and hK2 MoM values showed approximately log Gaussian distributions both in affected and unaffected men with follow-up of up to 5 years and 10 years (see Figures S2 and S3). Tables S1 and S2 show the distribution parameters of these markers in affected and unaffected men (means, standard deviations and truncation limits in Table S1, correlation coefficients in Table S2). Total, free and intact PSA (log) MoM values were highly correlated in affected and unaffected men. There was less correlation between hK2 and total, free or intact PSA (log) MoM values. Table 3 shows detection rates for specified false-positive rates of marker values expressed in mass units (ng/ml) and in MoM values. The table shows higher detection rates for specified false-positive rates when the marker values are expressed in MoM values compared with mass units (ng/ml). For example, with a 3% false-positive rate the detection rates were 90% v 88% with a follow-up of ≤5 years or 49% v 40% with a follow-up of ≤10 years. Intact PSA and hK2 show the biggest improvement in screening performance with the use of MoM values, for example at a 3% false-positive rate the use of MoM values increases the detection rate of hK2 from 35% to 45%. Table S3 shows detection rates for a greater range of specified false-positive rates. Table S4 shows, in a similar way to Table 3, false-positive rates for specified detection rates. At a 90% detection rate the false-positive rate using total PSA in MoMs was 2.8% compared to 4.5% in mass units (cut-off 3.1 ng/ml). Figure 2A shows false-positive rates for a 90% detection rate for single markers and for combinations of markers with age according to length of follow-up of ≤5 years and <10 years. Total PSA and hK2 and age in combination yielded a 1.3% false-positive rate for a 90% detection rate compared with a false-positive rate of 2.2% and 23%, respectively, for total PSA and hK2 separately with a follow-up ≤5 years. Table S5 shows the false-positive rates for a range of detection rates (not just 90%). Figure 2B shows, in a similar way to Figure 2A, detection rates for a 3% false-positive rate. The most discriminatory combination of markers was total PSA and hK2, with a 93% detection rate with a follow-up ≤5 years and a 54% detection rate with a follow-up <10 years. Table S5 shows detection rates for a range of false-positive rates. Table 4 shows the detection and false-positive rates according to mass unit cut-offs (from 1 to 7 ng/ml) for total PSA and, for given detection rates, the false-positive rates based on the risk-based screening algorithm that combines total PSA and hK2 with age. Using the algorithm the false-positive rates were between about 40% and 70% lower; for example using a cut-off of 4 ng/ml the detection rate for total PSA alone was 86% and the false-positive rate was 2%, but for the same 86% detection rate the false-positive rate using the screening algorithm was 0.5%. Tables 3 and 4, and Figure 2 taken together show that screening performance is increased by (i) expressing marker values in age-specific MoM values instead of ng/ml and (ii) including age in the screening algorithm. For example at a 3% false-positive rate the detection rate for total PSA MoM with age is 93% (Figure 2) but without age is 90% (Table 3), and without expressing total PSA in MoM values is 88% (Table 3). Table 5 shows detection rates, false-positive rates and odds of becoming affected given a positive test according to specified risk cut-offs for total PSA with age and total PSA and hK2 expressed in MoMs and combined with age; with a follow-up of ≤5 years. At a 1 in 20 risk cut-off, using total PSA and hK2 together with age yields a 90% detection rate for a 1.2% false-positive rate and an odds of becoming affected over 5 years of 1:0.9. Figure 3 shows the distributions of risk estimates based on total PSA and hK2 combined with age with a follow-up of ≤5 years in simulated populations of men who did and did not die of or with future prostate cancer. There is wide separation between the two distributions illustrating the high screening performance of the screening algorithm.

Screening performance
Our results show that a multi-marker risk-based screening algorithm incorporating a man's age, total PSA value, hK2 value, with the markers expressed in MoMs yields an improved screening performance compared to using total PSA alone. The predictive effect of PSA is similar for men whose cause of death is prostate cancer and those who have non-fatal prostate cancer. Our results show that the proposed screening algorithm used to identify men with a 5-year risk of 1 in 20 (5%) achieves a 90% detection rate for a 1.2% false-positive rate, which yields an odds of becoming affected over 5 years given a positive result of about 1:1 (50%).
In this study, men who developed prostate cancer were matched with men who did not die with, or of, prostate cancer according to age, duration of storage of serum samples and number of freeze-thaw cycles to ensure comparability. The screening algorithm incorporates the discrimination of age, with the age-specific rates taken from Cancer Research UK cancer statistics based on national data; age data from our study could not be used because cases and controls were age-matched. Although hK2 alone is a relatively weak screening marker for prostate cancer it is relatively independent of PSA so the two together yield a better test than PSA alone. Our results show that MSP however is not a useful screening marker for prostate cancer, and given the performance of total PSA, adding free PSA or intact PSA, which are highly correlated with total PSA, has a negligible effect on screening performance.
It is, perhaps, surprising that in spite of a large literature on prostate cancer screening using PSA and its sub units there is a notable absence of studies that present results that properly Table 3. Prostate cancer (died of, or with a history of) detection rates (DRs) for specified false-positive rates (FPRs) of individual markers expressed in mass units (ng/ml) and as age-specific multiples of the unaffected median (MoM) for the same age according to marker and length of follow-up.   evaluate the performance of the screening tests, namely in terms of detection rates for given false-positive rates or vice versa. Often odds ratios are reported for either a one unit or one standard deviation increase in the screening marker, or by comparing groups such as the upper versus lower quintile of marker values. These fail to provide any direct measure of screening performance. While there is a numerical equivalence between an odds ratio and the detection rate for a specified false-positive rate, or false-positive rate for a specified detection rate, odds ratios need to be very large for them to translate to a useful screening test. 12,13 The area under the receiver operating characteristic (ROC) curve (AUC) is often reported, but values are difficult to interpret, can be misleading and do not provide what is needed in practice. 14 For example, the AUC covers the whole range of values from 0 to 100% detection when only a small portion of the ROC curve is relevant. A higher AUC does not necessarily mean a better screening test at the point at which the false-positive rate would be acceptable; one screening test can have a higher AUC than another test, but a lower detection rate for a given false-positive rate. In this study we provide direct estimates of screening performance using biochemical markers considered in past work, individually and in combination, together with a man's age, all of which are needed to guide screening policy. The study shows the limitation of using a fixed PSA cut-off level (e.g. 4 ng/ml) that does not take account of the steep rise of PSA with age, something that can be readily solved by expressing PSA values in MoMs. Using a fixed ng/ml cut-off of 4 ng/ml, the detection and false-positive rates were 86% and 2% respectively for a follow-up of ≤5 years, a falsepositive rate about 4 times higher than when using MoMs and age and including hK2 in the screening algorithm. The improvement in screening performance is worthwhile. The approach we have used in our multi-marker screening method, while innovative in cancer screening, is widely used in prenatal screening for Down syndrome, which relies on several markers and uses MoM values to take account of gestational age 11,15 instead of a man's age as we describe here in prostate cancer screening. This screening method has the advantage that should new markers be identified in the future, they can easily be incorporated into the screening algorithm, as can existing marker parameters be updated in the light of new information.
It has been shown that PSA levels are lower in men who have a higher body mass index (BMI). 16,17 Conceptually one could allow for the increase in a screening algorithm in the same way that we used MoM values to allow for the increase in marker levels with increasing age. In our data, all screening markers statistically significantly decreased with increasing BMI (p < 0.001), apart from total PSA which was of borderline statistical significance (p = 0.052; see Figure S4). There is negligible difference in screening performance whether or not MoM values are adjusted for BMI; for example at a 2% falsepositive rate, the detection rates for the combination of total PSA and hK2 (with age) are 91.8% and 91.9% without and with BMI adjustment respectively; at an 80% detection rate, respective false-positive rates are 2.5% and 2.4%; so there is little advantage in allowing for BMI in the screening algorithm. The relevant screening marker parameters for BMI-adjusted MoM values are given in Tables S6 and S7, and Tables S8  and S9 compare screening performance with and without adjusting for BMI.
The samples used in this study had been stored for up to about 30 years. While this may have caused degradation of the markers measured, cases and controls were matched for duration of sample storage so any relative differences in marker values between affected and unaffected men is expected to be maintained. The use of MoM value allows for a person's age but also analytical differences between assays or laboratories, and the effects of sample handling and duration of storage. For example, if the marker levels were, on average, half of what they would have been had the assays been performed immediately after blood collection, the absolute difference in average levels between affected and unaffected men would be half of what they should be but the relative difference would be the same; by using MoMs the relative differences are used in the analysis, not the absolute differences.
A limitation of this study is the small number of cases of prostate cancer with 5 or less years of follow-up which leads to wide confidence intervals around estimates of screening performance. However, the median PSA MoM values are consistent with the pooled results from an earlier publication based on the results from four cohort studies (the BUPA study with data used in this study, the US CLUE study, the Finnish North Karelia study and the Finnish Social Insurance Institution study). 3 The PSA MoM values in cases with <3 and 3-5 years of follow-up were 23 and 4 respectively compared with 14.4 in this study,  indicating that our results are reasonably robust Also, beyond 5 years of follow-up there is still discrimination in PSA between men who died of or with a history of prostate cancer and men who did not, with an exponential decrease in MoM values, adding weight to the observed high PSA MoM values observed up to 5 years follow-up. Another limitation is having only baseline measurements of the biochemical markers. Serial measurements would have allowed the assessment of the rate of change of markers in detecting future prostate cancer, which may improve screening performance.

Impact on prostate cancer mortality
Although we have shown that the screening performance of PSA was improved by using a multi-marker risk-based algorithm, the results of this study do not provide the full information that such screening is worthwhile. It is also necessary to show that medical intervention following a positive screening test reduces mortality from prostate cancer. Four randomised controlled trials have been carried out with mixed results. The earliest trial, the Quebec trial, published in 2004, reported as the main result a 62% decrease in prostate cancer mortality after 8 years of follow-up (95% CI 25% to 81% decrease). 18 This result was, however, based on an "on-treatment" analysis while the intention-to-treat analysis showed no statistically significant effect of PSA screening, a result that cannot be regarded as negative because there was poor adherence to the randomised allocation. The remaining three trials had longer periods of follow-up; the European ERSPC trial (16 years follow-up), 19 the American PLCO trial (15 years) 20 and the UK CAP trial (10 years). 21 Neither the CAP nor PLCO trial showed that PSA screening reduced prostate cancer mortality. This is, however, not surprising because in the CAP trial 60% of men randomised to screening did not receive it, and 10-15% of men randomised to not receive screening did receive it. In the PLCO trial there was even less adherence to the randomised allocation, in which about 50% of those randomised to not receive screening had either already had it before being randomised or did so during follow-up. The ERSPC trial did not experience such poor adherence to the randomised allocation. It provides the only reliable evidence of the effect of PSA screening on prostate cancer mortality. After 16 years of follow-up, based on an intention-to-treat analysis, the reduction in prostate cancer mortality was 20% (95% CI 11% to 28%). 22 Using an on-treatment analysis the reduction in prostate cancer mortality was 25% for men who were screened once, and 48% for men who were screened more than once (screening interval 2-4 years).

Overdiagnosis
Overdiagnosis is a recognised problem in screening for cancers and this is particularly the case in screening for prostate cancer. Overall in our study, 43% (247/569) of men died with a history of prostate cancer, without prostate cancer being the cause of death. Among men with <10 years of follow-up the proportion was 35% (16/46) and among men with ≤5 years of follow-up it was 33% (8/12). Not all such cases will represent overdiagnosis; some will have had cancer clinically identified and successfully treated with the individuals concerned dying of causes other than prostate cancer. Among men who died with prostate cancer it is not possible to distinguish overdiagnosis cases from cases that benefitted from the early diagnosis; however, cases were likely to have been detected clinically rather than through PSA screening because systematic PSA screening has not been instituted in the UK. Even though the men may not have died of prostate cancer they would have had sufficient symptoms to seek medical attention. Our data set an upper bound of about 35% for the proportion of screenpositive overdiagnoses. The true proportion could be much lower. While screening will reduce prostate cancer mortality there is a need to consider the adverse effects of treatment in both true and false positives. 23,24 Screening practice The ERSPC used a single PSA cut-off (3.0 ng/ml) to define men as being screen positive. With screening based on the multi-marker risk-based algorithm described in this paper, efficacy would be improved and hence the expected reduction in prostate cancer mortality would be greater than that achieved in the randomised trials, with many fewer falsepositives and therefore less adverse effects from treatment. Our data indicate that a reasonable age to start screening would be 55, given that 99% (565/569) of affected men died of or with a history of prostate cancer aged 55 or over, and our data suggest a screening interval of up to 5 years, i.e. a 3-year interval would be reasonable. Figure 4 shows two examples of screening reports, one screen positive and one screen negative, which could be used in practice. Screening using the algorithm described in this paper would, using a risk cut-off of 1 in 20 (Table 5), identify about 90% of future deaths from, or with a history of, prostate cancer within a 5-year timeframe that would otherwise occur with a 1.2% false-positive rate. The effect on prostate cancer mortality would be expected to be greater than the 20% observed in the ERSPC trial because the multi-marker risk-based screening algorithm is likely to detect more cases than PSA alone.
Our data permitted an analysis of the effect of a "once-only" PSA test in which the risk may be so low that further periodic testing was unnecessary. We examined this strategy by identifying men with risks of 1 in 5000 or less, observing the number of missed cases at the time and the number of periodic screening examinations avoided in unaffected men. Over the intervals 0-5, 6-9, 11-14, 15-19 and 20 + years, the proportion of men in whom prostate cancer would be missed was 0%, 3%, 13%, 18% and 36% respectively, and the corresponding proportion of screening examinations avoided in unaffected men was respectively 30%, 27%, 30%, 34% and 47%. This analysis shows that even a very low risk of prostate cancer at an initial screening examination does not exclude the risk of having prostate cancer later. Screening performance could be improved by taking account of a previous result when interpreting a subsequent result, as has been adopted in antenatal screening for Down's syndrome. 25

Long-term biological implication
Of biological interest is the observation that PSA levels in affected men were statistically significantly elevated prior to their diagnosis of or death from prostate cancer, by as much as 30 years when the mean PSA was about 40% higher, indicating that the inception of prostate cancer arises many years before clinical presentation. However, beyond five years the magnitude of the predictive effect is too small to be of value in screening, and certainly too small after 10 years.

Conclusion
Our results show that if screening is to be carried out it should be done using a multi-marker risk-based screening algorithm instead of a fixed PSA cut-off level. At minimal extra cost this would achieve a high detection rate and, importantly, compared with conventional PSA-only based screening would reduce the false-positive rate by about three-quarters. This, in turn, would reduce the number of unnecessary prostate biopsies and operations, and reduce the overdiagnosis and overtreatment associated with prostate cancer screening.