Estimation of overdiagnosis using short-term trends and lead time estimates uncontaminated by overdiagnosed cases: Results from the Norwegian Breast Screening Programme

Background Estimating overdiagnosis in cancer screening is complicated. Using observational data, estimation of the expected incidence in the screening period and taking account of lead time are two major problems. Methods Using data from the Cancer Registry of Norway and the Norwegian Breast Cancer Screening Programme, we estimated incidence trends, using age-specific trends by year in the pre-screening period (1985–95). We also estimated sojourn time and sensitivity using interval cancers only. Thus, lead time estimates were uncontaminated by overdiagnosed cases. Finally, we derived estimates of overdiagnosis separately for all cancers, and for invasive cancers only, correcting for lead time, using two different methods. Results Our results indicate that overdiagnosis of all cancers, invasive and in situ, constituted 15–17% of all screen-detected cancers in 1996–2009. For invasive cancers only, the corresponding figures were -2 to 7% in the same period, suggesting that a substantial proportion of the overdiagnosis in the Norwegian Programme was due to ductal carcinoma in situ. Conclusion Using short-term trends, instead of long, prior to screening was more effective in predicting incidence in the screening epoch. In addition, sojourn time estimation using symptomatic cancers only avoids over-correction for lead time and consequently underestimation of overdiagnosis. Longer follow-up will provide more precise estimates of overdiagnosis.


Introduction
Overdiagnosis in the context of cancer screening is the diagnosis, as a result of screening, of cancer which would not have been diagnosed in the lifetime of the host if screening had not taken place. 1 An ideal estimate of overdiagnosis could be derived from a randomized trial of screening in which the control group was never screened and both groups were followed up to 100% expiry. In the absence of trial data, a way to estimate overdiagnosis is from trends in observational data on national or regional incidence of breast cancer, in conjunction with the time of introduction of screening. [2][3][4][5] Researchers often estimate trends in incidence prior to screening and project these to predict incidence during the screening period. An excess between the observed and the predicted incidence may be partly attributable to overdiagnosis. However, some of the excess will also be due to lead time, the diagnosis as a result of screening of cancers which would otherwise have been diagnosed symptomatically some years later.
There are two major problems to be overcome in estimation of overdiagnosis from observational data: estimation of the incidence to be expected in the absence of screening and taking account of lead time. 6 To be effective, screening has to detect substantial numbers of cancers a number of years earlier than they would have been diagnosed due to symptoms, so there is inevitably an observed excess incidence in a screened population. To separate the excess due to earlier diagnosis from that due to overdiagnosis requires either long follow-up or estimation of the likely lead time of the screen-detected tumours. It is desirable that the lead time estimates should not be from screen-detected cancers, as these will include overdiagnosed tumours. 7 The lead time is a function of the mean sojourn time, the duration of the preclinical screen-detectable period.
In this paper, we used data from the Cancer Registry of Norway and the Norwegian Breast Cancer Screening Programme (NBCSP) to estimate overdiagnosis. We compared observed and expected cancers in the screening programme adjusted for trends in incidence and lead time. The estimates of lead time were derived entirely from interval cancers, and therefore do not include any overdiagnosed cases. Thus, there was no over-correction due to overdiagnosed cancers being used in the lead time estimates.

Data and methods
The NBCSP started in November 1995, offering biennial two-view mammography to women aged 50-69, a population varying around 500,000 women. The programme began in four of the 19 counties in Norway and achieved nationwide coverage of invitation in 2005. Women receive a personal invitation by post every two years, regardless of their cancer history. 8 Mammography is carried out in specialist breast centres, and mammograms are double read. In 1995, only 956 screens took place and there were only three screen-detected cancers. We therefore included 1995 in our nominal pre-screening period. By the end of 2000, 39% of the eligible population had been screened at least once. By the end of 2005, the figure was 88%. Attendance at screening varies around 76%. 9 Data were supplied by the Cancer Registry of Norway under strict confidentiality and non-disclosure conditions. We obtained data on breast cancers, invasive and in situ from the Cancer Registry of Norway, including age at and date of diagnosis, from 1953 to 2009 (ductal carcinoma in situ (DCIS) was only registered from 1993 onwards). The NBCSP provided data on detection mode (outside of the screening cohort, screen detected, interval cancer, nonattender, not invited due to upper age limit and not invited as opted out). From the NBCSP, we had data on all screening invitations and attendances from November 1995 to December 2009. We also had tabular data on the resident female population in Norway by age and calendar year, as estimated in January every year. Age was calculated by subtracting the date of birth from the relevant calendar time.
We first estimated log-linear trends in incidence rates, per individual calendar year within each five-year age group from 50-54 to 80-84, using data from years 1985-95, by fitting a Poisson regression model in each age group of the form where c is the number of cases in a given year, x is the year and P is the person-years at risk within that year. Thus, b is the trend in increasing log incidence with time.
Duffy et al. noted that long-term pre-screening trends did not give good prediction of the incidence in the screening period. 10 We therefore followed the approach of Moller et al. 11 and used only the 11 years pre-screening, 1985-95. We fitted Poisson regression models to these within each age group as noted above. The numbers of cases and person-years by year and age group used to estimate the trends are shown in Table 1. We projected the trends b in the model above to give predicted incidence rates by age group in the periods 1996-2000, 2001-5 and 2006-9.
Day 12 derives the expected incidence of symptomatic cancer in the year following a screen as where I is the expected incidence in the absence of screening, S is the screening sensitivity and F is the distribution function of the sojourn time. The first component is the incidence of cancers which have entered the preclinical screen detectable period after the screen and progressed to symptomatic disease by one year from the screen. The second component is the incidence of cancers which were already in the preclinical screen-detectable phase at the time of the screen but which were missed by the screen (hence the 1-S in the equation), then subsequently progressed to symptomatic disease within one year of the screen. The above simplifies to If the sojourn time is distributed as exponential with mean 1/, which fits breast cancer data reasonably well, 13 this becomes

& '
This differs slightly from the more complex formulae in Duffy et al. 14 as the latter apply to a general time t, not necessarily one year, and assume that tumours can remain in the preclinical detectable phase for several rounds of screening and be missed at each successive round. If we have c symptomatic cancers occurring in the year after a screen of N subjects, the log-likelihood, assuming a Poisson distribution, is We maximized this log-likelihood with respect to S and with three realizations of c and N -the numbers of interval cancers within a year of screening and numbers of women screened in each of the three periods 1996-2000, 2001-5 and 2006-9. We estimated I as the expected incidence projecting the pre-screening trend in incidence from the 11 years prior to screening as noted above. It was not possible to derive closed-form maximum likelihood estimates, so we derived them by calculating all possible values of the log-likelihood over a grid of values of and S. 13 From the log-likelihood, we derived profile likelihood confidence intervals on S and . 15 The estimation of mean sojourn time and therefore lead time was entirely from data on symptomatic cancers, and therefore did not include overdiagnosed tumours. The estimation was carried out separately for the five-year age groups 50-54, 55-59, 60-64 and 65-69.
We estimated overdiagnosis by two methods.

Method 1
First, we calculated the excess numbers of cancers diagnosed in ages 50-69 in 1996-2009, compared with the expected numbers from the trends in the pre-screening periods, minus any deficit in ages 70-84 compared with expected numbers from the pre-screening trends. We then used the sojourn time estimates to further subtract from the excess any screen-detected cancers expected to be symptomatically diagnosed after the period of observation

Method 2
The second method of estimation used the fact that the expected number of screen-detected cancers at a prevalent screen is N p IS and the expected number at an incident screen is where N p is the number of prevalent screens, N i the number of incident screens and t the interscreening interval, in this case two years. The last formula simplifies to The formula above differs from the round-specific formulae in Duffy et al. 14 for two reasons. First, because the sensitivity and sojourn time estimates are explicitly estimated from non-overdiagnosed cancers, the formula does not include a term for overdiagnosed cancers. Second, we made the simplifying assumption that a common incidence screen detection rate would apply, based on the steady-state estimate of the programme sensitivity, that is the proportion of incident cancers expected to be screen detected. 16 The mathematical details are given in the Appendix, available online. If we then subtract the expected numbers at prevalent and incident screens from those observed, the remainder is an estimate of the overdiagnosed cases.
These methods are best seen by illustration, as in the results below. We present, in order

Results
All cancers Method 1. We first estimate overdiagnosis from all cancers, invasive and DCIS. Table 2 shows the observed numbers of breast cancers by age and period from 1996 to 2009, and expected numbers calculated by extrapolation of the annual age-specific log-linear trends in 1985-95, for ages 50-84. There were substantial excesses of cancers in the age groups 50-69 and smaller deficits at ages 70-84. The excesses at ages 50-54, 55-59, 60-64 and 65-69 were, respectively, 1108, Although not all of the women in these cohorts will have actually been exposed to screening, it is worth noting that this constitutes 93% (383/410) of the deficit above the age range for screening, suggesting that this deficit is indeed chiefly due to cancers detected earlier by screening. Table 3 shows the interval cancers arising within one year of a screen, number of screens prior to the interval cancer incidence and expected incidence from the extrapolated 1985-95 trends, by age group and period, with the maximum likelihood estimates of and S derived from these values. Table 4 shows the numbers of screen-detected cancers by age group and period, and the proportions and numbers of these expected not to arise symptomatically until after the end of 2009. For example, of the 480 screen-detected cancers diagnosed at ages 50-54 in 1996-2000, the expected percentage to arise symptomatically after the end of 2009 is 100 Â e À(11À5Â0.33) ¼ 2.25%. The expected number which would not have been diagnosed until after the period of observation is therefore 480 Â 0.0225 ¼ 11 cancers.
The total excess cancers diagnosed at ages 50-69 over that expected from pre-screening trends was 5313 (1108 þ 1280 þ 1197 þ 1728). Subtracting the 410 deficit observed at ages 70-84 and the 3371 screen-detected cancers expected to arise symptomatically after 2009 gives a lead time adjusted excess of 1532 cancers. This may be regarded as an estimate of the number of overdiagnosed cancers, but there are uncertainties and qualifications to this (see 'Discussion' section). This represents 5% of the 28,945 cancers diagnosed in women aged 50-84 between 1996 and 2009; 8% of the 20,347 cancers diagnosed in women in the screening age range, 50-69, in the same period; and 15% of the 10,014 screen-detected cancers. A woman attending all 10 screens from age 50 to 69 would have roughly a 5.4% chance of a screen-detected cancer, given the average detection rate of 5.4 per thousand. Thus, she would have a risk of an overdiagnosed tumour of 8 per thousand (0.15 Â .054).

Method 2.
To estimate overdiagnosis by our second method, we need the numbers of prevalent and incident screens by age group and period, in our screening period 1996-2009. Table 5 shows the numbers of prevalent and incident screens, and the expected yields of cancers from these, by age and period. The expected numbers of cancers are calculated as described above. For example, for age group 50-54 with estimated as 0.33, S as 0.88 and The total number of screen-detected cancers expected was 3872 þ 4448 ¼ 8320. Subtracting this from the 10,014 observed screen-detected cancers gives 1694 cancers estimated to be overdiagnosed, although again there are uncertainties and qualifications to this (see 'Discussion' section). This would represent 6% of cancers diagnosed at ages 50-84, 8% of cancers diagnosed at ages 50-69 and 17% of screen-detected cancers. This would translate to an absolute risk of nine per thousand in a woman attending all scheduled screens in the programme.

Invasive cancers only
Method 1. We then estimate overdiagnosis from invasive cancers only. Table 6 shows the observed numbers of invasive breast cancers by five-year age and period groups from 1996 to 2009; expected numbers calculated by projecting the annual age-specific log-linear trends in 1985-95 and person-years for ages 50-84. As with the total cancers, invasive and DCIS, significant excess numbers of invasive cancers were observed in the screening age groups 50-69, and smaller deficits above the screening age groups 70-84. The excesses at ages 50-54, 55-59, 60-64 and 65-69 were, respectively, 935, 1020, 885 and 1360. The deficits at ages 70-74, 75-79 and 80-84 were 288, 124 and 23. Table 7 shows the invasive interval cancers diagnosed within one year of a screen, numbers of screens and expected annual incidence from pre-screening trends, by age and period, with the maximum likelihood estimates of and S derived from the interval cancer data.
The numbers of invasive screen-detected cancers by five-year age and period groups and the percentages and numbers of invasive screen-detected cancers not to have been diagnosed symptomatically until after 2009 are shown in Table 8.
The total excess of invasive cancers diagnosed at ages 50-69 over that expected from pre-screening trends was 4200 (935 þ 1020 þ 885 þ 1360). Subtracting the deficit of 435 cancers observed at ages 70-84 and the 3190 screendetected cancers expected to be diagnosed symptomatically after 2009 gives a lead time adjusted excess of 575 cancers. This represents 2% of the 26,159 invasive cancers diagnosed in women aged 50-84 between 1996 and 2009; 3% of the 17,933 invasive cancers diagnosed in women in the screening age range, 50-69, in the same period; and 7% of the 8269 invasive screen-detected cancers. This would mean an absolute risk of three per thousand of an overdiagnosed invasive tumour in a woman attending all scheduled programme screens.

Method 2.
To estimate overdiagnosis by our second method, we again use the numbers of prevalent and incident screens by age group and period, in our screening period 1996-2009. Table 9 shows the numbers of prevalent and incident screens, and the expected invasive cancers diagnosed from these, by age and period.
The total number of invasive screen-detected cancers expected was 4295 þ 4153 ¼ 8448. Subtracting this from the 8269 observed invasive screen-detected cancers gives a deficit of 179 cancers. This suggests that there is no overdiagnosis of invasive cancers only. Because the first method gave an estimate of 575 cancers overdiagnosed (7% of screen detected), the true value is likely to lie between the two.

Discussion
Overdiagnosis in cancer screening is notoriously difficult to estimate. As is common practice in the physical sciences, when a quantity is difficult to measure, we measure it more than once and by different methods. Both methods took account of lead time effects and (relatively) shortterm pre-screening incidence trends. Our first method calculated the total observed cancers in the screening period and age range, and subtracted from these the total expected from pre-screening trends, the deficit observed above the screening age range and the number of screendetected cancers which would have been expected to arise symptomatically only after the period of observation, but the diagnosis of which was brought forward to our period of observation by lead time. The result of this subtraction   Table 7. Numbers of invasive interval cancers within one year of screening, numbers of screens and expected annual incidence from prescreening trends, by age and period, with the maximum likelihood estimates of and S from the interval cancer data.

Age Period
One-year interval cancers was our first estimate of overdiagnosis. The second method took advantage of the fact that only screendetected cancers can be overdiagnosed. It calculated expected numbers of screen-detected cancers at prevalent and incident screens, based on underlying incidence projected from pre-screening trends, estimated screening sensitivity and sojourn time. The excess of total observed screen-detected cancers over total expected gave a second estimate of overdiagnosis. The estimates from Methods 1 and 2 are not independent, being based on the same estimates of sensitivity and mean sojourn time. Thus, it might be expected that they would be of similar magnitude. One might argue that Method 2 is to be preferred as being the more direct. However, when a quantity can never be measured perfectly, it is desirable to measure it more than once, using different methods. The use of projected incidence rates from the prescreening period to estimate the underlying incidence has a crucial rationale in two areas. First, it means that there is an estimate of excess incidence compared with an independent estimate of the expected incidence in the absence of screening. Second, it affords estimation of sojourn time using cancers which were not screen detected (and therefore by definition not overdiagnosed), in that in addition to using only interval cancers from the screening period, the underlying incidence estimate was derived from prescreening data. Thus, we avoided over-correction for lead time (and consequent underestimation of overdiagnosis) arising from use of lead time estimates which include overdiagnosed cancers. 7 These results suggest that overdiagnosis of all cancers, invasive and in situ, constituted 5-6% of cancers diagnosed in women aged 50-84 in 1996-2009 and 15-17% of screen-detected cancers in the same period. For invasive cancers alone, the corresponding figures were 0-2% of invasive cancers diagnosed at age 50-84 and 0-7% (indeed one estimate was À2%) of screen-detected cancers. This suggests that most of the overdiagnosis in the Norwegian programme was due to DCIS. There are a number of qualifications to these estimates. First, while the estimates of tend generally to be smaller (implying longer lead times) for older subjects, they do not fall monotonically with age. Similarly, we did not observe a clear trend of increasing sensitivity with age. The restriction to interval cancer rates as the data resource for estimation probably adds an element of uncertainty. Second, we had to make the assumption that sojourn time in interval cancers is the same as sojourn time in a general unscreened population. Due to the converse of length bias, interval cancers may have a shorter sojourn time than the general tumour population. If this is the case, however, our estimates will be conservative, so they will not lead to underestimation of overdiagnosis. Third, it would be useful to have a longer period of screening exposure to study, which would give better estimates of the deficit, if any, after screening stops.

Number of screens
Some unusual observations arise in the data. First, the number of cancers at ages 55-59, especially but not exclusively in 2001-2005, is particularly high (Tables 2, 4, 6 and 8). This is largely due to the considerable amount of screening activity, especially incident screening, in this age group (Table 5). Second, under our second method, there was a higher expected number of screen-detected invasive cancers than the expected total. This was due to a particularly high estimated number of prevalent screen cancers at ages 60-64, which was in turn due to the very low estimate of for this group (Tables 7 and 9). The upper confidence interval on for this group would reduce the expected screen-detected invasive cancers to well below the expected number for total cancers invasive and in situ. This suggests that uncertainty in estimation  of , and sensitivity of expected numbers to the estimate, is a limitation of this study. However, it should be noted that whatever method is used to estimate overdiagnosis, the longer the period of observation, the better. 17 Given that the definition of overdiagnosis pertains to the lifetime of the patient, long follow-up is clearly desirable. In our data, we have relatively little person-time in women exposed to screening but who are now above the screening age range. A further five years of observation would yield considerable data on women in whom screening has stopped, an invaluable data source for estimation of overdiagnosis. A target for the future is to investigate whether the post-screening deficit occurs earlier for the four counties which started screening earlier than in the rest of Norway. 8 Excess incidence tended to be highest in the oldest screening age group, 65-69. This is consistent with overdiagnosis being greater at older ages, due to shorter future life expectancy and longer lead times. Interestingly, there was not a strong difference in proportions of in situ tumours by age (Tables 2 and 6). In 1996-2000, 12% of tumours in the 50-54 group were in situ compared with 10% at ages 65-69. The corresponding estimates for 2001-2005 were 13% versus 11%, and for 2006-2009 15% versus 13%. This suggests that there is a greater proportional contribution of invasive cancers to the higher overdiagnosis rates at older ages. We also used data on the use of hormone replacement therapy to predict the incidence rates in the screening period, but we obtained similar results. Our analysis was restricted to women of screening age (50-69). Whilst we acknowledge that some screen-detected cancers could occur outside of this age range, in our dataset, only 152 (1.5%) of the screendetected cancers were diagnosed below age 50, and only 174 (1.7%) at ages 70 or more, thus with so few screendetected cancers occurring outside of 50-69, we believe that only including women of screening age 50-69 is most appropriate. Our second method of estimation indicated that overdiagnosis was mainly a phenomenon of incident rather than prevalent screens, which is unusual. 1,2 There were in total 546,213 prevalent screens, resulting in 2860 (5.2 per thousand) invasive cancers and 625 (1.1 per thousand) in situ, a total of 3485 cancers. There were 1,310,218 incident screens, with 5409 (4.1 per thousand) invasive cases detected and 1120 (0.9 per thousand) in situ, a total of 6529. The total expected prevalent cases was 3485, exceeding the observed numbers, whereas the total expected incident cases was 4448, suggesting a considerable excess of observed incident cancers. This may indicate a low sensitivity at the start of the programme, improving with time, as has been observed elsewhere. 18 Also, the absolute number of DCIS cases diagnosed at incident screens was approximately double the number diagnosed at prevalent screening, and the percentage of screen-detected cancers which were DCIS was the same in incident and prevalent screens (details available from the authors). This contrasts with other programmes in which the proportion of DCIS is lower at incident screens. 19 Our estimates of overdiagnosis are rather higher than those estimated by Njor et al. in the Danish breast screening programme. 20 Also, inclusion of DCIS considerably increased our estimates, but did not significantly change estimates in the Danish programme. The first difference may be due to the longer follow-up since the start of screening in the Danish estimates. We suspect that with longer follow-up of the Norwegian programme, there will be greater opportunity to observe post-screening deficits, and more modest estimates of overdiagnosis will emerge. In considering the greater influence of DCIS in the Norwegian programme, it is worth noting that in the screening period in Norway, 9.6% of cancers were DCIS, whereas in Denmark the figures were 5.4% in Copenhagen and 5.8% in Funen. 20 There may have been more aggressive workup of calcifications leading to greater diagnosis of DCIS in the Norwegian programme.
Overall our results indicated 1532-1692 cancers, invasive and in situ, overdiagnosed. This amounts to 15-17% of screen-detected cancers, and with the 1,856,431 screening episodes, one overdiagnosed cancer per 1100-1200 screening episodes, or one overdiagnosed cancer per 111-112 women attending all 10 scheduled screens between ages 50 and 69. The corresponding figures for invasive cancers only were -2 to 7% of screen-detected cancers, that is estimates ranging from no overdiagnosis at all to 575 overdiagnosed cancers, one per 3200 screening episodes. These figures require confirmation with longer follow-up in the screening period.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was funded by the Research Council of Norway (project number 189520/V50).