Faecal immunochemical testing in bowel cancer screening: Estimating outcomes for different diagnostic policies

Objectives The National Health Service Bowel Cancer Screening Programme (NHS BCSP) in England has replaced guaiac faecal occult blood testing by faecal immunochemical testing (FIT). There is interest in fully exploiting FIT measures to improve bowel cancer (CRC) screening strategies. In this paper, we estimate the relationship of the quantitative haemoglobin concentration provided by FIT in faecal samples with underlying pathology. From this we estimate thresholds required for given levels of sensitivity to CRC and high-risk adenomas (HRA). Methods Data were collected from a pilot study of FIT in England in 2014, in which 27,238 participants completed a FIT. Those with a faecal haemoglobin concentration (f-Hb) of at least 20 µg/g were referred for further investigation, usually colonoscopy. Truncated regression models were used to explore the relationship between bowel pathology and FIT results. Regression results were applied to estimate sensitivity to different abnormalities for a number of thresholds. Results Participants with CRC and HRA had significantly higher f-Hb, and this remained unchanged after adjusting for age and sex. While a threshold of 20 μg/g was estimated to capture 82.2% of CRC and 64.0% of HRA, this would refer 7.8% of participants for colonoscopy. The current programme threshold used in England of 120 μg/g was estimated to identify 47.8% of CRC and 25.0% of HRA. Conclusions Under the current diagnostic policy of dichotomising FIT results, a very low threshold would be required to achieve high sensitivity to CRC and HRA, which would place further strain on colonoscopy resources. The NHS BCSP in England might benefit from a diagnostic policy that makes greater use of the quantitative nature of FIT.


Introduction
Bowel cancer (colorectal cancer, CRC) is the second most common cause of cancer death in the UK, accounting for 10% of all cancer deaths in 2017. 1 Between 2015 and 2017, there were around 16,300 CRC deaths in the UK every year, equivalent to 45 deaths every day. 1 In order to reduce mortality and incidence of CRC, the National Health Service Bowel Cancer Screening Programme (NHS BCSP) in England offers tests for the presence of occult blood in faeces free of charge every two years for men and women aged 60-74 years (inclusive).
Up to 7 June 2019, the guaiac faecal occult blood test (gFOBT) was the method used and this gives a binary result (positive or negative) for each sample. 2,3 It requires two faecal samples from each of three separate bowel motions to attain satisfactory sensitivity. In contrast, the faecal immunochemical test (FIT) gives a quantitative result in the form of micrograms of haemoglobin per gram of faeces (lg/g), and requires only a single sample. In 2014 the NHS BCSP in England performed a pilot study to examine the acceptability and diagnostic performance of FIT in two of the five regional hubs managing the established screening programme in England.
The main analysis of the FIT pilot study by Moss et al. 3 assessed the effect of varying faecal haemoglobin concentration (f-Hb) threshold on detection rate of CRC and advanced adenomas (high-risk and intermediate-risk adenomas combined, as defined by Moss et al: see the Pathology section for definitions), and on the colonoscopy rate. This information was very useful for informing the national programme, but because those with f-Hb less than 20 lg/g did not receive further investigation, it did not estimate the numbers of abnormalities missed for a given threshold. Also, it did not consider the relationship in the direction of causality: it is the abnormalities that cause bleeding, and therefore the FIT result. This paper aims to complement the previous results by: 1. Exploring the relationship between FIT results and bowel pathology using truncated regression, in both a univariate and multiple regression model, with demographic factors including age, sex and area-based socioeconomic status; and 2. Using these results to estimate proportions of bowel abnormalities the screening programme would fail to diagnose at different FIT thresholds (false negative rates); 3. Generating hypotheses for fuller exploitation of quantitative FIT measures.

Bowel cancer screening programme in England
The NHS BCSP in England started to adopt FIT in June 2019. 2 Currently, the policy is to have a single threshold indicating further diagnostic workup for those at or above the threshold, or return to routine screening for those below. The diagnostic performance of FIT for detecting CRC and adenomas depends on the threshold defined for positivity. The programme uses a threshold of 120 lg/g for referral for further investigation (Figure 1).

Study population
The 2014 FIT pilot study drew samples from the routine screening population invited by two of the five English BCSP Hubs (the Midlands and North West Hub and the Southern Hub). The study protocol pseudo-randomly assigned every 28th consecutive invitee to receive a FIT instead of a gFOBT kit. 3 Those who were offered FIT will be referred to as invitees below, and those who gave valid FIT results will be referred to as participants. There were 40,928 invitees aged 59-75 (inclusive) years old, and 27,238 participants (14,404 women and 12,834 men). Only those 2133 participants with a positive f-Hb, defined as at least 20 lg/g, were invited for further investigation (usually colonoscopy). At the end of the pilot study, 1825 participants had a definitive pathology outcome. These are referred to as complete cases below. The dataset used in this paper was extracted from BCSS (Bowel Cancer Screening System) with reference ODR_1819_103. It has been substantially updated and cleaned since the previous publication, in particular including FIT results which became available after the previous publication was written, and so will not have exactly the same numbers as previously reported. [3][4][5] Compared to the previous paper, we therefore report on two fewer invitees (

Pathology
After a colonoscopy, pathologists examined removed tissues (if any) and classified those according to the recommendations from The British Society of Gastroenterology (BSG). 6 In this paper, we re-categorised pathology outcomes defined as follows: 1. Low-risk adenomas (LRA), if removed tissues contained 1-2 adenomas and were both small (less than 1 cm in diameter); 2. Intermediate-risk adenomas (IRA), if removed tissues contained 3-4 small adenomas or 1-2 adenomas of which at least one has diameter greater than or equal to 1 cm; 3. High-risk adenomas (HRA), if removed tissues contained at least five small adenomas or three or more adenomas at least one of which is greater or equal to 1 cm in diameter; 4. Cancer (CRC) if removed tissues had characteristic of malignancy; 5. Other abnormality, to include all other unclassified abnormalities; and 6. No abnormality, if removed tissues contained no abnormalities.

Statistical analysis
In order to estimate the potential impact of deprivation on FIT results, we used the Index of Multiple Deprivation (IMD), which is the official and the most widely used measure of deprivation in England. 7 FIT results less than 4 lg/g were recoded to 1 lg/g (rather than zero due to the natural logarithm transformation later) and are referred to as undetectable f-Hb. 8 The limit of detection is 4 lg/g; that is FIT analysers (OC-Sensor DIANA, Eiken Chemical Co., Ltd, Japan) cannot distinguish samples with f-Hb lower than this from samples with no f-Hb.
We carried out univariable and multivariable regression analyses with the quantitative FIT measure (f-Hb) as the dependent and pathology outcomes as independent variables. We compared proportions of participants with undetectable f-Hb in their samples among demographic and screening episode characteristics using logistic regression.
We used truncated regression since there were no pathology data on participants with f-Hb less than 20 lg/g. 9 As a result of the skewed distribution of f-Hb, we used the natural logarithm of f-Hb as the dependent variable. Regression coefficients were then transformed back to the original scale to provide the ratio of geometric mean f-Hb for each pathology, relative to no abnormality pathology. Likelihood ratio tests were used to select the best model, with Wald tests helping to identify significant categories (p < 0.001) within a variable. Given previous observed sex-and age-specific differences, [3][4][5] the multivariable model included these variables.
We also fitted truncated regression models with only constant terms and no predictor variables, restricting analysis to each pathology category separately. These are referred to as univariable regression models below. From these results, we estimated the distributions of f-Hb for different pathology outcomes. Using these, we then estimated the proportions of abnormalities captured by different f-Hb thresholds, and by implication, the proportions missed for those thresholds. We also considered the problem from the opposite angle, calculating the thresholds required for given levels of sensitivity to CRC and HRA.
All analyses were carried out in StataMP version 15.1 on a Windows 8 platform. Table 1 shows the demographics and screening episode of the 40,928 invitees, and of the subpopulation of 27,238 participants who provided a valid FIT sample; the latter provides the dataset for analysis of the association between pathology and f-Hb. This subpopulation was characterised by relatively high proportions of non-deprived invitees (more than 50% were in the two least deprived IMD quintiles) and previous participants in CRC screening (75.1%). Table 2 summarises categories of observed f-Hb by demographic factors and screening episodes. The table gives the numbers and proportions with undetectable f-Hb, detectable f-Hb and positive f-Hb (f-Hb 5 20 lg/g), for the latter two subpopulations also giving the geometric mean and empirical 80% ranges of f-Hb. Large proportions of participants had undetectable f-Hb in all subgroups. There were significantly lower proportions with undetectable f-Hb (i.e. higher proportions with some evidence of bleeding) in the Midlands and North Western Hub participants, in males, in older participants, in more deprived populations, and in previous non-responders (P < 0.001 in all cases).

Results
Among positive cases, the overall geometric mean f-Hb was 78 lg/g, compared with 18 lg/g for participants with a detectable f-Hb. In the whole study population, FIT results varied by screening hubs, sex, age groups and deprivation index. Average f-Hb was similar between participants in the Midlands and North West Hub and those in the Southern Hub, although the latter hub had a higher proportion with undetectable f-Hb. Male participants, older participants and more deprived participants all had a higher geometric mean f-Hb. The age effect was largely due to lower numbers with undetectable f-Hb in the oldest age group. Although previous non-responders had a higher geometric mean f-Hb than either first-time invitees or previous responders, this was not statistically significant (although the previous non-respondents had a significantly lower proportion of undetectable f-Hb).
For those with f-Hb of at least 20 lg/g, geometric means were similar in all strata, with the exception of sex: males had a much higher geometric mean f-Hb in both the whole population and among the positive cases only. This suggests that most of the other demographic differences are predominantly driven by the proportions of undetectable f-Hb. However, although in this group as a whole there was no clear trend in f-Hb with age, there was a greater tendency for older subjects with no CRC or adenoma to have f-Hb of 20 lg/g or more: the proportions were 2.8%, 3.1% and 3.3% for age groups 59-64, 65-69 and 70-75 years, respectively (p ¼ 0.045).

Understanding the relationship between FIT-detected f-Hb and pathology
As noted in the Methods section, the truncated regression was carried out on the 1825 complete cases. The final multiple regression model (supplied in table S1) adjusted for age and sex suggests that participants with CRC and HRA had significantly higher f-Hb (p < 0.001). After controlling for age and sex, participants who had CRC and HRA, respectively, had log(f-Hb) approximately 3.08 higher and 1.53 higher than those with no abnormality. Backtransforming to the original scale, a participant with CRC was estimated to have f-Hb 22 times that of a participant of the same age and sex but with no abnormal pathology (on average). After adjusting for age and sex, the f-Hb of participants who had LRA or other abnormality were not statistically significant different from participants who had no abnormality pathology (Wald tests, p ¼ 0.855 and p ¼ 0.791). Table 3 gives the empirical geometric mean, and 10th to 90th percentile ranges (referred to below as 80% ranges) of f-Hb for each pathology and the corresponding estimated 80% ranges of the distribution of f-Hb, calculated from the univariable truncated regression models. We calculated 80% ranges rather than 95% ranges (used in laboratory quality control), as the latter were so wide as to be uninformative as to the concentrations characterising the central bulk of the population. If the model is a good fit, we expect 80% of cases to have observed f-Hb within the 80% range estimated from the distribution. We note that whilst the average f-Hb (geometric means) is strongly associated with pathology, the 80% ranges are very wide, particularly for CRC and HRA. To illustrate the overlap, Figure 2 shows the estimated distributions of f-Hb by pathology on the same graph. There is a reasonable separation between CRC and no abnormality and other abnormality, and a rather poorer separation of adenoma pathology from no abnormality and other abnormality. Figure 3 shows the estimated distribution of f-Hb within each pathology with histograms corresponding to observed f-Hb that are equal to or above 20 lg/g. Average f-Hb is very low for most categories, with a high degree of variation. For example, CRC having a mean and SD of 4.74 and 1.92, respectively, on the logarithmic scale corresponds to an 80% range in the linear scale of around 10-1339 lg/g.
Using regression results to estimate sensitivity of the programme to different bowel pathologies Using the univariable truncated regression results, Table 4 shows the estimated sensitivity of FIT to CRC and HRA, and the percentage of participants who would be recalled Table 2. Frequencies (proportions) of participants with undetectable f-Hb, and frequencies (proportions), geometric means and 80% empirical ranges for participants with detectable and positive f-Hb (mg/g), stratified by demographic characteristics and screening episodes.

Undetectable f-Hb a (<4 mg/g)
Detectable f-Hb a (54 mg/g) Positive f-Hb (520 mg/g)    for a number of thresholds. For example, a low threshold such as 20 mg/g results in high sensitivity for detection of CRC and moderate sensitivity for HRA, but would require almost 7.8% of participants to undergo colonoscopy. Note that specificity cannot be calculated since negative FIT did not result in further investigation. Table 5 shows estimated prevalence in different intervals, with sensitivities estimated from the model. Consequently, these differ from those presented in Table 4, which were based on observed frequencies. However, the differences are very small, indicating that the model fits rather well to the CRC and HRA data (a more detailed comparison between estimated and observed frequencies is presented in Table S2). The table also gives the sensitivity of the lower point of each interval as a threshold for further investigation. That is, a threshold of 20 lg/g would confer 82.2% sensitivity to CRC and 64.0% sensitivity to HRA. The table indicates that the current threshold of 120 lg/g has a poor sensitivity for both CRC and HRA, only correctly identifying 48.9% and 25.6%, respectively (Table 5). Further, a very low FIT threshold (40 lg/g) is required in order to detect 71.1% of CRC and 48.2% of HRA ( Table 5). The table also shows, for example, that for participants with f-Hb of 80-119 lg/g, 35.4 per 1000 (just under 4%) have CRC (more estimations on prevalence and sensitivity for all abnormalities by f-Hb are given in Tables S3-S5).

Discussion
We analysed data from 27,238 FIT participants, and carried out truncated regression on the 1825 (complete cases) participants who underwent colonoscopy as a result of a FIT result (f-Hb) of 20 lg/g and above. We estimated the influence of demographic factors and colonoscopy findings on f-Hb, and calculated the expected results of different f-Hb thresholds in terms of both detected and missed CRC and adenomas.
The higher proportion with detectable levels in older participants is consistent with the results of Clark et al. 10 In our data, the mean concentration among participants with positive results was relatively stable over age. This suggests that older participants have a greater tendency to bleeding and we might speculate that they do so regardless of presence or not of significant bowel abnormalities. Clark et al. suggest that this tendency may be a marker of systemic inflammation. 10 Others have found that older participants have more false positive FIT results at the low threshold of 17 lg/g. 11 In our data, with a threshold of 20 lg/g, there is also some evidence of this, with proportions of complete cases having no CRC or adenoma but with f-Hb of 20 lg/ g or more being 2.8%, 3.1% and 3.3% for age groups 59-64, 65-69 and 70-75 years respectively (p ¼ 0.045).
In contrast, amongst those with positive FIT results, males had a much higher geometric mean than females, and a much higher limit of the 80% range. Higher concentrations in males were also noted by Clark et al. 9 and by Ribbing et al. 12 Thus, the difference between males and females is driven in large measure by higher f-Hb of bleeding in those with positive f-Hb.
Since definitive pathology was only available for positive participants (f-Hb 520 mg/g), we used truncated regression methods to estimate the influence of bowel pathology on f-Hb and the sensitivity of different thresholds to CRC and HRA.
Using data from 1825 complete cases, we found that participants with CRC and HRA have considerably increased f-Hb, but that the variation among patients is very large. This has been observed by others. 13 Despite distinguishing upper bounds in f-Hb amongst pathology of different risks, the large overlap at intermediate and low f-Hb imposes challenges under current dichotomised screening policies, in which participants in England with f-Hb at or above a single threshold (120 mg/g) are referred for colonoscopy and participants below that concentration receive their next screen two years later. Our regression results indicate that the f-Hb threshold of 120 lg/g used in the NHS BCSP in England is likely to miss just over half of the CRC (51.1%) present at the time of sampling (Table 5). Ribbing et al. found considerably lower sensitivity at various thresholds for CRC and advanced adenoma (not the same as HRA; please refer to the paper for a precise definition) combined. 12 In subjects with f-Hb of 80-119 lg/g, which would not trigger further investigation in the current programme in England, just under 4% had CRC (Table 5). This is higher than the 3% risk threshold for a two-week wait referral for suspected cancer in symptomatic subjects. To capture 80% of CRC and around 60% of HRA, a threshold of 22 lg/g is indicated by our results (Tables 4, 5 and S5). This is consistent with Whyte et al., who concluded that, in the absence of colonoscopy capacity issues, the most cost-effective FIT strategy would be a threshold of 20 lg/g. 14 However, this would imply referring 7.5% of participants for colonoscopy (Table S5). Even prior to the COVID-19 crisis, colonoscopy capacity in England could not cope with this, and the capacity is likely to be even lower for the foreseeable future. Therefore, to maintain acceptable sensitivity to CRC and HRA, one might consider using the quantitative f-Hb more fully, 15 with different actions for different f-Hb categories, for example: • Undetectable f-Hb: delay next screen to three years; 16 • Very low f-Hb: next screen in two years; 17 • Low f-Hb: repeat screening test in three months to assess persistence of bleeding; 18 • Medium f-Hb: flexible sigmoidoscopy to examine the lower part of the colon (distal), and remove any abnormalities found, followed by a further FIT to ascertain whether the cause of the bleeding has been removed; 19 • High f-Hb: colonoscopy.
Note that we are not explicitly recommending exactly this strategy or these actions. This is simply an example of the approach one might take. More data is needed to ascertain the safety and effectiveness of such an approach, and to specify thresholds for different actions. Others have proposed varying strategies of f-Hb threshold and interscreening interval. For example, Haug et al. suggested a low threshold at first screen and a long interval to second. 20 However, Digby et al. estimated that this would lead to non-negligible numbers of cases missed, and suggested as an alternative an interval determined on the basis of concentration at the first screen. 14 Further research using this and other datasets will indicate the likely thresholds to define the above categories.
When truncated regression removes a majority of the data, estimates are less reliable. 9,21 Thus, those with a 'no abnormality' or 'other abnormality' pathology would be overwhelmingly below the threshold of 20 lg/g, and therefore estimates for these would be less reliable. This is a limitation of the present study, and renders estimation of false positive results (no abnormality pathology with f-Hb above the threshold) uncertain. Estimation of false positive and false negative rates for thresholds below 20 lg/g remains a target for the future.
It should also be noted that the absolute f-Hb results reported here pertain specifically to the OC-Sensor DIANA analyser. While the observations of associations of demographic variables and pathology with concentrations are likely to be generalisable, exact numbers will not be.
Another limitation is that models fitted did not control for factors such as villous status and location of adenomas, which are known to influence f-Hb. 13,22 These factors are also associated with risk of future CRC. Taking account of these is another target for the future.

Conclusion
This analysis shows that the current threshold of 120 lg/g in the English NHS BCSP may only correctly identify half of CRC and a quarter of all HRA in the population. In order to achieve better detection rates of bowel abnormalities, while minimising the burden on endoscopy resources, the NHS BCSP might make use of the ability of FIT to provide quantitative results to develop a multi-threshold management strategy, thereby optimising clinical resources and patient outcomes.
Authors' contributions SJL carried out the data analysis and modelling, and drafted the paper; LDS and PS contributed specific expertise in biostatistics and edited drafts of the paper; SCB contributed specific expertise in bowel cancer screening and edited drafts of the paper; OB carried out data cleaning and edited drafts of the paper; CM contributed informatics expertise and edited drafts of the paper; and SWD was responsible for study concept and edited drafts of the paper.

Data availability
Requests for data should be sent to the Office for Data Release. The authors do not have the authority to share the data.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: SJL is supported by the National

Research ethics and patient consent
As the FIT pilot was an evaluation of the new service, it was not subject to ethics committee review. All subjects were sent a pre-invitation letter explaining the FIT test and describing the evaluation project. Return of a complete FIT kit was considered to imply consent to participate.

Supplemental material
Supplemental material for this article is available online.