Dynamic methods for ongoing assessment of site-level risk in risk-based monitoring of clinical trials: A scoping review

Background/Aims It is increasingly recognised that reliance on frequent site visits for monitoring clinical trials is inefficient. Regulators and trialists have recently encouraged more risk-based monitoring. Risk assessment should take place before a trial begins to define the overarching monitoring strategy. It can also be done on an ongoing basis, to target sites for monitoring activity. Various methods have been proposed for such prioritisation, often using terms like ‘central statistical monitoring’, ‘triggered monitoring’ or, as in the International Conference on Harmonization Good Clinical Practice guidance, ‘targeted on-site monitoring’. We conducted a scoping review to identify such methods, to establish if any were supported by adequate evidence to allow wider implementation, and to guide future developments in this field of research. Methods We used seven publication databases, two sets of methodological conference abstracts and an Internet search engine to identify methods for using centrally held trial data to assess site conduct during a trial. We included only reports in English, and excluded reports published before 1996 or not directly relevant to our research question. We used reference and citation searches to find additional relevant reports. We extracted data using a predefined template. We contacted authors to request additional information about included reports. Results We included 30 reports in our final dataset, of which 21 were peer-reviewed publications. In all, 20 reports described central statistical monitoring methods (of which 7 focussed on detection of fraud or misconduct) and 9 described triggered monitoring methods; 21 reports included some assessment of their methods’ effectiveness, typically exploring the methods’ characteristics using real trial data without known integrity issues. Of the 21 with some effectiveness assessment, most contained limited information about whether or not concerns identified through central monitoring constituted meaningful problems. Several reports demonstrated good classification ability based on more than one classification statistic, but never without caveats of unclear reporting or other classification statistics being low or unavailable. Some reports commented on cost savings from reduced on-site monitoring, but none gave detailed costings for the development and maintenance of central monitoring methods themselves. Conclusion Our review identified various proposed methods, some of which could be combined within the same trial. The apparent emphasis on fraud detection may not be proportionate in all trial settings. Despite some promising evidence and some self-justifying benefits for data cleaning activity, many proposed methods have limitations that may currently prevent their routine use for targeting trial monitoring activity. The implementation costs, or uncertainty about these, may also be a barrier. We make recommendations for how the evidence-base supporting these methods could be improved.


Field
Format Comments 1 Report characteristics: to gather basic information about the reports and the trials that described methods had been used in Intervention risk category of the trials that the monitoring method(s) had been used in, according to the Organisation for Economic Co-operation and Development*; categories not mutually exclusive * https://www.oecd.org/sti/scitech/oecd-recommendationgovernance-of-clinical-trials.pdf

Field
Format Comments 2 Detail of reports' focus and scope, and any assessment of methods' effectiveness: to gather information on the type of method described, on whether it was descriptive only or also included some assessment of how well the method works and, if some assessment done, what form the assessment took. The category questions describing the type of assessment were not mutually exclusive. Focus of work Category: -Central statistical monitoring, with focus on fraud or misconduct -Central statistical monitoring, general -Triggered monitoring -Other flagging/targeting method -Other "Central statistical monitoring": methods involving statistical testing to identify outlying or unusual clinical trial centres. Reports about fraud or data fabrication differentiated because they were assumed to use different methods and different thresholds for defining 'problem centres' compared to methods looking for any type of problem.
"Triggered monitoring": use of threshold-based rules to identify problem centres (e.g. those with data return <80% or an unusually high number of serious adverse events submitted might be flagged).

If other, explain
Free text Scope of work Category: -Theory only -Association between central monitoring finding and site feature -Description/development of method -Some assessment of effectiveness "Theory only": papers discussing potential risk-based monitoring methods without any concrete evidence generation.
"Association between central monitoring finding and site feature": as a hypothetical example, papers linking high or low recruitment with the number of protocol violations, without then developing any monitoring method based on this.
"Theory only" and "Association…" papers were ultimately excluded from final results If some assessment of effectiveness, case studies presented?

Category: -Yes -No
Case studies defined as selected instances illustrating (usually narratively) how a method works. If some assessment of effectiveness, method explored on real data with no known fraud or other serious problems?
Category: -Yes -No Method tried out on real trial data without any known problems, i.e. no 'true positive' problem centres to find. Any 'positive' centres flagged through the central monitoring method might be assumed to be false positives without further investigation.

Field
Format Comments If some assessment of effectiveness, method used to find simulated fabrication/fraud?

Category: -Yes -No
Real datasets modified to simulate fabricated data, then attempts made to identify the fabricated data using the monitoring method.
If some assessment of effectiveness, method used to find known problems in real data?
Category: -Yes -No Datasets obtained from trials with known instances of fraud, data fabrication or other issues; method used to identify the problem centres (whether or not done by individuals blind to which the problem centres were).
If some assessment of effectiveness, method implemented in a trial and results of targeted on-site monitoring reported?

Category: -Yes -No
Results of on-site monitoring reported, i.e. number of (serious) findings from visits.
If some assessment of effectiveness, method implemented in a trial and effects on trial reported, in terms of cost, data quality or something else?

Category: -Yes -No
Effects of monitoring method on the trial, usually suggesting that risk-based monitoring methods reduce costs, improve aspects of trial quality, or both.
If some assessment of effectiveness, prospectively designed, controlled study to look at predictive ability of targeted on-site monitoring methods?

Category: -Yes -No
Use of method in a prospectively designed experiment aiming to assess how well it correctly identifies problem sites and excludes non-problem sites 3 Quality assessment: these fields were developed following review of the QUADAS-2 tool for quality assessment of diagnostic accuracy studies* because we suggest that sort of study shares similar potential sources of bias as the sort of study we were looking for. As it was not within the scope of our project to validate these questions as a quality assessment tool in this setting, we have not ultimately reported this information. However, it has informed our interpretation of the limitations in the existing evidence base. Free text Example of potential problem: simulated data used, but no attempt to make it reflect a possible real-life situation (e.g. extreme outliers added to data, when deliberate fabrication might involve addition of 'normal'-looking data)

Field
Format Comments If simulated data used, were outcome assessors blinded to simulation methods and details of any sites with implanted fabrication?

Free text
Example of potential problem: if outcome assessors -those using the proposed central monitoring method to identify problem centres -knew the simulation method, or knew the number of centres they were looking for, this might make it easier for them to guess which the problem centres were. If tested method using dataset with known fraud or other issues, is the choice of 'reference test' (usually source data verification) welldescribed and well-justified?

Free text
Example of potential problem: real fraud, data fabrication or other issues might be expected to have been found through on-site monitoring activities. A potential problem might be that this is not clearly described in a paper, so it is not possible to confirm how we know the 'true' status of each centre. If tested method using dataset with known fraud or other issues, are the results of the method being evaluated being assessed without knowledge of 'reference test' results? E.g. are statisticians trying to identify problem sites blinded to which were actually problematic?

Free text
Example of potential problem: if outcome assessors using the proposed central monitoring method to identify problem centres are not blinded to which are the problem centres, it is harder to say that the method alone has identified the problem centres. As defined in previously-published work*, 'unsupervised' analyses involves looking through all trial data for unusual patterns; in 'supervised' analyses, by contrast, analysts build in pre-specified limits to what is included (e.g. limits on how much data is included in the analysis, or pre-specified risk thresholds regardless of sample size).
We did not ultimately report this because a) it was not always straightforward to say whether a method was supervised or unsupervised, especially given slightly different definitions in the literature, and b) although we considered the distinction to be of some interest, we agreed it was just a way of characterising the methods we identified, rather than a key finding in our work. * Oba K. Statistical challenges for central monitoring in clinical trials: a review. Int J Clin Oncol 2016;21:28-37.

What evaluation?
Free text This was gathered to inform our understanding of the level and nature of any evaluation of methods' effectiveness. Not ultimately reported. What claims made about effectiveness?

Free text
We used this to collect quotes from each report about the effectiveness of the proposed methods. This informed our understanding of the scope of each work, but we have not ultimately reported this data. Summarise predictive value info in paper?

Free text
This was gathered to inform our understanding of how much information was in each report about the ability of the methods to correctly predict the 'true' status of each site (i.e. the classification ability, as per the fields that follow). This was for discussion purposes only and has not ultimately been reported. Category for classification info Category: -No evaluation -No information on true status -Partial A category field to describe how much information was in each report about the methods' classification ability.

Field
Format Comments -Explored through simulation -Case studies presented only -Detailed information 'True status' means whether or not each clinical trial site is confirmed to be a 'problem site' (however this is defined in each case), on the basis that central monitoring methods to flag possible problem sites are analogous to diagnostic tests.
'Partial' means information is only available on some sites (i.e. on their test results, their true status, or the total number of sites, or all of these).
'Explored through simulation' means information on statistics such as sensitivity and specificity is available for a range of simulated scenarios (though with limited or no information from reallife settings).
'Case studies presented only' means only a few, selected examples of methods' capabilities is presented.
'Detailed information' means information available to give a full (or at least detailed) picture of methods' sensitivity, specificity, and positive and negative predictive values, from specifically tested situations (as opposed to extensive simulation). Best classification results, if possible

Free text
We gathered and have reported information from each paper on the best (i.e. most successful) classification results in each report. In some cases this is reported directly from the paper, in others we calculated it from information available in the paper. This is reported, with details of any calculations, in Table  4 of our report. What classification terms mentioned?

Free text
We gathered information on use of terminology (e.g. presence or absence of 'sensitivity', 'specificity' etc) to inform our understanding of each report. We have not ultimately reported this.

Field
Format Comments Any information provided on cost/resource implications?

Free text
We gathered information on cost or resource implications from each paper. This could either be cost of developing the methods or any related computer systems, or cost implications of adopting risk-based monitoring methods, or anything else. This is briefly reported in our manuscript. Any comparison made between the centralised method and on-site monitoring, in any outcome?

Free text
We were interested to see if any authors had directly compared targeted and untargeted monitoring methods in terms of a monitoring-based outcome, such as ability to detect serious findings, or the time between protocol violation and its detection. This did not ultimately yield useful information, so we have not reported it. Does it meet any of the aims of Centralised Monitoring as defined in ICH GCP? a) identify missing data, inconsistent data, data outliers, unexpected lack of variability and protocol deviations b) examine data trends such as the range, consistency, and variability of data within and across sites c) evaluate for systematic or significant errors in data collection and reporting at a site or across sites; or potential data manipulation or data integrity problems d) analyse site characteristics and performance metrics e) select sites and/or processes for targeted on-site monitoring Category for each aim: -Yes -No We decided not to report the data from these fields because the ICH GCP aims are complex and not mutually exclusive; this made it challenging to reach agreement on which applied to each report.
Any restrictions placed on how/when method could be used?

Free text
Restrictions or limitations stated by the authors of each paper, for example if method can only be used for continuous or binary data. Not ultimately reported.

Field Format Comments
Other comments Free text Any general comments. These informed interpretation of our results.