A review of available software for adaptive clinical trial design

Background/aims: The increasing cost of the drug development process has seen interest in the use of adaptive trial designs grow substantially. Accordingly, much research has been conducted to identify barriers to increasing the use of adaptive designs in practice. Several articles have argued that the availability of user-friendly software will be an important step in making adaptive designs easier to implement. Therefore, we present a review of the current state of software availability for adaptive trial design. Methods: We review articles from 31 journals published in 2013–2017 that relate to methodology for adaptive trials to assess how often code and software for implementing novel adaptive designs is made available at the time of publication. We contrast our findings against these journals’ policies on code distribution. We also search popular code repositories, such as Comprehensive R Archive Network and GitHub, to identify further existing user-contributed software for adaptive designs. From this, we are able to direct interested parties toward solutions for their problem of interest. Results: Only 30% of included articles made their code available in some form. In many instances, articles published in journals that had mandatory requirements on code provision still did not make code available. There are several areas in which available software is currently limited or saturated. In particular, many packages are available to address group sequential design, but comparatively little code is present in the public domain to determine biomarker-guided adaptive designs. Conclusions: There is much room for improvement in the provision of software alongside adaptive design publications. In addition, while progress has been made, well-established software for various types of trial adaptation remains sparsely available.


Introduction
Classically, clinical trials have used fixed-sample designs.In this approach, a trial is designed, carried out using the design, and then the acquired data are analyzed on trial conclusion.In recent years, however, stagnation in the number of products submitted for regulatory approval (Hay et al, 2014), along with the escalating costs of drug development (DiMasi et al, 2016), has led the clinical trials community to seek new solutions to improving the efficiency of clinical research.One suggestion that has received much attention, is that a break from fixed-sample designs is required in order to make use of the pertinent information that is acquired over the course of a trial's progress.That is, to more regularly employ adaptive designs (ADs), which permit data-dependent modifications to be made to a trial's conduct through a series of prospectively planned interim analyses of the accumulating data.Indeed, both the US Food and Drug Administration and the European Medicines Agency have recognized that ADs could be key to the future of drug development (European Medicines Agency, 2007;US Food and Drug Administration, 2018).
With this increased interest, there has been an expansion in the publication of statistical methodology that facilitates the AD of clinical trials.Methods for carrying out numerous types of adaptation are now available, e.g., to drop particular treatment arms from further recruitment, to refine a trial's sample size, or to restrict future allocation to particular patient sub-groups.For an overview, several monographs have been published (Chow and Chang, 2011;Wassmer and Brannath, 2016;Yin, 2013).Furthermore, guidance is now available on when and why ADs may be useful, as well as on how to run such studies (Korn and Freidlin, 2017;Pallmann et al, 2018;Thorlund et al, 2018).Recommendations on how to report adaptively designed clinical trials are also under development (Dimairo et al, 2018).
However, the actual number of trials that have used ADs remains small.A recent review by Hatfield et al (2016)  Accordingly, much research has also been conducted in order to identify and describe the potential barriers to the increased use of ADs (Chow and Corey, 2011;Cofey et al, 2012;Dimairo et al, 2015a;Dimairo et al, 2015b;Jaki, 2013;Kairalla et al, 2012;Meurer et al, 2016;Morgan et al, 2014;Quinlan et al, 2010;Love et al, 2017).Numerous barriers have since been identified, such as a lack of available expertise in AD, the requisite length of time required for trial design when using an AD, a fear AD would introduce operational biases, and inadequate funding structures.Here, our focus is on an additional barrier, which has been noted by several of these reviews: a lack of easily accessible, well-documented, user-friendly software for AD (Chow and Corey, 2011;Cofey et al, 2012;Dimairo et al, 2015a;Dimairo et al, 2015b;Jaki, 2013;Kairalla et al, 2012;Quinlan et al, 2010).
The provision of software for ADs is particularly important because, relative to fixed-sample designs which often require only simple calculations, the complexity of ADs makes computational investigation of such methods typically a necessity.With the proliferation of software, it has been argued, project teams around the globe will be empowered to compare and contrast different designs in order to make informed decisions about the most appropriate design for their trial, and ultimately the frequency of appropriate AD use will increase.There have been recommendations that, wherever possible, software for novel AD methodology should be made available alongside statistical publications (Dimairo et al, 2015b).
Fortunately, therefore, several reviews of available software for ADs have now been presented.Zhu et al (2011) provided an overview of software for group sequential design, whilst Timofeyev (2014), Wassmer and Brannath (2016), and Wassmer and Vandemeulebroecke (2006) all provided more general overviews of software for ADs.However, each of these has concentrated on describing what software is available, focusing on established packages from a high-level perspective, and giving particular attention to stand-alone proprietary solutions such as East and AddPlan.
Here, our focus is directed toward two different aims.The first is to investigate the provision of usercontributed code and software for designing, conducting and analyzing trials using ADs in scientific publications.We review articles from a variety of journals that publish AD methodology, assessing how often code/software are provided alongside publications, and how these results compare with the current policies of these journals.Secondly, we assess which AD features are supported by available user-contributed programs for use in R, SAS, Stata, and other programming languages that are popular in the trials community.Since the abundance of user-written code makes it challenging to keep track of available solutions to particular design and analysis problems, we review several databases, including CRAN, SSC, and GitHub, in order to identify which design features and trial phases have been addressed heavily, and those that may require further computing resources to be provided.
We proceed by describing the methods behind our literature review, before detailing our findings on provision of code alongside AD methodology publications.We then detail identified available solutions in R, SAS, and Stata by type of adaptation, before discussing the current state of software for the AD of clinical trials.

Review protocol
Here, we summarise the most important points behind our literature and repository review.Further details are given in the Supplementary Material.

Review aims
• To determine the frequency with which requisite computer code is made available alongside publications relating to the AD of clinical trials, and further classify this availability according to the archiving method and code completeness.
• To determine the most popular programming languages used within the AD community.
• To determine the degree to which authors who state computer code is "available upon request", are able to respond with said code following an e-mail request.
• To identify and describe user-written code relating to the AD of clinical trials, with a focus on R, SAS, and Stata.

Identification of relevant journal publications
PubMed Central search.PubMed Central was searched on July 5 2018 by MJG, in order to identify potential publications for inclusion in our review.Articles were required to have been published in one of 31 journals, a bespoke selection of those we believed to be most likely to publish articles relating to AD methodology (see Supplementary Table 1).Publications from each journal were identified by searching the [Abstract], [Body -Key Terms], and [Title] fields for 53 chosen AD-related terms, which are listed in the Supplementary Material.The search was limited to those articles published between January 1 2013 and December 31 2017.Supplementary Table 1 provides the number of records identified for each of the considered journals; in total 4123 articles were identified for review.
Publication inclusion criteria.We desired to include publications related to the design and analysis of AD clinical trials.Thus, using the US Food and Drug Administration definition of an AD (US FDA, 2018), our inclusion criteria were: 1.A publication that proposes or examines design or analysis methodology for a clinical that "allows for prospectively planned modifications to one or more aspects of the design based on accumulating data from subjects in the trial" (US FDA, 2018); 2. A complete peer-reviewed publication (i.e., we excluded conference abstracts); 3. Set within the context of clinical trials (i.e., we excluded methodology that could be used for the AD of a clinical trial if the primary motivation was not clinical trial research); 4. Performs computational work of any kind relating to ADs (i.e., even to confirm theoretical results, produce simple graphs, etc.).
Note that we excluded conference abstracts as we believed it would be unlikely that they would explicitly note whether/where code is available.Similarly, in fields other than clinical trials there may be different expectations on the availability of code.We thus excluded such publications to reduce the bias in our findings, given our primary interest was AD methodology for clinical trials.No restrictions were made on the level of code required for inclusion since we felt drawing such conclusions would be subjective.Finally, note that by criterion 1, we exclude publications that simply present the results of a clinical trial that utilized an AD.
Selection of studies for inclusion in the review and data extraction.Two-hundred records were randomly selected to pilot the selection process and data extraction upon.Specifically, MJG and GMW independently considered the 200 records for inclusion, and for each of those marked for inclusion, extracted the following data: • Software availability: Each of the articles were allocated in to one of the categories given in Supplementary Table 2, according to the provision of the code required for the presented results.
• Software languages used: C++, R, SAS, Stata, Unclear, etc.Following this pilot, areas of disagreement were discussed in order to enhance the reliability of the selection process and data extraction on the remaining 3923 records, which were allocated evenly and at random to MJG and GMW.In extreme cases where a reviewer was unable to come to a conclusion on inclusion/data extraction, a decision was made following discussion with the other reviewer.
Note that in each case of exclusion, a reason for exclusion amongst the following options was recorded: • Non-adaptive design methodology; • No code required; • Not within the context of clinical trials; • Not complete publication.

Identification of relevant database-archived computer code
Software-specific database searches.In order to identify further software for the AD of clinical trials that is available for R, SAS, and Stata, MJG performed the following additional software-specific database searches on 10 July 2018.For each, there was no simple means of extracting results data in to a manageable offline form.Therefore, a less formal approach to record identification had to be taken, as outlined below.
Firstly, Rseek was used to identify packages currently available on the Comprehensive R Archive Network (CRAN; the principal location for the storage of R packages) that are pertinent to ADs.Specifically, each of the 53 terms used in the article search of PubMed Central (the "search terms") were entered in to the engine at https://rseek.org/.Next, the articles from the R-project tab were screened, with any that appeared to be of potential relevance to ADs noted in a .csvfile.Similarly, to identify code available for Stata that is relevant to ADs, the Statistical Software Components (SSC) archive was used (which hosts the largest collection of user-contributed Stata programs).The search terms were entered in to the search bar at https://ideas.repec.org/.Any potentially germane results were added to the aforementioned .csvfile.Moreover, the search terms were also entered in to the search engine at https://www.stata-journal.com/, in order to identify relevant publications in the Stata Journal (note that we did not search for Stata Journal articles via PubMed Central, as not all such articles are indexed there), the premier journal for the publication of Stata code articles.To find user-contributed code for AD in SAS, the abstracts of the proceedings of the SAS Global Forums from 2007-18 were searched using the search terms given earlier (e.g., for 2016 the terms were utilized via Ctrl+F searches at support.sas.com/resources/papers/proceedings16/).Finally, the procedure was repeated on GitHub, using the search bar at https://github.com/,with all seemingly relevant results again stored in a .csvfile.For this search, no restrictions on the programming language utilized were made.
Note that for each of these databases, no limits on the publication date were employed, as our goal was to identify as much relevant software as possible.The number of records identified of potential relevance are given in Supplementary Table 3.
Identification of relevant records.Each of the records from the search described in Section 2.4.1 were screened in order to identify those related to ADs.Our criteria for listing a record as relevant was point 1 from Section 2.3.2.The functionalities of those that were relevant were also noted via a checklist, using one or more of the following keywords: To pilot the screening, 31 records (~10% of the 307 records initially identified) were chosen at random and reviewed by MJG and GMW.As above, this allowed for discussions on differences of opinion, in order to improve the standardization of the classification for the remainder of the records.For efficiency purposes, MJG then screened each of the remaining records from GitHub.
GMW screened those from each of the other databases.
Finally, note that for all records that were marked to be of relevance to ADs, the author's additional repositories were screened (e.g., via their homepage on GitHub) in order to identify any further code relating to ADs.From this, three previously unidentified records were included.them to the observed rates of code availability/provision in our review.Figure 2 shows the distribution of code provision across journals according to their code provision policy (Compulsory, Strongly Encouraged, Encouraged, Possible, Not Mentioned).The data show journals with compulsory policies for code provision have not been enforcing their policies.
There is a possibility that articles published at the start of our review period (i.e., 2013) may not have been subject to the same code provision policy that is in place now.However, violations of the compulsory policy type are consistent across the review period.For example, Statistics in Medicine (ISSN 0277-6715; Wiley Online Library) states that "The journal also requires authors to supply any supporting computer code or simulations that allow readers to institute any new methodology proposed in the published article.";this is an example of a compulsory policy.In our review, 66 articles published in Statistics in Medicine were considered eligible.Table 1 shows the distribution of articles published each year across journal provision type for articles published in Statistics in Medicine.Over 5 years, 51 articles (77%) were published with no code provided, and numbers did not noticeably decrease over time, which would be consistent with the introduction of a compulsory code provision policy.On the other hand, Biometrical Journal (ISSN 1521-4036; Wiley Online Library) has a "Strongly Encouraged" level policy, stating "The journal strongly supports Reproducible Research.Authors are therefore vigorously encouraged to submit computer code and data sets used to illustrate new methods and to reproduce the results of the paper."All 7 eligible articles published during our review period provided full code to reproduce the results or gave all functions required to replicate the paper's results.

Software used
A variety of different statistical programs were used in the eligible articles, including open-source libraries, licensed programs, and commercial software.Overall, 129 articles (52%) stated what software was used in their computations; 60 of these articles (47%) did not make their code available.
Of the 129 articles, 107 used R (R Core Team, 2017); 91 such articles used R only, and the other 16 used R in combination with another program (e.g., MCMC sampling software such as JAGS, OpenBUGS, or WinBUGS) or provided code/software in other computing languages as well as R.
Table 2 shows the usage of different software and their provision categories in journals.

Repository review
We performed additional searches of major software libraries to identify and classify available computer software related to ADs.Our searches found 310 software libraries, of which 123 were considered eligible.Of these records, 64 (52%) were found on CRAN; 45 of these 64 CRAN packages had duplicate repositories on GitHub pages.Forty (33%) additional repositories were found on GitHub (i.e., repositories not located on any other platform), 8 (7%) on SSC, 6 (5%) from the SAS Global Forum and 5 (4%) from the Stata Journal.Of the 40 GitHub repositories, 35 (88%) featured code for R; the remaining 5 entries featured code for Julia (2/40), Javascript (1/40), Python (1/40), and SAS (1/40).This means that of the 123 eligible repositories, 99 (80%) provided R packages, or code for use in R that is yet to be published as a package on CRAN.
Table 3 shows the primary applications for AD software, split by software language and intended trial phase.The majority of available packages cover phase II and phase III trials and are for group sequential methods.The packages/programs tended to cover multiple purposes; 64 programs belonged to one of the design categories listed in Table 3, 51 belonged to two categories, 7 belonged to three categories, and 1 covered four categories.Supplementary Table 4 shows the distribution of software and trial phase catered for by the different subcategories of group sequential methods.When breaking down the "Group sequential" designs category into its constituent parts, we see that many packages are available for dealing with both two-stage and multi-stage designs.As per previous tables, R is generally the favoured software for writing such programs.We also extracted, where possible, the date when the package was last updated or released.For 11 (9%) entries, only the year of last known update was available.Figure 3 shows the distribution of year of latest update by Repository.Most packages are hosted on CRAN and GitHub, repositories that users can easily update and submit packages to, and all CRAN packages have been released or updated within the last 4 years.There are few programs hosted on the SAS Global Forum, SSC, and via the Stata Journal, most of which have not been updated in the last 4 years.We cannot tell if the lack of updates are because the package is in perfect working order with all required functionality, or whether a lack of interest from users means there is no need for the maintainer to update it.

Discussion
By scanning 31 journals and five years' worth of publications, we provide reliable estimates of the prevalence of software provision alongside AD methodology publications.The reliability of our findings is also aided by joint-review of 10% of records, with discussion of findings to ensure consistency.Ultimately, we found that 71% of included articles did not provide any code or software.Most of the journals in which these articles were published have code provision policies that either require or strongly encourage the provision of code.
The low rate of software provision is a disappointing finding.Providing code alongside methodological research allows readers to reproduce novel ADs and tailor them to their own project needs.Some research funders expect funding recipients to make data and original software used for analyses fully available at the time of publication.For example, the Wellcome Trust state that researchers should make sure such outputs i) are discoverable, ii) use recognized community repositories, and iii) use persistent identifiers (e.g., DOIs) where possible (see https://wellcome.ac.uk/funding/guidance/policy-data-software-materials-management-andsharing).We recommend that this guidance is followed for all AD related publications whenever feasible.
More positively, we identified that there has been a marked increase in the number of software repositories relating to ADs over the last five years (Figure 3).A further interesting result is that the majority of AD-related programs are written for R. Therefore, whilst provision of code and software with new publications may help increase the use of ADs, it would also thus be prudent for statisticians to be familiar with how to use R. Furthermore, by demonstrating what trial adaptations are covered by existing software, we have made it possible for researchers to be better informed as to where new and improved code is required.In particular, many programs are available for group sequential design.Future research in these areas likely does not require the provision of brand-new code, when several open-source packages are likely already available for the required AD.In contrast, only limited software is available to support sample size re-estimation, or biomarker-based adaptation.
A limitation of our review is that some papers may not release code at the time of publication as they intend to release their code as part of a larger package, or because of potential confidentiality issues.However, no papers mentioned that this was the case, and we would encourage authors to state why code is not available to accompany research.
In summary, to overcome the barriers to implementing ADs in clinical trials, we encourage researchers to make their code available alongside their published research as supplementary material, or by storing it on stable repositories such as GitHub and CRAN.Several articles stated code was available at a given URL, but half of these URLs did not work.Similarly, about a third of articles that stated code would be available upon request were unable to provide code within a month of sending a written request.Accordingly, making code available in either of these manners should not be viewed as a reliable long-term method of user access.
a Includes custom R functions, use of existing R packages, and also R Shiny applications.
of phase II, phase II/III, and phase III trials registered on ClinicalTrals.gov between 29 February 2000 and 1 June 2014, along with trials from the National Institute for Health Research register, identified only 143 AD clinical trials.Similarly, Bothwell et al (2018) reviewed articles from several databases published prior to September 2014, and found 142 AD phase II, phase II/III, or phase III trials.
Adaptive randomization; Alpha spending; Bayesian methods; Biomarker-based methods; Dosemodification/escalation; Drop the loser; Group sequential; Multi-stage; Phase I; Phase I/II; Phase II; Phase II/III; Phase III; Pick the winner; Sample-size adjustment; Stopping rules; Two stage.

Figure 3 .
Figure 3. Number of identified repositories by location and year.

Table 1 .
Code provision for articles published in Statistics in Medicine, split by year of first publication.

Table 3 .
Main Functions of software repositories, split by software and trial phase.Each package may belong to multiple categories and cover multiple trial phases.Number of articles by journal and whether code is provided or not.Number of articles by code provision, journal and journal's code provision policy.
a "Group sequential methods" covers the following subcategories: group sequential; two-stage; multi-stage; stopping rules; drop the loser; pick the winner; alpha spending.Figure 1.