Current recommendations/practices for anonymising data from clinical trials in order to make it available for sharing: A scoping review

Background/Aims There are increasing pressures for anonymised datasets from clinical trials to be shared across the scientific community, and differing recommendations exist on how to perform anonymisation prior to sharing. We aimed to systematically identify, describe and synthesise existing recommendations for anonymising clinical trial datasets to prepare for data sharing. Methods We systematically searched MEDLINE®, EMBASE and Web of Science from inception to 8 February 2021. We also searched other resources to ensure the comprehensiveness of our search. Any publication reporting recommendations on anonymisation to enable data sharing from clinical trials was included. Two reviewers independently screened titles, abstracts and full text for eligibility. One reviewer extracted data from included papers using thematic synthesis, which then was sense-checked by a second reviewer. Results were summarised by narrative analysis. Results Fifty-nine articles (from 43 studies) were eligible for inclusion. Three distinct themes are emerging: anonymisation, de-identification and pseudonymisation. The most commonly used anonymisation techniques are: removal of direct patient identifiers; and careful evaluation and modification of indirect identifiers to minimise the risk of identification. Anonymised datasets joined with controlled access was the preferred method for data sharing. Conclusions There is no single standardised set of recommendations on how to anonymise clinical trial datasets for sharing. However, this systematic review shows a developing consensus on techniques used to achieve anonymisation. Researchers in clinical trials still consider that anonymisation techniques by themselves are insufficient to protect patient privacy, and they need to be paired with controlled access.


23
There are increasing pressures for anonymised datasets from clinical trials to be shared 24 across the scientific community. There are various sets of recommendations on how to 25 perform anonymisation prior to sharing clinical trial data. We aim to systematically identify, 26 describe and synthesise these recommendations. We will systematically search literature 27 databases and websites of key organisations in the field. Any publication reporting 28 recommendations on anonymisation to enable data sharing in clinical trials will be included.

29
Two reviewers will independently screen titles, abstracts and full text for eligibility. One 30 reviewer will extract data from included papers which will then be sense checked by a second 31 reviewer. Results will be summarised by narrative review. This scoping review will provide 32 information about existing recommendations for anonymising clinical trial datasets in order to 33 make them available for sharing and it will inform (

50
Clinical trial datasets contain personal health information on the trial participants. It is 51 imperative that data sharing does not disclose personal data to anyone who falls outside the 52 original group to whom the trial participants consented to disclose their data. Anonymising the 53 trial dataset fulfils this requirement. However, the anonymisation process removes information 54 from the data, and if not done properly, the original trial analyses could not be reproduced, 55 which in turn will limit the data's usability for further research [5]. Anonymisation is complex,

56
and there are many possible ways of performing it.

57
The drive to share data more widely has generated various sets of recommendations to enable 58 sharing [4, 6-9]. Embedded within these, there is a variety of recommendations on how to 59 anonymise a dataset.

60
Why it is important to do this review

61
To our knowledge, there are no reviews of the methods and/or recommendations for the 62 process of generating anonymised clinical trial datasets 1 . To understand and bring together 63 1 A quick search was executed on the 07JAN2019 on Google Scholar with "literature" "review" "anonymization" "methods" "clinical trials" and also "literature" "review" "anonymisation" "methods" "clinical trials", the first 100 results were screened for each search and relevant results were not found Appendix 1 -Protocol

95
The search strategy will use the following key concept areas, adopting subject headings and 96 keywords as relevant for each database:

97
(Clinical) and (trial* or randomi* or research* or control*) and (principle* or guid* or recomm*) and (shar* or reus* or re-us* or access* or open) and (de-identi* or deidenti* or anonym* or privacy or confidential*)

105
To further supplement our search yield, we will use backwards and forward citation searching 106 on the retrieved documents in order to find additional sources. Also, experts and authors 107 known to have published relevant work will be contacted to identify further literature.  129 Data extraction will be undertaken by one reviewer (AR) who will manually extract relevant 130 data from each included publication onto the data extraction form, which will be sense checked 131 independently by a second reviewer (CT). Any discrepancies will discussed between the 132 reviewers and if agreement cannot be reach then it will be resolved by a third reviewer