dsSurvival: Privacy preserving survival models for federated individual patient meta-analysis in DataSHIELD.
BMC research notes 2022 ; 15: 197.
Banerjee S, Sofack GN, Papakonstantinou T, Avraam D, Burton P, Zoller D, Bishop TRP
DOI : 10.1186/s13104-022-06085-1
PubMed ID : 35659747
PMCID :
URL : https://bmcresnotes.biomedcentral.com/articles/10.1186/s13104-022-06085-1
Abstract
Achieving sufficient statistical power in a survival analysis usually requires large amounts of data from different sites. Sensitivity of individual-level data, ethical and practical considerations regarding data sharing across institutions could be a potential challenge for achieving this added power. Hence we implemented a federated meta-analysis approach of survival models in DataSHIELD, where only anonymous aggregated data are shared across institutions, while simultaneously allowing for exploratory, interactive modelling. In this case, meta-analysis techniques to combine analysis results from each site are a solution, but an analytic workflow involving local analysis undertaken at individual studies hinders exploration. Thus, the aim is to provide a framework for performing meta-analysis of Cox regression models across institutions without manual analysis steps for the data providers.
We introduce a package (dsSurvival) which allows privacy preserving meta-analysis of survival models, including the calculation of hazard ratios. Our tool can be of great use in biomedical research where there is a need for building survival models and there are privacy concerns about sharing data.
Lay Summary
Better and more relevant scientific results can be obtained when analyses are performed on larger, more diverse datasets. This is because it is possible to detect smaller effects and check whether results still apply in different settings. Bringing together existing datasets for analysis is not always possible, because the data can be sensitive or permission has not been given to share them. DataSHIELD (Data Aggregation Through Anonymous Summary-statistics from Harmonised Individual-levEL Databases) is a software system that allows users to analyse data at source by giving them the ability to run controlled commands on the data. The results returned by the commands are designed not to reveal any of the underlying data, only summary information that is still useful for analyses.
Survival analyses allow understanding of the time taken for individuals to undergo an event such as death or disease. The effects of different behaviours (e.g. smoking) and characteristics (e.g. body weight) on survival times can be studied, so that we can learn how we might improve survival times.
Our work provides new functionality for DataSHIELD that allows survival analyses to be run on multiple, distributed datasets, while aiming to prevent access to the underlying data. This will enable new analyses to be run across datasets that were not previously possible.