dsSurvival 2.0: privacy enhancing survival curves for survival models in the federated DataSHIELD analysis system.
BMC research notes 2023 ; 16: 98.
DOI : 10.1186/s13104-023-06372-5
PubMed ID : 37280717
PMCID : PMC10243006
URL : https://bmcresnotes.biomedcentral.com/articles/10.1186/s13104-023-06372-5
Abstract
Survival models are used extensively in biomedical sciences, where they allow the investigation of the effect of exposures on health outcomes. It is desirable to use diverse data sets in survival analyses, because this offers increased statistical power and generalisability of results. However, there are often challenges with bringing data together in one location or following an analysis plan and sharing results. DataSHIELD is an analysis platform that helps users to overcome these ethical, governance and process difficulties. It allows users to analyse data remotely, using functions that are built to restrict access to the detailed data items (federated analysis). Previous works have provided survival modelling functionality in DataSHIELD (dsSurvival package), but there is a requirement to provide functions that offer privacy enhancing survival curves that retain useful information.
We introduce an enhanced version of the dsSurvival package which offers privacy enhancing survival curves for DataSHIELD. Different methods for enhancing privacy were evaluated for their effectiveness in enhancing privacy while maintaining utility. We demonstrated how our selected method could enhance privacy in different scenarios using real survival data. The details of how DataSHIELD can be used to generate survival curves can be found in the associated tutorial.
Lay Summary
Scientists often use survival models to study how different factors affect our health. To get more accurate and reliable results, it's important to analyse diverse sets of data. However, it can be difficult to bring all the data together in one place and share the results in a secure and ethical manner.
That's where DataSHIELD comes in. It's a platform that helps researchers overcome the challenges of analysing data from multiple sources. With DataSHIELD, researchers can analyse the data remotely without accessing the specific details of each individual's information. This is called federated analysis and it protects people's privacy.
In this study, we improved an existing package called dsSurvival, which allows researchers to perform survival analysis using DataSHIELD. Our enhanced version now includes privacy-enhancing survival curves. Survival curves are used to understand and visualize how long individuals or groups of people survive (or stay healthy) over time in a particular study or research. They help us see the chances of survival or staying free from a specific health problem as time goes on. We evaluated different methods to ensure privacy while still maintaining the usefulness of the data. To demonstrate the effectiveness of our chosen method, we used real survival data in various scenarios.
To guide researchers in how to generate survival curves using dsSurvival, we wrote a tutorial that explains the process in detail.