From Centralized to Federated: The Journey of Data in Healthcare
Author:
Keywords:
STADIUS-24-171
Abstract:
Real-World Data (RWD) in healthcare holds tremendous potential, but its effective use is often hampered by significant challenges. The fragmentation of data across various healthcare systems, combined with strict privacy and legal regulations, greatly limits the ability to make RWD easily Findable, Accessible, Interoperable, and Reusable (FAIR). These challenges become even more pronounced when developing analytical models, where the issue of \emph{data centralization} presents a significant obstacle. Conventional data analysis methods typically depend on a single, unified dataset. However, for conditions like Multiple Sclerosis (MS), where the low prevalence of the disease is compounded by the dispersion of RWD across numerous repositories, the situation becomes even more complex. The variability in data formats, quality standards, and regulatory guidelines further complicates the aggregation of data. Consequently, these obstacles not only intensify the fragmentation of RWD but also hinder the generation of robust, large-scale evidence critical for advancing healthcare outcomes. Recognizing the gaps and needs in current practices, this thesis advocates for a shift towards federated data analysis as a core strategy. This approach enables the examination of distributed datasets without requiring centralization, thus preserving data privacy and integrity. Although federated analysis shows enormous potential as a solution to overcome many of the challenges associated with RWD, its widespread adoption is still limited. This is primarily due to the complexities involved in its practical implementation, amplified by the multidisciplinary nature of its domain and the heterogeneous characteristics of RWD. In this thesis, we outline the evolution of data management pipelines, transitioning from a centralized to a fully federated framework, and introduce three foundational pillars where technical solutions are combined with clinical perspectives aiming to advance the use of sophisticated yet inclusive, privacy-aware, but pragmatic technologies in healthcare. The thesis introduces its first pillar: a comprehensive, research-agnostic hybrid data managment pipeline. This pipeline is designed to support the integration of diverse data sources and formats, facilitating a more inclusive and practical approach to data analysis. This pipeline was effectively implemented in the Global Data Sharing Initiative for COVID-19 and MS, leading to the collection of the largest cohort of MS and COVID-19, showcasing the potential of collaborative, evidence-based healthcare advancements. As the second pillar of this thesis, the ``Federated Learning For Everyone'' framework is presented. This framework empowers its diverse stakeholders to more effectively leverage RWD through an adaptable and inclusive federated data analysis ecosystem. In addition, the framework introduces the novel concept of the `degree of federation', which allows for flexible adjustments between data centralization and decentralization to meet specific healthcare needs. Finally, the thesis explores the pioneering application of federated data analysis in MS research. It utilizes routine clinical data to evaluate the effectiveness of federated analysis in predicting disability progression, employing one of the largest available cohorts of people with MS. This evaluation includes assessing various federated configurations and optimizing models to demonstrate that federated analysis is a robust alternative to conventional centralized approaches. Additionally, the thesis proposes novel federated modeling techniques to enhance federation performance, further highlighting the potential of federated analysis in complex healthcare research settings. Overall, this thesis underscores the necessity and benefits of transitioning towards inclusive federated data management in healthcare by addressing critical gaps and leveraging pragmatic, privacy-aware technologies. This approach paves the way for broader adoption and fosters impactful innovations in the field, highlighting the significant potential for enhancing healthcare research and practice.