Reading: A Query Workflow Design to Perform Automatable Distributed Regression Analysis in Large Dist...

Download

A- A+
Alt. Display
  • Login has been disabled for this journal while it is transferred to a new platform. Please try again in 48 hours.

Model / Framework

A Query Workflow Design to Perform Automatable Distributed Regression Analysis in Large Distributed Data Networks

Authors:

Qoua L. Her ,

Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, US
X close

Jessica M. Malenfant,

Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, US
X close

Sarah Malek,

Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, US
X close

Yury Vilk,

Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, US
X close

Jessica Young,

Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, US
X close

Lingling Li,

Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, US
X close

Jeffery Brown,

Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, US
X close

Sengwee Toh

Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, US
X close

Abstract

Introduction: Patient privacy and data security concerns often limit the feasibility of pooling patient-level data from multiple sources for analysis. Distributed data networks (DDNs) that employ privacy-protecting analytical methods, such as distributed regression analysis (DRA), can mitigate these concerns. However, DRA is not routinely implemented in large DDNs.

Objective: We describe the design and implementation of a process framework and query workflow that allow automatable DRA in real-world DDNs that use PopMedNet, an open-source distributed networking software platform.

Methods: We surveyed and catalogued existing hardware and software configurations at all data partners in the Sentinel System, a PopMedNet-driven DDN. Key guiding principles for the design included minimal disruptions to the current PopMedNet query workflow and minimal modifications to data partners’ hardware configurations and software requirements.

Results: We developed and implemented a three-step process framework and PopMedNet query workflow that enables automatable DRA: 1) assembling a de-identified patient-level dataset at each data partner, 2) distributing a DRA package to data partners for local iterative analysis, and 3) iteratively transferring intermediate files between data partners and analysis center. The DRA query workflow is agnostic to statistical software, accommodates different regression models, and allows different levels of user-specified automation.

Discussion: The process framework can be generalized to and the query workflow can be adopted by other PopMedNet-based DDNs.

Conclusion: DRA has great potential to change the paradigm of data analysis in DDNs. Successful implementation of DRA in Sentinel will facilitate adoption of the analytic approach in other DDNs.

 

How to Cite: Her QL, Malenfant JM, Malek S, Vilk Y, Young J, Li L, et al.. A Query Workflow Design to Perform Automatable Distributed Regression Analysis in Large Distributed Data Networks. eGEMs (Generating Evidence & Methods to improve patient outcomes). 2018;6(1):11. DOI: http://doi.org/10.5334/egems.209
367
Views
113
Downloads
10
Citations
2
Twitter
  Published on 25 May 2018

Galley file missing.

Please contact support [at] ubiquitypress.com

comments powered by Disqus