Model / Framework
Assessing and Minimizing Re-identification Risk in Research Data Derived from Health Care Records
Authors:
Gregory E. Simon ,
Kaiser Permanente Washington Health Research Institute, Seattle, WA, US
About Gregory E.
MD, MPH
Susan M. Shortreed,
Kaiser Permanente Washington Health Research Institute, Seattle, WA, US
R. Yates Coley,
Kaiser Permanente Washington Health Research Institute, US
Robert B. Penfold,
Kaiser Permanente Washington Health Research Institute, Seattle, WA, US
Rebecca C. Rossom,
HealthPartners Institute, Minneapolis, MN, US
Beth E. Waitzfelder,
Kaiser Permanente Hawaii Center for Health Research, Honolulu, HI, US
Katherine Sanchez,
Baylor Scott and White Research Institute, Dallas, TX, US
Frances L. Lynch
Kaiser Permanente Northwest Center for Health Research, Portland, OR, US
Abstract
Background: Sharing of research data derived from health system records supports the rigor and reproducibility of primary research and can accelerate research progress through secondary use. But public sharing of such data can create risk of re-identifying individuals, exposing sensitive health information.
Method: We describe a framework for assessing re-identification risk that includes: identifying data elements in a research dataset that overlap with external data sources, identifying small classes of records defined by unique combinations of those data elements, and considering the pattern of population overlap between the research dataset and an external source. We also describe alternative strategies for mitigating risk when the external data source can or cannot be directly examined.
Results: We illustrate this framework using the example of a large database used to develop and validate models predicting suicidal behavior after an outpatient visit. We identify elements in the research dataset that might create risk and propose a specific risk mitigation strategy: deleting indicators for health system (a proxy for state of residence) and visit year.
Discussion: Researchers holding health system data must balance the public health value of data sharing against the duty to protect the privacy of health system members. Specific steps can provide a useful estimate of re-identification risk and point to effective risk mitigation strategies.
How to Cite:
Simon GE, Shortreed SM, Coley RY, Penfold RB, Rossom RC, Waitzfelder BE, et al.. Assessing and Minimizing Re-identification Risk in Research Data Derived from Health Care Records. eGEMs (Generating Evidence & Methods to improve patient outcomes). 2019;7(1):6. DOI: http://doi.org/10.5334/egems.270