Reading: A Data Element-Function Conceptual Model for Data Quality Checks

Download

A- A+
Alt. Display
  • Login has been disabled for this journal while it is transferred to a new platform. Please try again in 48 hours.

Empirical research

A Data Element-Function Conceptual Model for Data Quality Checks

Authors:

James R. Rogers,

Department of Biomedical Informatics, Columbia University, US
X close

Tiffany J. Callahan,

University of Colorado Denver Anschutz Medical Campus, US
X close

Tian Kang,

Department of Biomedical Informatics, Columbia University, US
X close

Alan Bauck,

Kaiser Permanente Northwest Center for Health Research, US
X close

Ritu Khare,

Children’s Hospital of Philadelphia, US
X close

Jeffrey S. Brown,

Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, US
X close

Michael G. Kahn,

University of Colorado Denver Anschutz Medical Campus, US
X close

Chunhua Weng

Department of Biomedical Informatics, Columbia University, US
About Chunhua
PhD
X close

Abstract

Introduction: In aggregate, existing data quality (DQ) checks are currently represented in heterogeneous formats, making it difficult to compare, categorize, and index checks. This study contributes a data element-function conceptual model to facilitate the categorization and indexing of DQ checks and explores the feasibility of leveraging natural language processing (NLP) for scalable acquisition of knowledge of common data elements and functions from DQ checks narratives.

Methods: The model defines a “data element”, the primary focus of the check, and a “function”, the qualitative or quantitative measure over a data element. We applied NLP techniques to extract both from 172 checks for Observational Health Data Sciences and Informatics (OHDSI) and 3,434 checks for Kaiser Permanente’s Center for Effectiveness and Safety Research (CESR).

Results: The model was able to classify all checks. A total of 751 unique data elements and 24 unique functions were extracted. The top five frequent data element-function pairings for OHDSI were Person-Count (55 checks), Insurance-Distribution (17), Medication-Count (16), Condition-Count (14), and Observations-Count (13); for CESR, they were Medication-Variable Type (175), Medication-Missing (172), Medication-Existence (152), Medication-Count (127), and Socioeconomic Factors-Variable Type (114).

Conclusions: This study shows the efficacy of the data element-function conceptual model for classifying DQ checks, demonstrates early promise of NLP-assisted knowledge acquisition, and reveals the great heterogeneity in the focus in DQ checks, confirming variation in intrinsic checks and use-case specific “fitness-for-use” checks.

How to Cite: Rogers JR, Callahan TJ, Kang T, Bauck A, Khare R, Brown JS, et al.. A Data Element-Function Conceptual Model for Data Quality Checks. eGEMs (Generating Evidence & Methods to improve patient outcomes). 2019;7(1):17. DOI: http://doi.org/10.5334/egems.289
249
Views
42
Downloads
  Published on 23 Apr 2019

Galley file missing.

Please contact support [at] ubiquitypress.com

comments powered by Disqus