Privacy-preserving record linkage across disparate institutions and datasets to enable a learning health system: The national COVID cohort collaborative (N3C) experience.

clinical research data privacy learning health systems record linkage translational research

Journal

Learning health systems
ISSN: 2379-6146
Titre abrégé: Learn Health Syst
Pays: United States
ID NLM: 101708071

Informations de publication

Date de publication:
Jan 2024
Historique:
received: 22 07 2023
revised: 06 12 2023
accepted: 06 12 2023
medline: 22 1 2024
pubmed: 22 1 2024
entrez: 22 1 2024
Statut: epublish

Résumé

Research driven by real-world clinical data is increasingly vital to enabling learning health systems, but integrating such data from across disparate health systems is challenging. As part of the NCATS National COVID Cohort Collaborative (N3C), the N3C Data Enclave was established as a centralized repository of deidentified and harmonized COVID-19 patient data from institutions across the US. However, making this data most useful for research requires linking it with information such as mortality data, images, and viral variants. The objective of this project was to establish privacy-preserving record linkage (PPRL) methods to ensure that patient-level EHR data remains secure and private when governance-approved linkages with other datasets occur. Separate agreements and approval processes govern N3C data contribution and data access. The Linkage Honest Broker (LHB), an independent neutral party (the Regenstrief Institute), ensures data linkages are robust and secure by adding an extra layer of separation between protected health information and clinical data. The LHB's PPRL methods (including algorithms, processes, and governance) match patient records using "deidentified tokens," which are hashed combinations of identifier fields that define a match across data repositories without using patients' clear-text identifiers. These methods enable three linkage functions: Deduplication, Linking Multiple Datasets, and Cohort Discovery. To date, two external repositories have been cross-linked. As of March 1, 2023, 43 sites have signed the LHB Agreement; 35 sites have sent tokens generated for 9 528 998 patients. In this initial cohort, the LHB identified 135 037 matches and 68 596 duplicates. This large-scale linkage study using deidentified datasets of varying characteristics established secure methods for protecting the privacy of N3C patient data when linked for research purposes. This technology has potential for use with registries for other diseases and conditions.

Identifiants

pubmed: 38249841
doi: 10.1002/lrh2.10404
pii: LRH210404
pmc: PMC10797567
doi:

Types de publication

Journal Article

Langues

eng

Pagination

e10404

Informations de copyright

© 2024 The Authors. Learning Health Systems published by Wiley Periodicals LLC on behalf of University of Michigan.

Déclaration de conflit d'intérêts

Jasmin Phua and Sara Rogovin are employed by Datavant, Inc. and Dr. Abel Kho has a financial interest in Datavant, Inc, a for‐profit company that was contracted to provided software, expertise, and services for the creation and operation of the PPRL approach described in this manuscript. Benjamin Amor, Maya Choudhury, Philip Sparks, Amin Mannaa, and Saad Ljazouli are employed by the for‐profit Palantir Technologies, which was contracted to provided software, expertise, and services for the creation and operation of the PPRL approach described in this manuscript. At the time that this work was performed, Drs. Peter Embi, Umberto Tachinardi, and Shaun Grannis all worked for and/or served as officers of Regenstrief Institute, Inc. a non‐profit research institute contracted by the NIH to manage and oversee the PPRL and LHB work described in this manuscript.

Auteurs

Umberto Tachinardi (U)

Department of Biomedical Informatics University of Cincinnati College of Medicine Cincinnati Ohio USA.

Shaun J Grannis (SJ)

Center for Biomedical Informatics, Regenstrief Institute Department of Family Medicine, IU School of Medicine Regenstrief Institute, Inc. and Indiana University School of Medicine Indianapolis Indiana USA.

Sam G Michael (SG)

National Center for Advancing Translational Science NIH Bethesda Maryland USA.

Leonie Misquitta (L)

National Center for Advancing Translational Science NIH Bethesda Maryland USA.

Jayme Dahlin (J)

National Center for Advancing Translational Science NIH Bethesda Maryland USA.

Usman Sheikh (U)

National Center for Advancing Translational Science NIH Bethesda Maryland USA.

Abel Kho (A)

Department of Medicine Northwestern University, Feinberg School of Medicine Chicago Illinois USA.
Public Sector Datavant, Inc San Francisco California USA.

Jasmin Phua (J)

Public Sector Datavant, Inc San Francisco California USA.

Sara S Rogovin (SS)

Public Sector Datavant, Inc San Francisco California USA.

Benjamin Amor (B)

Federal Health Palantir Technologies Denver Colorado USA.

Maya Choudhury (M)

Federal Health Palantir Technologies Denver Colorado USA.

Philip Sparks (P)

Federal Health Palantir Technologies Denver Colorado USA.

Amin Mannaa (A)

Federal Health Palantir Technologies Denver Colorado USA.

Saad Ljazouli (S)

Federal Health Palantir Technologies Denver Colorado USA.

Joel Saltz (J)

School of Medicine Stony Brook University Stony Brook New York USA.

Fred Prior (F)

COM Biomedical Informatics University of Arkansas for Medical Sciences Little Rock Arkansas USA.

Ahmen Baghal (A)

COM Biomedical Informatics University of Arkansas for Medical Sciences Little Rock Arkansas USA.

Kenneth Gersing (K)

National Center for Advancing Translational Science NIH Bethesda Maryland USA.

Peter J Embi (PJ)

Department of Biomedical Informatics Vanderbilt University Medical Center Nashville Tennessee USA.

Classifications MeSH