Demonstrating an approach for evaluating synthetic geospatial and temporal epidemiologic data utility: Results from analyzing >1.8 million SARS-CoV-2 tests in the United States National COVID Cohort Collaborative (N3C).
Journal
medRxiv : the preprint server for health sciences
Titre abrégé: medRxiv
Pays: United States
ID NLM: 101767986
Informations de publication
Date de publication:
08 Jul 2021
08 Jul 2021
Historique:
entrez:
16
7
2021
pubmed:
17
7
2021
medline:
17
7
2021
Statut:
epublish
Résumé
To evaluate whether synthetic data derived from a national COVID-19 data set could be used for geospatial and temporal epidemic analyses. Using an original data set (n=1,854,968 SARS-CoV-2 tests) and its synthetic derivative, we compared key indicators of COVID-19 community spread through analysis of aggregate and zip-code level epidemic curves, patient characteristics and outcomes, distribution of tests by zip code, and indicator counts stratified by month and zip code. Similarity between the data was statistically and qualitatively evaluated. In general, synthetic data closely matched original data for epidemic curves, patient characteristics, and outcomes. Synthetic data suppressed labels of zip codes with few total tests (mean=2.9±2.4; max=16 tests; 66% reduction of unique zip codes). Epidemic curves and monthly indicator counts were similar between synthetic and original data in a random sample of the most tested (top 1%; n=171) and for all unsuppressed zip codes (n=5,819), respectively. In small sample sizes, synthetic data utility was notably decreased. Analyses on the population-level and of densely-tested zip codes (which contained most of the data) were similar between original and synthetically-derived data sets. Analyses of sparsely-tested populations were less similar and had more data suppression. In general, synthetic data were successfully used to analyze geospatial and temporal trends. Analyses using small sample sizes or populations were limited, in part due to purposeful data label suppression -an attribute disclosure countermeasure. Users should consider data fitness for use in these cases.
Identifiants
pubmed: 34268525
doi: 10.1101/2021.07.06.21259051
pmc: PMC8282114
pii:
doi:
Types de publication
Preprint
Langues
eng
Subventions
Organisme : NIGMS NIH HHS
ID : U54 GM104938
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002649
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR003167
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001433
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001422
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001860
Pays : United States
Organisme : NIGMS NIH HHS
ID : U54 GM104942
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001420
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001439
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002243
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001445
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR003096
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002537
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001412
Pays : United States
Organisme : NLM NIH HHS
ID : T15 LM007442
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001872
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001878
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002529
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001863
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002494
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002736
Pays : United States
Organisme : NIGMS NIH HHS
ID : U54 GM115516
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002369
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002541
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002001
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002538
Pays : United States
Organisme : NIGMS NIH HHS
ID : U54 GM115458
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001442
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002535
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001866
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001449
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001453
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002489
Pays : United States
Organisme : NIGMS NIH HHS
ID : U54 GM104940
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR003107
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR003015
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002733
Pays : United States
Organisme : NCATS NIH HHS
ID : U24 TR002306
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002003
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001876
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001436
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002378
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002384
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002553
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002389
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001414
Pays : United States
Organisme : NIGMS NIH HHS
ID : U54 GM104941
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002014
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002550
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002319
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001855
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001425
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002373
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002240
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002556
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR003017
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001998
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001873
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001881
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002645
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001450
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002366
Pays : United States
Organisme : NIGMS NIH HHS
ID : U54 GM115428
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002345
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002377
Pays : United States
Organisme : NIGMS NIH HHS
ID : U54 GM115677
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR002544
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR003098
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR001430
Pays : United States
Organisme : NCATS NIH HHS
ID : UL1 TR003142
Pays : United States
Commentaires et corrections
Type : UpdateIn
Références
Front Sociol. 2020 Dec 02;5:613814
pubmed: 33869532
J Am Med Inform Assoc. 2021 Mar 1;28(3):638-639
pubmed: 33275146
Science. 2018 Jul 13;361(6398):134-136
pubmed: 30002244
JMIR Med Inform. 2020 Feb 20;8(2):e16492
pubmed: 32130148
Lancet Infect Dis. 2020 May;20(5):533-534
pubmed: 32087114
Scientometrics. 2020 Aug 28;:1-12
pubmed: 32904414
Gates Open Res. 2019 Dec 4;3:1722
pubmed: 32478311
J Med Internet Res. 2020 May 26;22(5):e19357
pubmed: 32408267
JACC Basic Transl Sci. 2018 Nov 12;3(5):716-718
pubmed: 30456342
Nat Commun. 2016 Oct 11;7:12521
pubmed: 27725664
PLoS One. 2015 Jul 02;10(7):e0132321
pubmed: 26134404
J Am Med Inform Assoc. 2021 Jan 15;28(1):184-189
pubmed: 32722749
Healthy People 2010 Stat Notes. 2002 Jul;(24):1-12
pubmed: 12117004
JAMIA Open. 2021 Mar 01;4(1):ooab012
pubmed: 33709065
Circ Cardiovasc Qual Outcomes. 2019 Jul;12(7):e005122
pubmed: 31284738
J Am Med Inform Assoc. 2021 Mar 1;28(3):427-443
pubmed: 32805036
Front Artif Intell. 2021 May 17;4:613956
pubmed: 34079930
JAMA Dermatol. 2018 Nov 1;154(11):1247-1248
pubmed: 30073260
Nat Med. 2019 Jan;25(1):37-43
pubmed: 30617331
JAMIA Open. 2020 Dec 14;3(4):557-566
pubmed: 33623891
Acad Pathol. 2018 Jul 11;5:2374289518784222
pubmed: 30023429
BMC Med Inform Decis Mak. 2019 Mar 14;19(1):44
pubmed: 30871520
J Am Med Inform Assoc. 2020 Jan 1;27(1):99-108
pubmed: 31592533