Federated causal inference based on real-world observational data sources: application to a SARS-CoV-2 vaccine effectiveness assessment.
COVID-19
Causal inference
Comparative effectiveness
Federated analysis
Pandemic preparedness
Real-world data
Vaccines
Journal
BMC medical research methodology
ISSN: 1471-2288
Titre abrégé: BMC Med Res Methodol
Pays: England
ID NLM: 100968545
Informations de publication
Date de publication:
23 10 2023
23 10 2023
Historique:
received:
26
07
2023
accepted:
11
10
2023
medline:
27
10
2023
pubmed:
24
10
2023
entrez:
23
10
2023
Statut:
epublish
Résumé
Causal inference helps researchers and policy-makers to evaluate public health interventions. When comparing interventions or public health programs by leveraging observational sensitive individual-level data from populations crossing jurisdictional borders, a federated approach (as opposed to a pooling data approach) can be used. Approaching causal inference by re-using routinely collected observational data across different regions in a federated manner, is challenging and guidance is currently lacking. With the aim of filling this gap and allowing a rapid response in the case of a next pandemic, a methodological framework to develop studies attempting causal inference using federated cross-national sensitive observational data, is described and showcased within the European BeYond-COVID project. A framework for approaching federated causal inference by re-using routinely collected observational data across different regions, based on principles of legal, organizational, semantic and technical interoperability, is proposed. The framework includes step-by-step guidance, from defining a research question, to establishing a causal model, identifying and specifying data requirements in a common data model, generating synthetic data, and developing an interoperable and reproducible analytical pipeline for distributed deployment. The conceptual and instrumental phase of the framework was demonstrated and an analytical pipeline implementing federated causal inference was prototyped using open-source software in preparation for the assessment of real-world effectiveness of SARS-CoV-2 primary vaccination in preventing infection in populations spanning different countries, integrating a data quality assessment, imputation of missing values, matching of exposed to unexposed individuals based on confounders identified in the causal model and a survival analysis within the matched population. The conceptual and instrumental phase of the proposed methodological framework was successfully demonstrated within the BY-COVID project. Different Findable, Accessible, Interoperable and Reusable (FAIR) research objects were produced, such as a study protocol, a data management plan, a common data model, a synthetic dataset and an interoperable analytical pipeline. The framework provides a systematic approach to address federated cross-national policy-relevant causal research questions based on sensitive population, health and care data in a privacy-preserving and interoperable way. The methodology and derived research objects can be re-used and contribute to pandemic preparedness.
Identifiants
pubmed: 37872541
doi: 10.1186/s12874-023-02068-3
pii: 10.1186/s12874-023-02068-3
pmc: PMC10594731
doi:
Substances chimiques
COVID-19 Vaccines
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
248Informations de copyright
© 2023. BioMed Central Ltd., part of Springer Nature.
Références
Hernán MA, Robins JM. Causal inference: what if. 1st ed. Boca Raton: Chapman & Hall/CRC; 2020.
Greenland S, Robins JM. Identifiability, exchangeability and confounding revisited. Epidemiol Perspect Innov. 2009;6(1): 4. https://doi.org/10.1186/1742-5573-6-4 .
doi: 10.1186/1742-5573-6-4
pubmed: 19732410
pmcid: 2745408
Listl S, Jürges H, Watt RG. Causal inference from observational data. Commun Dent Oral Epidemiol. 2016;44(5):409–15. https://doi.org/10.1111/cdoe.12231 .
doi: 10.1111/cdoe.12231
Pearce N, Lawlor DA. Causal inference—so much more than statistics. Int J Epidemiol. 2016;45(6):1895–903. https://doi.org/10.1093/ije/dyw328 .
doi: 10.1093/ije/dyw328
pubmed: 28204514
Hernán MA, Hernández-Díaz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;15(5):615–25. https://doi.org/10.1097/01.ede.0000135174.63482.43 .
doi: 10.1097/01.ede.0000135174.63482.43
pubmed: 15308962
Hernán MA, Wang W, Leaf DE. Target trial emulation: a framework for causal inference from observational data. JAMA. 2022;328(24):2446–7. https://doi.org/10.1001/jama.2022.21383 .
doi: 10.1001/jama.2022.21383
pubmed: 36508210
Hernán MA, Robins JM. Estimating causal effects from epidemiological data. J Epidemiol Community Health. 2006;60:578–86.
doi: 10.1136/jech.2004.029496
pubmed: 16790829
pmcid: 2652882
Glass TA, Goodman SN, Hernán MA, Samet JM. Causal inference in public health. Annu Rev Public Health. 2013;34(1):61–75. https://doi.org/10.1146/annurev-publhealth-031811-124606 .
doi: 10.1146/annurev-publhealth-031811-124606
pubmed: 23297653
pmcid: 4079266
International Vaccine Access Center (IVAC). VIEW-hub. 2023. Available from: https://view-hub.org/covid-19/effectiveness-studies . Cited 2023 Feb 13.
Directorate-General for Informatics (European Commission). New European interoperability framework: promoting seamless services and data flows for European public administrations. Publications Office of the European Union; 2017. Available from: https://data.europa.eu/doi/ https://doi.org/10.2799/78681 . Cited 2023 Mar 1.
Croatian Institute of Public Health (CIPH). Instituto Aragonés de Ciencias de la Salud (IACS). LOST* and found: Report on interoperability landscape in Europe. p. 1–55. Available from: https://www.inf-act.eu/sites/inf-act.eu/files/2020-11/D10.1.pdf .
González-García J, Estupiñán-Romero F, Tellería-Orriols C, González-Galindo J, Palmieri L, Faragalli A, González-García J, Estupiñán-Romero F, Tellería-Orriols C, González-Galindo J, Palmieri L, Faragalli A, Pristās I, Vuković J, Misinš J, Zile I, Bernal-Delgado E, Unim B, Carle F, Gesuita R, Ivanković D, Brkić M, Dimnjaković J, Lyons J, Lyons R, Ors Z, Zaletel M, Nogueira P, Lapão LV, Haaheim H, Bogaert P, Abboud L, van Oyen H. Coping with interoperability in the development of a federated research infrastructure: achievements, challenges and recommendations from the JA-InfAct. Arch Public Health. 2021;79(1):221. https://doi.org/10.1186/s13690-021-00731-z .
doi: 10.1186/s13690-021-00731-z
pubmed: 34879872
pmcid: 8656001
Wolfson M, Wallace SE, Masca N, Rowe G, Sheehan NA, Ferretti V, et al. DataSHIELD: resolving a conflict in contemporary bioscience—performing a pooled analysis of individual-level data without sharing the data. Int J Epidemiol. 2010;39:1372–82.
doi: 10.1093/ije/dyq111
pubmed: 20630989
pmcid: 2972441
Gaye A, Marcon Y, Isaeva J, LaFlamme P, Turner A, Jones EM, Gaye A, Marcon Y, Isaeva J, LaFlamme P, Turner A, Jones EM, Minion J, Boyd AW, Newby CJ, Nuotio M-L, Wilson R, Butters O, Murtagh B, Demir I, Doiron D, Giepmans L, Wallace SE, Budin-Ljøsne I, Oliver Schmidt C, Boffetta P, Boniol M, Bota M, Carter KW, deKlerk N, Dibben C, Francis RW, Hiekkalinna T, Hveem K, Kvaløy K, Millar S, Perry IJ, Peters A, Phillips CM, Popham F, Raab G, Reischl E, Sheehan N, Waldenberger M, Perola M, van den Heuvel E, Macleod J, Knoppers BM, Stolk RP, Fortier I, Harris JR, Woffenbuttel BHR, Murtagh MJ, Ferretti V, Burton PR. DataSHIELD: taking the analysis to the data, not the data to the analysis. Int J Epidemiol. 2014;43(6):1929–44. https://doi.org/10.1093/ije/dyu188 .
doi: 10.1093/ije/dyu188
pubmed: 25261970
pmcid: 4276062
Attema T, Worm D. Technological breakthrough finally, a privacy-friendly way to harness data. 2021. Available from: http://resolver.tudelft.nl/uuid:8002b966-7bba-427c-b343-56326c1a587b . Cited 2023 Sep 5.
Beyan O, Choudhury A, van Soest J, Kohlbacher O, Zimmermann L, Stenzhorn H, et al. Distributed analytics on sensitive medical data: the personal health train. Data Intell. 2020;2:96–107.
doi: 10.1162/dint_a_00032
Moncada-Torres A, Martin F, Sieswerda M, Van Soest J, Geleijnse G. VANTAGE6: an open source priVAcy preserviNg federaTed leArninG infrastructurE for Secure Insight eXchange. AMIA Annu Symp Proc. 2021;2020:870–7.
pubmed: 33936462
pmcid: 8075508
BY-COVID. Available from: https://by-covid.org/ . Cited 2023 Mar 30.
Spellman BA, Gilbert EA, Corker KS. Open Science. Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience. John Wiley & Sons, Ltd; 2018. p. 1–47. Available from: https://onlinelibrary.wiley.com/doi/abs/10.1002/9781119170174.epcn519 . Cited 2023 May 16.
Foster ED, Deardorff A. Open Science Framework (OSF). J Med Libr Assoc. 2017;105:203–6.
doi: 10.5195/jmla.2017.88
pmcid: 5370619
Abboud LA, Bogaert P, Fehr A, Urbanski D, Tolonen H, Noguer-Zambran I, et al. The new joint action on health information: information for action (InfAct)! Eur J Pub Health. 2018;28:cky213651.
doi: 10.1093/eurpub/cky212.651
Bogaert P, Schutte N. Towards a population health information research infrastructure. Eur J Pub Health. 2021;31(Supplement_3):ckab164572. https://doi.org/10.1093/eurpub/ckab164.572 .
doi: 10.1093/eurpub/ckab164.572
Nishikawa-Pacher A. Research questions with PICO: a universal mnemonic. Publications. 2022;10(3): 21. https://doi.org/10.3390/publications10030021 .
doi: 10.3390/publications10030021
Riva JJ, Malik KMP, Burnie SJ, Endicott AR, Busse JW. What is your research question? An introduction to the PICOT format for clinicians. J Can Chiropr Assoc. 2012;56:167–71.
pubmed: 22997465
pmcid: 3430448
Lira RPC, Rocha EM. PICOT: imprescriptible items in a clinical research question. Arq Bras Oftalmol. 2019;82:1–1.
doi: 10.5935/0004-2749.20190028
pubmed: 30726407
Hernán MA, Robins JM. Using big data to emulate a target trial when a randomized trial is not available. Am J Epidemiol. 2016;183:758–64.
doi: 10.1093/aje/kwv254
pubmed: 26994063
pmcid: 4832051
Staplin N, Herrington WG, Judge PK, Reith CA, Haynes R, Landray MJ, Staplin N, Herrington WG, Judge PK, Reith CA, Haynes R, Landray MJ, Baigent C, Emberson J. Use of causal diagrams to inform the design and interpretation of observational studies: an example from the Study of Heart and Renal Protection (SHARP). CJASN. 2017;12(3):546–52. https://doi.org/10.2215/CJN.02430316 .
doi: 10.2215/CJN.02430316
pubmed: 27553952
Suzuki E, Shinozaki T, Yamamoto E. Causal diagrams: pitfalls and tips. J Epidemiol. 2020;30(4):153–62. https://doi.org/10.2188/jea.JE20190192 .
doi: 10.2188/jea.JE20190192
pubmed: 32009103
pmcid: 7064555
Tennant PWG, Murray EJ, Arnold KF, Berrie L, Fox MP, Gadd SC, Tennant PWG, Murray EJ, Arnold KF, Berrie L, Fox MP, Gadd SC, Harrison WJ, Keeble C, Ranker LR, Textor J, Tomova GD, Gilthorpe MS, Ellison GTH. Use of directed acyclic graphs (DAGs) to identify confounders in applied health research: review and recommendations. Int J Epidemiol. 2021;50(2):620–32. https://doi.org/10.1093/ije/dyaa213 .
doi: 10.1093/ije/dyaa213
pubmed: 33330936
Textor J, van der Zander B, Gilthorpe MS, Liskiewicz M, Ellison GT. Robust causal inference using directed acyclic graphs: the R package “dagitty.” Int J Epidemiol. 2016;45:1887–94.
pubmed: 28089956
Digitale JC, Martin JN, Glymour MM. Tutorial on directed acyclic graphs. J Clin Epidemiol. 2021. Available from: https://www.sciencedirect.com/science/article/pii/S0895435621002407 . Cited 2021 Nov 16.
Kasza J, Wolfe R, Schuster T. Assessing the impact of unmeasured confounding for binary outcomes using confounding functions. Int J Epidemiol. 2017;46(4):1303–11. https://doi.org/10.1093/ije/dyx023 .
doi: 10.1093/ije/dyx023
pubmed: 28338913
Using eye tracking to study variable naming conventions and their effect on code readability. Available from: https://www.diva-portal.org/smash/record.jsf?pid=diva2%3A1337810&dswid=1132 . Cited 2023 Sep 8.
DiLeo C. Naming things. In: DiLeo C, editor. Clean ruby: a guide to crafting better code for rubyists. Berkeley: Apress; 2019. p. 9–32. https://doi.org/10.1007/978-1-4842-5546-9_2 . Cited 2023 Sep 8.
doi: 10.1007/978-1-4842-5546-9_2
Li L, Kleinman K, Gillman MW. A comparison of confounding adjustment methods with an application to early life determinants of childhood obesity. J Dev Orig Health Dis. 2014;5(6):435–47. https://doi.org/10.1017/S2040174414000415 .
doi: 10.1017/S2040174414000415
pubmed: 25171142
pmcid: 4337023
Bareinboim E, Pearl J. Controlling Selection Bias in Causal Inference. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics. PMLR; 2012. p. 100–8. Available from: https://proceedings.mlr.press/v22/bareinboim12.html . Cited 2023 Jul 24.
Kang H. The prevention and handling of the missing data. Korean J Anesthesiol. 2013;64(5):402–6. https://doi.org/10.4097/kjae.2013.64.5.402 .
doi: 10.4097/kjae.2013.64.5.402
pubmed: 23741561
pmcid: 3668100
Haukoos JS, Newgard CD. Advanced statistics: missing data in clinical research—Part 1: an introduction and conceptual framework. Acad Emerg Med. 2007;14(7):662–8. https://doi.org/10.1111/j.1553-2712.2007.tb01855.x .
doi: 10.1111/j.1553-2712.2007.tb01855.x
pubmed: 17538078
Rubin DB, Little RJA. Statistical analysis with missing data, third edition. Hoboken: Wiley; 2019.
Cai L, Zhu Y. The challenges of data quality and data quality assessment in the big data era. Data Sci Jour. 2015;14:2.
doi: 10.5334/dsj-2015-002
Bashari Rad B, Bhatti H, Ahmadi M. An introduction to docker and analysis of its performance. IJCSNS Int J Comput Sci Netw Secur. 2017;173:8.
Boettiger C. An introduction to docker for reproducible research. SIGOPS Oper Syst Rev. 2015;49(1):71–9. https://doi.org/10.1145/2723872.2723882 .
doi: 10.1145/2723872.2723882
Piccolo SR, Frampton MB. Tools and techniques for computational reproducibility. GigaScience. 2016;5(1):30. https://doi.org/10.1186/s13742-016-0135-4 .
doi: 10.1186/s13742-016-0135-4
pubmed: 27401684
pmcid: 4940747
Raasveldt M, Mühleisen H. DuckDB: an embeddable analytical database. Proceedings of the 2019 international conference on management of data. New York: association for computing machinery; 2019;1981–4. https://doi.org/10.1145/3299869.3320212 . Cited 2023 May 9.
Meurisse M, Van Goethem N, Estupiñán-Romero F, González-Galindo J, Royo-Sierra S, Martínez-Lizaga N et al. BY-COVID - WP5 - baseline use case: COVID-19 vaccine effectiveness assessment - study protocol. 2023. Available from: https://zenodo.org/record/7560731 . Cited 2023 Jan 31.
Estupiñán-Romero F, Van Goethem N, Meurisse M, González-Galindo J, Bernal-Delgado E. BY-COVID - WP5 - Baseline Use Case: SARS-CoV-2 vaccine effectiveness assessment - Common Data Model Specification. 2023. Available from: https://zenodo.org/record/7572373 . Cited 2023 Feb 22.
Faraglia D. Welcome to Faker’s documentation! — Faker 18.13.0 documentation. Available from: https://faker.readthedocs.io/en/master/ . Cited 2023 Sep 12.
Meurisse M, Estupiñán-Romero F, Van Goethem N, González-Galindo J, Royo-Sierra S, Bernal-Delgado E. BY-COVID - WP5 - Baseline Use Case: SARS-CoV-2 vaccine effectiveness assessment. BY-COVID Project; 2023. https://doi.org/10.5281/zenodo.6913045 . Cited 2023 Apr 26.
ydata-profiling. YData. 2023. Available from: https://github.com/ydataai/ydata-profiling . Cited 2023 Sep 13.
Welcome - YData profiling. Available from: https://docs.profiling.ydata.ai/4.5/ . Cited 2023 Sep 13.
Martínez-Lizaga N, Meurisse M, Estupiñan-Romero F, Goethem NV, Bernal-Delgado E. BY-COVID - WP5 - baseline use case: COVID-19 vaccine effectiveness assessment - data management plan. 2023. Available from: https://zenodo.org/record/7625784 . Cited 2023 May 2.
Sefton P, Ó Carragáin E, Soiland-Reyes S, Corcho O, Garijo D, Palma R et al. RO-crate metadata specification 1.1.3. 2023. Available from: https://zenodo.org/record/7867028 . Cited 2023 May 3.
Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten J-W, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJG, Groth P, Goble C, Grethe JS, Heringa J, ’t Hoen PAC, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone S-A, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Mons B. The FAIR guiding principles for scientific data management and stewardship. Sci Data. 2016;3(1):160018. https://doi.org/10.1038/sdata.2016.18 .
doi: 10.1038/sdata.2016.18
pubmed: 26978244
pmcid: 4792175
Lee PH, Burstyn I. Identification of confounder in epidemiologic data contaminated by measurement error in covariates. BMC Med Res Methodol. 2016;16(1):54. https://doi.org/10.1186/s12874-016-0159-6 .
doi: 10.1186/s12874-016-0159-6
pubmed: 27193095
pmcid: 4870765
Andrade C. Internal, external, and ecological validity in research design, conduct, and evaluation. Indian J Psychol Med. 2018;40(5):498–9. https://doi.org/10.4103/IJPSYM.IJPSYM_334_18 .
doi: 10.4103/IJPSYM.IJPSYM_334_18
pubmed: 30275631
pmcid: 6149308
Findley MG, Kikuta K, Denly M. External validity. Annu Rev Polit Sci. 2021;24(1):365–93. https://doi.org/10.1146/annurev-polisci-041719-102556 .
doi: 10.1146/annurev-polisci-041719-102556
Grimes DA, Schulz KF. Bias and causal associations in observational research. Lancet. 2002;359(9302):248–52. https://doi.org/10.1016/S0140-6736(02)07451-2 .
doi: 10.1016/S0140-6736(02)07451-2
pubmed: 11812579
Quarto. Available from: https://quarto.org/ . Cited 2023 Jun 14.
Dube K, Gallagher T. Approach and method for generating realistic synthetic electronic healthcare records for secondary use. In: Gibbons J, MacCaull W, editors. Foundations of health information engineering and systems. Berlin, Heidelberg: Springer; 2014. p. 69–86.
doi: 10.1007/978-3-642-53956-5_6
Al-Jundi A, Sakka S. Protocol writing in clinical research. J Clin Diagn Res. 2016;10:ZE10-13.
pubmed: 28050522
pmcid: 5198475
OpenAIRE. Argos. Available from: https://argos.openaire.eu/splash/index.html . Cited 2023 May 9.
Papadopoulou E. ARGOS: plan and follow your data. 2021. Available from: https://www.um.edu.mt/library/oar/bitstream/123456789/70269/1/ARGOS_plan_and_follow_your_data_2021.pdf .
Margariti V, Stamati T, Anagnostopoulos D, Nikolaidou M, Papastilianou A. A holistic model for assessing organizational interoperability in public administration. Govern Inform Q. 2022;39:101712.
doi: 10.1016/j.giq.2022.101712
Weichhart G. Learning for sustainable organisational interoperability. IFAC Proc Vol. 2014;47(3):4280–5. https://doi.org/10.3182/20140824-6-ZA-1003.01590 .
doi: 10.3182/20140824-6-ZA-1003.01590
de Mello BH, Rigo SJ, da Costa CA, da Rosa Righi R, Donida B, Bez MR, de Mello BH, Rigo SJ, da Costa CA, da Rosa Righi R, Donida B, Bez MR, Schunke LC. Semantic interoperability in health records standards: a systematic literature review. Health Technol. 2022;12(2):255–72. https://doi.org/10.1007/s12553-022-00639-w .
doi: 10.1007/s12553-022-00639-w
Gillespie C, Lovelace R. Efficient R programming: a practical guide to smarter programming. 1st ed. O’Reilly Media, Inc.; 2016. Available from: https://www.oreilly.com/library/view/efficient-r-programming/9781491950777/ .
Lutz M. Learning python: powerful object-oriented programming. 5th ed. Sebastopol: O’Reilly Media, Inc.; 2013.