Towards self-describing and FAIR bulk formats for biomedical data.
Journal
PLoS computational biology
ISSN: 1553-7358
Titre abrégé: PLoS Comput Biol
Pays: United States
ID NLM: 101238922
Informations de publication
Date de publication:
03 2023
03 2023
Historique:
received:
24
07
2022
accepted:
13
02
2023
revised:
23
03
2023
pubmed:
14
3
2023
medline:
28
3
2023
entrez:
13
3
2023
Statut:
epublish
Résumé
We introduce a self-describing serialized format for bulk biomedical data called the Portable Format for Biomedical (PFB) data. The Portable Format for Biomedical data is based upon Avro and encapsulates a data model, a data dictionary, the data itself, and pointers to third party controlled vocabularies. In general, each data element in the data dictionary is associated with a third party controlled vocabulary to make it easier for applications to harmonize two or more PFB files. We also introduce an open source software development kit (SDK) called PyPFB for creating, exploring and modifying PFB files. We describe experimental studies showing the performance improvements when importing and exporting bulk biomedical data in the PFB format versus using JSON and SQL formats.
Identifiants
pubmed: 36913405
doi: 10.1371/journal.pcbi.1010944
pii: PCOMPBIOL-D-22-01126
pmc: PMC10035862
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
e1010944Subventions
Organisme : NHLBI NIH HHS
ID : U2C HL138346
Pays : United States
Informations de copyright
Copyright: © 2023 Lukowski et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Déclaration de conflit d'intérêts
The authors have declared that no competing interests exist.
Références
J Biomed Inform. 2009 Jun;42(3):530-9
pubmed: 19475726
Cell Genom. 2021 Nov 10;1(2):
pubmed: 35072136
Stud Health Technol Inform. 2006;121:279-90
pubmed: 17095826
Nat Biotechnol. 2022 Jun;40(6):817-820
pubmed: 35705716
Cancer J. 2018 May/Jun;24(3):126-130
pubmed: 29794537
Sci Data. 2016 Mar 15;3:160018
pubmed: 26978244
J Am Med Inform Assoc. 2016 Sep;23(5):899-908
pubmed: 26911829
Nucleic Acids Res. 2017 Jan 4;45(D1):D865-D876
pubmed: 27899602
Nat Genet. 2017 May 26;49(6):816-819
pubmed: 28546571
Nat Genet. 2021 Mar;53(3):257-262
pubmed: 33619384
Nucleic Acids Res. 2012 Jan;40(Database issue):D940-6
pubmed: 22080554
Trends Genet. 2019 Mar;35(3):223-234
pubmed: 30691868