Schema Matching and Data Integration with Consistent Naming on Protein Crystallization Screens.
Journal
IEEE/ACM transactions on computational biology and bioinformatics
ISSN: 1557-9964
Titre abrégé: IEEE/ACM Trans Comput Biol Bioinform
Pays: United States
ID NLM: 101196755
Informations de publication
Date de publication:
Historique:
pubmed:
30
4
2019
medline:
15
12
2021
entrez:
30
4
2019
Statut:
ppublish
Résumé
The data representation as well as naming conventions used in commercial screen files by different companies make the automated analysis of crystallization experiments difficult and time-consuming. In order to reduce the human effort required to deal with this problem, we present an approach for computationally matching elements of two schemas using linguistic schema matching methods and then transform the input screen format to another format with naming defined by the user. This approach is tested on a number of commercial screens from different companies and the results of the experiments showed an overall accuracy of 97 percent on schema matching which is significantly better than the other two matchers we tested. Our tool enables mapping a screen file in one format to another format preferred by the expert using their preferred chemical names.
Identifiants
pubmed: 31034419
doi: 10.1109/TCBB.2019.2913368
pmc: PMC7874513
mid: NIHMS1653276
doi:
Substances chimiques
Proteins
0
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
2074-2085Subventions
Organisme : NIGMS NIH HHS
ID : R41 GM116283
Pays : United States
Organisme : NIGMS NIH HHS
ID : R42 GM116283
Pays : United States
Références
Drug Discov Today. 2016 May;21(5):819-25
pubmed: 27032894
IEEE Trans Nanobioscience. 2016 Mar;15(2):101-12
pubmed: 26955046
IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):2074-2085
pubmed: 31034419
Cryst Growth Des. 2013 Jul 3;13(7):2728-2736
pubmed: 24532991
J Cheminform. 2012 Dec 13;4(1):35
pubmed: 23237381