Schema Matching and Data Integration with Consistent Naming on Protein Crystallization Screens.


IEEE/ACM transactions on computational biology and bioinformatics
ISSN: 1557-9964
Titre abrégé: IEEE/ACM Trans Comput Biol Bioinform
Pays: United States
ID NLM: 101196755

Informations de publication

Date de publication:
pubmed: 30 4 2019
medline: 15 12 2021
entrez: 30 4 2019
Statut: ppublish


The data representation as well as naming conventions used in commercial screen files by different companies make the automated analysis of crystallization experiments difficult and time-consuming. In order to reduce the human effort required to deal with this problem, we present an approach for computationally matching elements of two schemas using linguistic schema matching methods and then transform the input screen format to another format with naming defined by the user. This approach is tested on a number of commercial screens from different companies and the results of the experiments showed an overall accuracy of 97 percent on schema matching which is significantly better than the other two matchers we tested. Our tool enables mapping a screen file in one format to another format preferred by the expert using their preferred chemical names.


pubmed: 31034419
doi: 10.1109/TCBB.2019.2913368
pmc: PMC7874513
mid: NIHMS1653276

Substances chimiques

Proteins 0

Types de publication

Journal Article Research Support, N.I.H., Extramural



Sous-ensembles de citation





Organisme : NIGMS NIH HHS
ID : R41 GM116283
Pays : United States
Organisme : NIGMS NIH HHS
ID : R42 GM116283
Pays : United States


Drug Discov Today. 2016 May;21(5):819-25
pubmed: 27032894
IEEE Trans Nanobioscience. 2016 Mar;15(2):101-12
pubmed: 26955046
IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):2074-2085
pubmed: 31034419
Cryst Growth Des. 2013 Jul 3;13(7):2728-2736
pubmed: 24532991
J Cheminform. 2012 Dec 13;4(1):35
pubmed: 23237381


Articles similaires

Databases, Protein Protein Domains Protein Folding Proteins Deep Learning
Humans Colorectal Neoplasms Biomarkers, Tumor Prognosis Gene Expression Regulation, Neoplastic
Metabolic Networks and Pathways Saccharomyces cerevisiae Computational Biology Synthetic Biology Computer Simulation

Classifications MeSH