Whole genome and exome sequencing reference datasets from a multi-center and cross-platform benchmark study.
Journal
Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192
Informations de publication
Date de publication:
09 11 2021
09 11 2021
Historique:
received:
23
03
2021
accepted:
11
10
2021
entrez:
10
11
2021
pubmed:
11
11
2021
medline:
15
12
2021
Statut:
epublish
Résumé
With the rapid advancement of sequencing technologies, next generation sequencing (NGS) analysis has been widely applied in cancer genomics research. More recently, NGS has been adopted in clinical oncology to advance personalized medicine. Clinical applications of precision oncology require accurate tests that can distinguish tumor-specific mutations from artifacts introduced during NGS processes or data analysis. Therefore, there is an urgent need to develop best practices in cancer mutation detection using NGS and the need for standard reference data sets for systematically measuring accuracy and reproducibility across platforms and methods. Within the SEQC2 consortium context, we established paired tumor-normal reference samples and generated whole-genome (WGS) and whole-exome sequencing (WES) data using sixteen library protocols, seven sequencing platforms at six different centers. We systematically interrogated somatic mutations in the reference samples to identify factors affecting detection reproducibility and accuracy in cancer genomes. These large cross-platform/site WGS and WES datasets using well-characterized reference samples will represent a powerful resource for benchmarking NGS technologies, bioinformatics pipelines, and for the cancer genomics studies.
Identifiants
pubmed: 34753956
doi: 10.1038/s41597-021-01077-5
pii: 10.1038/s41597-021-01077-5
pmc: PMC8578599
doi:
Types de publication
Dataset
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
296Subventions
Organisme : EC | European Regional Development Fund (Europski Fond za Regionalni Razvoj)
ID : Project No. 2014-2020.4.01.15-0012
Organisme : NCI NIH HHS
ID : HHSN261201500003C
Pays : United States
Organisme : Vetenskapsrådet (Swedish Research Council)
ID : 2017-00630, 2019-01976
Organisme : NIH HHS
ID : S10 OD019960
Pays : United States
Organisme : NCI NIH HHS
ID : HHSN261201500003I
Pays : United States
Organisme : NCI NIH HHS
ID : HHSN261201800001C
Pays : United States
Informations de copyright
© 2021. The Author(s).
Références
Bioinformatics. 2012 Feb 1;28(3):311-7
pubmed: 22155872
Genome Biol. 2009;10(3):R25
pubmed: 19261174
Nat Biotechnol. 2021 Sep;39(9):1141-1150
pubmed: 34504346
Clin Chem. 2015 Jan;61(1):64-71
pubmed: 25421801
J Pers Med. 2018 Sep 17;8(3):
pubmed: 30227640
Nat Biotechnol. 2021 Sep;39(9):1151-1160
pubmed: 34504347
Nat Biotechnol. 2009 Feb;27(2):182-9
pubmed: 19182786
Bioinformatics. 2016 Oct 1;32(19):3047-8
pubmed: 27312411
PLoS Comput Biol. 2020 Feb 13;16(2):e1007603
pubmed: 32053599
Gigascience. 2017 Nov 1;6(11):1-6
pubmed: 29048539
Bioinformatics. 2014 Aug 1;30(15):2114-20
pubmed: 24695404
Bioinformatics. 2016 Jan 15;32(2):292-4
pubmed: 26428292
Nucleic Acids Res. 2013 Apr 1;41(6):e67
pubmed: 23303777
Science. 2017 Feb 17;355(6326):752-756
pubmed: 28209900
Genome Res. 2015 Oct;25(10):1570-80
pubmed: 26286554
Commun Biol. 2018 Mar 22;1:20
pubmed: 30271907
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Bioinformatics. 2012 Jul 15;28(14):1811-7
pubmed: 22581179