Whole exome sequencing and machine learning germline analysis of individuals presenting with extreme phenotypes of high and low risk of developing tobacco-associated lung adenocarcinoma.
Cancer risk
Extreme phenotypes
Lung adenocarcinoma
Tobacco
Whole exome sequencing
Journal
EBioMedicine
ISSN: 2352-3964
Titre abrégé: EBioMedicine
Pays: Netherlands
ID NLM: 101647039
Informations de publication
Date de publication:
13 Mar 2024
13 Mar 2024
Historique:
received:
29
09
2023
revised:
15
02
2024
accepted:
22
02
2024
medline:
15
3
2024
pubmed:
15
3
2024
entrez:
14
3
2024
Statut:
aheadofprint
Résumé
Tobacco is the main risk factor for developing lung cancer. Yet, while some heavy smokers develop lung cancer at a young age, other heavy smokers never develop it, even at an advanced age, suggesting a remarkable variability in the individual susceptibility to the carcinogenic effects of tobacco. We characterized the germline profile of subjects presenting these extreme phenotypes with Whole Exome Sequencing (WES) and Machine Learning (ML). We sequenced germline DNA from heavy smokers who either developed lung adenocarcinoma at an early age (extreme cases) or who did not develop lung cancer at an advanced age (extreme controls), selected from databases including over 6600 subjects. We selected individual coding genetic variants and variant-rich genes showing a significantly different distribution between extreme cases and controls. We validated the results from our discovery cohort, in which we analysed by WES extreme cases and controls presenting similar phenotypes. We developed ML models using both cohorts. Mean age for extreme cases and controls was 50.7 and 79.1 years respectively, and mean tobacco consumption was 34.6 and 62.3 pack-years. We validated 16 individual variants and 33 variant-rich genes. The gene harbouring the most validated variants was HLA-A in extreme controls (4 variants in the discovery cohort, p = 3.46E-07; and 4 in the validation cohort, p = 1.67E-06). We trained ML models using as input the 16 individual variants in the discovery cohort and tested them on the validation cohort, obtaining an accuracy of 76.5% and an AUC-ROC of 83.6%. Functions of validated genes included candidate oncogenes, tumour-suppressors, DNA repair, HLA-mediated antigen presentation and regulation of proliferation, apoptosis, inflammation and immune response. Individuals presenting extreme phenotypes of high and low risk of developing tobacco-associated lung adenocarcinoma show different germline profiles. Our strategy may allow the identification of high-risk subjects and the development of new therapeutic approaches. See a detailed list of funding bodies in the Acknowledgements section at the end of the manuscript.
Sections du résumé
BACKGROUND
BACKGROUND
Tobacco is the main risk factor for developing lung cancer. Yet, while some heavy smokers develop lung cancer at a young age, other heavy smokers never develop it, even at an advanced age, suggesting a remarkable variability in the individual susceptibility to the carcinogenic effects of tobacco. We characterized the germline profile of subjects presenting these extreme phenotypes with Whole Exome Sequencing (WES) and Machine Learning (ML).
METHODS
METHODS
We sequenced germline DNA from heavy smokers who either developed lung adenocarcinoma at an early age (extreme cases) or who did not develop lung cancer at an advanced age (extreme controls), selected from databases including over 6600 subjects. We selected individual coding genetic variants and variant-rich genes showing a significantly different distribution between extreme cases and controls. We validated the results from our discovery cohort, in which we analysed by WES extreme cases and controls presenting similar phenotypes. We developed ML models using both cohorts.
FINDINGS
RESULTS
Mean age for extreme cases and controls was 50.7 and 79.1 years respectively, and mean tobacco consumption was 34.6 and 62.3 pack-years. We validated 16 individual variants and 33 variant-rich genes. The gene harbouring the most validated variants was HLA-A in extreme controls (4 variants in the discovery cohort, p = 3.46E-07; and 4 in the validation cohort, p = 1.67E-06). We trained ML models using as input the 16 individual variants in the discovery cohort and tested them on the validation cohort, obtaining an accuracy of 76.5% and an AUC-ROC of 83.6%. Functions of validated genes included candidate oncogenes, tumour-suppressors, DNA repair, HLA-mediated antigen presentation and regulation of proliferation, apoptosis, inflammation and immune response.
INTERPRETATION
CONCLUSIONS
Individuals presenting extreme phenotypes of high and low risk of developing tobacco-associated lung adenocarcinoma show different germline profiles. Our strategy may allow the identification of high-risk subjects and the development of new therapeutic approaches.
FUNDING
BACKGROUND
See a detailed list of funding bodies in the Acknowledgements section at the end of the manuscript.
Identifiants
pubmed: 38484556
pii: S2352-3964(24)00083-5
doi: 10.1016/j.ebiom.2024.105048
pii:
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
105048Informations de copyright
Copyright © 2024 The Authors. Published by Elsevier B.V. All rights reserved.
Déclaration de conflit d'intérêts
Declaration of interests MFS reports grants from Bristol Myers Squibb and Roche, as well as honoraria from lectures and advisory from MSD, BMS and Numab; and travel support from Roche, Astra-Zeneca and BMS. IM reports receiving commercial research grants from AstraZeneca, BMS, Highlight Therapeutics, Alligator, Pfizer Genmab and Roche; has received speakers bureau honoraria from MSD; and is a consultant or advisory board member for BMS, Roche, AstraZeneca, Genmab, Pharmamar, F-Star, Bioncotech, Bayer, Numab, Pieris, Gossamer, Alligator and Merck Serono. JJZ declares consultancy and advisory Board from American Heart Technologies and Median Technologies. GB declares honoraria from lectures and advisory from General Electric, Siemens Healthineers; educational activities for General Electric, Siemens Healthineers, Bayer; and institutional research grants from Siemens Healthineers, Guerbet. LMS reports a role as scientific advisor for Sabartech, Serum, Astra Zeneca, Roche, MSD, and Median technologies, and has received honoraria as a speaker from Astra Zeneca, GSK, Roche, Menarini, and Chiesi. LMM declares research grants from Astra-Zeneca, BMS, Serum Detect Inc. and Pharmamar; speaker fee from Astra Zeneca; participation in advisory boards from Serum Detect Inc; and has a Licenced patent co-holder from AMADIX. JLPG declares research grants and support from Astellas, Amgen, BMS, MSD, Novartis, Roche, Seattle Genetics; participates in advisory boards for Astellas, BMS, Ipsen, MSD, Roche, Seattle Genetics; and travel support from BMS, MSD, Roche.