Marginal variable screening for survival endpoints.


Journal

Biometrical journal. Biometrische Zeitschrift
ISSN: 1521-4036
Titre abrégé: Biom J
Pays: Germany
ID NLM: 7708048

Informations de publication

Date de publication:
05 2020
Historique:
received: 31 08 2018
revised: 23 05 2019
accepted: 04 06 2019
pubmed: 27 8 2019
medline: 2 6 2021
entrez: 27 8 2019
Statut: ppublish

Résumé

When performing survival analysis in very high dimensions, it is often required to reduce the number of covariates using preliminary screening. During the last years, a large number of variable screening methods for the survival context have been developed. However, guidance is missing for choosing an appropriate method in practice. The aim of this work is to provide an overview of marginal variable screening methods for survival and develop recommendations for their use. For this purpose, a literature review is given, offering a comprehensive and structured introduction to the topic. In addition, a novel screening procedure based on distance correlation and martingale residuals is proposed, which is particularly useful in detecting nonmonotone associations. For evaluating the performance of the discussed approaches, a simulation study is conducted, comparing the true positive rates of competing variable screening methods in different settings. A real data example on mantle cell lymphoma is provided.

Identifiants

pubmed: 31448463
doi: 10.1002/bimj.201800269
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't Review

Langues

eng

Sous-ensembles de citation

IM

Pagination

610-626

Subventions

Organisme : Federal Ministry of Education and Research
ID : 01ER1505a
Pays : International
Organisme : Federal Ministry of Education and Research
ID : 01ER1505b
Pays : International

Informations de copyright

© 2019 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

Références

Ambler, G., & Benner, A. (2015). mfp: Multivariable fractional polynomials. R package version 1.5.2. https://CRAN.R-project.org/package=mfp
Capper, D., Jones, D. T. W., Sill, M., Hovestadt, V., Schrimpf, D., Sturm, D., … Pfister, S. M. (2018). DNA methylation-based classification of central nervous system tumours. Nature, 555, 469-474.
Cattell, R. B. (1966). Handbook of multivariate experimental psychology. Rand McNally psychology series. Chicago, IL: Rand McNally.
Chen, H.-Y., Yu, S.-L., Chen, C.-H., Chang, G.-C., Chen, C. Y., Yuan, A., … Yang, P. C. (2007). A five-gene signature and clinical outcome in non-small-cell lung cancer. New England Journal of Medicine, 356, 11-20.
Chen, X., Chen, X., & Wang, H. (2018). Robust feature screening for ultra-high dimensional right censored data via distance correlation. Computational Statistics & Data Analysis, 119, 118-138.
Dvinge, H., Git, A., Gräf, S., Salmon-Divon, M., Curtis, C., Sottoriva, A., … Caldas, C. (2013). The shaping and functional consequences of the microRNA landscape in breast cancer. Nature, 497, 378-382.
Edelmann, D., Fokianos, K., & Pitsillou, M. (2018). An updated literature review of distance correlation and its applications to time series. International Statistical Review, 87, 237-262.
Fan, J., Feng, Y., & Wu, Y. (2010). High-dimensional variable selection for Cox's proportional hazards model. In Berger, J. O., Cai, T. T., & Johnstone, I. M. eds., Borrowing strength: Theory powering applications-A Festschrift for Lawrence D. Brown (pp. 70-86). Institute of Mathematical Statistics, Beachwood, Ohio, USA.
Fan, J., & Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 849-911.
Farcomeni, A., & Viviani, S. (2011). Robust estimation for the Cox regression model based on trimming. Biometrical Journal, 53, 956-973.
Franklin, J. M., Schneeweiss, S., Polinski, J. M., & Rassen, J. A. (2014). Plasmode simulation for the evaluation of pharmacoepidemiologic methods in complex healthcare databases. Computational Statistics & Data Analysis, 72, 219-226.
Gorst-Rasmussen, A., & Scheike, T. (2013). Independent screening for single-index hazard rate models with ultrahigh dimensional features. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75, 217-245.
Gorst-Rasmussen, A., & Scheike, T. H. (2012). Coordinate descent methods for the penalized semiparametric additive hazards model. Journal of Statistical Software, 47, 1-17.
Harrell, F., Lee, K., & Mark, D. (1996). Tutorial in biostatistics: Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy and measuring and reducing errors. Statistics in Medicine, 15, 361-387.
He, X., Wang, L., & Hong, H. G. (2013). Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. The Annals of Statistics, 41, 342-369.
Hong, H. G., Chen, X., Christiani, D. C., & Li, Y. (2018a). Integrated powered density: Screening ultrahigh dimensional covariates with survival outcomes. Biometrics, 74, 421-429.
Hong, H. G., Kang, J., & Li, Y. (2018b). Conditional screening for ultra-high dimensional covariates with survival outcomes. Lifetime Data Analysis, 24, 45-71.
Huang, C.-C., Tu, S.-H., Lien, H.-H., Jeng, J.-Y., Huang, C. S., Huang, C. J., … Chuang, E. Y. (2013). Concurrent gene signatures for Han Chinese breast cancers. PLOS ONE, 8, e76421.
Huang, T.-J., McKeague, I. W., & Qian, M. (2019). Marginal screening for high-dimensional predictors of survival outcomes. Statistica Sinica, to appear.
Huo, X., & Székely, G. J. (2016). Fast computing for distance covariance. Technometrics, 58, 435-447.
Irizarry, R. A., Hobbs, B., Collin, F., & Beazer-Barclay, Y. D. O. (2003). Exploration, normalization and summaries of high density oligonucleotide array probe level data. Biostatistics, 4, 249-264.
Koul, H., Susarla, V. V., & Van Ryzin, J. (1981). Regression analysis with randomly right-censored data. The Annals of Statistics, 9, 1276-1288.
Li, J., Zheng, Q., Peng, L., & Huang, Z. (2016). Survival impact index and ultrahigh-dimensional model-free screening with survival outcomes. Biometrics, 72, 1145-1154.
Li, R., Zhong, W., & Zhu, L. (2012). Feature screening via distance correlation learning. Journal of the American Statistical Association, 107, 1129-1139.
Lin, D., & Ying, Z. (1994). Semiparametric analysis of the additive risk model. Biometrika, 81, 61-71.
Mehta, T. S., Zakharkin, S. O., Gadbury, G. L., & Allison, D. B. (2006). Epistemological issues in omics and high-dimensional biology: Give the people what they want. Physiological Genomics, 28, 24-32.
Ng, S. W. K., Mitchell, A., Kennedy, J. A., Chen, W., McLeod, J., Ibrahimova, N., … Wang, J. (2016). A 17-gene stemness score for rapid determination of risk in acute leukaemia. Nature, 540, 433-437.
Pan, W., Wang, X., Xiao, W., & Zhu, H. (2018). A generic sure independence screening procedure. Journal of the American Statistical Association, 114, 928-937.
Rosenwald, A., Wright, G., Wiestner, A., Chan, W. C., Connors, J. M., Campo, E., & Staudt, L. M. (2003). The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma. Cancer Cell, 3, 185-197.
Royston, P., & Altman, D. G. (1994). Regression using fractional polynomials of continuous covariates: Parsimonious parametric modelling. Journal of the Royal Statistical Society. Series C (Applied Statistics), 43, 429-467.
Saldana, D. F., & Feng, Y. (2018). SIS: An R package for sure independence screening in ultrahigh-dimensional statistical models. Journal of Statistical Software, 83, 1-25.
Shen, H., Chai, H., Li, M., Zhou, Z., Liang, Y., Yang, Z., … Zhang, B. (2018). Robust sparse accelerated failure time model for survival analysis. Technology and Health Care, 26, 55-63.
Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2011). Regularization paths for Cox's proportional hazards model via coordinate descent. Journal of Statistical Software, 39, 1-13.
Song, R., Lu, W., Ma, S., & Jessie Jeng, X. (2014). Censored rank independence screening for high-dimensional survival data. Biometrika, 101, 799-814.
Székely, G. J., & Rizzo, M. L. (2009). Brownian distance covariance. The Annals of Applied Statistics, 3, 1236-1265.
Székely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35, 2769-2794.
Therneau, T. M., Grambsch, P. M., & Fleming, T. R. (1990). Martingale-based residuals for survival models. Biometrika, 77, 147-160.
Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 58, 267-288.
Tibshirani, R. J. (2009). Univariate shrinkage in the Cox model for high dimensional data. Statistical Applications in Genetics and Molecular Biology, 8, 1-18.
Wang, X., Pan, W., Zhang, H., Zhu, H., Tian, Y., Xiao, W., … Zhu, J. (2018a). Ball: Statistical inference and sure independence screening via ball statistics. R package version 1.2.0. https://CRAN.R-project.org/package=Ball
Wang, Z., Teng, D., Li, Y., Hu, Z., Liu, L., & Zheng, H. (2018b). A six-gene-based prognostic signature for hepatocellular carcinoma overall survival prediction. Life Sciences, 203, 83-91.
Wei, T., & Simko, V. (2017). Visualization of a Correlation Matrix. version 0.84. https://CRAN.R-project.org/package=corrplot
Weinhold, N., Kirn, D., Seckinger, A., Hielscher, T., Granzow, M., Bertsch, U., … Jauch, A. (2016). Concomitant gain of 1q21 and MYC translocation define a poor prognostic subgroup of hyperdiploid multiple myeloma. Haematologica, 101, e116-e119.
Xia, Y., Yang, C., Hu, N., Yang, Z., He, X., Li, T., & Zhang, L. (2017). Exploring the key genes and signaling transduction pathways related to the survival time of glioblastoma multiforme patients by a novel survival analysis model. BMC Genomics, 18, 950.
Yan, X., Tang, N., & Zhao, X. (2017). The Spearman rank correlation screening for ultrahigh dimensional censored data. arXiv preprint arXiv:1702.02708.
Zhang, J., Liu, Y., & Wu, Y. (2017). Correlation rank screening for ultrahigh-dimensional survival data. Computational Statistics & Data Analysis, 108, 121-132.
Zhao, K., Li, Z., & Tian, H. (2018). Twenty-gene-based prognostic model predicts lung adenocarcinoma survival. OncoTargets and Therapy, 11, 3415-3424.
Zhao, S. D., & Li, Y. (2012). Principled sure independence screening for Cox models with ultra-high-dimensional covariates. Journal of Multivariate Analysis, 105, 397-411.
Zhao, S. D., & Li, Y. (2014). Score test variable screening. Biometrics, 70, 862-871.
Zhong, W., Zhu, L., Li, R., & Cui, H. (2016). Regularized quantile regression and robust feature screening for single index models. Statistica Sinica, 26, 69-95.
Zhou, T., & Zhu, L. (2017). Model-free feature screening for ultrahigh dimensional censored regression. Statistics and Computing, 27, 947-961.
Zhu, L.-P., Li, L., Li, R., & Zhu, L.-X. (2011). Model-free feature screening for ultrahigh-dimensional data. Journal of the American Statistical Association, 106, 1464-1475.

Auteurs

Dominic Edelmann (D)

Division of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg, Germany.

Manuela Hummel (M)

Division of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg, Germany.

Thomas Hielscher (T)

Division of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg, Germany.

Maral Saadati (M)

Division of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg, Germany.

Axel Benner (A)

Division of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg, Germany.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH