Decorrelated forward regression for high-dimensional data analysis

发布时间:2024-10-18 10:11 阅读:
A A A
报告时间:
报告地点:
报告人:

Forward regression (FR) is a crucial methodology for automatically identifying important predictors from a large pool of potential covariates. While forward selection techniques achieve screening consistency in contexts with moderate predictor correlation, this property gradually becomes invalid when dealing with substantially correlated variables—especially in high-dimensional datasets where strong correlations exist among predictors. This challenge is not unique to forward selection methods and is encountered by other model selection approaches as well. To address these challenges, we introduce a novel decorrelated forward (DF) selection framework for generalized mean regression models, including prevalent models, such as linear, logistic, Poisson, and quasi likelihood. The DF selection framework stands out because of its ability to convert generalized mean regression models into linear ones, thus providing a clear interpretation of the forward selection process. It also offers a closed-form expression for forward iteration, to improve practical applicability and efficiency. Theoretically, we establish the screening consistency of DF selection and determine the upper bound of the selected submodel's size. To reduce computational burden, we develop a thresholding DF algorithm that provides a stopping rule for the forward-searching process. Simulations and real data applications show the outstanding performance of our method compared with that of some existing model selection methods.