Wu Y, et al. Front Oncol 2020.
Background: Lymph node metastasis (LNM) is difficult to precisely predict before surgery in patients with early-T-stage non-small cell lung cancer (NSCLC). This study aimed to develop machine learning (ML)-based predictive models for LNM. Methods: Clinical characteristics and imaging features were retrospectively collected from 1,102 NSCLC ≤ 2 cm patients. A total of 23 variables were included to develop predictive models for LNM by multiple ML algorithms. The models were evaluated by the receiver operating characteristic (ROC) curve for predictive performance and decision curve analysis (DCA) for clinical values. A feature selection approach was used to identify optimal predictive factors. Results: The areas under the ROC curve (AUCs) of the 8 models ranged from 0.784 to 0.899. Some ML-based models performed better than models using conventional statistical methods in both ROC curves and decision curves. The random forest classifier (RFC) model with 9 variables introduced was identified as the best predictive model. The feature selection indicated the top five predictors were tumor size, imaging density, carcinoembryonic antigen (CEA), maximal standardized uptake value (SUVmax), and age. Conclusions: By incorporating clinical characteristics and radiographical features, it is feasible to develop ML-based models for the preoperative prediction of LNM in early-T-stage NSCLC, and the RFC model performed best.