Comparative analysis of three data mining techniques in diagnosis of lung cancer

Lung Cancer

Eur J Cancer Prev. 2020 Aug 27. doi: 10.1097/CEJ.0000000000000598. Online ahead of print.


There is a lot of abnormal information in the development of lung cancer, and how to extract useful knowledge is urgent from massive information. Data mining technology has become a popular tool for medical classification and prediction. However, each technology has its advantage and disadvantage, and several data mining methods will be applied to conduct the in-depth analysis step by step. And the prediction results of different models are compared. A total of 180 lung cancer patients and 243

lung benign individuals were collected from the First Affiliated Hospital of Zhengzhou University from October 2014 to March 2016, and the prediction models based on epidemiological data, clinical features and tumor markers were developed by artificial neural network (ANN), decision tree C5.0 and support vector machine (SVM). The results showed that there were significant differences between the lung cancer group and the lung benign group in terms of seven tumor markers and 10 epidemiological and clinical indicators. The accuracy rates of ANN, C5.0 and SVM were 76.47, 89.92 and 85.71%, respectively. The results of receiver operating characteristic curve (ROC) curve revealed the area under the ROC curve (AUC) of ANN was 0.811 (0.770-0.847), the AUC of C5.0 was 0.897 (0.864-0.924) and the AUC of SVM was 0.878 (0.843-0.908). It was shown that the decision tree C5.0 model has the least error rate and highest accuracy, and it could be used to diagnose lung cancer.