Features processing for random forest optimization in lung nodule localization

•Random Forest is trained with features extracted from pixels of lung CT images.•Used segmentation as auxiliary step to improve results of region properties feature.•Reduced false positive rate to reach lung nodule localization with optimized forest.•Accuracy improved when compare results to our pre...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Expert systems with applications 2022-05, Vol.193, p.116489, Article 116489
Hauptverfasser:	El-Askary, Nada S., Salem, Mohammed A.-M., Roushdy, Mohamed I.
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Algorithms Automatic detection Computed Tomography Deep learning Feature extraction Feature processing Image classification Localization Lung features Lung nodule localization Machine learning Medical imaging Model accuracy Nodules Optimization Preprocessing Random forest
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•Random Forest is trained with features extracted from pixels of lung CT images.•Used segmentation as auxiliary step to improve results of region properties feature.•Reduced false positive rate to reach lung nodule localization with optimized forest.•Accuracy improved when compare results to our previous model and other researches.•Used 214 cases from a standard dataset with total 2124 CT lung slices. Lung nodule can cause lung cancer and so researchers do their best to detect those nodules in their early stages. Machine learning algorithms are used to detect lung nodules in a short time with high accuracy. Random Forest (RF) is a remarkable ensemble machine learning algorithm can be used to classify medical images, recognize different pathologies and detect deficiencies based on selected input features. The paper proposes a model that enables early detection and localization for lung nodule from CT images and propose RF optimization and analysis the effect of the feature groups on the classification accuracy. Processing was applied on features extracted from CT images to optimize the RF output. In previous work, local features such as Haar features gave better results than region-based features. In the proposed model after applying a novel ANDing technique in preprocessing step these region-based features gave better results and the model accuracy enhanced. By combining global and local features the model classification results and accuracy are greatly improved. Experiments were made using 214 cases with total 2124 CT slices downloaded from the publicly available LIDC database. After applying preprocessing using novel technique, 119 features are calculated and extracted from each pixel in the CT image. Post-processing is made on the extracted features to refine the learner input data. Feature dimensionality reduction was applied by dividing features into 5 different feature sets and select best scored results. Finally, when comparing with previous work, RF is optimized, true positive rate is increased by 8.66% and false positive rate is decreased by 4.4% which led to better localization and accuracy increased by 5.47%. Best achieved results were 96.41%, 95.98% and 96.20% for sensitivity, specificity and accuracy respectively when tuning RF with 80 trees and 0.04 for in bag fraction. Results from RF were compared with other methodologies such as KNN, SVM, CNN and deep learning and RF proved to give best accuracy as mentioned in the discussion section.
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2021.116489