TY - GEN
T1 - Diagnosis of Pneumoconiosis with Machine Learning
AU - Hanampa, Viviana
AU - Astete, Jonh
AU - Castaneda, Benjamin
AU - Romero, Stefano
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Pneumoconiosis encompasses a group of lung diseases caused by inhaling dust particles. Frequently recognized as an occupational disease, it primarily affects workers in the mining industry. This paper details the use of machine learning algorithms to automate the diagnostic process in two distinct stages: Stage 1 involves lung segmentation, and Stage 2 focuses on classifying X-rays to determine the presence or absence of pneumoconiosis. In Stage 1, a U-Net network is employed for semantic segmentation, achieving an accuracy of 94% on test data and an average accuracy of 98.35% on validation data. Stage 2 introduces a comparative system that complies with the ILO's standard practical guidelines for diagnosis. This stage evaluates four machine learning techniques: Support Vector Machine (SVM), Random Forest, and Naive Bayes and XGBoost. Our findings indicate that dividing the lung into six segments yields the most balanced metrics (including accuracy, precision, F1 score, and recall) across these models. Notably, the XGBoost model outperforms others in this configuration, achieving a remarkable precision of 98%, an accuracy of 90% and a F1 of 84%.
AB - Pneumoconiosis encompasses a group of lung diseases caused by inhaling dust particles. Frequently recognized as an occupational disease, it primarily affects workers in the mining industry. This paper details the use of machine learning algorithms to automate the diagnostic process in two distinct stages: Stage 1 involves lung segmentation, and Stage 2 focuses on classifying X-rays to determine the presence or absence of pneumoconiosis. In Stage 1, a U-Net network is employed for semantic segmentation, achieving an accuracy of 94% on test data and an average accuracy of 98.35% on validation data. Stage 2 introduces a comparative system that complies with the ILO's standard practical guidelines for diagnosis. This stage evaluates four machine learning techniques: Support Vector Machine (SVM), Random Forest, and Naive Bayes and XGBoost. Our findings indicate that dividing the lung into six segments yields the most balanced metrics (including accuracy, precision, F1 score, and recall) across these models. Notably, the XGBoost model outperforms others in this configuration, achieving a remarkable precision of 98%, an accuracy of 90% and a F1 of 84%.
KW - diagnosis
KW - log-normal label distribution learning
KW - Machine learning
KW - pneumoconiosis
UR - http://www.scopus.com/inward/record.url?scp=85214974983&partnerID=8YFLogxK
U2 - 10.1109/EMBC53108.2024.10782772
DO - 10.1109/EMBC53108.2024.10782772
M3 - Conference contribution
AN - SCOPUS:85214974983
T3 - Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS
BT - 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2024
Y2 - 15 July 2024 through 19 July 2024
ER -