Study of data mining techniques to classify the life expectancy of patients with chronic hepatitis

Muhammad Sam'an(1*)


(1) Universitas Muhammadiyah Semarang
(*) Corresponding Author

Abstract


This study examines a hepatitis patient dataset using eleven machine learning (ML) models, including LR, SVM, KNN, DT, RF, XGBoost, LightGBM, GBDT, Cat- Boost, AdaBoost, and Stacking. The dataset is subjected to various analyses, includ- ing correlation analysis, age distribution exploration, class imbalance resolution, and feature importance evaluation using eight methods: Chi-square, DT, RF, XGBoost, LightGBM, GBDT, CatBoost, and AdaBoost. The results of this study indicate that the implementation of the SMOTE method and feature importance analysis improves the performance of ML models. Among the eleven models used, the LR model achieved the highest accuracy, reaching 93.75% before applying SMOTE and increasing to 100% after its implementation. Furthermore, the SMOTE method suc- cessfully addressed the issue of class imbalance in the dataset, as evidenced by the improvement in accuracy of the RF model after applying SMOTE. Overall, this study demonstrates that the use of the SMOTE method and feature importance analysis, particularly with the Chi-square method, plays a crucial role in improving the performance of ML models. SMOTE helps address class imbalance issues, while feature importance analysis assists in selecting relevant features. By combining both approaches, ML models achieve higher and better accuracy in classifying samples from the minority class

Keywords


Hepatitis; SMOTE; Feature importance; Machine learning

References


Abd El-Salam, S. M., Ezz, M. M., Hashem, S., Elakel, W., Salama, R., ElMakhzangy, H., & ElHefnawi, M. (2019). Performance of machine learning approaches on prediction of esophageal varices for Egyptian chronic hepatitis C patients. Informatics in Medicine Un- locked, 17. doi:10.1016/j.imu.2019.100267.

Adlung, L., Cohen, Y., Mor, U., & Elinav, E. (2021). Machine learning in clinical decision making. Med, 2(6), 642—665. doi:10.1016/j.medj.2021.04.006.

Ali, M. M. R., Helmy, Y., Khedr, A. E., & Abdo, A. (2018). Intelligent Decision Framework to Explore and Control Infection of Hepatitis C Virus. Advances in Intelligent Systems and Computing, 723, 264-–274. doi:10.1007/978-3-319-74690-6 26.

Ali, N., Srivastava, D., Tiwari, A., Pandey, A., Pandey, A. K., & Sahu, A. (2022). Predicting Life Expectancy of Hepatitis B Patients using Machine Learning. IEEE International Con- ference on Distributed Computing and Electrical Circuits and Electronics, ICDCECE 2022. doi:10.1109/ICDCECE53908.2022.9793025.

Ali, A. M., Hassan, M. R., Aburub, F., Alauthman, M., Aldweesh, A., Al-Qerem, A., Jebreen, I., & Nabot, A. (2023). Explainable Machine Learning Approach for Hepatitis C Diagnosis Using SFS Feature Selection. Machines, 11(3), 391. doi:10.3390/machines11030391.

Alizargar, A., Chang, Y. L., & Tan, T. H. (2023). Performance Comparison of Machine Learn- ing Approaches on Hepatitis C Prediction Employing Data Mining Techniques. Bioengi- neering, 10(4). doi:10.3390/bioengineering10040481.

Barakat, N. H., Barakat, S. H., & Ahmed, N. (2019). Prediction and staging of hepatic fibrosis in children with hepatitis c virus: A machine learning approach. Healthcare Informatics Research, 25(3), 173—181. doi:10.4258/hir.2019.25.3.173.

Butt, M. B., Alfayad, M., Saqib, S., Khan, M. A., Ahmad, M., Khan, M. A., & Elmitwally, N. S. (2021). Diagnosing the Stage of Hepatitis C Using Machine Learning. Journal of Healthcare Engineering. doi:10.1155/2021/8062410.

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321—357. doi:10.1613/jair.953.

Chien, R. N., Kao, J. H., Peng, C. Y., Chen, C. H., Liu, C. J., Huang, Y. H., Hu, T. H., Yang, H. I., Lu, S. N., Ni, Y. H., Chuang, W. L., Lee, C. M., Wu, J. C., Chen, P. J., & Liaw, Y. F. (2019). Taiwan consensus statement on the management of chronic hepatitis B. Journal of the Formosan Medical Association, 118(1P1), 7-–38. doi:10.1016/j.jfma.2018.11.008.

Douzas, G., & Bacao, F. (2018). Effective data generation for imbalanced learning using con- ditional generative adversarial networks. Expert Systems with Applications, 91, 464—471. doi:10.1016/j.eswa.2017.09.030.

Elreedy, D., & Atiya, A. F. (2019). A Comprehensive Analysis of Synthetic Minority Oversam- pling Technique (SMOTE) for handling class imbalance. Information Sciences, 505, 32—64. doi:10.1016/j.ins.2019.07.070.

Gower, E., Estes, C., Blach, S., Razavi-Shearer, K., & Razavi, H. (2014). Global epidemiology and genotype distribution of the hepatitis C virus infection. Journal of Hepatology, 61(1), S45–S57. doi:10.1016/j.jhep.2014.07.027.

Hashem, S., ElHefnawi, M., Habashy, S., El-Adawy, M., Esmat, G., Elakel, W., Abdelazziz, A. O., Nabeel, M. M., Abdelmaksoud, A. H., Elbaz, T. M., & Shousha, H. I. (2020). Ma- chine Learning Prediction Models for Diagnosing Hepatocellular Carcinoma with HCV- related Chronic Liver Disease. Computer Methods and Programs in Biomedicine, 196. doi:10.1016/j.cmpb.2020.105551.

Hoffmann, G., Bietenbeck, A., Lichtinghagen, R., & Klawonn, F. (2018). Using machine learn- ing techniques to generate laboratory diagnostic pathways—a case study. Journal of Labo- ratory and Precision Medicine, 3, 58—58. doi:10.21037/jlpm.2018.06.01.

Lin, J. H., & Haug, P. J. (2006). Data preparation framework for preprocessing clinical data in data mining. Annual Symposium Proceedings / AMIA Symposium. AMIA Symposium, 489-–493.

Kayvanjoo, A. H., Ebrahimi, M., & Haqshenas, G. (2014). Prediction of hepatitis C virus interferon/ribavirin therapy outcome based on viral nucleotide attributes using machine learning algorithms. BMC Research Notes, 7(1). doi:10.1186/1756-0500-7-565.

Mamdouh Farghaly, H., Shams, M. Y., & Abd El-Hafeez, T. (2023). Hepatitis C Virus pre- diction based on machine learning framework: a real-world case study in Egypt. Knowledge and Information Systems. doi:10.1007/s10115-023-01851-4.

Nakayama, J. Y., Ho, J., Cartwright, E., Simpson, R., & Hertzberg, V. S. (2021). Pre- dictors of progression through the cascade of care to a cure for hepatitis C patients using decision trees and random forests. Computers in Biology and Medicine, 134. doi:10.1016/j.compbiomed.2021.104461.

Nandipati, S. C., XinYing, C., & Wah, K. K. (2020). Hepatitis C Virus (HCV) Prediction by Machine Learning Techniques. Applications of Modelling and Simulation, 4(0), 89—100.

Obaido, G., Ogbuokiri, B., Swart, T. G., Ayawei, N., Kasongo, S. M., Aruleba, K., Mienye, I. D., Aruleba, I., Chukwu, W., Osaye, F., Egbelowo, O. F., Simphiwe, S., & Esenogho, E. (2022). An Interpretable Machine Learning Approach for Hepatitis B Diagnosis. Applied

Sciences (Switzerland), 12(21). doi:10.3390/app122111127.

Oladimeji, O. O., Oladimeji, A., & Olayanju, O. (2021). Machine Learning Models for Di-

agnostic Classification of Hepatitis C Tests. Frontiers in Health Informatics, 10(1), 70.

doi:10.30699/fhi.v10i1.274.

Safdari, R., Deghatipour, A., Gholamzadeh, M., & Maghooli, K. (2022). Applying data min-

ing techniques to classify patients with suspected hepatitis C virus infection. Intelligent

Medicine, 2(4), 193-–198.

Saputra, T. A. N., Arizona, K. I., Andrian, M. R., Kurniadi, F. I., & Juarto, B. (2022).

Random Forest in Detecting Hepatitis C. Proceedings - 2022 9th International Conference on Information Technology, Computer and Electrical Engineering, ICITACEE 2022, 299- –302. doi:10.1109/ICITACEE55701.2022.9924074.

Sharshar, E. T., Maghawry, H. A., Abdelsameea, E., & Badr, N. (2022). Machine Learning Prediction of Hepatic Fibrosis in Hepatitis B Egyptian Patients Based on Clinical Labo- ratory Parameters. Journal of Theoretical and Applied Information Technology, 100(18), 5702—5714.

Simmonds, P., Bukh, J., Combet, C., Deleage, G., Enomoto, N., Feinstone, S., Halfon, P., Inchauspe, G., Kuiken, C., Maertens, G., Mizokami, M., Murphy, D. G., Okamoto, H., Pawlotsky, J. M., Penin, F., Sablon, E., Shin-I, T., Stuyver, L. J., Thiel, H. J., . . . Widell, A. (2005). Consensus proposals for a unified system of nomenclature of hepatitis C virus genotypes. Hepatology, 42(4), 962-–973. doi:10.1002/hep.20819.

Santolaria, C. (2021). Machine Learning in Medicine. doi:10.3390/mol2net-07-11828. Vijayalakshmi, C., & Mohideen, S. P. (2022). Predicting Hepatitis B to be acute or chronic in an infected person using machine learning algorithm. Advances in Engineering Software, 172. doi:10.1016/j.advengsoft.2022.103179.

Wongvorachan, T., He, S., & Bulut, O. (2023). A Comparison of Undersampling, Oversam-

pling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data

Mining. Information (Switzerland), 14(1). doi:10.3390/info14010054.

Wong, G. L. H., Hui, V. W. K., Tan, Q., Xu, J., Lee, H. W., Yip, T. C. F., Yang, B., Tse, Y. K., Yin, C., Lyu, F., Lai, J. C. T., Lui, G. C. Y., Chan, H. L. Y., Yuen, P. C., & Wong, V. W. S. (2022). Novel machine learning models outperform risk scores in predict- ing hepatocellular carcinoma in patients with chronic viral hepatitis. JHEP Reports, 4(3).

doi:10.1016/j.jhepr.2022.100441.

Yaganoglu, M. (2022). Hepatitis C virus data analysis and prediction using machine learning. Data and Knowledge Engineering, 142. doi:10.1016/j.datak.2022.102087.

Yue, W., Wang, Z., Chen, H., Payne, A., & Liu, X. (2018). Machine learning with applications in breast cancer diagnosis and prognosis. Designs, 2 (2), 1-–17. doi:10.3390/designs2020013.

Zuhdi, N. (2023). Indonesia Termasuk 20 Negara dengan Angka Hepatitis yang Tert- inggi Global. Media Indonesia. https://mediaindonesia.com/humaniora/581686/indonesia-

termasuk-20-negara-dengan-angka-hepatitis-yang-tertinggi-global


Article Metrics

Abstract view : 15 times


DOI: https://doi.org/10.26714/jichi.v6i2.17519

Refbacks

  • There are currently no refbacks.


____________________________________________________________________________
Journal of Intelligent Computing and Health Informatics (JICHI)
ISSN 2715-6923 (print) | 2721-9186 (online)
Organized by
Department of Informatics
Faculty of Engineering
Universitas Muhammadiyah Semarang

W : https://jurnal.unimus.ac.id/index.php/ICHI
E : [email protected], [email protected]

View My Stats

Creative Commons License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.