Model Ensemble untuk Prediksi Risiko Diabetes dengan Pertimbangan Efisiensi Biaya

Authors

  • Rofiatul Qosimah Universitas Negeri Surabaya
  • Riska Dhenabayu Universitas Negeri Surabaya https://orcid.org/0000-0002-6530-2099
  • Achmad Kautsar Universitas Negeri Surabaya
  • Anita Safitri Universitas Negeri Surabaya

DOI:

https://doi.org/10.33367/ijhass.v6i4.8505

Keywords:

Cost-Sensitive Learning, Kalibrasi Probabilitas, Prediksi Diabetes, Random Forest, XGBoost

Abstract

This study addresses the growing global burden of diabetes by evaluating whether ensemble-based machine learning models can support reliable and cost-efficient early risk prediction. Moving beyond accuracy-centered evaluation, the study integrates cost-sensitive threshold optimization and probability calibration to enhance clinical relevance. Random Forest and XGBoost are evaluated using two datasets with contrasting population characteristics. Model performance is examined in terms of discriminative ability, calibration quality, and total misclassification cost. The results indicate that while XGBoost remains competitive on small-scale datasets, Random Forest provides more stable calibration and more consistent cost efficiency. These findings suggest that cost-sensitive and calibrated ensemble approaches have the potential to support more rational and economically efficient diabetes screening policies.

 

References

Ahsan, Md Manjurul, M. A.Parvez Mahmud, Pritom Kumar Saha, Kishor Datta Gupta, and Zahed Siddique. “Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance.” Technologies 2021, Vol. 9, Page 52 9, no. 3 (July 24, 2021): 52. https://doi.org/10.3390/TECHNOLOGIES9030052.

Altalhan, Manahel, Abdulmohsen Algarni, and Monia Turki-Hadj Alouane. “Imbalanced Data Problem in Machine Learning: A Review.” IEEE Access 13 (2025): 13686–99. https://doi.org/10.1109/ACCESS.2025.3531662.

Araf, Imane, Ali Idri, and Ikram Chairi. “Cost-Sensitive Learning for Imbalanced Medical Data: A Review.” Artificial Intelligence Review 2024 57:4 57, no. 4 (March 1, 2024): 1–72. https://doi.org/10.1007/S10462-023-10652-8.

Calster, Ben Van, David J. McLernon, Maarten Van Smeden, Laure Wynants, Ewout W. Steyerberg, Patrick Bossuyt, Gary S. Collins, Petra MacAskill, Karel G.M. Moons, and Andrew J. Vickers. “Calibration: The Achilles Heel of Predictive Analytics.” BMC Medicine 17, no. 1 (December 16, 2019). https://doi.org/10.1186/S12916-019-1466-7.

Chang, Victor, Meghana Ashok Ganatra, Karl Hall, Lewis Golightly, and Qianwen Ariel Xu. “An Assessment of Machine Learning Models and Algorithms for Early Prediction and Diagnosis of Diabetes Using Health Indicators.” Healthcare Analytics 2 (November 1, 2022): 100118. https://doi.org/10.1016/J.HEALTH.2022.100118.

Dhenabayu, Riska, Hujjatullah Fazlurrahman, and Purwohandoko. “Potential Researches of GAN in Fashion Areas.” Proceedings - 2023 6th International Conference on Computer and Informatics Engineering: AI Trust, Risk and Security Management (AI Trism), IC2IE 2023, 2023, 276–81. https://doi.org/10.1109/IC2IE60547.2023.10331411.

Dhenabayu, Riska, Nanang Hoesen Hidroes Abbrori, Hujjatullah Fazlurrahman, and Achmad Fitro. “Harnessing the Power of Transformer Networks in AI-Driven Decision Support Systems for Badminton Action Recognition.” 2024 12th International Conference on Cyber and IT Service Management, CITSM 2024, 2024. https://doi.org/10.1109/CITSM64103.2024.10775332.

Ganie, Shahid Mohammad, Pijush Kanti Dutta Pramanik, Majid Bashir Malik, Saurav Mallik, and Hong Qin. “An Ensemble Learning Approach for Diabetes Prediction Using Boosting Techniques.” Frontiers in Genetics 14 (2023): 1252159. https://doi.org/10.3389/FGENE.2023.1252159.

IDF. “Diabetes Facts and Figures | International Diabetes Federation,” 2025. https://idf.org/about-diabetes/diabetes-facts-figures/.

———. “The Diabetes Atlas | Global Diabetes Data & Statistics.” International Diabetes Federation (IDF), 2025. https://diabetesatlas.org/.

Joel, Luke Oluwaseye, Wesley Doorsamy, and Babu Sena Paul. “A Comparative Study of Imputation Techniques for Missing Values in Healthcare Diagnostic Datasets.” International Journal of Data Science and Analytics 2025 20:7 20, no. 7 (June 11, 2025): 6357–73. https://doi.org/10.1007/S41060-025-00825-9.

Leevy, Joffrey L., Justin M. Johnson, John Hancock, and Taghi M. Khoshgoftaar. “Threshold Optimization and Random Undersampling for Imbalanced Credit Card Data.” Journal of Big Data 2023 10:1 10, no. 1 (May 6, 2023): 1–22. https://doi.org/10.1186/S40537-023-00738-Z.

Li, Jing. “Area under the ROC Curve Has the Most Consistent Evaluation for Binary Classification.” PLOS ONE 19, no. 12 (December 1, 2024): e0316019. https://doi.org/10.1371/JOURNAL.PONE.0316019.

Liou, Lathan, Erick Scott, Prathamesh Parchure, Yuxia Ouyang, Natalia Egorova, Robert Freeman, Ira S. Hofer, et al. “Assessing Calibration and Bias of a Deployed Machine Learning Malnutrition Prediction Model within a Large Healthcare System.” Npj Digital Medicine 2024 7:1 7, no. 1 (June 6, 2024): 149-. https://doi.org/10.1038/s41746-024-01141-5.

Malik, Abdoul. “Comparative Evaluation of Ensemble Learning Algorithms for Early Detection of Diabetes.” EDRAAK 2025 (September 6, 2025): 103–10. https://doi.org/10.70470/EDRAAK/2025/013.

Parker, Emily D., Janice Lin, Troy Mahoney, Nwanneamaka Ume, Grace Yang, Robert A. Gabbay, Nuha A. Elsayed, and Raveendhara R. Bannuru. “Economic Costs of Diabetes in the U.S. in 2022.” Diabetes Care 47, no. 1 (January 2, 2024): 26–43. https://doi.org/10.2337/DCI23-0085.

Pedregosa, Fabian, Gael Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, et al. “Scikit-Learn: Machine Learning in Python Gaël Varoquaux Bertrand Thirion Vincent Dubourg Alexandre Passos PEDREGOSA, VAROQUAUX, GRAMFORT ET AL. Matthieu Perrot.” Journal of Machine Learning Research 12 (2011): 2825–30.

Petridis, Panagiotis D., Aleksandra S. Kristo, Angelos K. Sikalidis, and Ilias K. Kitsas. “A Review on Trending Machine Learning Techniques for Type 2 Diabetes Mellitus Management.” Informatics 2024, Vol. 11, Page 70 11, no. 4 (September 27, 2024): 70. https://doi.org/10.3390/INFORMATICS11040070.

Probst, Philipp, Marvin Wright, and Anne-Laure Boulesteix. “Hyperparameters and Tuning Strategies for Random Forest.” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 9, no. 3 (February 26, 2019). https://doi.org/10.1002/widm.1301.

Ryu, Dajung, and Sohyune Sok. “Prediction Model of Quality of Life Using the Decision Tree Model in Older Adult Single-Person Households: A Secondary Data Analysis.” Frontiers in Public Health 11 (August 31, 2023): 1224018. https://doi.org/10.3389/FPUBH.2023.1224018/BIBTEX.

Thai-Nghe, Nguyen, Zeno Gantner, and Lars Schmidt-Thieme. “Cost-Sensitive Learning Methods for Imbalanced Data.” Proceedings of the International Joint Conference on Neural Networks, 2010. https://doi.org/10.1109/IJCNN.2010.5596486.

Tougui, Ilias, Abdelilah Jilbab, and Jamal El Mhamdi. “Impact of the Choice of Cross-Validation Techniques on the Results of Machine Learning-Based Diagnostic Applications.” Healthcare Informatics Research 27, no. 3 (July 1, 2021): 189. https://doi.org/10.4258/HIR.2021.27.3.189.

Vrudhula, Amey, Alan C. Kwan, David Ouyang, and Susan Cheng. “Machine Learning and Bias in Medical Imaging: Opportunities and Challenges.” Circulation. Cardiovascular Imaging 17, no. 2 (February 1, 2024): e015495. https://doi.org/10.1161/CIRCIMAGING.123.015495.

WHO. “Diabetes.” WHO, 2024. https://www.who.int/news-room/fact-sheets/detail/diabetes.

Zadrozny, Bianca, and Charles Elkan. “Transforming Classifier Scores into Accurate Multiclass Probability Estimates.” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002, 694–99. https://doi.org/10.1145/775047.775151.

Zheng, Alice, and Amanda Casari. Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists. ACM Computing Surveys. First Edit. Vol. 53. Sebastopol, CA: O’Reilly Media, Inc., 2018.

Downloads

Abstract Views: 74, PDF downloads: 54

Published

2025-12-19

How to Cite

Qosimah, R., Dhenabayu, R., Kautsar, A. ., & Safitri, A. . (2025). Model Ensemble untuk Prediksi Risiko Diabetes dengan Pertimbangan Efisiensi Biaya. Indonesian Journal of Humanities and Social Sciences, 6(4), 797-810. https://doi.org/10.33367/ijhass.v6i4.8505