Predicting Depression Levels Among Final-Year Students Using CatBoost Algorithm

Prediksi Tingkat Depresi Mahasiswa Tingkat Akhir Menggunakan Algoritma CatBoost

Authors

  • Shintya Preity Aleng Sam Ratulangi University
  • Bernad Jumadi Dehotman Sitompul Sam Ratulangi University
  • Oktavian Abraham Lantang Sam Ratulangi University

DOI:

https://doi.org/10.35793/jtek.v14i2.62464

Keywords:

CatBoost, Depression, Final-Year Students, Machine Learning, Prediction, Thesis

Abstract

Abstract — Depression among final-year students completing their thesis is a serious issue that often goes unnoticed, yet it affects mental health and academic productivity. Academic pressure, deadlines, workload, and lack of social support contribute to the risk of depression. This study aims to build a predictive model for depression levels in final-year students based on the five categories from the PHQ-9, namely: Not Depressed, Mild, Moderate, Severe, and Very Severe, using the CatBoost algorithm. Data were collected through an online questionnaire from 450 respondents, consisting of nine PHQ-9 items as the depression level label, and additional questions on psychological factors (academic burden, deadlines, sleep quality, and social support) as well as demographic data (age, gender, study program, and semester) as predictive features. After selection, 440 valid responses were obtained. The preprocessing stages included handling invalid values, detecting outliers, renaming columns, and stratified data splitting into training and testing sets (70:30). Evaluation results show that the CatBoost algorithm outperformed Random Forest and Support Vector Machine (SVM), achieving an accuracy of 88%, macro F1-score of 89%, weighted F1-score of 88%, macro AUC of 97.87%, and weighted AUC of 97.44%. The model demonstrates strong potential as an effective early detection tool for depression among final-year students with high accuracy and efficiency.

Key words CatBoost; Depression; Final-Year Students; Machine Learning; Prediction; Thesis.

 

Abstrak Depresi pada mahasiswa tingkat akhir yang menyusun skripsi merupakan isu serius yang sering tidak disadari, namun berdampak pada kesehatan mental dan produktivitas akademik. Tekanan akademik, tenggat waktu, beban tugas, serta kurangnya dukungan sosial menjadi faktor yang berkontribusi terhadap risiko depresi. Penelitian ini bertujuan membangun model prediksi tingkat depresi mahasiswa akhir berdasarkan lima kategori dari PHQ-9, yaitu Tidak Depresi, Ringan, Sedang, Berat, dan Sangat Berat, menggunakan algoritma CatBoost. Data dikumpulkan melalui kuesioner daring dari 450 responden, yang mencakup sembilan item skala PHQ-9 sebagai label tingkat depresi, serta pertanyaan tambahan mengenai faktor psikologis (beban akademik, tenggat waktu, kualitas tidur, dan dukungan sosial) dan data demografis (usia, jenis kelamin, program studi, semester) sebagai fitur prediktor. Setelah proses seleksi, diperoleh 440 data valid. Tahapan prapemrosesan mencakup penanganan nilai tidak valid, deteksi outlier, pengubahan nama kolom, dan pembagian data menjadi data latih dan data uji secara stratifikasi (70:30). Hasil evaluasi menunjukkan bahwa algoritma CatBoost memberikan performa terbaik dibandingkan Random Forest dan Support Vector Machine (SVM), dengan akurasi 88%, macro F1-score 89%, weighted F1-score 88%, serta AUC macro 97,87% dan weighted 97,44%. Model ini menunjukkan potensi sebagai alat bantu deteksi dini depresi pada mahasiswa tingkat akhir secara akurat dan efisien.

Kata kunci — CatBoost; Depresi; Machine Learning; Mahasiswa Akhir; Prediksi; Skripsi.

References

[1] B. Siskia, U. Subroto, and M. Kurniawati, “Korelasi Resiliensi dan Depresi Mahasiswa Tingkat Akhir Universitas X,” vol. 4, pp. 3576–3585, 2024.

[2] E. Sitepu, Juliana Tampubolon, Sudianto Manulang, and Sisti Nadia Amalia, “Analisis Faktor-Faktor yang Berhubungan dengan Kejadian Stres Pada Mahasiswa Tingkat Akhir S1 Matematika di Universitas Negeri Medan,” Statistika, vol. 24, no. 1, pp. 93–101, 2024, doi: 10.29313/statistika.v24i1.3257.

[3] A. H. Z. Tarigan, Y. A. Appulembang, and I. P. Nugroho, “Pengaruh Stress Management Terhadap Resiliensi Mahasiswa Semester Akhir Di Palembang,” J. Bimbing. Dan Konseling Ar-Rahman, vol. 7, no. 1, p. 12, 2021, doi: 10.31602/jbkr.v7i1.4989.

[4] B. B. Pamungkas, “Hubungan Level Stres Akademik Dengan Tingkat Kecemasan Pada Mahasiswa Selama Pandemi Covid-19 Di Indonesia,” Skripsi, Fak. Ilmu Kesehat. Progr. Stud. S1 Keperawatan Univ. Muhammadiyah Surakarta, 2021.

[5] A. Putrama, “Sistem Pakar Diagnosa Depresi Mahasiswa Akhir Beserta Solusi Penanganannya Dengan Metode Certainty Factor,” vol. 2, no. 1, pp. 41–49, 2020.

[6] L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “Catboost: Unbiased boosting with categorical features,” Adv. Neural Inf. Process. Syst., no. Section 4, pp. 6638–6648, 2020.

[7] J. Lu, “Identifying Depression Using Machine Learning,” Front. Comput. Intell. Syst., vol. 11, no. 1, pp. 64–69, 2025.

[8] A. A. Ibrahim, R. L. Ridwan, M. M. Muhammed, R. O. Abdulaziz, and G. A. Saheed, “Comparison of the CatBoost Classifier with other Machine Learning Methods,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 11, pp. 738–748, 2020, doi: 10.14569/IJACSA.2020.0111190.

[9] J. Chudri, D. Nazma, E. Istriana, Kartini, Junaidi, and V. Hendrilie, “Depresi pada Pekerja: Kenali Gejala dan Pencegahannya,” vol. 2, no. 1, pp. 258–267, 2025, doi: https://doi.org/10.25105/abdimastrimedika.v2i1.22007.

[10] D. B. Srisulistiowati, M. Khaerudin, and S. Rejeki, “Sistem Informasi Prediksi Penjualan Alat Tulis Kantor Dengan Metode Fp-Growth (Studi Kasus Toko Koperasi Sekolah Bina Mulia),” J. Sist. Inf. Univ. Suryadarma, vol. 8, no. 2, 2021, doi: 10.35968/jsi.v8i2.739.

[11] R. K. Dinata and N. Hasdyna, Machine Learning. Unimal Press, 2020.

[12] C. P. Ananda, “Machine Learning Untuk Prediksi Gaya Hidup Berdasarkan Socioeconomic Status (SES) Menggunakan Algoritma Catboost Studi Kasus: Mahasiswa UIN Jakarta,” Nucl. Phys., vol. 13, no. 1, pp. 104–116, 2023.

[13] S. Barua, D. Gavandi, P. Sangle, L. Shinde, and J. Ramteke, “Swindle: Predicting the Probability of Loan Defaults using CatBoost Algorithm,” Proc. - 5th Int. Conf. Comput. Methodol. Commun. ICCMC 2021, no. Iccmc, pp. 1710–1715, 2021, doi: 10.1109/ICCMC51019.2021.9418277.

[14] H. F. Fadli and A. F. Hidayatullah, “Identifikasi Cyberbullying pada Media Sosial Twitter Menggunakan Metode LSTM dan BiLSTM,” Univ. Islam Indones., vol. 2, no. No. 1, pp. 1–6, 2021.

[15] A. R. Dani and I. Handayani, “Klasifikasi Motif Batik Yogyakarta Menggunakan Metode GLCM dan CNN,” J. Teknol. Terpadu, vol. 10, no. 2, pp. 142–156, 2024, [Online]. Available: https://journal.nurulfikri.ac.id/index.php/jtt

[16] J. C. Obi, “A comparative study of several classification metrics and their performances on data,” World J. Adv. Eng. Technol. Sci., vol. 8, no. 1, pp. 308–314, 2023, doi: 10.30574/wjaets.2023.8.1.0054.

[17] N. Moningka, Raynold, M. Hafidurrohman, W. A. Tri R, and Kusrini, “Klasifikasi Mental Mahasiswa Menggunakan Metode Machine Learning,” J. Quancom, vol. 1, no. 2, pp. 27–32, 2023, [Online]. Available: https://www.kaggle.com/datasets/shariful07/student-mental-

[18] A. Maulana, R. P. F. Afidh, N. B. Maulydia, G. M. Idroes, and S. Rahimah, “Predicting Obesity Levels with High Accuracy: Insights from a CatBoost Machine Learning Model,” Infolitika J. Data Sci., vol. 2, no. 1, pp. 17–27, 2024, doi: 10.60084/ijds.v2i1.195.

[19] L. Yang et al., “Application of machine learning in depression risk prediction for connective tissue diseases,” Sci. Rep., vol. 15, no. 1, p. 1706, 2025, doi: 10.1038/s41598-025-85890-7.

Downloads

Published

2025-12-01