Analisis Klasifikasi Spam Email Menggunakan Metode Extreme Gradient Boosting (XGBoost)
Kata Kunci:
Klasifikasi Spam Email, Machine Learning, Extreme Gradient Boosting, Stratified K-Fold Cross Validation, Confusion MatrixAbstrak
Peningkatan penggunaan email telah menyebabkan lonjakan spam yang merugikan, seperti penipuan, phishing, dan iklan tidak sah. Untuk mengatasi hal ini, diperlukan sistem deteksi yang mampu mengklasifikasikan email dengan akurat sebagai spam atau ham. Penelitian ini mengusulkan metode Extreme Gradient Boosting (XGBoost) untuk klasifikasi spam email. Evaluasi dilakukan menggunakan Stratified K-Fold Cross Validation dan Confusion Matrix. Hasil evaluasi Klasifikasi spam email menggunakan metode Extreme Gradient Boosting menunjukkan bahwa model yang diusulkan memiliki akurasi sebesar 95,3%, precision 95,1%, recall 95,6%, dan F1-score 95,2%. Analisis confusion matrix mengungkapkan bahwa model berhasil mengklasifikasikan 326 email spam dengan benar (True Positive) dan 323 email non-spam dengan benar (True Negative), sementara tingkat kesalahan yang tercatat relatif kecil, yaitu 17 email non-spam salah diklasifikasikan sebagai spam (False Positive) dan 15 email spam salah diklasifikasikan sebagai non-spam (False Negative). Hasil ini menggambarkan keseimbangan yang baik antara kemampuan model untuk mengenali email spam dan menghindari kesalahan klasifikasi pada email non-spam. Secara keseluruhan, hasil analisis matriks evaluasi ini membuktikan bahwa metode Extreme Gradient Boosting adalah pendekatan yang efektif dalam mengklasifikasikan spam email.
Referensi
Anirudh, S., Radha Nishant, P., Baitha, S., & Dinesh Kumar, K. (2024). An Ensemble Classification Model for Phishing Mail Detection. Procedia Computer Science, 233, 970–978. https://doi.org/10.1016/j.procs.2024.03.286
Badan Pusat Statistik (BPS). (2023). statistik-telekomunikasi-indonesia-2022.
Badan Sandi dan Siber Negara (BSSN). (2022). LANSKAP KEAMANAN SIBER INDONESIA.
Čavor, I. (2021, February 16). Decision Tree Model for Email Classification. 2021 25th International Conference on Information Technology, IT 2021. https://doi.org/10.1109/IT51528.2021.9390143
Hairani, H., Anggrawan, A., & Priyanto, D. (2023). Improvement Performance of the Random Forest Method on Unbalanced Diabetes Data Classification Using Smote-Tomek Link. International Journal on Informatics Visualization, 7(1), 258–264. https://doi.org/10.30630/joiv.7.1.1069
Kalra, V., Kashyap, I., & Kaur, H. (2022). Effect of Ensembling over K-fold Cross-Validation with Weighted K-Nearest Neighbour for Classification in Medical Domain. 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing, COM-IT-CON 2022, 796–800. https://doi.org/10.1109/COM-IT-CON54601.2022.9850498
Kementerian Komunikasi dan Informatika. (2024). Ancaman Siber Meningkat, 2Wamenkominfo Tekankan Pelindungan Data Pribadi. https://www.kominfo.go.id/content/detail/55668/siaran-pers-no-243hmkominfo032024-tentang-ancaman-siber-meningkat-wamenkominfo-tekankan-pelindungan-data-pribadi/0/siaran_pers
Ma, T. M., Yamamori, K., & Thida, A. (2020). A Comparative Approach to Naïve Bayes Classifier and Support Vector Machine for Email Spam Classification. 2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020, 324–326. https://doi.org/10.1109/GCCE50665.2020.9291921
Soundrapandian, P. D., & Geetha, S. (2022). Ensemble Learning on a Weak Correlated Android Malware data using Stratified K-Fold. 3rd IEEE 2022 International Conference on Computing, Communication, and Intelligent Systems, ICCCIS 2022, 187–192. https://doi.org/10.1109/ICCCIS56430.2022.10037646
Sulochana, B. C., Pragada, B. S., Lokesh, K., & Venugopalan, M. (2023). PySpark-Powered ML Models for Accurate Spam Detection in Messages. 2023 2nd International Conference on Futuristic Technologies, INCOFT 2023. https://doi.org/10.1109/INCOFT60753.2023.10425231
Sumithra, A., Ashifa, A., Harini, S., & Kumaresan, N. (2022). Probability-based Naïve Bayes Algorithm for Email Spam Classification. 2022 International Conference on Computer Communication and Informatics, ICCCI 2022. https://doi.org/10.1109/ICCCI54379.2022.9740792
Anirudh, S., Radha Nishant, P., Baitha, S., & Dinesh Kumar, K. (2024). An Ensemble Classification Model for Phishing Mail Detection. Procedia Computer Science, 233, 970–978. https://doi.org/10.1016/j.procs.2024.03.286
Badan Pusat Statistik (BPS). (2023). statistik-telekomunikasi-indonesia-2022.
Badan Sandi dan Siber Negara (BSSN). (2022). LANSKAP KEAMANAN SIBER INDONESIA.
Čavor, I. (2021, February 16). Decision Tree Model for Email Classification. 2021 25th International Conference on Information Technology, IT 2021. https://doi.org/10.1109/IT51528.2021.9390143
Hairani, H., Anggrawan, A., & Priyanto, D. (2023). Improvement Performance of the Random Forest Method on Unbalanced Diabetes Data Classification Using Smote-Tomek Link. International Journal on Informatics Visualization, 7(1), 258–264. https://doi.org/10.30630/joiv.7.1.1069
Kalra, V., Kashyap, I., & Kaur, H. (2022). Effect of Ensembling over K-fold Cross-Validation with Weighted K-Nearest Neighbour for Classification in Medical Domain. 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing, COM-IT-CON 2022, 796–800. https://doi.org/10.1109/COM-IT-CON54601.2022.9850498
Kementerian Komunikasi dan Informatika. (2024). Ancaman Siber Meningkat, 2Wamenkominfo Tekankan Pelindungan Data Pribadi. https://www.kominfo.go.id/content/detail/55668/siaran-pers-no-243hmkominfo032024-tentang-ancaman-siber-meningkat-wamenkominfo-tekankan-pelindungan-data-pribadi/0/siaran_pers
Ma, T. M., Yamamori, K., & Thida, A. (2020). A Comparative Approach to Naïve Bayes Classifier and Support Vector Machine for Email Spam Classification. 2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020, 324–326. https://doi.org/10.1109/GCCE50665.2020.9291921
Soundrapandian, P. D., & Geetha, S. (2022). Ensemble Learning on a Weak Correlated Android Malware data using Stratified K-Fold. 3rd IEEE 2022 International Conference on Computing, Communication, and Intelligent Systems, ICCCIS 2022, 187–192. https://doi.org/10.1109/ICCCIS56430.2022.10037646
Sulochana, B. C., Pragada, B. S., Lokesh, K., & Venugopalan, M. (2023). PySpark-Powered ML Models for Accurate Spam Detection in Messages. 2023 2nd International Conference on Futuristic Technologies, INCOFT 2023. https://doi.org/10.1109/INCOFT60753.2023.10425231
Sumithra, A., Ashifa, A., Harini, S., & Kumaresan, N. (2022). Probability-based Naïve Bayes Algorithm for Email Spam Classification. 2022 International Conference on Computer Communication and Informatics, ICCCI 2022. https://doi.org/10.1109/ICCCI54379.2022.9740792
Unduhan
Diterbitkan
Cara Mengutip
Terbitan
Bagian
Lisensi
Hak Cipta (c) 2025 Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer

Artikel ini berlisensiCreative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.