HYBRID APPROACH DENGAN EVOLUTIONARY HYBRID SAMPLING UNTUK PERMASALAHAN CLASS IMBALANCE DAN OVERLAPPING
Sari
Permasalahan Class Imbalance merupakan permasalahan yang perlu mendapat penanganan serius di dalam proses klasifikasi. Permasalahan ini tidak dapat dihindari karena kecenderungan distribusi instance yang tidak seimbang yang mengakitbatkan suatu class memiliki instance yang jauh lebih besar dibandingkan class lainnya. Hal ini dapat mempengaruhi akurasi klasifikasi karena class dengan jumlah instance yang lebih besar memiliki akurasi yang lebih baik dibandingkan dengan class dengan jumlah instance yang lebih kecil. Penanganan class imbalance menggunakan pendekatan data-level, algorithm-level, dan hybrid approach. Hybrid approach yang menggabungkan data-level dan algorithm-level cenderung memberikan hasil yang lebih baik di dalam penanganan class imbalance. Di dalam Hybrid Approach penggunaan over-sampling dapat mengakitkan kondisi overlapping dan meaningless samples sedangkan under-sampling mengakibatkan hilangnya informasi penting dari majority samples. Pendekatan Evolutionary Hybrid Sampling digunakan untuk untuk menganalisa distribusi data pada original data dan memberikan area overlapping diantara majority dan minority dengan menggabungkan pendekatan over-sampling pada minority samples dan under-sampling pada majority samples. Penerapan Hybrid Approach dengan Evolutionary Hybrid Sampling diharapkan akan memberikan hasil yang lebih baik dibandingkan dengan Hybrid Approach dengan SMOTE sebagai metode Sampling. Hasil penelitian menunjukkan bahwa Hybrid Approach dengan Evolutionary Hybrid Sampling memberikan hasil yang lebih baik pada Augmented R-Value, Precision, dan Recall.
Kata Kunci
Teks Lengkap:
PDFReferensi
Ahsan, R., Ebrahimi, F., & Ebrahimi, M. (2022). Classification of imbalanced protein sequences with deep-learning approaches; application on influenza A imbalanced virus classes. Informatics in Medicine Unlocked, 100860. https://doi.org/10.1016/j.imu.2022.100860
Alcalá-Fdez, J., Sánchez, L., García, S., Jesus, M. J. del, Ventura, S., Garrell, J. M., Otero, J., Romero, C., Bacardit, J., Rivas, V. M., Fernández, J. C., & Herrera, F. (2009). KEEL: A software tool to assess evolutionary algorithms for data mining problems. Soft Computing, 13(3), 307–318. https://doi.org/10.1007/s00500-008-0323-y
Bach, M., Werner, A., & Palt, M. (2019). The Proposal of Undersampling Method for Learning from Imbalanced Datasets. Procedia Computer Science, 159, 125–134. https://doi.org/10.1016/j.procs.2019.09.167
Czarnowski, I. (2022). Weighted Ensemble with one-class Classification and Over-sampling and Instance selection (WECOI): An approach for learning from imbalanced data streams. Journal of Computational Science, 61, 101614. https://doi.org/10.1016/j.jocs.2022.101614
De Angeli, K., Gao, S., Danciu, I., Durbin, E. B., Wu, X.-C., Stroup, A., Doherty, J., Schwartz, S., Wiggins, C., Damesyn, M., Coyle, L., Penberthy, L., Tourassi, G. D., & Yoon, H.-J. (2022). Class imbalance in out-of-distribution datasets: Improving the robustness of the TextCNN for the classification of rare cancer types. Journal of Biomedical Informatics, 125, 103957. https://doi.org/10.1016/j.jbi.2021.103957
de Morais, R. F. A. B., & Vasconcelos, G. C. (2019). Boosting the performance of over-sampling algorithms through under-sampling the minority class. Neurocomputing, 343, 3–18. https://doi.org/10.1016/j.neucom.2018.04.088
Eshelman, L. J. (1991). The CHC Adaptive Search Algorithm: How to Have Safe Search When Engaging in Nontraditional Genetic Recombination. In G. J. E. Rawlins (Ed.), Foundations of Genetic Algorithms (Vol. 1, pp. 265–283). Elsevier. https://doi.org/10.1016/B978-0-08-050684-5.50020-3
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., & Herrera, F. (2012). A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(4), 463–484. https://doi.org/10.1109/TSMCC.2011.2161285
Hanskunatai, A. (2018). A New Hybrid Sampling Approach for Classification of Imbalanced Datasets. 2018 3rd International Conference on Computer and Communication Systems (ICCCS), 67–71. https://doi.org/10.1109/CCOMS.2018.8463228
Luque, A., Carrasco, A., Martín, A., & de las Heras, A. (2019). The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition, 91, 216–231. https://doi.org/10.1016/j.patcog.2019.02.023
Mienye, I. D., & Sun, Y. (2021). Performance analysis of cost-sensitive learning methods with application to imbalanced medical data. Informatics in Medicine Unlocked, 25, 100690. https://doi.org/10.1016/j.imu.2021.100690
Oh, S. (2011). A new dataset evaluation method based on category overlap. Computers in Biology and Medicine, 41(2), 115–122. https://doi.org/10.1016/j.compbiomed.2010.12.006
Shin, J., Yoon, S., Kim, Y., Kim, T., Go, B., & Cha, Y. (2020). Effects of class imbalance on resampling and ensemble learning for improved prediction of cyanobacteria blooms. Ecological Informatics, 101202. https://doi.org/10.1016/j.ecoinf.2020.101202
Soltanzadeh, P., & Hashemzadeh, M. (2021). RCSMOTE: Range-Controlled synthetic minority over-sampling technique for handling the class imbalance problem. Information Sciences, 542, 92–111. https://doi.org/10.1016/j.ins.2020.07.014
Wang, Y.-C., & Cheng, C.-H. (2021). A multiple combined method for rebalancing medical data with class imbalances. Computers in Biology and Medicine, 134, 104527. https://doi.org/10.1016/j.compbiomed.2021.104527
Xu, Z., Shen, D., Nie, T., & Kou, Y. (2020). A hybrid sampling algorithm combining M-SMOTE and ENN based on Random forest for medical imbalanced data. Journal of Biomedical Informatics, 107, 103465. https://doi.org/10.1016/j.jbi.2020.103465
Zhu, Y., Yan, Y., Zhang, Y., & Zhang, Y. (2020). EHSO: Evolutionary Hybrid Sampling in overlapping scenarios for imbalanced learning. Neurocomputing, 417, 333–346. https://doi.org/10.1016/j.neucom.2020.08.060
DOI: http://dx.doi.org/10.22303/csrid.13.3a.2021.465-474
Refbacks
- Saat ini tidak ada refbacks.
##submission.copyrightStatement##
##submission.license.cc.by4.footer##
INDEXED BY:
This work is licensed under a Creative Commons Attribution 4.0 International License.
CSRID Journal Editor's Office:
Universitas Potensi Utama. Jl. K.L. Yos Sudarso Km 6,5 No.3-A Telp. (061) 6640525 Ext. 214 Tanjung Mulia Medan 20241