HYBRID APPROACH DENGAN EVOLUTIONARY HYBRID SAMPLING UNTUK PERMASALAHAN CLASS IMBALANCE DAN OVERLAPPING

Teddy Surya Gunawan, Hartono Hartono, Sofyan Rahmad, Nurafni Damanik

Sari


Permasalahan Class Imbalance merupakan permasalahan yang perlu mendapat penanganan serius di dalam proses klasifikasi. Permasalahan ini tidak dapat dihindari karena kecenderungan distribusi instance yang tidak seimbang yang mengakitbatkan suatu class memiliki instance yang jauh lebih besar dibandingkan class lainnya. Hal ini dapat mempengaruhi akurasi klasifikasi karena class dengan jumlah instance yang lebih besar memiliki akurasi yang lebih baik dibandingkan dengan class dengan jumlah instance yang lebih kecil. Penanganan class imbalance menggunakan pendekatan data-level, algorithm-level, dan hybrid approach. Hybrid approach yang menggabungkan data-level dan algorithm-level cenderung memberikan hasil yang lebih baik di dalam penanganan class imbalance. Di dalam Hybrid Approach penggunaan over-sampling dapat mengakitkan kondisi overlapping dan meaningless samples sedangkan under-sampling mengakibatkan hilangnya informasi penting dari majority samples. Pendekatan Evolutionary Hybrid Sampling digunakan untuk untuk menganalisa distribusi data pada original data dan memberikan area overlapping diantara majority dan minority dengan menggabungkan pendekatan over-sampling pada minority samples dan under-sampling pada majority samples. Penerapan Hybrid Approach dengan Evolutionary Hybrid Sampling diharapkan akan memberikan hasil yang lebih baik dibandingkan dengan Hybrid Approach dengan SMOTE sebagai metode Sampling. Hasil penelitian menunjukkan bahwa Hybrid Approach dengan Evolutionary Hybrid Sampling memberikan hasil yang lebih baik pada Augmented R-Value, Precision, dan Recall.


Kata Kunci


Class Imbalance;Hybrid Approach;Over-Sampling;Under-Sampling;Evolutionary-Hybrid Sampling

Teks Lengkap:

PDF

Referensi


Ahsan, R., Ebrahimi, F., & Ebrahimi, M. (2022). Classification of imbalanced protein sequences with deep-learning approaches; application on influenza A imbalanced virus classes. Informatics in Medicine Unlocked, 100860. https://doi.org/10.1016/j.imu.2022.100860

Alcalá-Fdez, J., Sánchez, L., García, S., Jesus, M. J. del, Ventura, S., Garrell, J. M., Otero, J., Romero, C., Bacardit, J., Rivas, V. M., Fernández, J. C., & Herrera, F. (2009). KEEL: A software tool to assess evolutionary algorithms for data mining problems. Soft Computing, 13(3), 307–318. https://doi.org/10.1007/s00500-008-0323-y

Bach, M., Werner, A., & Palt, M. (2019). The Proposal of Undersampling Method for Learning from Imbalanced Datasets. Procedia Computer Science, 159, 125–134. https://doi.org/10.1016/j.procs.2019.09.167

Czarnowski, I. (2022). Weighted Ensemble with one-class Classification and Over-sampling and Instance selection (WECOI): An approach for learning from imbalanced data streams. Journal of Computational Science, 61, 101614. https://doi.org/10.1016/j.jocs.2022.101614

De Angeli, K., Gao, S., Danciu, I., Durbin, E. B., Wu, X.-C., Stroup, A., Doherty, J., Schwartz, S., Wiggins, C., Damesyn, M., Coyle, L., Penberthy, L., Tourassi, G. D., & Yoon, H.-J. (2022). Class imbalance in out-of-distribution datasets: Improving the robustness of the TextCNN for the classification of rare cancer types. Journal of Biomedical Informatics, 125, 103957. https://doi.org/10.1016/j.jbi.2021.103957

de Morais, R. F. A. B., & Vasconcelos, G. C. (2019). Boosting the performance of over-sampling algorithms through under-sampling the minority class. Neurocomputing, 343, 3–18. https://doi.org/10.1016/j.neucom.2018.04.088

Eshelman, L. J. (1991). The CHC Adaptive Search Algorithm: How to Have Safe Search When Engaging in Nontraditional Genetic Recombination. In G. J. E. Rawlins (Ed.), Foundations of Genetic Algorithms (Vol. 1, pp. 265–283). Elsevier. https://doi.org/10.1016/B978-0-08-050684-5.50020-3

Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., & Herrera, F. (2012). A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(4), 463–484. https://doi.org/10.1109/TSMCC.2011.2161285

Hanskunatai, A. (2018). A New Hybrid Sampling Approach for Classification of Imbalanced Datasets. 2018 3rd International Conference on Computer and Communication Systems (ICCCS), 67–71. https://doi.org/10.1109/CCOMS.2018.8463228

Luque, A., Carrasco, A., Martín, A., & de las Heras, A. (2019). The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition, 91, 216–231. https://doi.org/10.1016/j.patcog.2019.02.023

Mienye, I. D., & Sun, Y. (2021). Performance analysis of cost-sensitive learning methods with application to imbalanced medical data. Informatics in Medicine Unlocked, 25, 100690. https://doi.org/10.1016/j.imu.2021.100690

Oh, S. (2011). A new dataset evaluation method based on category overlap. Computers in Biology and Medicine, 41(2), 115–122. https://doi.org/10.1016/j.compbiomed.2010.12.006

Shin, J., Yoon, S., Kim, Y., Kim, T., Go, B., & Cha, Y. (2020). Effects of class imbalance on resampling and ensemble learning for improved prediction of cyanobacteria blooms. Ecological Informatics, 101202. https://doi.org/10.1016/j.ecoinf.2020.101202

Soltanzadeh, P., & Hashemzadeh, M. (2021). RCSMOTE: Range-Controlled synthetic minority over-sampling technique for handling the class imbalance problem. Information Sciences, 542, 92–111. https://doi.org/10.1016/j.ins.2020.07.014

Wang, Y.-C., & Cheng, C.-H. (2021). A multiple combined method for rebalancing medical data with class imbalances. Computers in Biology and Medicine, 134, 104527. https://doi.org/10.1016/j.compbiomed.2021.104527

Xu, Z., Shen, D., Nie, T., & Kou, Y. (2020). A hybrid sampling algorithm combining M-SMOTE and ENN based on Random forest for medical imbalanced data. Journal of Biomedical Informatics, 107, 103465. https://doi.org/10.1016/j.jbi.2020.103465

Zhu, Y., Yan, Y., Zhang, Y., & Zhang, Y. (2020). EHSO: Evolutionary Hybrid Sampling in overlapping scenarios for imbalanced learning. Neurocomputing, 417, 333–346. https://doi.org/10.1016/j.neucom.2020.08.060




DOI: http://dx.doi.org/10.22303/csrid.13.3a.2021.465-474

Refbacks

  • Saat ini tidak ada refbacks.


##submission.copyrightStatement##

##submission.license.cc.by4.footer##

INDEXED BY:

         Image result for icon mendeley

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License

CSRID Journal Editor's Office:

Universitas Potensi Utama. Jl. K.L. Yos Sudarso Km 6,5 No.3-A Telp. (061) 6640525 Ext. 214 Tanjung Mulia Medan 20241