Reduction in High-Dimensional Data by using HDRF with Random Forest Classifier

Authors

  • Ahmed Najat Ahmed Department of Computer Engineering, College of Engineering and Computer Science, Lebanese French University, Erbil, Kurdistan Region, Iraq.

DOI:

https://doi.org/10.25212/lfu.qzj.6.4.30

Keywords:

Random forest, ensemble pruning, kappa measure, hybrid dimensionality reduction forest

Abstract

In machine learning, high-dimensional data classification is one of the significant challenges. For dimensionality reduction, the traditional classifiers enhance the variety of classifiers. The conventional method has some restrictions; information loss is caused during dimensionality reduction, minimizing accuracy. The selection of the sample is vulnerable to redundant features and noise. The proposed method, Hybrid Dimensionality Reduction Forest (HDRF) with Random Forest (RF) ensemble classifier and kappa measure, were used to overcome those restrictions. Initially, the Kappa measure is used for pruning, and the higher degree is selected from the forest. For partitioning the features, a tree-based selection method is used. Principal Component Analysis (PCA) is used for feature extraction, noise reduction, and dimensionality reduction. The proposed method removes the weak classifiers and eliminates redundancy. Also, it reduces the unselected structures and the fundamental structures into a new system. The evaluation results on 25 high-dimensional data, the proposed method outperforms with Random Forest ensemble classifier methods and provides enhanced results obtained on 21 out of 25.

Downloads

Download data is not yet available.

References

Ali, M. S., & Yahiya, T. A. (2018, October). Performance Analysis of Native Ipv4/Ipv6 Networks Compared to 6to4 Tunnelling Mechanism. In 2018 International Conference on Advanced Science and Engineering (ICOASE) (pp. 250-255). IEEE.

Alzami, F., Tang, J., Yu, Z., Wu, S., Chen, C. P., You, J., & Zhang, J. (2018). Adaptive hybrid feature selection-based classifier ensemble for epileptic seizure classification. IEEE Access, 6, 29132-29145.

Balaji, B. S., Rajkumar, R. S., & Ibrahim, B. F. (2019). Service profile based ontological system for selection and ranking of business process web services. International Journal of Advanced Trends in Computer Science and Engineering, 8, 18-22.

Chen, H., Tiňo, P., & Yao, X. (2009). Predictive ensemble pruning by expectation propagation. IEEE Transactions on Knowledge and Data Engineering, 21(7), 999-1013.

Chen, W., Xu, Y., Yu, Z., Cao, W., Chen, C. P., & Han, G. (2020). Hybrid dimensionality reduction forest with pruning for high-dimensional data classification. IEEE Access, 8, 40138-40150.

Diao, R., Chao, F., Peng, T., Snooke, N., & Shen, Q. (2013). Feature selection inspired classifier ensemble reduction. IEEE transactions on cybernetics, 44(8), 1259-1268.

Dong, Y., Du, B., & Zhang, L. (2015). Target detection based on random forest metric learning. IEEE Journal of selected topics in applied earth observations and remote sensing, 8(4), 1830-1838.

Guan, Y., Li, C. T., & Roli, F. (2014). On reducing the effect of covariate factors in gait recognition: a classifier ensemble method. IEEE transactions on pattern analysis and machine intelligence, 37(7), 1521-1528.

Hashim, E. W. A., Hammood, M. O., & Al-azraqe, M. T. I. (2016). A Cloud Computing System Based Laborites’ Learning Universities: Case Study of Bayan University’s Laborites-Erbil. Book of Proceeding, 538.

He, H., & Cao, Y. (2012). SSC: A classifier combination method based on signal strength. IEEE Transactions on neural networks and learning systems, 23(7), 1100-1117.

Jiang, Z., Liu, H., Fu, B., & Wu, Z. (2016). A novel Bayesian ensemble pruning method. In 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), IEEE,1205-1212.

Jidong, L., & Ran, Z. (2018). Dynamic weighting multi-factor stock selection strategy based on XGboost machine learning algorithm. In 2018 IEEE International Conference of Safety Produce Informatization (IICSPI), IEEE, 868-872.

Li, Y. S., Chi, H., Shao, X. Y., Qi, M. L., & Xu, B. G. (2020). A novel random forest approach for imbalance problem in crime linkage. Knowledge-Based Systems, 195, 105738.

Lu, Y. C., Lu, C. J., Chang, C. C., & Lin, Y. W. (2017). A hybrid of data mining and ensemble learning forecasting for recurrent ovarian cancer. In 2017 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS), 216-216. IEEE.

Margineantu, D. D., & Dietterich, T. G. (1997). Pruning adaptive boosting. ICML’97, 211-218.

Miran, A., & Kadir, G. (2019). Enhancing AODV routing protocol to support QoS. International Journal of Advanced Trends in Computer Science and Engineering, 8(5), 1824–1830. https://doi.org/10.30534/ijatcse/2019/04852019

Mohammed, A. G. (2021). A Study of Scheduling Algorithms in LTE-Advanced HetNet. QALAAI ZANIST SCIENTIFIC JOURNAL, 6(3), 945-968.

Nag, K., & Pal, N. R. (2015). A multiobjective genetic programming-based ensemble for simultaneous feature selection and classification. IEEE transactions on cybernetics, 46(2), 499-510.

Owaid, H. J. A. A. A., Noori, H. S. M., & Raouf, M. (2020). EFFECT OF HYBRID NANO MATERIAL (Ti+ Cu) ADDITION ON CORROSION–FATIGUE INTERACTION AND ELECTRICAL PROPERTIES. Iraqi journal of mechanical and material engineering, 20(1).

Qing, H., Hamedi, S., Eftekhari, S. A., Alizadeh, S. M., Toghraie, D., Hekmatifar, M., ... & Khan, A. (2021). A well-trained feed-forward perceptron Artificial Neural Network (ANN) for prediction the dynamic viscosity of Al2O3–MWCNT (40: 60)-Oil SAE50 hybrid nano-lubricant at different volume fraction of nanoparticles, temperatures, and shear rates. International Communications in Heat and Mass Transfer, 128, 105624.

Serafino, F., Pio, G., & Ceci, M. (2018). Ensemble learning for multi-type classification in heterogeneous networks. IEEE Transactions on Knowledge and Data Engineering, 30(12), 2326-2339.

Sheykhmousa, M., Mahdianpari, M., Ghanbari, H., Mohammadimanesh, F., Ghamisi, P., & Homayouni, S. (2020). Support vector machine vs. random forest for remote sensing image classification: A meta-analysis and systematic review. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 6308-6325.

Tang, S., Zheng, Y. T., Wang, Y., & Chua, T. S. (2011). Sparse ensemble learning for concept detection. IEEE Transactions on Multimedia, 14(1), 43-54.

Wang, Y., Yang, Y., Liu, Y. X., & Bharath, A. A. (2019). A recursive ensemble learning approach with noisy labels or unlabeled data. IEEE Access, 7, 36459-36470.

Wu, O. (2018). Classifier ensemble by exploring supplementary ordering information. IEEE Transactions on Knowledge and Data Engineering, 30(11), 2065-2077.

Wu, Q., Tan, M., Song, H., Chen, J., & Ng, M. K. (2016). ML-FOREST: A multi-label tree ensemble method for multi-label classification. IEEE transactions on knowledge and data engineering, 28(10), 2665-2680.

Yan, S., Ye, L., Han, S., Han, T., Li, Y., & Alasaarela, E. (2020). Speech Interactive Emotion Recognition System Based on Random Forest. In 2020 International Wireless Communications and Mobile Computing (IWCMC), IEEE, 1458-1462.

Yu, H., & Ni, J. (2014). An improved ensemble learning method for classifying high-dimensional and imbalanced biomedicine data. IEEE/ACM transactions on computational biology and bioinformatics, 11(4), 657-666.

Zhang, C., Wang, X., Chen, S., Li, H., Wu, X., & Zhang, X. (2021). A Modified Random Forest Based on Kappa Measure and Binary Artificial Bee Colony Algorithm. IEEE Access, 9, 117679-117690.

Downloads

Published

2021-12-30

How to Cite

Ahmed Najat Ahmed. (2021). Reduction in High-Dimensional Data by using HDRF with Random Forest Classifier . QALAAI ZANIST JOURNAL, 6(4), 876–889. https://doi.org/10.25212/lfu.qzj.6.4.30

Issue

Section

Articles