Twitter Sentiment Analysis for Kurdish Language
DOI:
https://doi.org/10.25212/lfu.qzj.8.4.42Keywords:
Kurdish text, Machine learning, Sentiment analysis, Stemming.Abstract
Sentiment analysis of text data has received a significant attention throughout Natural Language Processing stages. However, most of the focus has been on English language depriving many other languages from taking advantage of the state-of-the-art techniques most suitable to a particular language especially the Kurdish Sorani language. This paper is an attempt to bridge the gap between English and Kurdish language in sentiment analysis for social media text. For this purpose, firstly a new Kurdish sentiment analysis dataset was curated and annotated then we tried different combinations of machine learning algorithms including classical machine learning algorithms such as Random Forrest, KNN, SVM, Naive Bayes bias and Decision trees and compared the results to Deep Learning techniques namely ANN, LSTM and CNN. In our experiments Naïve Bayes achieved the best results achieving an 78% accuracy.Downloads
References
Edmonds, A. J. (2013). The Dialects of. Ruprecht-Karls-Universität Heidelberg.
Abbasi, A., Chen, H., & Salem, A. (2008). Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. ACM transactions on information systems (TOIS), 1-34.
Abdulla, S., & Hama, M. (2015). Sentiment analyses for Kurdish social network texts using Naive Bayes classifier. Journal of University of Human Development, 393-397.
Ahmadi, S., & Masoud, M. (2020). Towards machine translation for the Kurdish language. arXiv preprint arXiv:2010.06041.
Anon. (2021, 7 1). Ethnologue Languages of the World. Retrieved from https://www.ethnologue.com
Aro, T. O. (2019). Homogenous ensembles on data mining techniques for breast cancer diagnosis. Daffodil International University.
Bayari, R. a. (2021). Text mining techniques for cyberbullying detection: state of the art. Adv. Sci. Technol. Eng. Syst. J, 783_790.
Esmaili, K., & Salavati, S. (2013). Sorani Kurdish versus Kurmanji Kurdish: an empirical comparison. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 300-305).
Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. . J. Mach. Learn. Res., 1289-1305.
Gamallo, P., & Garcia, M. (2014). Citius: A Naive-Bayes Strategy for Sentiment Analysis on English Tweets. In Semeval@ coling} (pp. 171-175).
Haig, G., & Matras, Y. (2002). Kurdish linguistics: a brief overview. STUF-Language Typology and Universals, 3_14.
Ismail, Z. B. (1977). Kurdish language history [tarikh al-alughat al-kurdiya]. Baghdad: al-Hawadth.
Nassr, Z., Sael, N., & Benabbou, F. (2020). Preprocessing arabic dialect for sentiment mining: State of art. The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, 323_330.
Rushdi-Saleh, M. a. (2011). OCA: Opinion corpus for Arabic. Journal of the American Society for Information Science and Technology, 2045-2054.
StopWords. (2021, 6 1). StopWords. Retrieved from https://stopwords.net/kurdish-ku
Vicinitas. (2021, 8 8). Vicinitas. Retrieved from https://www.vicinitas.io/free-tools/download-user-tweets
Waters, J., & Lester, J. (2010). The Everything Guide to Social Media: All you need to know about participating in today's most popular online communities. Simon and Schuster.
Zhang , L., Wang, S., & Liu, B. (2018). Deep learning for sentiment analysis: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, e12345.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Qalaai Zanist Journal allows the author to retain the copyright in their articles. Articles are instead made available under a Creative Commons license to allow others to freely access, copy and use research provided the author is correctly attributed.
Creative Commons is a licensing scheme that allows authors to license their work so that others may re-use it without having to contact them for permission