Twitter Sentiment Analysis for Kurdish Language


  • Didam Mahmud Department of Information Technology, College of Commerce, Sulaimani University, Sulaymaniyah, Iraq
  • Bawar Abid Abdalla Department of Software Engineering, Faculty of Engineering, Koya University, Koya, Iraq
  • Azhi Faraj Department of Information Technology, College of commerce, Sulaimani University, Sulaymaniyah, Iraq



Kurdish text, Machine learning, Sentiment analysis, Stemming.


Sentiment analysis of text data has received a significant attention throughout Natural Language Processing stages. However, most of the focus has been on English language depriving many other languages from taking advantage of the state-of-the-art techniques most suitable to a particular language especially the Kurdish Sorani language. This paper is an attempt to bridge the gap between English and Kurdish language in sentiment analysis for social media text.  For this purpose, firstly a new Kurdish sentiment analysis dataset was curated and annotated then we tried different combinations of machine learning algorithms including classical machine learning algorithms such as Random Forrest, KNN, SVM, Naive Bayes bias and Decision trees and compared the results to Deep Learning techniques namely ANN, LSTM and CNN. In our experiments Naïve Bayes achieved the best results achieving an 78% accuracy.


Download data is not yet available.


Edmonds, A. J. (2013). The Dialects of. Ruprecht-Karls-Universität Heidelberg.

Abbasi, A., Chen, H., & Salem, A. (2008). Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. ACM transactions on information systems (TOIS), 1-34.

Abdulla, S., & Hama, M. (2015). Sentiment analyses for Kurdish social network texts using Naive Bayes classifier. Journal of University of Human Development, 393-397.

Ahmadi, S., & Masoud, M. (2020). Towards machine translation for the Kurdish language. arXiv preprint arXiv:2010.06041.

Anon. (2021, 7 1). Ethnologue Languages of the World. Retrieved from

Aro, T. O. (2019). Homogenous ensembles on data mining techniques for breast cancer diagnosis. Daffodil International University.

Bayari, R. a. (2021). Text mining techniques for cyberbullying detection: state of the art. Adv. Sci. Technol. Eng. Syst. J, 783_790.

Esmaili, K., & Salavati, S. (2013). Sorani Kurdish versus Kurmanji Kurdish: an empirical comparison. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 300-305).

Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. . J. Mach. Learn. Res., 1289-1305.

Gamallo, P., & Garcia, M. (2014). Citius: A Naive-Bayes Strategy for Sentiment Analysis on English Tweets. In Semeval@ coling} (pp. 171-175).

Haig, G., & Matras, Y. (2002). Kurdish linguistics: a brief overview. STUF-Language Typology and Universals, 3_14.

Ismail, Z. B. (1977). Kurdish language history [tarikh al-alughat al-kurdiya]. Baghdad: al-Hawadth.

Nassr, Z., Sael, N., & Benabbou, F. (2020). Preprocessing arabic dialect for sentiment mining: State of art. The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, 323_330.

Rushdi-Saleh, M. a. (2011). OCA: Opinion corpus for Arabic. Journal of the American Society for Information Science and Technology, 2045-2054.

StopWords. (2021, 6 1). StopWords. Retrieved from

Vicinitas. (2021, 8 8). Vicinitas. Retrieved from

Waters, J., & Lester, J. (2010). The Everything Guide to Social Media: All you need to know about participating in today's most popular online communities. Simon and Schuster.

Zhang , L., Wang, S., & Liu, B. (2018). Deep learning for sentiment analysis: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, e12345.




How to Cite

Didam Mahmud, Bawar Abid Abdalla, & Azhi Faraj. (2023). Twitter Sentiment Analysis for Kurdish Language. QALAAI ZANIST JOURNAL, 8(4), 1132–1144.


