Correlation Evaluation Scale Through Text Mining Algorithms and Implementation on the Kurdish Language
A Review
DOI:
https://doi.org/10.25212/lfu.qzj.8.2.53Keywords:
Text Mining, Classification, Document clustering, Association rule mining, Keyword extraction.Abstract
In recent times, because of the many articles that are found on the web, text extraction has become an interesting area of research. Text mining is a technique that can be used to extract useful information or knowledge from text documents that are not typically in an unstructured form. There are some studies conducted to use different techniques of text production for unstructured data sets. This study will provide an overview of the different methods and algorithms that are related to Text mining and also some studies on the mining of Kurdish web documentation. In addition, a collection of research problems and research methodologies will assist scholars in tracking their future research.
Downloads
References
Abdulla, S., & Hama, M. H. (2015). Sentiment analyses for Kurdish social network texts using Naive Bayes classifier. Journal of University of Human Development, 1(4), 393-397.
Aggarwal, C. C., & Zhai, C. (2012a). An introduction to text mining Mining text data (pp. 1-10): Springer.
Aggarwal, C. C., & Zhai, C. (2012b). Mining text data: Springer Science & Business Media.
Agnihotri, D., Verma, K., & Tripathi, P. (2014). Pattern and Cluster Mining on Text Data.
Ahmadi, S. (2019). A rule-based Kurdish text transliteration system. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 18(2), 1-8.
Belwal, R. C., Rai, S., & Gupta, A. (2021). A new graph-based extractive text summarization using keywords or topic modeling. Journal of Ambient Intelligence and Humanized Computing, 12(10), 8975-8990.
Berry, M. W., & Castellanos, M. (2004). Survey of text mining. Computing Reviews, 45(9), 548.
Boukil, S., Biniz, M., El Adnani, F., Cherrat, L., & El Moutaouakkil, A. E. (2018). Arabic text classification using deep learning technics. International Journal of Grid and Distributed Computing, 11(9), 103-114.
Dai, R. (2021). Text Data Mining Algorithm Combining CNN and DBM Models. Mobile Information Systems, 2021.
Edmonds, C. J. (1971). Kurdish nationalism. Journal of contemporary history, 6(1), 87-107.
Esmaili, K. S., Eliassi, D., Salavati, S., Aliabadi, P., Mohammadi, A., Yosefi, S., & Hakimi, S. (2013). Building a test collection for Sorani Kurdish. Paper presented at the 2013 ACS International Conference on Computer Systems and Applications (AICCSA).
Esmaili, K. S., & Salavati, S. (2013). Sorani Kurdish versus Kurmanji Kurdish: an empirical comparison. Paper presented at the Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers).
Gaikwad, S. V., Chaugule, A., & Patil, P. (2014). Text mining methods and techniques. International Journal of Computer Applications, 85(17).
Gao, W., Peng, M., Wang, H., Zhang, Y., Han, W., Hu, G., & Xie, Q. (2020). Generation of topic evolution graphs from short text streams. Neurocomputing, 383, 282-294.
Ghosh, S., Roy, S., & Bandyopadhyay, S. K. (2012). A tutorial review on Text Mining Algorithms. International Journal of Advanced Research in Computer and Communication Engineering, 1(4), 7.
Goh, Y. M., & Ubeynarayana, C. (2017). Construction accident narrative classification: An evaluation of text mining techniques. Accident Analysis & Prevention, 108, 122-130.
Gupta, V., & Lehal, G. S. (2009). A survey of text mining techniques and applications. Journal of emerging technologies in web intelligence, 1(1), 60-76.
Haig, G., & Matras, Y. (2002). Kurdish linguistics: a brief overview. STUF-Language Typology and Universals, 55(1), 3-14.
Hamarashid, H. K., Saeed, S. A., & Rashid, T. A. (2021). Next word prediction based on the N-gram model for Kurdish Sorani and Kurmanji. Neural Computing and Applications, 33(9), 4547-4566.
Hassani, H., & Kareem, R. (2011). Kurdish text to speech (KTTS). Paper presented at the Tenth International Workshop on Internationalisation of Products and Systems.
Hassani, H., & Medjedovic, D. (2016). Automatic Kurdish dialects identification. Computer Science & Information Technology, 6(2), 61-78.
Hotho, A., Nürnberger, A., & Paaß, G. (2005). A brief survey of text mining. Paper presented at the Ldv Forum.
Inzalkar, S., & Sharma, J. (2015). A survey on text mining-techniques and application. International Journal of Research In Science & Engineering, 24, 1-14.
Jalal, A. A., & Ali, B. H. (2021). Text documents clustering using data mining techniques. International Journal of Electrical & Computer Engineering (2088-8708), 11(1).
Kak, S. F., Mustafa, F. M., & Valente, P. (2018). A review of person recognition based on face model. Eurasian Journal of Science & Engineering, 4(1), 157-168.
Kamal, Z., & Hassani, H. (2020). Towards Kurdish text to sign translation. Paper presented at the Proceedings of the LREC2020 9th Workshop on the Representation and Processing of Sign Languages: Sign Language Resources in the Service of the Language Community, Technological Challenges and Application Perspectives.
Kamruzzaman, S., Haider, F., & Hasan, A. R. (2010). Text classification using data mining. arXiv preprint arXiv:1009.4987.
Kanya, N., & Geetha, S. (2007). Information Extraction-a text mining approach.
Kim, J.-C., & Chung, K. (2019). Associative feature information extraction using text mining from health big data. Wireless Personal Communications, 105(2), 691-707.
Kim, J., Jang, S., Park, E., & Choi, S. (2020). Text classification using capsules. Neurocomputing, 376, 214-221.
Liu, B. (2011). Information retrieval and Web search Web Data Mining (pp. 211-268): Springer.
MacKenzie, D. N. (1962). Kurdish Dialect, Studies 1 (Vol. 2): Oxford University Press.
Malmasi, S. (2016). Subdialectal differences in sorani kurdish. Paper presented at the Proceedings of the third workshop on nlp for similar languages, varieties and dialects (vardial3).
Rashid, A., Shoaib, U., & ShahzadSarfraz, M. KNOWLEDGE DISCOVERY IN DATABASE USING INTENTION MINING.
Rashid, T. A., Mustafa, A. M., & Saeed, A. M. (2017). Automatic Kurdish text classification using KDC 4007 dataset. Paper presented at the International Conference on Emerging Internetworking, Data & Web Technologies.
Rehman, Z., Anwar, W., & Bajwa, U. I. (2011). Challenges in Urdu text tokenization and sentence boundary disambiguation. Paper presented at the Proceedings of the 2nd Workshop on South Southeast Asian Natural Language Processing (WSSANLP).
Runeson, P., Alexandersson, M., & Nyholm, O. (2007). Detection of duplicate defect reports using natural language processing. Paper presented at the 29th International Conference on Software Engineering (ICSE'07).
Saeed, A. M., Rashid, T. A., Mustafa, A. M., Agha, R. A. A.-R., Shamsaldin, A. S., & Al-Salihi, N. K. (2018). An evaluation of Reber stemmer with longest match stemmer technique in Kurdish Sorani text classification. Iran Journal of Computer Science, 1(2), 99-107.
Salloum, S. A., AlHamad, A. Q., Al-Emran, M., & Shaalan, K. (2018). A survey of Arabic text mining Intelligent Natural Language Processing: Trends and Applications (pp. 417-431): Springer.
Saura, J. R., Palos-Sanchez, P., & Grilo, A. (2019). Detecting indicators for startup business success: Sentiment analysis using text data mining. Sustainability, 11(3), 917.
Shamsfard, M. (2011). Challenges and open problems in Persian text processing. Proceedings of LTC, 11.
Shamsfard, M., Jafari, H. S., & Ilbeygi, M. (2010). STeP-1: A Set of Fundamental Tools for Persian Text Processing. Paper presented at the LREC.
Shao, Z., Li, Y., Wang, X., Zhao, X., & Guo, Y. (2020). Research on a new automatic generation algorithm of concept map based on text analysis and association rules mining. Journal of ambient intelligence and humanized computing, 11(2), 539-551.
Shehata, S., Karray, F., & Kamel, M. (2006). Enhancing text clustering using concept-based mining model. Paper presented at the Sixth International Conference on Data Mining (ICDM'06).
Solka, J. L. (2008). Text data mining: theory and methods. Statistics Surveys, 2, 94-112.
Sukanya, M., & Biruntha, S. (2012). Techniques on text mining. Paper presented at the 2012 IEEE International Conference on Advanced Communication Control and Computing Technologies (ICACCCT).
Suresh, R., & Harshni, S. (2017). Data mining and text mining—a survey. Paper presented at the 2017 International Conference on Computation of Power, Energy Information and Commuincation (ICCPEIC).
Tandel, S. S., Jamadar, A., & Dudugu, S. (2019). A survey on text mining techniques. Paper presented at the 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS).
Tapsai, C., Meesad, P., & Haruechaiyasak, C. (2016). TLS-ART: Thai language segmentation by automatic ranking trie. Paper presented at the 9th International Conference Autonomous Systems.
Vijayarani, S., Ilamathi, M. J., & Nithya, M. (2015). Preprocessing techniques for text mining-an overview. International Journal of Computer Science & Communication Networks, 5(1), 7-16.
Volodina, E., Granstedt, L., Matsson, A., Megyesi, B., Pilán, I., Prentice, J., . . . Sundberg, G. (2019). The SweLL Language Learner Corpus: From Design to Annotation. Northern European Journal of Language Technology, 6, 67-104.
Wallace, B. C., Paul, M. J., Sarkar, U., Trikalinos, T. A., & Dredze, M. (2014). A large-scale quantitative analysis of latent factors and sentiment in online doctor reviews. Journal of the American Medical Informatics Association, 21(6), 1098-1103.
Walther, G., & Sagot, B. (2010). Developing a large-scale lexicon for a less-resourced language: General methodology and preliminary experiments on Sorani Kurdish. Paper presented at the Proceedings of the 7th SaLTMiL Workshop on Creation and use of basic lexical resources for less-resourced languages (LREC 2010 Workshop).
Zhang, F., Fleyeh, H., Wang, X., & Lu, M. (2019). Construction site accident analysis using text mining and natural language processing techniques. Automation in Construction, 99, 238-248.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Qalaai Zanist Journal allows the author to retain the copyright in their articles. Articles are instead made available under a Creative Commons license to allow others to freely access, copy and use research provided the author is correctly attributed.
Creative Commons is a licensing scheme that allows authors to license their work so that others may re-use it without having to contact them for permission