Text Automation Classification Using the Lexical Feature Representation Method
DOI:
https://doi.org/10.61841/03346049Keywords:
text classification, feature representation, bag of concept, n-gramAbstract
The process of automating text classification plays an important role in organizing a text document, determining the characteristics and characteristics of a document. To determine a characteristic or information hidden in a large dataset is very necessary, this is because the unstructured document has many meanings, different meanings and purposes. Therefore, it is necessary to have a special method that can provide important information contained in a text document. The feature representation method that will be used in this research is N-Grams, as well as the use of bag of concepts which is the development of the concept of bag of words to reduce the level of computing in forming feature representations. The purpose of this research is to design a question categorization automation feature that does not have a category contained in a text document, by applying lexical feature representation concepts such as bag of concepts, bag of word and N-Gram to the question categorization automation feature. Based on the results of experiments on WEKA, the combination of lexical features between unigram, bigram, trigram and keyword from each category in the implementation of making data models using cross validation with a fold number of 10 shows that the combination of the bigram trigram and keyword features gives the percentage of instance properly classified more correctly high compared to other feature combinations that is equal to 96.5% with the J48 Tree classifier.
Downloads
References
[1] S. Russell and P. Norvig, Artificial Intelligence A Modern Approach. New Jersey: Pearson Education, Inc., 2010.
[2] S. Shalev-Shwartz and S. Ben-David, Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014.
[3] M. Swamynathan, Mastering Machine Learning with Python in Six Steps - A Practical Implementation Guide toPredictive Data Analytics Using Python. Bangalore: Apress, 2017.
[4] A. Smola and S. V. . Vishwanathan, Introduction to Machine Learning. New York: Cambridge University Press, 2008.
[5] A. Chopra, A. Prashar, and C. Sain, “Natural Language Processing,” Int. J. Technol. Enhanc. Emerg. Eng. Reserach, vol. 1, no. 4, pp. 131–134, 2013
[6] M. Ikonomakis, S. Kotsiantis, and V. Tampakas, “Text Classification Using Machine Learning Techniques,” WSEAS Trans. Comput., vol. 4, no. 8, pp. 966–974, 2005,
[7] M. Adriani, J. Asian, B. Nazief, H. E. Williams, and S. M. M. Tahaghoghi, “Stemming Indonesian: A Confix-Stripping Approach,” Conf. Res. Pract. Inf. Technol. Ser., vol. 38, no. September 2018, pp. 307–314, 2007
Downloads
Published
Issue
Section
License
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
- Attribution — You must give appropriate credit , provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation .
No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.