Text Automation Classification Using the Lexical Feature Representation Method

Authors

  • Murnawan Murnawan is with Information System Department, Widyatama University, Bandung, Indonesia Author
  • A.E. Virgana R. R.A.E. Virgana is with Information System Department, Widyatama University, Bandung, Indonesia Author
  • Sri Lestari Kadiyono Sri Lestari is with Information System Department, Widyatama University, Bandung, Indonesia Author

DOI:

https://doi.org/10.61841/03346049

Keywords:

text classification, feature representation, bag of concept, n-gram

Abstract

The process of automating text classification plays an important role in organizing a text document, determining the characteristics and characteristics of a document. To determine a characteristic or information hidden in a large dataset is very necessary, this is because the unstructured document has many meanings, different meanings and purposes. Therefore, it is necessary to have a special method that can provide important information contained in a text document. The feature representation method that will be used in this research is N-Grams, as well as the use of bag of concepts which is the development of the concept of bag of words to reduce the level of computing in forming feature representations. The purpose of this research is to design a question categorization automation feature that does not have a category contained in a text document, by applying lexical feature representation concepts such as bag of concepts, bag of word and N-Gram to the question categorization automation feature. Based on the results of experiments on WEKA, the combination of lexical features between unigram, bigram, trigram and keyword from each category in the implementation of making data models using cross validation with a fold number of 10 shows that the combination of the bigram trigram and keyword features gives the percentage of instance properly classified more correctly high compared to other feature combinations that is equal to 96.5% with the J48 Tree classifier. 

Downloads

Download data is not yet available.

References

[1] S. Russell and P. Norvig, Artificial Intelligence A Modern Approach. New Jersey: Pearson Education, Inc., 2010.

[2] S. Shalev-Shwartz and S. Ben-David, Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014.

[3] M. Swamynathan, Mastering Machine Learning with Python in Six Steps - A Practical Implementation Guide toPredictive Data Analytics Using Python. Bangalore: Apress, 2017.

[4] A. Smola and S. V. . Vishwanathan, Introduction to Machine Learning. New York: Cambridge University Press, 2008.

[5] A. Chopra, A. Prashar, and C. Sain, “Natural Language Processing,” Int. J. Technol. Enhanc. Emerg. Eng. Reserach, vol. 1, no. 4, pp. 131–134, 2013

[6] M. Ikonomakis, S. Kotsiantis, and V. Tampakas, “Text Classification Using Machine Learning Techniques,” WSEAS Trans. Comput., vol. 4, no. 8, pp. 966–974, 2005,

[7] M. Adriani, J. Asian, B. Nazief, H. E. Williams, and S. M. M. Tahaghoghi, “Stemming Indonesian: A Confix-Stripping Approach,” Conf. Res. Pract. Inf. Technol. Ser., vol. 38, no. September 2018, pp. 307–314, 2007

Downloads

Published

29.02.2020

How to Cite

Murnawan, R., A. V., & Lestari Kadiyono, S. (2020). Text Automation Classification Using the Lexical Feature Representation Method. International Journal of Psychosocial Rehabilitation, 24(1), 2497-2506. https://doi.org/10.61841/03346049