Ensemble Feature Selection to improve the classifier Performance in Sentimental Analysis

1M. Gunasekar, S. Naveen Kumar, K. Sakthi Gnanesh, T.A.Salman Syed Mukthar, K.Shribalaji


Pre-trained word embedding’s are used in several downstream applications as well as for constructing representations for sentences, paragraphs and documents. One improvement area is reducing the dimensionality of word embedding. Reducing the size of word embedding can improve their utility in memoryconstrained devices, benefiting several real world applications. Therefore, in this paper, we focus on how to classify textual information and it consist of online comments from Wikipedia talk page edits where an unsupervised learning approaches are used to obtain better performance of sentimental analysis. To this end, we first analyse the dataset for pre-training by using a phenomenon called glove word embedding, and giving some unique dimensions to each comments. Then we reduce the dimensions of the comments using dimensionality reduction approach and propose an iterative algorithm called t-SNE to visualise the high dimensional data. Finally, a Bidirectional LSTM model is built using keras to classify the sentences into appropriate types of toxicity. To the best of our knowledge, this work is first to the study of negative online behaviours, like various types of toxic comments.


Sentimental Analysis, Word Embedding Dimensionality Reduction, Visualization, Toxicity

Paper Details
IssueIssue 3