ASSOCIATION OF IDENTICAL PAIRS USING NATURAL LANGUAGE PROCESSING

1Saravanan Alagarsamy, Kartheeban Kamatchi, Mehta Maharshi, Nilesh Nirav, Moksh Kaushal

154 Views
40 Downloads
Abstract:

Question duplication is the serious issue experienced by question and answer discussion forum like Quora, Stack-flood, Reddit, and so on. Answers get divided across various adaptations of a similar inquiry because of the repetition of inquiries in these gatherings. In the end, this outcome in absence of a reasonable pursuit, answer weakness, isolation of data and the lack of reaction to the examiners. The copied questions can be identified utilizing Machine Learning and Natural Language Processing. Dataset of in excess of 400,000 inquiries sets gave by Quora are preprocessed through tokenization, lemmatization and evacuation of stop words. This prehandled dataset is utilized for the element extraction. Fake Neural Network is then planned and the highlights thus removed, are fit into the model. This neural system gives exactness of 86.09%. More or less, this examination predicts the semantic fortuitous event between the inquiry sets removing profoundly prevailing aspects and consequently, decide the likelihood of inquiry being copy.

Keywords:

Nature Language Processing, Vector Space Modeling, Artificial Intelligence.

Paper Details
Month4
Year2020
Volume24
IssueIssue 6
Pages7320-7327