AUTHOR IDENTIFICATION OF HINDI STORIES
DOI:
https://doi.org/10.61841/jjng7371Keywords:
Author Identification,, Feature Selection,, Hindi Stories,, J48 Decision Tree,, Machine Learning,, Stylometry, ., WekaAbstract
Attribution also called Authorship Identification determines the probability of work that is produced by any author by examining other works from that same author. This process is used in various places like Characterization of work of an author, detecting Plagiarism, Cybercrime analysis etc. In this paper, we are using this process on a corpus of 70 Hindi Stories each from three different authors. Various lexical and structural features are extracted from these works like Word count, Average length of sentence, Frequency of words and characters, Function Words etc. With help of these features we build a dataset and use it as input in J48 decision tree algorithm for determining the best features that help in authorship attribution. We then use these extracted features on different types of algorithm like SMO, Bayes Net, Naïve Bayes, J48 etc. and select the algorithm with the best accuracy for classifying author.
Downloads
References
1. Paulo Varela et al, “A computational approach for authorship attribution of literary texts using sintactic features”, 2016 International Joint Conference on Neural Networks (IJCNN)
2. S. Bourib et al, “Author Identification on Noise Arabic Documents”, 2018 5th International Conference on Control, Decision and Information Technologies (CoDIT’18), pp 216-221.
3. Alaa Saleh Altheneyan et al, “Naïve Bayes classifiers for authorship attribution of Arabic texts”, Journal of King Saud University – Computer and Information Sciences (2014) 26, 473–484.
4. Kale Sunil Digamberrao et al, “Author Identification using Sequential Minimal Optimization with rule- based Decision Tree on Indian Literature in Marathi” Procedia Computer Science (2018) volume 132, pp 1086-1101.
5. Kholoud Alsmearat et al, “Author gender identification from Arabic text” Journal of Information Security and Applications 35(2017),85-95
6. Tanmoy Chakraborty et al, “Authorship Identification in Bengali Language: A Graph Based Approach” 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp 443-446.
7. Aida-zаdе К.R. et al, “Authorship Identification of the Azerbaijani Texts Using n-grams”, 2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT)
8. Shanta Phani et al, “Authorship Attribution in Bengali Language” Proceedings of the 12th International MConference on Natural Language Processing (2015) pp:100-105.
9. Chunxia Zhang et al, “Authorship identification from unstructured texts” Knowledge-Based Systems (2014), 99-111
10. Antonio Nemeab et al, “Stylistics analysis and authorship attribution algorithms based on self-organizing maps” Neuro Computing (2015) Volume147 pp:147-159.
11. A. Pandian et al, “Author Identification of Bengali Poems” 2018 International journal of Engineering and Technology, Vol 7, No 4.19 pp 17-21.
12. Barathi Ganesh H B et al, “Author Identification based on Word Distribution in Word Space, 2015 IEEE
13. Al-Falahi Ahmed et al, “Authorship Attribution in Arabic Poetry”,78-1-4799-7560-0/15, 2015, IEEE
14. Michael R. Schmid et al, “E-mail authorship attribution using customized associative classification, Digital Investigation, Volume 14(2015) pp 116-126.
15. Pandian, A et al, "Authorship categorization in email investigations using Fisher's linear discriminant method with radial basis function" (2014) Journal of Computer Science, 10 (6), pp. 1003-1014 http://www.hindi-kavita.com
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
- Attribution — You must give appropriate credit , provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation .
No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.