Rule based POS Tagger for Sanskrit
1Sharada Adinarayanan, J. Naren, P. Sriranjanie and Dr.G. Vithya
POS tagging is a process of attaching each word in a sentence with a suitable tag from the given set of tags. In the paper, rule based view of NLP is taken up for tagging the part of speech for Sanskrit words. The foundation for POS tagging is morphological analysis. The twelfth chapter of Bhagavad Gita is considered as input for POS tagging process. Annotated corpora will be developed and used for retrieving the grammatical category of the input text. Sanskrit is a language with very concrete grammar proposed by Panini (4000.B.C) and has layered grammatical structure. Thus rule based approach would fulfill the tagging process rather than stochastic or probabilistic approach (existing system).Therefore, the project aims to improve the accuracy by utilizing the efficient lookup strategies, searching and sorting techniques and finally rule formations(utilizing the richness of Sanskrit grammar) to quickly narrow down the assignment of grammatical category to words. The major challenge is the tokenization process of joined words. Since Sanskrit has many inflected noun and verb forms, identifying the correct grammatical category involves contextual meaning and semantics to be taken into view. Also, semantic analysis, derivative analysis and Sandhi analysis is done.
Annotated Corpora, Tokenization, Morphological Analysis.