Usage of HIVE Tool in Hadoop ECO System with Loading Data and User Defined Functions

1Dr. K. Uma Pavan Kumar, Dr. Lakshma Reddy Bhavanam

121 Views
47 Downloads
Abstract:

The general usage of Hadoop is to store the bulk data with Hadoop Distributed File System and to process the data with Map Reduce. Apart from this the eco system provides extensive functionalities like usage of query-based logics to import the data from local path and Hadoop distributed path. This article presents the usage of Hive in the context of loading the bulk data and some simple analytics applicability. The Hive User Defined functions (UDF) creation and running with eclipse is the additional context of the paper. The work explains the parameters involved in the processing of the data loading and working with UDF’s so as to simplify the Map Reduce (MR) process with HIVE commands.The context of Map Reduce requires the complex coding skills, and the problem is only HDFS path is known to the MR, there is no approach of working with local file system. The basic advantage of Hive is to work with local path files and as well as HDFS path files. Similarly processing wise Hive simplifies coding and functions usage with the implementation of the simple commands.The case study taken in this article deals with various parameters like page views data, system_IP, View_time, user_id and page_url. The other case study we have taken is loading of the bulk data in the less time.The outcome of the work is loading of the data in the context of local path and Hadoop Distributed Path. Loading of the bulk data within seconds and recording of the time taken is the other outcome. The creation of the UDF and running of the tasks in HIVE is the resultant of the work. Apart from these considerations the research issues and possible extension works can be observed in the article.

Keywords:

Hive, Import, UDF, Map Reduce, Data Loading.

Paper Details
Month2
Year2020
Volume24
IssueIssue 4
Pages1058-1062