IMPROVEMENT OF CLASSIFIER ACCURACY ON CLINICAL DATA SETS BY OPTIMAL SELECTION OF IMPUTATION METHODS

1C.UshaNandhini, Dr. P.R.Tamilselvi

159 Views
76 Downloads
Abstract:

Missing value imputation is one of the biggest tasks of data pre-processing whenperforming data mining. Most clinical datasets are usually incomplete. Simplyremoving the incomplete cases from the original datasets can bring more problemsthan solutions. A suitable method for missing value imputation can help to producegood quality datasets for betteranalyzing clinical trials. In this paper we explore theuse of a machine learning technique as a missing value imputation method forincomplete cardiovascular data. Mean imputation, Group mean imputation, kNN imputation and Multi-Linear Regression Imputation are used as missing value imputation and the imputed datasets are subject to classification and prediction using C5.0 and Random Forest classifier. The experiment shows that final classifier performance is improvedwhen Multi-Linear Regression Imputation is used to predict missingattribute values for Random Forest and in most cases, the machine learningtechniques were found to perform better than the standard mean imputationtechnique.

Keywords:

Missing value imputation - cardiovascular data - Mean imputation -Group mean imputation -kNN imputation -Multi-Linear Regression Imputation- C5.0 – Random Forest – Performance Measures.

Paper Details
Month5
Year2020
Volume24
IssueIssue 5
Pages8442-8453