IMPROVEMENT OF CLASSIFIER ACCURACY ON CLINICAL DATA SETS BY OPTIMAL SELECTION OF IMPUTATION METHODS

C.UshaNandhini, Dr. P.R.Tamilselvi

IMPROVEMENT OF CLASSIFIER ACCURACY ON CLINICAL DATA SETS BY OPTIMAL SELECTION OF IMPUTATION METHODS

¹C.UshaNandhini, Dr. P.R.Tamilselvi

159 Views

76 Downloads

download article

Abstract:

Missing value imputation is one of the biggest tasks of data pre-processing whenperforming data mining. Most clinical datasets are usually incomplete. Simplyremoving the incomplete cases from the original datasets can bring more problemsthan solutions. A suitable method for missing value imputation can help to producegood quality datasets for betteranalyzing clinical trials. In this paper we explore theuse of a machine learning technique as a missing value imputation method forincomplete cardiovascular data. Mean imputation, Group mean imputation, kNN imputation and Multi-Linear Regression Imputation are used as missing value imputation and the imputed datasets are subject to classification and prediction using C5.0 and Random Forest classifier. The experiment shows that final classifier performance is improvedwhen Multi-Linear Regression Imputation is used to predict missingattribute values for Random Forest and in most cases, the machine learningtechniques were found to perform better than the standard mean imputationtechnique.

Keywords:

Missing value imputation - cardiovascular data - Mean imputation -Group mean imputation -kNN imputation -Multi-Linear Regression Imputation- C5.0 – Random Forest – Performance Measures.

Paper Details

D.O.I10.37200/V24I5/33870

Month5

Year2020

Volume24

IssueIssue 5

Pages8442-8453

IMPROVEMENT OF CLASSIFIER ACCURACY ON CLINICAL DATA SETS BY OPTIMAL SELECTION OF IMPUTATION METHODS

1C.UshaNandhini, Dr. P.R.Tamilselvi

¹C.UshaNandhini, Dr. P.R.Tamilselvi