IMPROVEMENT OF CLASSIFIER ACCURACY ON CLINICAL DATA SETS BY OPTIMAL SELECTION OF IMPUTATION METHODS

Authors

  • C.Usha Nandhini Research Scholar in Periyar University, Assistant Professor in Computer Applications, Vellalar College for Women(Autonomous),Erode - 638012, Tamil Nadu, India Author
  • Dr. P.R.Tamilselvi Research Supervisor, Assistant Professor in Computer Science, Govt. Arts and Science College, Komarapalayam, Namakkal District, Tamil Nadu, India Author

DOI:

https://doi.org/10.61841/v5pq4643

Keywords:

Missing value imputation, cardiovascular data, Mean imputation, Group mean imputation, kNN imputation, Multi-Linear Regression Imputation, C5.0 – Random Forest, Performance Measures.

Abstract

Missing value imputation is one of the biggest tasks of data pre-processing whenperforming data mining. Most clinical datasets are usually incomplete. Simplyremoving the incomplete cases from the original datasets can bring more problemsthan solutions. A suitable method for missing value imputation can help to producegood quality datasets for betteranalyzing clinical trials. In this paper we explore theuse of a machine learning technique as a missing value imputation method forincomplete cardiovascular data. Mean imputation, Group mean imputation, kNN imputation and Multi-Linear Regression Imputation are used as missing value imputation and the imputed datasets are subject to classification and prediction using C5.0 and Random Forest classifier. The experiment shows that final classifier performance is improvedwhen Multi-Linear Regression Imputation is used to predict missingattribute values for Random Forest and in most cases, the machine learningtechniques were found to perform better than the standard mean imputationtechnique.

Downloads

Download data is not yet available.

References

1. Rahman, M. M. and Davis, D. N. (2013) “Machine Learning-Based Missing Value Imputation Method for Clinical Datasets”, IAENG Transactions on Engineering Technologies, Springer Netherlands, 245-257.

2. Mohammad Al Khaldy (2016) “Performance Analysis of Various Missing Value Imputation Methods on Heart Failure Dataset”, SAI Intelligent Systems Conference 2016, IEEE, September 20-22, 2016 | London, UK

3. M.N.M. Salleh and N.A. Samat(2017),” An Imputation for Missing Data Features Based on Fuzzy Swarm Approachin Heart Disease Classification”, © Springer International Publishing AG 2017, Y. Tan et al. (Eds.): ICSI 2017, Part II, LNCS 10386, pp. 285–292, 2017.

4. Dr. M. Sujatha , SallaAnusha and GundaBhavani(2018), “A STUDY ON PERFORMANCE OFCLEVELAND HEART DISEASE DATASET FOR IMPUTING MISSING VALUES”,

International Journal of Pure and Applied Mathematics, Volume 120 No. 6 2018, 7271-7280, ISSN: 1314-3395 (on-line version)

5. S.Anitha, M.Vanitha (2019), “Imputation Methods for Missing Data for a Proposed VASA Dataset”, International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-9 Issue-1, November 2019, Blue Eyes Intelligence Engineering & Sciences Publication

6. Taeyoung Kim, WoongKo and Jinho Kim (2019), “Analysis and Impact Evaluation of Missing DataImputation in Day-ahead PV Generation Forecasting”, Appl. Sci. 2019, 9, 204; doi:10.3390/app9010204, Published: 8 January 2019

7. Anil Jadhav, DhanyaPramod& Krishnan Ramanathan(2019), “Comparison of Performance of Data Imputation Methods for Numeric Dataset”, Applied Artificial Intelligence 33:10, 913-

933, DOI: 10.1080/08839514.2019.1637138, An International Journal ISSN: 0883-9514

(Print) 1087-6545 (Online) Journal homepage: https://www.tandfonline.com/loi/uaai20, Published online: 04 Jul 2019

8. AdityaSundararajan and Arif I. Sarwat(2019), “Evaluation of Missing Data ImputationMethods for an Enhanced Distributed PV Generation Prediction”, Springer Nature Switzerland AG 2020,K. Arai et al. (Eds.): FTC 2019, AISC 1069, pp. 590–609, 2020. https://doi.org/10.1007/978-3-030-32520-6_43

9. C. UshaNandhini, Dr.P.R.Tamilselvi.,“An Ensemble Approach for Performance Analysis of Preprocessing Techniques on Classification for Heart Disease Datasets”, by IMRF, International Research Journals (UGC approved), 2018.

10. Meenakshi, Dr..RajanVohra, Gimpsy(2014), “Missing value Imputation in Multi Attribute Date Set”, International Journal of Computer Science and Information Technologies, ISSN: 0975-9646, Vol. 5(4), 2014, 5315-5321.

11. Jiawei Han and MichelineKamber, Data Mining Concepts and Techniques,2nd Edition, An imprint of Elsevier

12. Margaret H.Dunham, Data Mining- Introductory and Advanced Concepts, Pearson Education, 2014

13. C. UshaNandhini, Dr.P.R.Tamilselvi, “A Review on Feature Selection Approaches for Heart Disease Classification”, International Journal of Theoretical & Applied Sciences, Special Issue 10(1a): 63-67(2018).

14. Jared P. Lander, R for Everyone-Advanced Analytics and Graphics, 2nd Edition, Pearson India Education Services Pvt., Ltd.,

15. Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, “An Introduction to Statistical Learning with Applications in R”,Springer Texts in Statistics, 1st Edition, 2017.

Downloads

Published

31.07.2020

How to Cite

Nandhini, C., & Tamilselvi, P. (2020). IMPROVEMENT OF CLASSIFIER ACCURACY ON CLINICAL DATA SETS BY OPTIMAL SELECTION OF IMPUTATION METHODS. International Journal of Psychosocial Rehabilitation, 24(5), 8442-8453. https://doi.org/10.61841/v5pq4643