A Novel SVM-KNN Classifier for Cervical Cancer Diagnosis using Feature Reduction and Imbalanced Learning Techniques

Authors

  • K., Lavanya maylavaram, Krishna District, Andhra Pradesh, INDIA Author

DOI:

https://doi.org/10.61841/ajrt8k70

Keywords:

Classification, Cervical Cancer, Feature Selection, Regularization Method.

Abstract

Cervical cancer is one sort of prenatal tumors and a large portion of the complexities of cancer threatening causes to deaths which were identified in most of the countries. There are different risk factors related to cancer threatening development. The number of methodologies developed to predict this cancer such as Decision Tree (DT), K-nearest neighbors (KNN), Support vector machine (SVM), Random Forest (RF), Logistic Regression (LR), Principal Component Analysis (PCA) and Logistic Regression (LR). However, it is observed that most of the medical data suffer from class imbalance issues. The work in this paper proposed an ensemble classifier using SVM and KNN with an oversampling technique called Synthetic Minority Oversampling Technique (SMOTE) for Cervical Cancer. Also, work extended to applied set of feature reduction techniques to reduce computation tasks and to improve model accuracy. However, in this cancer data total 4 target variables: Hinselmann, Schiller, Cytology, and Biopsy are considered associated with 32 risk factors. Moreover, the study used the number of benchmarks like Accuracy, Sensitivity, Specificity, Positive Prediction Accuracy (PPA) and Negative Prediction Accuracy (NPA) for the performance analysis. The results showed that the proposed ensemble classifier method to be proven efficient for cervical cancer analysis compared to standard methods.

 

 

 

Downloads

Download data is not yet available.

References

1. P. Z. Mcveigh, A. M. Syed, M. Milosevic, A. Fyles, and M. A. Haider,‘‘Diffusion-weighted MRI in cervical cancer,’’ Eur. Radiol., vol. 18, no. 5,pp. 1058–1064, 2008.

2. Y. Huang, D. Wu, Z. Zhang, H. Chen, and S. Chen, ‘‘EMD-based pulsed TIG welding process porosity defect detection and defect diagnosis using GA-SVM,’’ J. Mater. Process. Technol., vol. 239, pp. 92– 102, Jan.2017.

3. Y. Chen, ‘‘Reference-related component analysis: A new method inheriting the advantages of PLS and PCA for separating interesting information and reducing data dimension,’’ Chemometrics Intell. Lab. Syst., vol. 156,pp. 196–202, Aug. 2016.

4. S. Ding et al., ‘‘On the application of PCA technique to fault diagnosis,’’Tsinghua Sci. Technol., vol. 15, no. 2, pp. 138–144, 2010.

5. X. Leng et al., ‘‘Prediction of size-fractionated airborne particle-bound metals using MLR, BP-ANN and SVM analyses,’’ Chemosphere, vol. 180,pp. 513–522, Aug. 2017.

6. X. Liang, L. Zhu, and D.-S. Huang, ‘‘Multi-task ranking SVM for image segmentation,’’ Neurocomputing, vol. 247, pp. 126–136, Jul. 2017.

7. A. Radman, N. Zainal, and S. A. Suandi, ‘‘Automated segmentation of iris images acquired in an unconstrained environment using HOG-SVM and GrowCut,’’ Digit. Signal Process., vol. 64, pp. 60–70, May 2017.

8. V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995.

9. Li R., Ye S. W., Shi Z. Z., 2002, Chinese Journal of Electronics, 30(5), 745

10. Wu, W. and H. Zhou, 2017, “Data-Driven Diagnosis of Cervical Cancer With Support Vector MachineBased Approaches”, IEEE Access, 5:p. 25189-25195.

11. S. Di Martino, F. Ferrucci, C. Gravino, and F. Sarro, "A genetic algorithm to configure support vector machines for predicting faultprone components," in Product-Focused Software Process Improvement, ed: Springer, 2011, pp. 247-261.

12. Chawla, N.V., Lazarevic, A., Hall, L.O. and Bowyer, K.: SMOTEBoost: Improving prediction of the Minority Class in Boosting. 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, Cavtat Dubrovnik, Croatia (2003) 107-119

13. Gustavo, E.A., Batista, P.A., Ronaldo, C., Prati, Maria Carolina Monard: A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. SIGKDD Explorations 6 (1) (2004) 20-29.

14. Andrew Estabrooks, Taeho Jo and Nathalie Japkowicz: A Multiple Resampling Method for Learning from Imbalanced Data Sets. Comprtational Intelligence 20 (1) (2004) 18-36

15. Lavanya K., Reddy, L., & Reddy, B. E. (2019). Distributed Based Serial Regression Multiple Imputation for High Dimensional Multivariate Data in Multicore Environment of Cloud. International Journal of Ambient Computing and Intelligence (IJACI), 10(2), 63-79. doi:10.4018/IJACI.2019040105. (SCOPUS)(Web of Science)(ESCI).

16. K. Lavanya, L. S. S. Reddy and B. Eswara Reddy, ”Modelling of Missing Data Imputation using Additive LASSO Regression Model in Microsoft Azure”, Journal of Engineering and Applied Sciences,2018,Vol 13,Special Issue 8,pp:6324-6334. (SCOPUS)

17. Liu Yang and Rong Jin, “Distance metric learning: a comprehensive survey,” Tech. Rep., Department of Computer Science and Engineering, Michigan State University, 2006.

18. C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), Springer, New York, NY, USA, 2007

19. S. Ding et al., ‘‘On the application of PCA technique to fault diagnosis,’’ Tsinghua Sci. Technol., vol. 15, no. 2, pp. 138–144, 2010.

20. Y. Chen, ‘‘Reference-related component analysis: A new method inheriting the advantages of PLS and PCA for separating interesting information and reducing data dimension,’’ Chemometrics Intell. Lab. Syst., vol. 156, pp. 196–202, Aug. 2016.

21. Chawla, N.V., Bowyer,K.W., Hall, L.O., Kegelmeyer W.P.: SMOTE: Synthetic Minority Over- Sampling Technique. Journal of Artificial Intelligence Research 16 (2002) 321-357

22. Chawla, N.V., Lazarevic, A., Hall, L.O. and Bowyer, K.: SMOTEBoost: Improving prediction of the Minority Class in Boosting. 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, Cavtat Dubrovnik, Croatia (2003) 107-119

23. Gustavo, E.A., Batista, P.A., Ronaldo, C., Prati, Maria Carolina Monard: A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. SIGKDD Explorations 6 (1) (2004) 20-29.

24. D. A. Cieslak, N. V. Chawla, and A. Striegel, ‘‘Combating imbalance in network intrusion datasets,’’ in Proc. IEEE Int. Conf. Granular Comput., May 2006, pp. 732–737.

25. A. Fallahi and S. Jafari, ‘‘An expert system for detection of breast cancer using data preprocessing and Bayesian network,’’ Int. J. Adv. Sci. Technol., vol. 34, no. 9, pp. 65–70, 2011.

26. Y. Liu, N. V. Chawla, M. P. Harper, E. Shriberg, and A. Stolcke, ‘‘A study in machine learning from imbalanced data for sentence boundary detection in speech,’’ Comput. Speech Lang., vol. 20, pp. 468– 494, Oct. 2006.

27. K. Lavanya, L. S. S. Reddy and B. Eswara Reddy ,”A Study of High-Dimensional Data Imputation Using Additive LASSO Regression Model”, Computational Intelligence in Data Mining,2019 Advances in Intelligent Systems and Computing 711. (SCOPUS).

28. Lavanya.K, L.S.S.Reddy, B. Eswara Reddy, “Multivariate Missing Data Handling with Iterative Bayesian Additive Lasso (IBAL) Multiple Imputation in Multicore Environment on Cloud”, Volume 5, Issue 5 , May 2019,International Journal on Future Revolution in Computer Science & Communication Engineering (IJFRSCE), PP: 54 – 58.

Downloads

Published

30.06.2020

How to Cite

Lavanya, K. (2020). A Novel SVM-KNN Classifier for Cervical Cancer Diagnosis using Feature Reduction and Imbalanced Learning Techniques. International Journal of Psychosocial Rehabilitation, 24(6), 5158-5168. https://doi.org/10.61841/ajrt8k70