A Novel SVM-KNN Classifier for Cervical Cancer Diagnosis using Feature Reduction and Imbalanced Learning Techniques
DOI:
https://doi.org/10.61841/ajrt8k70Keywords:
Classification, Cervical Cancer, Feature Selection, Regularization Method.Abstract
Cervical cancer is one sort of prenatal tumors and a large portion of the complexities of cancer threatening causes to deaths which were identified in most of the countries. There are different risk factors related to cancer threatening development. The number of methodologies developed to predict this cancer such as Decision Tree (DT), K-nearest neighbors (KNN), Support vector machine (SVM), Random Forest (RF), Logistic Regression (LR), Principal Component Analysis (PCA) and Logistic Regression (LR). However, it is observed that most of the medical data suffer from class imbalance issues. The work in this paper proposed an ensemble classifier using SVM and KNN with an oversampling technique called Synthetic Minority Oversampling Technique (SMOTE) for Cervical Cancer. Also, work extended to applied set of feature reduction techniques to reduce computation tasks and to improve model accuracy. However, in this cancer data total 4 target variables: Hinselmann, Schiller, Cytology, and Biopsy are considered associated with 32 risk factors. Moreover, the study used the number of benchmarks like Accuracy, Sensitivity, Specificity, Positive Prediction Accuracy (PPA) and Negative Prediction Accuracy (NPA) for the performance analysis. The results showed that the proposed ensemble classifier method to be proven efficient for cervical cancer analysis compared to standard methods.
Downloads
References
1. P. Z. Mcveigh, A. M. Syed, M. Milosevic, A. Fyles, and M. A. Haider,‘‘Diffusion-weighted MRI in cervical cancer,’’ Eur. Radiol., vol. 18, no. 5,pp. 1058–1064, 2008.
2. Y. Huang, D. Wu, Z. Zhang, H. Chen, and S. Chen, ‘‘EMD-based pulsed TIG welding process porosity defect detection and defect diagnosis using GA-SVM,’’ J. Mater. Process. Technol., vol. 239, pp. 92– 102, Jan.2017.
3. Y. Chen, ‘‘Reference-related component analysis: A new method inheriting the advantages of PLS and PCA for separating interesting information and reducing data dimension,’’ Chemometrics Intell. Lab. Syst., vol. 156,pp. 196–202, Aug. 2016.
4. S. Ding et al., ‘‘On the application of PCA technique to fault diagnosis,’’Tsinghua Sci. Technol., vol. 15, no. 2, pp. 138–144, 2010.
5. X. Leng et al., ‘‘Prediction of size-fractionated airborne particle-bound metals using MLR, BP-ANN and SVM analyses,’’ Chemosphere, vol. 180,pp. 513–522, Aug. 2017.
6. X. Liang, L. Zhu, and D.-S. Huang, ‘‘Multi-task ranking SVM for image segmentation,’’ Neurocomputing, vol. 247, pp. 126–136, Jul. 2017.
7. A. Radman, N. Zainal, and S. A. Suandi, ‘‘Automated segmentation of iris images acquired in an unconstrained environment using HOG-SVM and GrowCut,’’ Digit. Signal Process., vol. 64, pp. 60–70, May 2017.
8. V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995.
9. Li R., Ye S. W., Shi Z. Z., 2002, Chinese Journal of Electronics, 30(5), 745
10. Wu, W. and H. Zhou, 2017, “Data-Driven Diagnosis of Cervical Cancer With Support Vector MachineBased Approaches”, IEEE Access, 5:p. 25189-25195.
11. S. Di Martino, F. Ferrucci, C. Gravino, and F. Sarro, "A genetic algorithm to configure support vector machines for predicting faultprone components," in Product-Focused Software Process Improvement, ed: Springer, 2011, pp. 247-261.
12. Chawla, N.V., Lazarevic, A., Hall, L.O. and Bowyer, K.: SMOTEBoost: Improving prediction of the Minority Class in Boosting. 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, Cavtat Dubrovnik, Croatia (2003) 107-119
13. Gustavo, E.A., Batista, P.A., Ronaldo, C., Prati, Maria Carolina Monard: A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. SIGKDD Explorations 6 (1) (2004) 20-29.
14. Andrew Estabrooks, Taeho Jo and Nathalie Japkowicz: A Multiple Resampling Method for Learning from Imbalanced Data Sets. Comprtational Intelligence 20 (1) (2004) 18-36
15. Lavanya K., Reddy, L., & Reddy, B. E. (2019). Distributed Based Serial Regression Multiple Imputation for High Dimensional Multivariate Data in Multicore Environment of Cloud. International Journal of Ambient Computing and Intelligence (IJACI), 10(2), 63-79. doi:10.4018/IJACI.2019040105. (SCOPUS)(Web of Science)(ESCI).
16. K. Lavanya, L. S. S. Reddy and B. Eswara Reddy, ”Modelling of Missing Data Imputation using Additive LASSO Regression Model in Microsoft Azure”, Journal of Engineering and Applied Sciences,2018,Vol 13,Special Issue 8,pp:6324-6334. (SCOPUS)
17. Liu Yang and Rong Jin, “Distance metric learning: a comprehensive survey,” Tech. Rep., Department of Computer Science and Engineering, Michigan State University, 2006.
18. C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), Springer, New York, NY, USA, 2007
19. S. Ding et al., ‘‘On the application of PCA technique to fault diagnosis,’’ Tsinghua Sci. Technol., vol. 15, no. 2, pp. 138–144, 2010.
20. Y. Chen, ‘‘Reference-related component analysis: A new method inheriting the advantages of PLS and PCA for separating interesting information and reducing data dimension,’’ Chemometrics Intell. Lab. Syst., vol. 156, pp. 196–202, Aug. 2016.
21. Chawla, N.V., Bowyer,K.W., Hall, L.O., Kegelmeyer W.P.: SMOTE: Synthetic Minority Over- Sampling Technique. Journal of Artificial Intelligence Research 16 (2002) 321-357
22. Chawla, N.V., Lazarevic, A., Hall, L.O. and Bowyer, K.: SMOTEBoost: Improving prediction of the Minority Class in Boosting. 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, Cavtat Dubrovnik, Croatia (2003) 107-119
23. Gustavo, E.A., Batista, P.A., Ronaldo, C., Prati, Maria Carolina Monard: A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. SIGKDD Explorations 6 (1) (2004) 20-29.
24. D. A. Cieslak, N. V. Chawla, and A. Striegel, ‘‘Combating imbalance in network intrusion datasets,’’ in Proc. IEEE Int. Conf. Granular Comput., May 2006, pp. 732–737.
25. A. Fallahi and S. Jafari, ‘‘An expert system for detection of breast cancer using data preprocessing and Bayesian network,’’ Int. J. Adv. Sci. Technol., vol. 34, no. 9, pp. 65–70, 2011.
26. Y. Liu, N. V. Chawla, M. P. Harper, E. Shriberg, and A. Stolcke, ‘‘A study in machine learning from imbalanced data for sentence boundary detection in speech,’’ Comput. Speech Lang., vol. 20, pp. 468– 494, Oct. 2006.
27. K. Lavanya, L. S. S. Reddy and B. Eswara Reddy ,”A Study of High-Dimensional Data Imputation Using Additive LASSO Regression Model”, Computational Intelligence in Data Mining,2019 Advances in Intelligent Systems and Computing 711. (SCOPUS).
28. Lavanya.K, L.S.S.Reddy, B. Eswara Reddy, “Multivariate Missing Data Handling with Iterative Bayesian Additive Lasso (IBAL) Multiple Imputation in Multicore Environment on Cloud”, Volume 5, Issue 5 , May 2019,International Journal on Future Revolution in Computer Science & Communication Engineering (IJFRSCE), PP: 54 – 58.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
- Attribution — You must give appropriate credit , provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation .
No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.