Overhead Cross Section Sampling Machine Learning based Cervical Cancer Risk Factors Prediction

Main Article Content

A. Peter Soosai Anandaraj, M. Shyamala Devi, J. Amutharaj, M. Dineshkumar


Most forms of human papillomavirus can create alterations on a woman's cervix that can lead to cervical cancer in the long run, while others can produce genital or epidermal tumors. Cervical cancer is a leading cause of morbidity and mortality among women in low- and middle-income countries. The prediction of cervical cancer still remains an open challenge as there are several risk factors affecting the cervix of the women. By considering the above, the cervical cancer risk factor dataset from KAGGLE data warehouse is executed for predicting the cervical cancer risk classes.  The cervical cancer data set is normalised with incomplete data and Pattern Calibration. Secondly, the interpretive data analysis is carried out, and the target feature's dispersion of the cervical cancer risk is visualised. Thirdly, several classifiers are fitted to the unprocessed data set, and the performance is measured with pre and post feature scaling. Fourth, oversampling methodologies are applied to the pre - processed data set. Fifth, the oversampled dataset by differment methods are applied to all the classifiers and the performance is compared with pre and post feature scaling. Sixth, Precision, recall, F-score, accuracy, and running time are some of the metrics used in performance analysis. The code is written in Python and executed with Anaconda Navigator on the Spyder framework. The findings of the experiments reveal that the Random forest classifier tends to sustain 96% accuracy pre and post scaling for unporocessed dataset. Similarly the same classifier tends to sustain 98% accuracy for all the oversampling techniques.

Article Details