Classification Breast Cancer Revisited with Machine Learning

Main Article Content

Hanna Arini Parhusip
Bambang Susanto
Lilik Linawati
Suryasatriya Trihandaru
Yohanes Sardjono
Adella Septiana Mugirahayu


The article presents the study of several machine learning algorithms that are used to study breast cancer data with 33 features from 569 samples. The purpose of this research is to investigate the best algorithm for classification of breast cancer. The data may have different scales with different large range one to the other features and hence the data are transformed before the data are classified. The used classification methods in machine learning are logistic regression, k-nearest neighbor, Naive bayes classifier, support vector machine, decision tree and random forest algorithm. The original data and the transformed data are classified with size of data test is 0.3. The SVM and Naive Bayes algorithms have no improvement of accuracy with random forest gives the best accuracy among all. Therefore the size of data test is reduced to 0.25 leading to improve all algorithms in transformed data classifications. However, random forest algorithm still gives the best accuracy.

Article Details

How to Cite
H. A. Parhusip, B. Susanto, L. Linawati, S. Trihandaru, Y. Sardjono, and A. S. Mugirahayu, “Classification Breast Cancer Revisited with Machine Learning”, Int. J. Data. Science., vol. 1, no. 1, pp. 42-50, May 2020.


R. Janet et al., “A Historical Perspective on Breast Cancer Activism in the United States: From Education and Support to Partnership in Scientific Research,” J Womens Heal., vol. 21, no. 3, pp. 355–362, 2012, doi: 10.1089/jwh.2011.2862.

M. Muhammad, A. W. Harto, and Y. Sardjono, “Monte Carlo N Particle Extended ( MCNPX ) Radiation Shield Modelling on Boron Neutron Capture Therapy Facility Using D-D Neutron Generator,” vol. 4, no. 2, pp. 58–65, 2019, [Online]. Available:

A. A. Khan, C. Maitz, C. Quanyu, and F. Hawthorne, “BNCT induced immunomodulatory effects contribute to mammary tumor inhibition,” PLoS One, vol. 14, no. 9, pp. 1–14, 2019, doi: 10.1371/journal.pone.0222022.

S. Dyah, P. Bagaswoto, and S. Yohannes, “In Vitro and In Vivo Test of Boron Delivery Agent for BNCT,” Indones. J. Phys. Nucl. Appl., vol. 4, no. 2, 2019, doi:

S. G. Pinasti, “Measurement of Yttrium-90 Biodistribution in Selective Internal Radiation Therapy ( SIRT): a Comparison Between PET AND SPECT IMAGING,” vol. 4, no. 2, pp. 45–57, 2019, [Online]. Available:

I. Issam, J. Stéphane, N. Karl, and M. Carole, “The Big Data Revolution for Breast Cancer Patients,” Eur J Breast Heal., vol. 14, no. 2, pp. 61–62, 2018, doi: 10.5152/ejbh.2018.0101.

H. G. Russnes, O. C. Lingjærde, A. L. Børresen-Dale, and C. Caldas, “Breast Cancer Molecular Stratification: From Intrinsic Subtypes to Integrative Clusters,” Am. J. Pathol., vol. 187, no. 10, pp. 2152–2162, 2017, doi: 10.1016/j.ajpath.2017.04.022.

S. Jabeen and K. Jilani Abdul, “Predicting Breast Cancer using Logistic Regression and Multi-Class Classifiers,” Int. J. Eng. Technol., vol. 7, no. 4.20, pp. 22–26, 2018.

R. Megha and S. Arun Kumar, “Breast Cancer Prediction using Naïve Bayes Classifier,” Int. J. Inf. Technol. Syst., vol. 1, no. 2, pp. 77–80, 2012, [Online]. Available:

K. Kourou, T. P. Exarchos, K. P. Exarchos, M. V. Karamouzis, and D. I. Fotiadis, “Machine learning applications in cancer prognosis and prediction,” Comput. Struct. Biotechnol. J., vol. 13, pp. 8–17, 2015, doi: 10.1016/j.csbj.2014.11.005.

Moh. Yamin Darsyah, “Menakar Tingkat Akurasi Support Vector Machine Study Kasus Kanker Payudara,” Stat. Univ. Muhammadiyah, vol. 1, no. 1, pp. 15–20, 2013.

Kathija and N. Shajun, “Breast Cancer Data Classification Using SVM and Naïve Bayes Techniques,” Int. J. Innov. Res. Comput. Commun. Eng., vol. 3297, no. 6, pp. 11449–11455, 2016, doi: 10.15680/IJIRCCE.2016. 0412129.

L. Demidova, E. Nikulchev, and Y. Sokolova, “Big Data Classification Using the SVM Classifiers with the Modified Particle Swarm Optimization and the SVM Ensembles,” Int. J. Adv. Comput. Sci. Appl., vol. 7, no. 5, pp. 294–312, 2016, doi: 10.14569/ijacsa.2016.070541.

B. Sadegh, Imandoust, and B. Mohammad, “Application of K-Nearest Neighbor (KNN) Approach for Predicting Economic Events: Theoretical Background,” Int. J. Eng. Res. Appl., vol. 3, no. 5, pp. 605–610, 2013, [Online]. Available:

M. . Bárcena, M. A. Garín, A. Martín, A. . Tusell, and E. Unzueta, “A Web Simulator to Assist in the Teaching of Bayes’ Theorem,” J. Stat. Educ., vol. 27, no. 2, 2019, doi:

Z. Shichao, L. Xuelong, Z. Ming, Z. Xiaofeng, and W. Ruili, “Efficient kNN Classification With Different Numbers of Nearest Neighbors,” IEEE Trans. Neural Networks Learn. Syst., vol. 29, no. 5, pp. 1774–1785, 2018, doi: 10.1109/TNNLS.2017.2673241.