DOI: https://doi.org/10.18517/ijods.5.1.33-49.2024

Improving Acute Leukemia Classification through Recursive Feature Elimination and Multilayer Perceptron Analysis of Gene Expression Data

Temitope Elizabeth Ogunbiyi (1) , Michael Abejide Adegoke (2) , Adebisi Esther Oluwatosin (3) , Bamidele Aremo (4) , Olufemi Adekunle (5) , Emmanuel Ayodele Ayoariyo (6) , Austin Udemba (7)
(1) Department of Computer Science and Information Technology, Bells University of Technology, Ota, Nigeria
(2) Department of Computer Science and Information Technology, Bells University of Technology, Ota, Nigeria
(3) Department of Computer Science and Information Technology, Bells University of Technology, Ota, Nigeria
(4) Department of Computer Science Education, Federal College of Education (Technical), Akoka, Lagos, Nigeria
(5) Department of Computer Science and Information Technology, Bells University of Technology, Ota, Nigeria
(6) Department of Computer Science and Information Technology, Bells University of Technology, Ota, Nigeria
(7) Department of Computer Science and Information Technology, Bells University of Technology, Ota, Nigeria
Fulltext View | Download

Abstract

This study presents an approach to improving the classification of acute leukemia subtypes using gene expression data analysis. Leveraging Recursive Feature Elimination (RFE) as a feature selection technique and Multilayer Perceptron (MLP) as the predictive modeling framework, this research aims to identify the most influential genes for distinguishing between Acute Lymphoblastic Leukemia (ALL) and Acute Myeloid Leukemia (AML) cases. RFE systematically ranks and selects the most discriminative gene attributes, while MLP constructs a predictive model based on these attributes. The results demonstrate the effectiveness of this combined approach, achieving precision, accuracy, F1-Score, and recall rates of approximately 99% for leukemia subtype classification. Furthermore, specific genes contributing most to the model's predictive power and shedding light on potential biomarkers for leukemia diagnosis were identified. This research underscores the significance of RFE and MLP in the analysis of gene expression data and their potential impact on clinical decision-making in the field of oncology.

Article Details

How to Cite
[1]
T. Ogunbiyi, “Improving Acute Leukemia Classification through Recursive Feature Elimination and Multilayer Perceptron Analysis of Gene Expression Data”, Int. J. Data. Science., vol. 5, no. 1, pp. 33-49, Jun. 2024.
Section
Articles

References

References

Ahiara, W., Abioye, T., Chiagunye, T., & Olaleye, T. (2023). An Exploratory Data Analytics of Multivariate Observational Metrics on Generative AI. CEUR Workshop Proceedings (pp. 1-10). ceur-ws.
Asad, E., & Mollah, A. F. (2021). Biomarker Identification From Gene Expression Based on Symmetrical Uncertainty. International Journal of Intelligent Information Technologies (IJIIT), 17(4). doi:10.4018/IJIIT.289966
Buja, A., Cook, D., Hofmann, H., Lawrence, M., Lee, E.-K., Swayne, D. F., & Wickham, H. (2009). Statistical inference for exploratory data analysis and model diagnostics. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 367(1906), 4361-4383. doi:https://doi.org/10.1098/rsta.2009.0120
Carethers, J. M., & Doubeni, C. A. (2020). Causes of socioeconomic disparities in colorectal cancer and intervention framework and strategies. Gastroenterology, 354-367.
Coury, J., Miech, E. J., Styer, F., Petrik, A. F., & Coates, K. E. (2021). What’s the “secret sauce”? How implementation variation affects the success of colorectal cancer screening outreach. Implementation science communications, 2, 1-11.
Crawford, C. (2017). Gene expression dataset. Retrieved from https://www.kaggle.com/datasets/crawford/gene-expression?select=actual.csv
Faggad, A., Budczies, J., Tchernitsa, O., & Darb‐Esfahani, S. (2010). Prognostic significance of Dicer expression in ovarian cancer—link to global microRNA changes and oestrogen receptor expression. The Journal of Pathology: A Journal of the Pathological Society of Great Britain and Ireland, 220(3), 382-391.
He, G., Chen, M., Bian, Y., & Yang, E. (2023). MTM: a multi-task learning framework to predict individualized tissue gene expression profiles. Bioinformatics, 39(6). doi:https://doi.org/10.1093/bioinformatics/btad363
Huang, F.-L., & Yu, S.-J. (2018). Esophageal cancer: risk factors, genetic association, and treatment. Asian journal of surgery, 41(3), 210-215.
Kilincer, I. F., Ertam, F., Sengur, A., R. S., U. T., & Acharya, R. (2023). Automated detection of cybersecurity attacks in healthcare systems with recursive feature elimination and multilayer perceptron optimization. Biocybernetics and Biomedical Engineering, 43(1), 30-41.
Moustafa, N., Creech, G., & Slay, J. (2018). Anomaly detection system using beta mixture models and outlier detection. Progress in Computing, Analytics and Networking: Proceedings of ICCAN, 125-135.
Olaleye, T. O., Arogundade, O., Misra, S., Abayomi-Alli, A., & Kose, U. (2023). Predictive analytics and software defect severity: A systematic review and future directions. Scientific Programming, 2023, 1-18. doi:https://doi.org/10.1155/2023/6221388
Pepper, J. W., Findlay, C. S., & Kassen, R. (2009). Synthesis: cancer research meets evolutionary biology. Evolutionary applications, 2(1), 62-70.
Potghan, S., Rajamenakshi, R., & Bhise, A. (2018). Multi-layer perceptron based lung tumor classification. 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA). IEEE.
Schmit, S. L., Purrington, K., & Figueiredo, J. C. (2023). Efforts to Grow Genomic Research in Ancestrally Diverse and Admixed Populations. Cancer Research, 83(15), 2443-2444.
Shi, X., Yu, Z., Ren, P., Dong, X., Ding, X., Song, J., . . . Wang, C. (2023). HUSCH: an integrated single-cell transcriptome atlas for human tissue gene expression visualization and analyses. Nucleic Acids Research, 51, D1029–D1037. doi:https://doi.org/10.1093/nar/gkac1001
Simsek, E., Badem, H., & Okumus, I. T. (2021). Leukemia Sub-Type Classification by Using Machine Learning Techniques on Gene Expression. Proceedings of Sixth International Congress on Information and Communication Technology (pp. 629–637). Springer.
Singh, U., Hur, M., Dorman, K., & Wurtele, E. S. (2020). MetaOmGraph: a workbench for interactive exploratory data analysis of large expression datasets. Nucleic Acids Research, 48(4), e23. doi:https://doi.org/10.1093/nar/gkz1209
Slavova-Azmanova, N., Newton, J. C., Hohnen, H., Johnson, C. E., & Saunders, C. (2019). How communication between cancer patients and their specialists affect the quality and cost of cancer care. Supportive care in cancer, 27, 4575-4585.
Suganthi, S. T., Ayoobkhan, M. U., Kumar, K., Bacanin, N., K, V., Štěpán, H., & Pavel, T. (2022). Deep learning model for deep fake face recognition and detection. PeerJ Computer Science, 8, e881.
Taiwo Olaleye, O. A. (2021). Evaluation of image filtering parameters for plant biometrics improvement using machine learning. Soft Computing and its Engineering Applications: Second International Conference, icSoftComp 2020 (pp. 1–12). Anand: Springer Singapore.
Thom, B., & Benedict, C. (2019). The impact of financial toxicity on psychological well-being, coping self-efficacy, and cost-coping behaviors in young adults with cancer. Journal of adolescent and young adult oncology, 8(3), 236-242.
Wang, M., Li, X., Chen, L., & Chen, H. (2023). Medical machine learning based on multiobjective evolutionary algorithm using learning decomposition. Expert Systems with Applications, 216. doi:https://doi.org/10.1016/j.eswa.2022.119450
Weinberg, R. A., & Weinberg, R. A. (2006). The biology of cancer. WW Norton & Company.
Woolf, S. H. (2008). The meaning of translational research and why it matters. Jama, 299(2), 211-213.