DOI: https://doi.org/10.18517/ijods.1.2.114-119.2020

Classification of Biomedical Literature in Hypertension and Diabetes

Nur Aniq Syafiq Rodzuan (1) , Shahreen Kasim (2) , Mohanavali Sithambranathan (3) , Muhammad Zaki Hassan (4)
(1) Faculty of Computer Science and Technology, Universiti Tun Hussein Onn Malaysia, 86400 Parit Raja, Johor, Malaysia.
(2) Faculty of Computer Science and Technology, Universiti Tun Hussein Onn Malaysia, 86400 Parit Raja, Johor, Malaysia.
(3) Faculty of Computer Science and Technology, Universiti Tun Hussein Onn Malaysia, 86400 Parit Raja, Johor, Malaysia.
(4) Faculty of Computer Science and Technology, Universiti Tun Hussein Onn Malaysia, 86400 Parit Raja, Johor, Malaysia.
Fulltext View | Download

Abstract

Textual information gives us more clear information as it is presented using words and characters, which is easy for humans to understand. To extract this kind of information, text mining was introduced as new technology. Text mining is the process of extracting non-trivial patterns or knowledge from text documents or from textual databases. The purpose of this research paper is to perform and compare keyword extraction using statistical and linguistic extraction tools for 120 text documents related to hypertension and diabetes disease. In order to draw this comparison, RStudio, a statistical-based tool and TerMine, a linguistic-based tool have been used to demonstrate the process of extracting the specified keyword from the biomedical literature. Thus, classification evaluation using Naïve Bayes classifier is carried out in order to evaluate and compare the performance of the statistical and linguistic approaches using these tools. Experimental results show the result of the comparison and the difference between both tools in executing extraction keywords.

Article Details

How to Cite
[1]
N. A. S. Rodzuan, S. Kasim, M. Sithambranathan, and M. Z. Hassan, “Classification of Biomedical Literature in Hypertension and Diabetes”, Int. J. Data. Science., vol. 1, no. 2, pp. 114-119, Aug. 2020.
Section
Articles

References

Ali, R., Hussain, J., Siddiqi, M. H., Hussain, M., and Lee, S. (2015). H2RM: A Hybrid Rough Set Reasoning Model for Prediction and Management of Diabetes Mellitus, 15921–15951.

Arlot, S., and Celisse, A. (2009). A survey of cross-validation procedures for model selection, 4, 40–79. doi: 10.1214/09-SS054.

Chandrashekar, G., and Sahin, F. (2014). A survey on feature selection methods. Computers and Electrical Engineering, 40(1), 16–28. doi: 10.1016/j.compeleceng.2013.11.024.

Gülçin Yıldırım, E., Karahoca, A., and Uçar, T. (2011). Dosage planning for diabetes patients using data mining methods. Procedia Computer Science, 3, 1374–1380.

Holland K., (2017), Everything You Need to Know About High Blood Pressure (Hypertension). Retrieved from http://www.healthline.com/health/high-blood-pressure-hypertension.

Iyer, A., S, J., and Sumbaly, R. (2015). Diagnosis of Diabetes Using Classification Mining Techniques. International Journal of Data Mining and Knowledge Management Process, 5(1), 01–14.

Jurafsky, D., and Martin, J. H. (2016). Part-of-Speech Tagging. In Speech and Language Processing. Retrieved from http://en.wikipedia.org/w/index.php?title=Part-of-speech_tagging&oldid=550410494.

Kale, S., Kumar, R., and Vassilvitskii, S. (2011). Cross-Validation and Mean-Square Stability.

Pineda A. L., Yea Y., Visweswarana S., Coopera G. F., Wagnera M. M., and Tsuia F., J Biomed Inform. (2015) December ; 58: 60–69. doi:10.1016/j.jbi.2015.08.019.

Spasić, I., Greenwood, M., Preece, A., Francis, N., and Elwyn, G. (2013). FlexiTerm: a flexible term recognition method. Journal of Biomedical Semantics, 4(1), 27.