Classification of Biomedical Literature in Hypertension and Diabetes

Main Article Content

Nur Aniq Syafiq Rodzuan
Shahreen Kasim
Mohanavali Sithambranathan
Muhammad Zaki Hassan


Textual information gives us more clear information as it is presented using words and characters, which is easy for humans to understand. To extract this kind of information, text mining was introduced as new technology. Text mining is the process of extracting non-trivial patterns or knowledge from text documents or from textual databases. The purpose of this research paper is to perform and compare keyword extraction using statistical and linguistic extraction tools for 120 text documents related to hypertension and diabetes disease. In order to draw this comparison, RStudio, a statistical-based tool and TerMine, a linguistic-based tool have been used to demonstrate the process of extracting the specified keyword from the biomedical literature. Thus, classification evaluation using Naïve Bayes classifier is carried out in order to evaluate and compare the performance of the statistical and linguistic approaches using these tools. Experimental results show the result of the comparison and the difference between both tools in executing extraction keywords.

Article Details

How to Cite
N. A. S. Rodzuan, S. Kasim, M. Sithambranathan, and M. Z. Hassan, “Classification of Biomedical Literature in Hypertension and Diabetes”, Int. J. Data. Science., vol. 1, no. 2, pp. 114-119, Aug. 2020.


Ali, R., Hussain, J., Siddiqi, M. H., Hussain, M., and Lee, S. (2015). H2RM: A Hybrid Rough Set Reasoning Model for Prediction and Management of Diabetes Mellitus, 15921–15951.

Arlot, S., and Celisse, A. (2009). A survey of cross-validation procedures for model selection, 4, 40–79. doi: 10.1214/09-SS054.

Chandrashekar, G., and Sahin, F. (2014). A survey on feature selection methods. Computers and Electrical Engineering, 40(1), 16–28. doi: 10.1016/j.compeleceng.2013.11.024.

Gülçin Yıldırım, E., Karahoca, A., and Uçar, T. (2011). Dosage planning for diabetes patients using data mining methods. Procedia Computer Science, 3, 1374–1380.

Holland K., (2017), Everything You Need to Know About High Blood Pressure (Hypertension). Retrieved from

Iyer, A., S, J., and Sumbaly, R. (2015). Diagnosis of Diabetes Using Classification Mining Techniques. International Journal of Data Mining and Knowledge Management Process, 5(1), 01–14.

Jurafsky, D., and Martin, J. H. (2016). Part-of-Speech Tagging. In Speech and Language Processing. Retrieved from

Kale, S., Kumar, R., and Vassilvitskii, S. (2011). Cross-Validation and Mean-Square Stability.

Pineda A. L., Yea Y., Visweswarana S., Coopera G. F., Wagnera M. M., and Tsuia F., J Biomed Inform. (2015) December ; 58: 60–69. doi:10.1016/j.jbi.2015.08.019.

Spasić, I., Greenwood, M., Preece, A., Francis, N., and Elwyn, G. (2013). FlexiTerm: a flexible term recognition method. Journal of Biomedical Semantics, 4(1), 27.