Main Article Content
Textual information gives us more clear information as it is presented using words and characters, which is easy for humans to understand. To extract this kind of information, text mining was introduced as new technology. Text mining is the process of extracting non-trivial patterns or knowledge from text documents or from textual databases. The purpose of this research paper is to perform and compare keyword extraction using statistical and linguistic extraction tools for 120 text documents related to hypertension and diabetes disease. In order to draw this comparison, RStudio, a statistical-based tool and TerMine, a linguistic-based tool have been used to demonstrate the process of extracting the specified keyword from the biomedical literature. Thus, classification evaluation using Naïve Bayes classifier is carried out in order to evaluate and compare the performance of the statistical and linguistic approaches using these tools. Experimental results show the result of the comparison and the difference between both tools in executing extraction keywords.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Arlot, S., and Celisse, A. (2009). A survey of cross-validation procedures for model selection, 4, 40–79. doi: 10.1214/09-SS054.
Chandrashekar, G., and Sahin, F. (2014). A survey on feature selection methods. Computers and Electrical Engineering, 40(1), 16–28. doi: 10.1016/j.compeleceng.2013.11.024.
Gülçin Yıldırım, E., Karahoca, A., and Uçar, T. (2011). Dosage planning for diabetes patients using data mining methods. Procedia Computer Science, 3, 1374–1380.
Holland K., (2017), Everything You Need to Know About High Blood Pressure (Hypertension). Retrieved from http://www.healthline.com/health/high-blood-pressure-hypertension.
Iyer, A., S, J., and Sumbaly, R. (2015). Diagnosis of Diabetes Using Classification Mining Techniques. International Journal of Data Mining and Knowledge Management Process, 5(1), 01–14.
Jurafsky, D., and Martin, J. H. (2016). Part-of-Speech Tagging. In Speech and Language Processing. Retrieved from http://en.wikipedia.org/w/index.php?title=Part-of-speech_tagging&oldid=550410494.
Kale, S., Kumar, R., and Vassilvitskii, S. (2011). Cross-Validation and Mean-Square Stability.
Pineda A. L., Yea Y., Visweswarana S., Coopera G. F., Wagnera M. M., and Tsuia F., J Biomed Inform. (2015) December ; 58: 60–69. doi:10.1016/j.jbi.2015.08.019.
Spasić, I., Greenwood, M., Preece, A., Francis, N., and Elwyn, G. (2013). FlexiTerm: a flexible term recognition method. Journal of Biomedical Semantics, 4(1), 27.