Protein Structure Prediction Using Robust Principal Component Analysis and Support Vector Machine

Main Article Content

Nur Aini Zakaria
Zuraini Ali Shah
Shahreen Kasim


Existence of bioinformatics is to increase the further understanding of biological process. Proteins structure is one of the major challenges in structural bioinformatics. With former knowledge of the structure, the quality of secondary structure, prediction of tertiary structure, and prediction function of amino acid from its sequence increase significantly. Recently, the gap between sequence known and structure known proteins had increase dramatically. So it is compulsory to understand on proteins structure to overcome this problem so further functional analysis could be easier. The research applying RPCA algorithm to extract the essential features from the original high-dimensional input vectors. Then the process followed by experimenting SVM with RBF kernel. The proposed method obtains accuracy by 84.41% for training dataset and 89.09% for testing dataset. The result then compared with the same method but PCA was applied as the feature extraction. The prediction assessment is conducted by analyzing the accuracy and number of principal component selected. It shows that combination of RPCA and SVM produce a high quality classification of protein structure

Article Details

How to Cite
N. A. Zakaria, Z. Ali Shah, and S. Kasim, “Protein Structure Prediction Using Robust Principal Component Analysis and Support Vector Machine”, Int. J. Data. Science., vol. 1, no. 1, pp. 14-17, May 2020.


Croux, C. and Ruiz-Gazen, A. (2005), "High breakdown estimators for principal components: the Projection-pursuit approach revisited", Journal of Multivariate Analysis, 95, 206-226

Ding, Chris HQ, and Inna Dubchak. (2001), "Multi-class protein fold recognition using support vector machines and neural networks." Bioinformatics 17.4: 349-358.

Singh, Lavneet, Girija Chetty, and Dharmendra Sharma.(2012) "A novel approach to protein structure prediction using PCA or LDA based extreme learning machines." Neural Information Processing. Springer Berlin Heidelberg.

Li L, Cui X, Yu S, Zhang Y, Luo Z, Yang H, et al. PSSP-RFE: Accurate Prediction of Protein structure by Recursive Feature Extraction from PSI-BLAST Profile, PhysicalChemical Property and Functional Annotations.” PLoS ONE 9(3): e92863. doi:10.1371/journal.pone.0092863, (2014)