Computer Science and Computer Engineering Undergraduate Honors Theses

A Support Vector Machine Base Model for Predicting Heparin-Binding Proteins Using Biological Metrics and XB Patterns as Features

Joseph W. Sirrianni, University of Arkansas, FayettevilleFollow

Date of Graduation

5-2016

Document Type

Thesis

Degree Name

Bachelor of Science

Degree Level

Undergraduate

Department

Computer Science and Computer Engineering

Advisor/Mentor

Li, Wing

Committee Member/Reader

Beavers, Merwin

Committee Member/Second Reader

Patitz, Matthew

Abstract

Heparin is a highly sulphated and negatively charged polysaccharides belonging to the glycosamino- glycans(GAGs) family. It is widely used in medical treatments as an injectable anticoagulant. Although many heparin-binding proteins have been identified through experimental studies, there are still many proteins needing to be classified as heparin-binding or not. Many studies have been aimed at prediction of heparin binding patterns or motifs in the primary structure of proteins. For example XBBXBX and XBBBXXBX are two well-known patterns or motifs. In spite of intensive studies, still no good model has emerged which reasonably predicts proteins in the protein database as heparin-binding or not. The main objective of this study is to be able to predict heparin-binding proteins from their amino acid sequence information. A supervised learning algorithm based on support vector machine (SVM) is applied to two data sets; each contains 70 proteins, which are known to be heparin-binding and non-heparin-binding respectively. With appropriate adjustment of the parameters of the support vector machines, severl models are produced by the computer algorithm. These models are used to classify those proteins that are not used in the learning or training. The testing set contains 137 proteins with 104 of them are known to be heparin-binding and the rest of 33 proteins are known to be non-heparin-binding. For the testing set, the models achieve ~75% accuracy in predicting heparin binding proteins. For the complete data set, the model achieves ~87% accuracy. The current models use different combinations of XB patterns and biological metrics as features in a higher dimensional vector space.

Citation

Sirrianni, J. W. (2016). A Support Vector Machine Base Model for Predicting Heparin-Binding Proteins Using Biological Metrics and XB Patterns as Features. Computer Science and Computer Engineering Undergraduate Honors Theses Retrieved from https://scholarworks.uark.edu/csceuht/39

Download

Included in

Biochemistry Commons, Numerical Analysis and Scientific Computing Commons

COinS

Computer Science and Computer Engineering Undergraduate Honors Theses

A Support Vector Machine Base Model for Predicting Heparin-Binding Proteins Using Biological Metrics and XB Patterns as Features

Date of Graduation

Document Type

Degree Name

Degree Level

Department

Advisor/Mentor

Committee Member/Reader

Committee Member/Second Reader

Abstract

Citation

Included in

Browse

Links

Search

Computer Science and Computer Engineering Undergraduate Honors Theses

A Support Vector Machine Base Model for Predicting Heparin-Binding Proteins Using Biological Metrics and XB Patterns as Features

Author

Date of Graduation

Document Type

Degree Name

Degree Level

Department

Advisor/Mentor

Committee Member/Reader

Committee Member/Second Reader

Abstract

Citation

Included in

Share

Browse

Links

Search