An immune system is a system of biological structures and processes within an organism that protects against diseases. When a host is infected by pathogens, physiological functions of the immune system trigger protective responses induced by parts of proteins known as epitopes. However, despite recent technical advances, experimental determination of epitope binding remains time-consuming and labor-intensive; thus, using computational approaches to extract biological features from sequences have become highly important to understand the relationships between hosts and pathogens. In this project, we use linear B-cell epitope identification as an example and propose a prediction method, in which incorporates various physicochemical properties as feature representation, correspondence analysis as feature reduction, and support vector machines as classification techniques, for immunological bioinformatics protein analysis. The development of bioinformatics approaches can help discover new vaccines and therapies for human immunodeficiency virus (HIV), malaria, tuberculosis, and influenza. In addition, the proposed biological features can provide valuable insights into the nature of cancer, allergy, and autoimmune diseases. Moreover, due to the generality of our approach, we will extend the proposed method to tackle research problems, including epitope prediction, interaction site identification, and immunological protein analysis. First, we will aim at developing predictors for linear epitopes, including B-cell epitopes and cytotoxic T-lymphocyte epitopes. First, we will identify the biological, physical, and chemical factors that are key players in determining immunogenicity. Then, feature extraction, feature representation, and feature reduction will be applied on various physicochemical properties. Finally, a hybrid computational prediction system with optimized performance will be proposed and biological insights will be drawn from experiment results with references to vaccine design. Second, we will develop methods to further analyze protein interaction sites based on biological features proposed from epitope prediction. We will first apply our experience learned from RNA-binding site identification to predict interacting residues in proteins. Afterwards, since post-translational modification (PTM) sites have been shown relevant to epitope binding, we will incorporate more biological features and extend binding site analysis to predict PTM sites, especially for phosphorylation, glycosylation, and methylation. Third, immunological bioinformatics has become an important research field for the analysis of human disease and translational medicine. We will extend our methods to conformational epitope discovery, major histocompatibility complex (MHC)/human leukocyte antigen (HLA) binding peptide identification, and immunological protein classification, such as prediction of bacterial toxins, serum proteins, and virus proteins. In the coming three years, we will endeavor to develop useful bioinformatics tools and propose interpretable biological features that can be used collectively to assist biologists in inferring and annotating protein immunological functions.
|Effective start/end date||8/1/13 → 10/31/14|
- B-cell epitope prediction
- position-specific scoring matrix
- support vector machines
- feature reduction
- correspondence analysis