Epitope-based vaccine design has emerged as a promising method to treat many diseases including allergy and cancers. However, effective activation of anti-inflammatory responses greatly depends on successful identifications of T-cell epitopes. When allergens enter the body, the allergenic proteins are recognized by antigen-presenting cell (APC) and delivered into endoplasmic reticulum to bind to major histocompatibility complex (MHC) II molecules. Then, the peptide-MHC complexes are presented to T-cell receptors and then activate the B-cells. Despite extensive studies in allergen prediction and epitope identification, current approaches still suffer from the problems of low positive predictive values (i.e., high false positives) and the lack of interpretable biological features. Thus, developments of allergen prediction from sequences have become highly important to facilitate in silico epitope-based vaccine design. In this project, we propose a systematic approach to predict allergenic proteins based on machine learning algorithms. This study can help discover new prophylactic and therapeutic vaccines for dengue fever, influenza, and human immunodeficiency virus (HIV). Moreover, we analyze immunological features that can provide valuable insights into immunotherapies of cancer, allergy, and autoimmune diseases in translational bioinformatics. In this two-year project, the results about HIV type 1 (HIV-1) protease cleavage site prediction in the first year have been described in the midterm report. For the final report of this two-year project, we developed refined methods to improve allergenic protein prediction, especially for proteins with low sequence identities with known allergens. First, we collect allergenic protein data from literature and databases, and construct an updated allergen benchmark data set. Then, encoding schemes are used to represent amino acid compositions, dipeptide compositions, and pseudo amino acid compositions to capture physicochemical properties. Finally, the features are predicted by machine learning algorithms, in which decision trees, logistic regressions, and artificial neural networks are incorporated to improve predictive performance. Finally, interpretable biological features proposed in our method are validated by immunologists with references to vaccine design. In this project, we endeavor to develop improved immunoinformatics tools and propose interpretable biological features that can be used collectively to assist immunologists in epitope-based vaccine design for translational bioinformatics.
|Effective start/end date||8/1/16 → 10/31/17|
- allergen prediction
- cross-reactivity analysis
- machine learning algorithms
- in silico vaccine design