Human Pol II promoter prediction by using nucleotide property composition features

Wen Lin Huang, Chun Wei Tung, Shinn Ying Ho

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

RNA polymerase II (Pol II) promoter is a key region that regulates differential transcription of protein coding genes. The identification of the RNA polymerase II (Pol II) promoter is one of the most challenging problems in genome annotation. Though many promoter prediction methods and tools have been developed, they have not yet extracted informative features from large-scale DNA sequences to improve predictive accuracy. A prediction method ProPolyII, which involves mining informative nucleotide property composition (NPC) features, is proposed to design a support vector machine-based classifier. An existing data set HumP (1872 human promoters and 1870 non-promoters) is used to evaluate ProPolyII for promoter prediction. ProPolyII yields 70 informative NPC features with training and test accuracies of 99.1% and 95.1%, respectively. The 70 NPC features consist of 46 4-mer motifs, 3 nucleotide properties and 21 global descriptors. The accuracies are better than those of Prom-Machine (94.9% and 91.1%) and M1 (97.4% and 93.6%) which uses top 128 4-mer motifs and 36 global descriptors, respectively. The high predictive performance indicates that ProPolyII can be beneficial in the identification of promoters comparative to other methods.

Original languageEnglish
Title of host publicationISB 2010 Proceedings - International Symposium on Biocomputing
DOIs
Publication statusPublished - May 3 2010
Externally publishedYes
EventInternational Symposium on Biocomputing, ISB 2010 - Calicut, Kerala, India
Duration: Feb 15 2010Feb 17 2010

Publication series

NameISB 2010 Proceedings - International Symposium on Biocomputing

Conference

ConferenceInternational Symposium on Biocomputing, ISB 2010
CountryIndia
CityCalicut, Kerala
Period2/15/102/17/10

Fingerprint

Nucleotides
RNA Polymerase II
RNA
Chemical analysis
Nucleotide Motifs
Genes
DNA sequences
Transcription
Genome
Support vector machines
Classifiers
Proteins

Keywords

  • Global descriptors
  • Nucleotide property
  • Promoter
  • Support vector machine

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Computational Theory and Mathematics
  • Software
  • Pharmaceutical Science

Cite this

Huang, W. L., Tung, C. W., & Ho, S. Y. (2010). Human Pol II promoter prediction by using nucleotide property composition features. In ISB 2010 Proceedings - International Symposium on Biocomputing [1722050] (ISB 2010 Proceedings - International Symposium on Biocomputing). https://doi.org/10.1145/1722024.1722050

Human Pol II promoter prediction by using nucleotide property composition features. / Huang, Wen Lin; Tung, Chun Wei; Ho, Shinn Ying.

ISB 2010 Proceedings - International Symposium on Biocomputing. 2010. 1722050 (ISB 2010 Proceedings - International Symposium on Biocomputing).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Huang, WL, Tung, CW & Ho, SY 2010, Human Pol II promoter prediction by using nucleotide property composition features. in ISB 2010 Proceedings - International Symposium on Biocomputing., 1722050, ISB 2010 Proceedings - International Symposium on Biocomputing, International Symposium on Biocomputing, ISB 2010, Calicut, Kerala, India, 2/15/10. https://doi.org/10.1145/1722024.1722050
Huang WL, Tung CW, Ho SY. Human Pol II promoter prediction by using nucleotide property composition features. In ISB 2010 Proceedings - International Symposium on Biocomputing. 2010. 1722050. (ISB 2010 Proceedings - International Symposium on Biocomputing). https://doi.org/10.1145/1722024.1722050
Huang, Wen Lin ; Tung, Chun Wei ; Ho, Shinn Ying. / Human Pol II promoter prediction by using nucleotide property composition features. ISB 2010 Proceedings - International Symposium on Biocomputing. 2010. (ISB 2010 Proceedings - International Symposium on Biocomputing).
@inproceedings{27b8fb52ddf044fc960d6c6fb366952f,
title = "Human Pol II promoter prediction by using nucleotide property composition features",
abstract = "RNA polymerase II (Pol II) promoter is a key region that regulates differential transcription of protein coding genes. The identification of the RNA polymerase II (Pol II) promoter is one of the most challenging problems in genome annotation. Though many promoter prediction methods and tools have been developed, they have not yet extracted informative features from large-scale DNA sequences to improve predictive accuracy. A prediction method ProPolyII, which involves mining informative nucleotide property composition (NPC) features, is proposed to design a support vector machine-based classifier. An existing data set HumP (1872 human promoters and 1870 non-promoters) is used to evaluate ProPolyII for promoter prediction. ProPolyII yields 70 informative NPC features with training and test accuracies of 99.1{\%} and 95.1{\%}, respectively. The 70 NPC features consist of 46 4-mer motifs, 3 nucleotide properties and 21 global descriptors. The accuracies are better than those of Prom-Machine (94.9{\%} and 91.1{\%}) and M1 (97.4{\%} and 93.6{\%}) which uses top 128 4-mer motifs and 36 global descriptors, respectively. The high predictive performance indicates that ProPolyII can be beneficial in the identification of promoters comparative to other methods.",
keywords = "Global descriptors, Nucleotide property, Promoter, Support vector machine",
author = "Huang, {Wen Lin} and Tung, {Chun Wei} and Ho, {Shinn Ying}",
year = "2010",
month = "5",
day = "3",
doi = "10.1145/1722024.1722050",
language = "English",
isbn = "9781605587226",
series = "ISB 2010 Proceedings - International Symposium on Biocomputing",
booktitle = "ISB 2010 Proceedings - International Symposium on Biocomputing",

}

TY - GEN

T1 - Human Pol II promoter prediction by using nucleotide property composition features

AU - Huang, Wen Lin

AU - Tung, Chun Wei

AU - Ho, Shinn Ying

PY - 2010/5/3

Y1 - 2010/5/3

N2 - RNA polymerase II (Pol II) promoter is a key region that regulates differential transcription of protein coding genes. The identification of the RNA polymerase II (Pol II) promoter is one of the most challenging problems in genome annotation. Though many promoter prediction methods and tools have been developed, they have not yet extracted informative features from large-scale DNA sequences to improve predictive accuracy. A prediction method ProPolyII, which involves mining informative nucleotide property composition (NPC) features, is proposed to design a support vector machine-based classifier. An existing data set HumP (1872 human promoters and 1870 non-promoters) is used to evaluate ProPolyII for promoter prediction. ProPolyII yields 70 informative NPC features with training and test accuracies of 99.1% and 95.1%, respectively. The 70 NPC features consist of 46 4-mer motifs, 3 nucleotide properties and 21 global descriptors. The accuracies are better than those of Prom-Machine (94.9% and 91.1%) and M1 (97.4% and 93.6%) which uses top 128 4-mer motifs and 36 global descriptors, respectively. The high predictive performance indicates that ProPolyII can be beneficial in the identification of promoters comparative to other methods.

AB - RNA polymerase II (Pol II) promoter is a key region that regulates differential transcription of protein coding genes. The identification of the RNA polymerase II (Pol II) promoter is one of the most challenging problems in genome annotation. Though many promoter prediction methods and tools have been developed, they have not yet extracted informative features from large-scale DNA sequences to improve predictive accuracy. A prediction method ProPolyII, which involves mining informative nucleotide property composition (NPC) features, is proposed to design a support vector machine-based classifier. An existing data set HumP (1872 human promoters and 1870 non-promoters) is used to evaluate ProPolyII for promoter prediction. ProPolyII yields 70 informative NPC features with training and test accuracies of 99.1% and 95.1%, respectively. The 70 NPC features consist of 46 4-mer motifs, 3 nucleotide properties and 21 global descriptors. The accuracies are better than those of Prom-Machine (94.9% and 91.1%) and M1 (97.4% and 93.6%) which uses top 128 4-mer motifs and 36 global descriptors, respectively. The high predictive performance indicates that ProPolyII can be beneficial in the identification of promoters comparative to other methods.

KW - Global descriptors

KW - Nucleotide property

KW - Promoter

KW - Support vector machine

UR - http://www.scopus.com/inward/record.url?scp=77951548363&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77951548363&partnerID=8YFLogxK

U2 - 10.1145/1722024.1722050

DO - 10.1145/1722024.1722050

M3 - Conference contribution

SN - 9781605587226

T3 - ISB 2010 Proceedings - International Symposium on Biocomputing

BT - ISB 2010 Proceedings - International Symposium on Biocomputing

ER -