GPMiner

An integrated system for mining combinatorial cis-regulatory elements in mammalian gene group

Tzong Yi Lee, Wen Chi Chang, Justin Bo Kai Hsu, Tzu Hao Chang, Dray Ming Shien

Research output: Chapter in Book/Report/Conference proceedingConference contribution

26 Citations (Scopus)

Abstract

Background: Sequence features in promoter regions are involved in regulating gene transcription initiation.Although numerous computational methods have been developed for predicting transcriptional start sites (TSSs) or transcription factor (TF) binding sites (TFBSs), they lack annotations for do not consider some important regulatory features such as CpG islands, tandem repeats, the TATA box, CCAAT box, GC box, over-represented oligonucleotides, DNA stability, and GC content. Additionally, the combinatorial interaction of TFs regulates the gene group that is associated with same expression pattern. To investigate gene transcriptional regulation, an integrated system that annotates regulatory features in a promoter sequence and detects co-regulation of TFs in a group of genes is needed.

Results: This work identifies TSSs and regulatory features in a promoter sequence, and recognizes co-occurrence of cis-regulatory elements in co-expressed genes using a novel system. Three well-known TSS prediction tools are incorporated with orthologous conserved features, such as CpG islands, nucleotide composition, over-represented hexamer nucleotides, and DNA stability, to construct the novel Gene Promoter Miner (GPMiner) using a support vector machine (SVM). According to five-fold cross-validation results, the predictive sensitivity and specificity are both roughly 80%. The proposed system allows users to input a group of gene names/symbols, enabling the cooccurrence of TFBSs to be determined. Additionally, an input sequence can also be analyzed for homogeneity of experimental mammalian promoter sequences, and conserved regulatory features between homologous promoters can be observed through cross-species analysis. After identifying promoter regions, regulatory features are visualized graphically to facilitate gene promoter observations.

Conclusions: The GPMiner, which has a user-friendly input/output interface, has numerous benefits in analyzing human and mouse promoters. The proposed system is freely available at http://GPMiner.mbc.nctu.edu.tw/.

Original languageEnglish
Title of host publicationSeries on Advances in Bioinformatics and Computational Biology
PublisherImperial College Press
Volume13
DOIs
Publication statusPublished - 2012
Externally publishedYes
Event10th Asia Pacific Bioinformatics Conference, APBC 2012 - Melbourne, Australia
Duration: Jan 17 2012Jan 19 2012

Other

Other10th Asia Pacific Bioinformatics Conference, APBC 2012
CountryAustralia
CityMelbourne
Period1/17/121/19/12

Fingerprint

Miners
Genes
CpG Islands
Binding sites
Nucleotides
Genetic Promoter Regions
DNA
Binding Sites
TATA Box
Tandem Repeat Sequences
Transcription factors
Oligonucleotides
Transcription
Conserved Sequence
Computational methods
Base Composition
Gene expression
Support vector machines
Names
Transcription Factors

ASJC Scopus subject areas

  • Bioengineering
  • Information Systems
  • Biotechnology
  • Genetics

Cite this

Lee, T. Y., Chang, W. C., Hsu, J. B. K., Chang, T. H., & Shien, D. M. (2012). GPMiner: An integrated system for mining combinatorial cis-regulatory elements in mammalian gene group. In Series on Advances in Bioinformatics and Computational Biology (Vol. 13). [S3] Imperial College Press. https://doi.org/10.1186/1471-2164-13-S1-S3

GPMiner : An integrated system for mining combinatorial cis-regulatory elements in mammalian gene group. / Lee, Tzong Yi; Chang, Wen Chi; Hsu, Justin Bo Kai; Chang, Tzu Hao; Shien, Dray Ming.

Series on Advances in Bioinformatics and Computational Biology. Vol. 13 Imperial College Press, 2012. S3.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lee, TY, Chang, WC, Hsu, JBK, Chang, TH & Shien, DM 2012, GPMiner: An integrated system for mining combinatorial cis-regulatory elements in mammalian gene group. in Series on Advances in Bioinformatics and Computational Biology. vol. 13, S3, Imperial College Press, 10th Asia Pacific Bioinformatics Conference, APBC 2012, Melbourne, Australia, 1/17/12. https://doi.org/10.1186/1471-2164-13-S1-S3
Lee TY, Chang WC, Hsu JBK, Chang TH, Shien DM. GPMiner: An integrated system for mining combinatorial cis-regulatory elements in mammalian gene group. In Series on Advances in Bioinformatics and Computational Biology. Vol. 13. Imperial College Press. 2012. S3 https://doi.org/10.1186/1471-2164-13-S1-S3
Lee, Tzong Yi ; Chang, Wen Chi ; Hsu, Justin Bo Kai ; Chang, Tzu Hao ; Shien, Dray Ming. / GPMiner : An integrated system for mining combinatorial cis-regulatory elements in mammalian gene group. Series on Advances in Bioinformatics and Computational Biology. Vol. 13 Imperial College Press, 2012.
@inproceedings{479784972385451faa5c882cacd60a2e,
title = "GPMiner: An integrated system for mining combinatorial cis-regulatory elements in mammalian gene group",
abstract = "Background: Sequence features in promoter regions are involved in regulating gene transcription initiation.Although numerous computational methods have been developed for predicting transcriptional start sites (TSSs) or transcription factor (TF) binding sites (TFBSs), they lack annotations for do not consider some important regulatory features such as CpG islands, tandem repeats, the TATA box, CCAAT box, GC box, over-represented oligonucleotides, DNA stability, and GC content. Additionally, the combinatorial interaction of TFs regulates the gene group that is associated with same expression pattern. To investigate gene transcriptional regulation, an integrated system that annotates regulatory features in a promoter sequence and detects co-regulation of TFs in a group of genes is needed.Results: This work identifies TSSs and regulatory features in a promoter sequence, and recognizes co-occurrence of cis-regulatory elements in co-expressed genes using a novel system. Three well-known TSS prediction tools are incorporated with orthologous conserved features, such as CpG islands, nucleotide composition, over-represented hexamer nucleotides, and DNA stability, to construct the novel Gene Promoter Miner (GPMiner) using a support vector machine (SVM). According to five-fold cross-validation results, the predictive sensitivity and specificity are both roughly 80{\%}. The proposed system allows users to input a group of gene names/symbols, enabling the cooccurrence of TFBSs to be determined. Additionally, an input sequence can also be analyzed for homogeneity of experimental mammalian promoter sequences, and conserved regulatory features between homologous promoters can be observed through cross-species analysis. After identifying promoter regions, regulatory features are visualized graphically to facilitate gene promoter observations.Conclusions: The GPMiner, which has a user-friendly input/output interface, has numerous benefits in analyzing human and mouse promoters. The proposed system is freely available at http://GPMiner.mbc.nctu.edu.tw/.",
author = "Lee, {Tzong Yi} and Chang, {Wen Chi} and Hsu, {Justin Bo Kai} and Chang, {Tzu Hao} and Shien, {Dray Ming}",
year = "2012",
doi = "10.1186/1471-2164-13-S1-S3",
language = "English",
volume = "13",
booktitle = "Series on Advances in Bioinformatics and Computational Biology",
publisher = "Imperial College Press",

}

TY - GEN

T1 - GPMiner

T2 - An integrated system for mining combinatorial cis-regulatory elements in mammalian gene group

AU - Lee, Tzong Yi

AU - Chang, Wen Chi

AU - Hsu, Justin Bo Kai

AU - Chang, Tzu Hao

AU - Shien, Dray Ming

PY - 2012

Y1 - 2012

N2 - Background: Sequence features in promoter regions are involved in regulating gene transcription initiation.Although numerous computational methods have been developed for predicting transcriptional start sites (TSSs) or transcription factor (TF) binding sites (TFBSs), they lack annotations for do not consider some important regulatory features such as CpG islands, tandem repeats, the TATA box, CCAAT box, GC box, over-represented oligonucleotides, DNA stability, and GC content. Additionally, the combinatorial interaction of TFs regulates the gene group that is associated with same expression pattern. To investigate gene transcriptional regulation, an integrated system that annotates regulatory features in a promoter sequence and detects co-regulation of TFs in a group of genes is needed.Results: This work identifies TSSs and regulatory features in a promoter sequence, and recognizes co-occurrence of cis-regulatory elements in co-expressed genes using a novel system. Three well-known TSS prediction tools are incorporated with orthologous conserved features, such as CpG islands, nucleotide composition, over-represented hexamer nucleotides, and DNA stability, to construct the novel Gene Promoter Miner (GPMiner) using a support vector machine (SVM). According to five-fold cross-validation results, the predictive sensitivity and specificity are both roughly 80%. The proposed system allows users to input a group of gene names/symbols, enabling the cooccurrence of TFBSs to be determined. Additionally, an input sequence can also be analyzed for homogeneity of experimental mammalian promoter sequences, and conserved regulatory features between homologous promoters can be observed through cross-species analysis. After identifying promoter regions, regulatory features are visualized graphically to facilitate gene promoter observations.Conclusions: The GPMiner, which has a user-friendly input/output interface, has numerous benefits in analyzing human and mouse promoters. The proposed system is freely available at http://GPMiner.mbc.nctu.edu.tw/.

AB - Background: Sequence features in promoter regions are involved in regulating gene transcription initiation.Although numerous computational methods have been developed for predicting transcriptional start sites (TSSs) or transcription factor (TF) binding sites (TFBSs), they lack annotations for do not consider some important regulatory features such as CpG islands, tandem repeats, the TATA box, CCAAT box, GC box, over-represented oligonucleotides, DNA stability, and GC content. Additionally, the combinatorial interaction of TFs regulates the gene group that is associated with same expression pattern. To investigate gene transcriptional regulation, an integrated system that annotates regulatory features in a promoter sequence and detects co-regulation of TFs in a group of genes is needed.Results: This work identifies TSSs and regulatory features in a promoter sequence, and recognizes co-occurrence of cis-regulatory elements in co-expressed genes using a novel system. Three well-known TSS prediction tools are incorporated with orthologous conserved features, such as CpG islands, nucleotide composition, over-represented hexamer nucleotides, and DNA stability, to construct the novel Gene Promoter Miner (GPMiner) using a support vector machine (SVM). According to five-fold cross-validation results, the predictive sensitivity and specificity are both roughly 80%. The proposed system allows users to input a group of gene names/symbols, enabling the cooccurrence of TFBSs to be determined. Additionally, an input sequence can also be analyzed for homogeneity of experimental mammalian promoter sequences, and conserved regulatory features between homologous promoters can be observed through cross-species analysis. After identifying promoter regions, regulatory features are visualized graphically to facilitate gene promoter observations.Conclusions: The GPMiner, which has a user-friendly input/output interface, has numerous benefits in analyzing human and mouse promoters. The proposed system is freely available at http://GPMiner.mbc.nctu.edu.tw/.

UR - http://www.scopus.com/inward/record.url?scp=84862268542&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84862268542&partnerID=8YFLogxK

U2 - 10.1186/1471-2164-13-S1-S3

DO - 10.1186/1471-2164-13-S1-S3

M3 - Conference contribution

VL - 13

BT - Series on Advances in Bioinformatics and Computational Biology

PB - Imperial College Press

ER -