Rule-based knowledge acquisition method for promoter prediction in human and drosophila species

Wen Lin Huang, Chun Wei Tung, Chyn Liaw, Hui Ling Huang, Shinn Ying Ho

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

The rapid and reliable identification of promoter regions is important when the number of genomes to be sequenced is increasing very speedily. Various methods have been developed but few methods investigate the effectiveness of sequence-based features in promoter prediction. This study proposes a knowledge acquisition method (named PromHD) based on if-then rules for promoter prediction in human and Drosophila species. PromHD utilizes an effective feature-mining algorithm and a reference feature set of 167 DNA sequence descriptors (DNASDs), comprising three descriptors of physicochemical properties (absorption maxima, molecular weight, and molar absorption coefficient), 128 top-ranked descriptors of 4-mer motifs, and 36 global sequence descriptors. PromHD identifies two feature subsets with 99 and 74 DNASDs and yields test accuracies of 96.4% and 97.5% in human and Drosophila species, respectively. Based on the 99- and 74-dimensional feature vectors, PromHD generates several if-then rules by using the decision tree mechanism for promoter prediction. The top-ranked informative rules with high certainty grades reveal that the global sequence descriptor, the length of nucleotide A at the first position of the sequence, and two physicochemical properties, absorption maxima and molecular weight, are effective in distinguishing promoters from non-promoters in human and Drosophila species, respectively.

Original languageEnglish
Article number327306
JournalThe Scientific World Journal
Volume2014
DOIs
Publication statusPublished - Mar 6 2014
Externally publishedYes

Fingerprint

Knowledge acquisition
Drosophila
DNA sequences
physicochemical property
prediction
Molecular Weight
Molecular weight
DNA
Decision Trees
absorption coefficient
Decision trees
Genetic Promoter Regions
Nucleotides
genome
Genes
Genome
method
Physicochemical Absorption

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Environmental Science(all)

Cite this

Rule-based knowledge acquisition method for promoter prediction in human and drosophila species. / Huang, Wen Lin; Tung, Chun Wei; Liaw, Chyn; Huang, Hui Ling; Ho, Shinn Ying.

In: The Scientific World Journal, Vol. 2014, 327306, 06.03.2014.

Research output: Contribution to journalArticle

@article{f1619a6c181740a39db4f8819f28a915,
title = "Rule-based knowledge acquisition method for promoter prediction in human and drosophila species",
abstract = "The rapid and reliable identification of promoter regions is important when the number of genomes to be sequenced is increasing very speedily. Various methods have been developed but few methods investigate the effectiveness of sequence-based features in promoter prediction. This study proposes a knowledge acquisition method (named PromHD) based on if-then rules for promoter prediction in human and Drosophila species. PromHD utilizes an effective feature-mining algorithm and a reference feature set of 167 DNA sequence descriptors (DNASDs), comprising three descriptors of physicochemical properties (absorption maxima, molecular weight, and molar absorption coefficient), 128 top-ranked descriptors of 4-mer motifs, and 36 global sequence descriptors. PromHD identifies two feature subsets with 99 and 74 DNASDs and yields test accuracies of 96.4{\%} and 97.5{\%} in human and Drosophila species, respectively. Based on the 99- and 74-dimensional feature vectors, PromHD generates several if-then rules by using the decision tree mechanism for promoter prediction. The top-ranked informative rules with high certainty grades reveal that the global sequence descriptor, the length of nucleotide A at the first position of the sequence, and two physicochemical properties, absorption maxima and molecular weight, are effective in distinguishing promoters from non-promoters in human and Drosophila species, respectively.",
author = "Huang, {Wen Lin} and Tung, {Chun Wei} and Chyn Liaw and Huang, {Hui Ling} and Ho, {Shinn Ying}",
year = "2014",
month = "3",
day = "6",
doi = "10.1155/2014/327306",
language = "English",
volume = "2014",
journal = "The Scientific World Journal",
issn = "2356-6140",
publisher = "Hindawi Publishing Corporation",

}

TY - JOUR

T1 - Rule-based knowledge acquisition method for promoter prediction in human and drosophila species

AU - Huang, Wen Lin

AU - Tung, Chun Wei

AU - Liaw, Chyn

AU - Huang, Hui Ling

AU - Ho, Shinn Ying

PY - 2014/3/6

Y1 - 2014/3/6

N2 - The rapid and reliable identification of promoter regions is important when the number of genomes to be sequenced is increasing very speedily. Various methods have been developed but few methods investigate the effectiveness of sequence-based features in promoter prediction. This study proposes a knowledge acquisition method (named PromHD) based on if-then rules for promoter prediction in human and Drosophila species. PromHD utilizes an effective feature-mining algorithm and a reference feature set of 167 DNA sequence descriptors (DNASDs), comprising three descriptors of physicochemical properties (absorption maxima, molecular weight, and molar absorption coefficient), 128 top-ranked descriptors of 4-mer motifs, and 36 global sequence descriptors. PromHD identifies two feature subsets with 99 and 74 DNASDs and yields test accuracies of 96.4% and 97.5% in human and Drosophila species, respectively. Based on the 99- and 74-dimensional feature vectors, PromHD generates several if-then rules by using the decision tree mechanism for promoter prediction. The top-ranked informative rules with high certainty grades reveal that the global sequence descriptor, the length of nucleotide A at the first position of the sequence, and two physicochemical properties, absorption maxima and molecular weight, are effective in distinguishing promoters from non-promoters in human and Drosophila species, respectively.

AB - The rapid and reliable identification of promoter regions is important when the number of genomes to be sequenced is increasing very speedily. Various methods have been developed but few methods investigate the effectiveness of sequence-based features in promoter prediction. This study proposes a knowledge acquisition method (named PromHD) based on if-then rules for promoter prediction in human and Drosophila species. PromHD utilizes an effective feature-mining algorithm and a reference feature set of 167 DNA sequence descriptors (DNASDs), comprising three descriptors of physicochemical properties (absorption maxima, molecular weight, and molar absorption coefficient), 128 top-ranked descriptors of 4-mer motifs, and 36 global sequence descriptors. PromHD identifies two feature subsets with 99 and 74 DNASDs and yields test accuracies of 96.4% and 97.5% in human and Drosophila species, respectively. Based on the 99- and 74-dimensional feature vectors, PromHD generates several if-then rules by using the decision tree mechanism for promoter prediction. The top-ranked informative rules with high certainty grades reveal that the global sequence descriptor, the length of nucleotide A at the first position of the sequence, and two physicochemical properties, absorption maxima and molecular weight, are effective in distinguishing promoters from non-promoters in human and Drosophila species, respectively.

UR - http://www.scopus.com/inward/record.url?scp=84896893097&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84896893097&partnerID=8YFLogxK

U2 - 10.1155/2014/327306

DO - 10.1155/2014/327306

M3 - Article

VL - 2014

JO - The Scientific World Journal

JF - The Scientific World Journal

SN - 2356-6140

M1 - 327306

ER -