A model-based circular binary segmentation algorithm for the analysis of array CGH data

Fang Han Hsu, Hung I H Chen, Mong Hsun Tsai, Liang Chuan Lai, Chi Cheng Huang, Shih Hsin Tu, Eric Y. Chuang, Yidong Chen

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Background: Circular Binary Segmentation (CBS) is a permutation-based algorithm for array Comparative Genomic Hybridization (aCGH) data analysis. CBS accurately segments data by detecting change-points using a maximal-t test; but extensive computational burden is involved for evaluating the significance of change-points using permutations. A recent implementation utilizing a hybrid method and early stopping rules (hybrid CBS) to improve the performance in speed was subsequently proposed. However, a time analysis revealed that a major portion of computation time of the hybrid CBS was still spent on permutation. In addition, what the hybrid method provides is an approximation of the significance upper bound or lower bound, not an approximation of the significance of change-points itself. Results: We developed a novel model-based algorithm, extreme-value based CBS (eCBS), which limits permutations and provides robust results without loss of accuracy. Thousands of aCGH data under null hypothesis were simulated in advance based on a variety of non-normal assumptions, and the corresponding maximal-t distribution was modeled by the Generalized Extreme Value (GEV) distribution. The modeling results, which associate characteristics of aCGH data to the GEV parameters, constitute lookup tables (eXtreme model). Using the eXtreme model, the significance of change-points could be evaluated in a constant time complexity through a table lookup process. Conclusions: A novel algorithm, eCBS, was developed in this study. The current implementation of eCBS consistently outperforms the hybrid CBS 4× to 20× in computation time without loss of accuracy. Source codes, supplementary materials, supplementary figures, and supplementary tables can be found at http://ntumaps.cgm.ntu.edu.tw/eCBSsupplementary.

Original languageEnglish
Article number394
JournalBMC Research Notes
Volume4
DOIs
Publication statusPublished - 2011
Externally publishedYes

Fingerprint

Comparative Genomic Hybridization
Table lookup

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Medicine(all)

Cite this

A model-based circular binary segmentation algorithm for the analysis of array CGH data. / Hsu, Fang Han; Chen, Hung I H; Tsai, Mong Hsun; Lai, Liang Chuan; Huang, Chi Cheng; Tu, Shih Hsin; Chuang, Eric Y.; Chen, Yidong.

In: BMC Research Notes, Vol. 4, 394, 2011.

Research output: Contribution to journalArticle

Hsu, Fang Han ; Chen, Hung I H ; Tsai, Mong Hsun ; Lai, Liang Chuan ; Huang, Chi Cheng ; Tu, Shih Hsin ; Chuang, Eric Y. ; Chen, Yidong. / A model-based circular binary segmentation algorithm for the analysis of array CGH data. In: BMC Research Notes. 2011 ; Vol. 4.
@article{f1861963b69c480ba5dcb7d83ffa1206,
title = "A model-based circular binary segmentation algorithm for the analysis of array CGH data",
abstract = "Background: Circular Binary Segmentation (CBS) is a permutation-based algorithm for array Comparative Genomic Hybridization (aCGH) data analysis. CBS accurately segments data by detecting change-points using a maximal-t test; but extensive computational burden is involved for evaluating the significance of change-points using permutations. A recent implementation utilizing a hybrid method and early stopping rules (hybrid CBS) to improve the performance in speed was subsequently proposed. However, a time analysis revealed that a major portion of computation time of the hybrid CBS was still spent on permutation. In addition, what the hybrid method provides is an approximation of the significance upper bound or lower bound, not an approximation of the significance of change-points itself. Results: We developed a novel model-based algorithm, extreme-value based CBS (eCBS), which limits permutations and provides robust results without loss of accuracy. Thousands of aCGH data under null hypothesis were simulated in advance based on a variety of non-normal assumptions, and the corresponding maximal-t distribution was modeled by the Generalized Extreme Value (GEV) distribution. The modeling results, which associate characteristics of aCGH data to the GEV parameters, constitute lookup tables (eXtreme model). Using the eXtreme model, the significance of change-points could be evaluated in a constant time complexity through a table lookup process. Conclusions: A novel algorithm, eCBS, was developed in this study. The current implementation of eCBS consistently outperforms the hybrid CBS 4× to 20× in computation time without loss of accuracy. Source codes, supplementary materials, supplementary figures, and supplementary tables can be found at http://ntumaps.cgm.ntu.edu.tw/eCBSsupplementary.",
author = "Hsu, {Fang Han} and Chen, {Hung I H} and Tsai, {Mong Hsun} and Lai, {Liang Chuan} and Huang, {Chi Cheng} and Tu, {Shih Hsin} and Chuang, {Eric Y.} and Yidong Chen",
year = "2011",
doi = "10.1186/1756-0500-4-394",
language = "English",
volume = "4",
journal = "BMC Research Notes",
issn = "1756-0500",
publisher = "BioMed Central",

}

TY - JOUR

T1 - A model-based circular binary segmentation algorithm for the analysis of array CGH data

AU - Hsu, Fang Han

AU - Chen, Hung I H

AU - Tsai, Mong Hsun

AU - Lai, Liang Chuan

AU - Huang, Chi Cheng

AU - Tu, Shih Hsin

AU - Chuang, Eric Y.

AU - Chen, Yidong

PY - 2011

Y1 - 2011

N2 - Background: Circular Binary Segmentation (CBS) is a permutation-based algorithm for array Comparative Genomic Hybridization (aCGH) data analysis. CBS accurately segments data by detecting change-points using a maximal-t test; but extensive computational burden is involved for evaluating the significance of change-points using permutations. A recent implementation utilizing a hybrid method and early stopping rules (hybrid CBS) to improve the performance in speed was subsequently proposed. However, a time analysis revealed that a major portion of computation time of the hybrid CBS was still spent on permutation. In addition, what the hybrid method provides is an approximation of the significance upper bound or lower bound, not an approximation of the significance of change-points itself. Results: We developed a novel model-based algorithm, extreme-value based CBS (eCBS), which limits permutations and provides robust results without loss of accuracy. Thousands of aCGH data under null hypothesis were simulated in advance based on a variety of non-normal assumptions, and the corresponding maximal-t distribution was modeled by the Generalized Extreme Value (GEV) distribution. The modeling results, which associate characteristics of aCGH data to the GEV parameters, constitute lookup tables (eXtreme model). Using the eXtreme model, the significance of change-points could be evaluated in a constant time complexity through a table lookup process. Conclusions: A novel algorithm, eCBS, was developed in this study. The current implementation of eCBS consistently outperforms the hybrid CBS 4× to 20× in computation time without loss of accuracy. Source codes, supplementary materials, supplementary figures, and supplementary tables can be found at http://ntumaps.cgm.ntu.edu.tw/eCBSsupplementary.

AB - Background: Circular Binary Segmentation (CBS) is a permutation-based algorithm for array Comparative Genomic Hybridization (aCGH) data analysis. CBS accurately segments data by detecting change-points using a maximal-t test; but extensive computational burden is involved for evaluating the significance of change-points using permutations. A recent implementation utilizing a hybrid method and early stopping rules (hybrid CBS) to improve the performance in speed was subsequently proposed. However, a time analysis revealed that a major portion of computation time of the hybrid CBS was still spent on permutation. In addition, what the hybrid method provides is an approximation of the significance upper bound or lower bound, not an approximation of the significance of change-points itself. Results: We developed a novel model-based algorithm, extreme-value based CBS (eCBS), which limits permutations and provides robust results without loss of accuracy. Thousands of aCGH data under null hypothesis were simulated in advance based on a variety of non-normal assumptions, and the corresponding maximal-t distribution was modeled by the Generalized Extreme Value (GEV) distribution. The modeling results, which associate characteristics of aCGH data to the GEV parameters, constitute lookup tables (eXtreme model). Using the eXtreme model, the significance of change-points could be evaluated in a constant time complexity through a table lookup process. Conclusions: A novel algorithm, eCBS, was developed in this study. The current implementation of eCBS consistently outperforms the hybrid CBS 4× to 20× in computation time without loss of accuracy. Source codes, supplementary materials, supplementary figures, and supplementary tables can be found at http://ntumaps.cgm.ntu.edu.tw/eCBSsupplementary.

UR - http://www.scopus.com/inward/record.url?scp=80053616481&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80053616481&partnerID=8YFLogxK

U2 - 10.1186/1756-0500-4-394

DO - 10.1186/1756-0500-4-394

M3 - Article

C2 - 21985277

AN - SCOPUS:80053616481

VL - 4

JO - BMC Research Notes

JF - BMC Research Notes

SN - 1756-0500

M1 - 394

ER -