MaxBin: An automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm

Yu Wei Wu, Yung Hsu Tang, Susannah G. Tringe, Blake A. Simmons, Steven W. Singer

研究成果: 雜誌貢獻文章

173 引文 (Scopus)

摘要

Background: Recovering individual genomes from metagenomic datasets allows access to uncultivated microbial populations that may have important roles in natural and engineered ecosystems. Understanding the roles of these uncultivated populations has broad application in ecology, evolution, biotechnology and medicine. Accurate binning of assembled metagenomic sequences is an essential step in recovering the genomes and understanding microbial functions. Results: We have developed a binning algorithm, MaxBin, which automates the binning of assembled metagenomic scaffolds using an expectation-maximization algorithm after the assembly of metagenomic sequencing reads. Binning of simulated metagenomic datasets demonstrated that MaxBin had high levels of accuracy in binning microbial genomes. MaxBin was used to recover genomes from metagenomic data obtained through the Human Microbiome Project, which demonstrated its ability to recover genomes from real metagenomic datasets with variable sequencing coverages. Application of MaxBin to metagenomes obtained from microbial consortia adapted to grow on cellulose allowed genomic analysis of new, uncultivated, cellulolytic bacterial populations, including an abundant myxobacterial population distantly related to Sorangium cellulosum that possessed a much smaller genome (5 MB versus 13 to 14 MB) but has a more extensive set of genes for biomass deconstruction. For the cellulolytic consortia, the MaxBin results were compared to binning using emergent self-organizing maps (ESOMs) and differential coverage binning, demonstrating that it performed comparably to these methods but had distinct advantages in automation, resolution of related genomes and sensitivity. Conclusions: The automatic binning software that we developed successfully classifies assembled sequences in metagenomic datasets into recovered individual genomes. The isolation of dozens of species in cellulolytic microbial consortia, including a novel species of myxobacteria that has the smallest genome among all sequenced aerobic myxobacteria, was easily achieved using the binning software. This work demonstrates that the processes required for recovering genomes from assembled metagenomic datasets can be readily automated, an important advance in understanding the metabolic potential of microbes in natural environments.
原文英語
文章編號26
期刊Microbiome
2
發行號1
DOIs
出版狀態已發佈 - 三月 4 2014
對外發佈Yes

指紋

Metagenome
Metagenomics
Genome
Myxococcales
Microbial Consortia
Microbial Genome
Population
Software
Automation
Microbiota
Biotechnology
Ecology
Cellulose
Biomass
Ecosystem
Medicine
Datasets

ASJC Scopus subject areas

  • Microbiology
  • Microbiology (medical)

引用此文

MaxBin : An automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. / Wu, Yu Wei; Tang, Yung Hsu; Tringe, Susannah G.; Simmons, Blake A.; Singer, Steven W.

於: Microbiome, 卷 2, 編號 1, 26, 04.03.2014.

研究成果: 雜誌貢獻文章

Wu, Yu Wei ; Tang, Yung Hsu ; Tringe, Susannah G. ; Simmons, Blake A. ; Singer, Steven W. / MaxBin : An automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. 於: Microbiome. 2014 ; 卷 2, 編號 1.
@article{0f9d5d61392c45f6bf8084bbb9a00190,
title = "MaxBin: An automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm",
abstract = "Background: Recovering individual genomes from metagenomic datasets allows access to uncultivated microbial populations that may have important roles in natural and engineered ecosystems. Understanding the roles of these uncultivated populations has broad application in ecology, evolution, biotechnology and medicine. Accurate binning of assembled metagenomic sequences is an essential step in recovering the genomes and understanding microbial functions. Results: We have developed a binning algorithm, MaxBin, which automates the binning of assembled metagenomic scaffolds using an expectation-maximization algorithm after the assembly of metagenomic sequencing reads. Binning of simulated metagenomic datasets demonstrated that MaxBin had high levels of accuracy in binning microbial genomes. MaxBin was used to recover genomes from metagenomic data obtained through the Human Microbiome Project, which demonstrated its ability to recover genomes from real metagenomic datasets with variable sequencing coverages. Application of MaxBin to metagenomes obtained from microbial consortia adapted to grow on cellulose allowed genomic analysis of new, uncultivated, cellulolytic bacterial populations, including an abundant myxobacterial population distantly related to Sorangium cellulosum that possessed a much smaller genome (5 MB versus 13 to 14 MB) but has a more extensive set of genes for biomass deconstruction. For the cellulolytic consortia, the MaxBin results were compared to binning using emergent self-organizing maps (ESOMs) and differential coverage binning, demonstrating that it performed comparably to these methods but had distinct advantages in automation, resolution of related genomes and sensitivity. Conclusions: The automatic binning software that we developed successfully classifies assembled sequences in metagenomic datasets into recovered individual genomes. The isolation of dozens of species in cellulolytic microbial consortia, including a novel species of myxobacteria that has the smallest genome among all sequenced aerobic myxobacteria, was easily achieved using the binning software. This work demonstrates that the processes required for recovering genomes from assembled metagenomic datasets can be readily automated, an important advance in understanding the metabolic potential of microbes in natural environments.",
keywords = "Binning, Expectation-maximization algorithm, Metagenomics",
author = "Wu, {Yu Wei} and Tang, {Yung Hsu} and Tringe, {Susannah G.} and Simmons, {Blake A.} and Singer, {Steven W.}",
year = "2014",
month = "3",
day = "4",
doi = "10.1186/2049-2618-2-26",
language = "English",
volume = "2",
journal = "Microbiome",
issn = "2049-2618",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - MaxBin

T2 - An automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm

AU - Wu, Yu Wei

AU - Tang, Yung Hsu

AU - Tringe, Susannah G.

AU - Simmons, Blake A.

AU - Singer, Steven W.

PY - 2014/3/4

Y1 - 2014/3/4

N2 - Background: Recovering individual genomes from metagenomic datasets allows access to uncultivated microbial populations that may have important roles in natural and engineered ecosystems. Understanding the roles of these uncultivated populations has broad application in ecology, evolution, biotechnology and medicine. Accurate binning of assembled metagenomic sequences is an essential step in recovering the genomes and understanding microbial functions. Results: We have developed a binning algorithm, MaxBin, which automates the binning of assembled metagenomic scaffolds using an expectation-maximization algorithm after the assembly of metagenomic sequencing reads. Binning of simulated metagenomic datasets demonstrated that MaxBin had high levels of accuracy in binning microbial genomes. MaxBin was used to recover genomes from metagenomic data obtained through the Human Microbiome Project, which demonstrated its ability to recover genomes from real metagenomic datasets with variable sequencing coverages. Application of MaxBin to metagenomes obtained from microbial consortia adapted to grow on cellulose allowed genomic analysis of new, uncultivated, cellulolytic bacterial populations, including an abundant myxobacterial population distantly related to Sorangium cellulosum that possessed a much smaller genome (5 MB versus 13 to 14 MB) but has a more extensive set of genes for biomass deconstruction. For the cellulolytic consortia, the MaxBin results were compared to binning using emergent self-organizing maps (ESOMs) and differential coverage binning, demonstrating that it performed comparably to these methods but had distinct advantages in automation, resolution of related genomes and sensitivity. Conclusions: The automatic binning software that we developed successfully classifies assembled sequences in metagenomic datasets into recovered individual genomes. The isolation of dozens of species in cellulolytic microbial consortia, including a novel species of myxobacteria that has the smallest genome among all sequenced aerobic myxobacteria, was easily achieved using the binning software. This work demonstrates that the processes required for recovering genomes from assembled metagenomic datasets can be readily automated, an important advance in understanding the metabolic potential of microbes in natural environments.

AB - Background: Recovering individual genomes from metagenomic datasets allows access to uncultivated microbial populations that may have important roles in natural and engineered ecosystems. Understanding the roles of these uncultivated populations has broad application in ecology, evolution, biotechnology and medicine. Accurate binning of assembled metagenomic sequences is an essential step in recovering the genomes and understanding microbial functions. Results: We have developed a binning algorithm, MaxBin, which automates the binning of assembled metagenomic scaffolds using an expectation-maximization algorithm after the assembly of metagenomic sequencing reads. Binning of simulated metagenomic datasets demonstrated that MaxBin had high levels of accuracy in binning microbial genomes. MaxBin was used to recover genomes from metagenomic data obtained through the Human Microbiome Project, which demonstrated its ability to recover genomes from real metagenomic datasets with variable sequencing coverages. Application of MaxBin to metagenomes obtained from microbial consortia adapted to grow on cellulose allowed genomic analysis of new, uncultivated, cellulolytic bacterial populations, including an abundant myxobacterial population distantly related to Sorangium cellulosum that possessed a much smaller genome (5 MB versus 13 to 14 MB) but has a more extensive set of genes for biomass deconstruction. For the cellulolytic consortia, the MaxBin results were compared to binning using emergent self-organizing maps (ESOMs) and differential coverage binning, demonstrating that it performed comparably to these methods but had distinct advantages in automation, resolution of related genomes and sensitivity. Conclusions: The automatic binning software that we developed successfully classifies assembled sequences in metagenomic datasets into recovered individual genomes. The isolation of dozens of species in cellulolytic microbial consortia, including a novel species of myxobacteria that has the smallest genome among all sequenced aerobic myxobacteria, was easily achieved using the binning software. This work demonstrates that the processes required for recovering genomes from assembled metagenomic datasets can be readily automated, an important advance in understanding the metabolic potential of microbes in natural environments.

KW - Binning

KW - Expectation-maximization algorithm

KW - Metagenomics

UR - http://www.scopus.com/inward/record.url?scp=84925636192&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84925636192&partnerID=8YFLogxK

U2 - 10.1186/2049-2618-2-26

DO - 10.1186/2049-2618-2-26

M3 - Article

AN - SCOPUS:84925636192

VL - 2

JO - Microbiome

JF - Microbiome

SN - 2049-2618

IS - 1

M1 - 26

ER -