SQUAT: A Sequencing Quality Assessment Tool for data quality assessments of genome assemblies

Li An Yang, Yu Jung Chang, Shu Hwa Chen, Chung Yen Lin, Jan Ming Ho

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Background: With the rapid increase in genome sequencing projects for non-model organisms, numerous genome assemblies are currently in progress or available as drafts, but not made available as satisfactory, usable genomes. Data quality assessment of genome assemblies is gaining importance not only for people who perform the assembly/re-assembly processes, but also for those who attempt to use assemblies as maps in downstream analyses. Recent studies of the quality control, quality evaluation/ assessment of genome assemblies have focused on either quality control of reads before assemblies or evaluation of the assemblies with respect to their contiguity and correctness. However, correctness assessment depends on a reference and is not applicable for de novo assembly projects. Hence, development of methods providing both post-assembly and pre-assembly quality assessment reports for examining the quality/correctness of de novo assemblies and the input reads is worth studying. Results: We present SQUAT, an efficient tool for both pre-assembly and post-assembly quality assessment of de novo genome assemblies. The pre-assembly module of SQUAT computes quality statistics of reads and presents the analysis in a well-designed interface to visualize the distribution of high- and poor-quality reads in a portable HTML report. The post-assembly module of SQUAT provides read mapping analytics in an HTML format. We categorized reads into several groups including uniquely mapped reads, multiply mapped, unmapped reads; for uniquely mapped reads, we further categorized them into perfectly matched, with substitutions, containing clips, and the others. We carefully defined the poorly mapped (PM) reads into several groups to prevent the underestimation of unmapped reads; indeed, a high PM% would be a sign of a poor assembly that requires researchers' attention for further examination or improvements before using the assembly. Finally, we evaluate SQUAT with six datasets, including the genome assemblies for eel, worm, mushroom, and three bacteria. The results show that SQUAT reports provide useful information with details for assessing the quality of assemblies and reads. Availability: The SQUAT software with links to both its docker image and the on-line manual is freely available at https://github.com/luke831215/SQUAT.

Original languageEnglish
Article number238
JournalBMC Genomics
Volume19
DOIs
Publication statusPublished - Apr 18 2019
Externally publishedYes

Fingerprint

Genome
Quality Control
Eels
Agaricales
Data Accuracy
Surgical Instruments
Software
Research Personnel
Bacteria

Keywords

  • Data quality assessment
  • Genome assembly
  • Genome sequencing
  • Non-model organisms

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Cite this

SQUAT : A Sequencing Quality Assessment Tool for data quality assessments of genome assemblies. / Yang, Li An; Chang, Yu Jung; Chen, Shu Hwa; Lin, Chung Yen; Ho, Jan Ming.

In: BMC Genomics, Vol. 19, 238, 18.04.2019.

Research output: Contribution to journalArticle

Yang, Li An ; Chang, Yu Jung ; Chen, Shu Hwa ; Lin, Chung Yen ; Ho, Jan Ming. / SQUAT : A Sequencing Quality Assessment Tool for data quality assessments of genome assemblies. In: BMC Genomics. 2019 ; Vol. 19.
@article{809c1afed00b41a1a6ac66d111a9bbf3,
title = "SQUAT: A Sequencing Quality Assessment Tool for data quality assessments of genome assemblies",
abstract = "Background: With the rapid increase in genome sequencing projects for non-model organisms, numerous genome assemblies are currently in progress or available as drafts, but not made available as satisfactory, usable genomes. Data quality assessment of genome assemblies is gaining importance not only for people who perform the assembly/re-assembly processes, but also for those who attempt to use assemblies as maps in downstream analyses. Recent studies of the quality control, quality evaluation/ assessment of genome assemblies have focused on either quality control of reads before assemblies or evaluation of the assemblies with respect to their contiguity and correctness. However, correctness assessment depends on a reference and is not applicable for de novo assembly projects. Hence, development of methods providing both post-assembly and pre-assembly quality assessment reports for examining the quality/correctness of de novo assemblies and the input reads is worth studying. Results: We present SQUAT, an efficient tool for both pre-assembly and post-assembly quality assessment of de novo genome assemblies. The pre-assembly module of SQUAT computes quality statistics of reads and presents the analysis in a well-designed interface to visualize the distribution of high- and poor-quality reads in a portable HTML report. The post-assembly module of SQUAT provides read mapping analytics in an HTML format. We categorized reads into several groups including uniquely mapped reads, multiply mapped, unmapped reads; for uniquely mapped reads, we further categorized them into perfectly matched, with substitutions, containing clips, and the others. We carefully defined the poorly mapped (PM) reads into several groups to prevent the underestimation of unmapped reads; indeed, a high PM{\%} would be a sign of a poor assembly that requires researchers' attention for further examination or improvements before using the assembly. Finally, we evaluate SQUAT with six datasets, including the genome assemblies for eel, worm, mushroom, and three bacteria. The results show that SQUAT reports provide useful information with details for assessing the quality of assemblies and reads. Availability: The SQUAT software with links to both its docker image and the on-line manual is freely available at https://github.com/luke831215/SQUAT.",
keywords = "Data quality assessment, Genome assembly, Genome sequencing, Non-model organisms",
author = "Yang, {Li An} and Chang, {Yu Jung} and Chen, {Shu Hwa} and Lin, {Chung Yen} and Ho, {Jan Ming}",
year = "2019",
month = "4",
day = "18",
doi = "10.1186/s12864-019-5445-3",
language = "English",
volume = "19",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central Ltd.",

}

TY - JOUR

T1 - SQUAT

T2 - A Sequencing Quality Assessment Tool for data quality assessments of genome assemblies

AU - Yang, Li An

AU - Chang, Yu Jung

AU - Chen, Shu Hwa

AU - Lin, Chung Yen

AU - Ho, Jan Ming

PY - 2019/4/18

Y1 - 2019/4/18

N2 - Background: With the rapid increase in genome sequencing projects for non-model organisms, numerous genome assemblies are currently in progress or available as drafts, but not made available as satisfactory, usable genomes. Data quality assessment of genome assemblies is gaining importance not only for people who perform the assembly/re-assembly processes, but also for those who attempt to use assemblies as maps in downstream analyses. Recent studies of the quality control, quality evaluation/ assessment of genome assemblies have focused on either quality control of reads before assemblies or evaluation of the assemblies with respect to their contiguity and correctness. However, correctness assessment depends on a reference and is not applicable for de novo assembly projects. Hence, development of methods providing both post-assembly and pre-assembly quality assessment reports for examining the quality/correctness of de novo assemblies and the input reads is worth studying. Results: We present SQUAT, an efficient tool for both pre-assembly and post-assembly quality assessment of de novo genome assemblies. The pre-assembly module of SQUAT computes quality statistics of reads and presents the analysis in a well-designed interface to visualize the distribution of high- and poor-quality reads in a portable HTML report. The post-assembly module of SQUAT provides read mapping analytics in an HTML format. We categorized reads into several groups including uniquely mapped reads, multiply mapped, unmapped reads; for uniquely mapped reads, we further categorized them into perfectly matched, with substitutions, containing clips, and the others. We carefully defined the poorly mapped (PM) reads into several groups to prevent the underestimation of unmapped reads; indeed, a high PM% would be a sign of a poor assembly that requires researchers' attention for further examination or improvements before using the assembly. Finally, we evaluate SQUAT with six datasets, including the genome assemblies for eel, worm, mushroom, and three bacteria. The results show that SQUAT reports provide useful information with details for assessing the quality of assemblies and reads. Availability: The SQUAT software with links to both its docker image and the on-line manual is freely available at https://github.com/luke831215/SQUAT.

AB - Background: With the rapid increase in genome sequencing projects for non-model organisms, numerous genome assemblies are currently in progress or available as drafts, but not made available as satisfactory, usable genomes. Data quality assessment of genome assemblies is gaining importance not only for people who perform the assembly/re-assembly processes, but also for those who attempt to use assemblies as maps in downstream analyses. Recent studies of the quality control, quality evaluation/ assessment of genome assemblies have focused on either quality control of reads before assemblies or evaluation of the assemblies with respect to their contiguity and correctness. However, correctness assessment depends on a reference and is not applicable for de novo assembly projects. Hence, development of methods providing both post-assembly and pre-assembly quality assessment reports for examining the quality/correctness of de novo assemblies and the input reads is worth studying. Results: We present SQUAT, an efficient tool for both pre-assembly and post-assembly quality assessment of de novo genome assemblies. The pre-assembly module of SQUAT computes quality statistics of reads and presents the analysis in a well-designed interface to visualize the distribution of high- and poor-quality reads in a portable HTML report. The post-assembly module of SQUAT provides read mapping analytics in an HTML format. We categorized reads into several groups including uniquely mapped reads, multiply mapped, unmapped reads; for uniquely mapped reads, we further categorized them into perfectly matched, with substitutions, containing clips, and the others. We carefully defined the poorly mapped (PM) reads into several groups to prevent the underestimation of unmapped reads; indeed, a high PM% would be a sign of a poor assembly that requires researchers' attention for further examination or improvements before using the assembly. Finally, we evaluate SQUAT with six datasets, including the genome assemblies for eel, worm, mushroom, and three bacteria. The results show that SQUAT reports provide useful information with details for assessing the quality of assemblies and reads. Availability: The SQUAT software with links to both its docker image and the on-line manual is freely available at https://github.com/luke831215/SQUAT.

KW - Data quality assessment

KW - Genome assembly

KW - Genome sequencing

KW - Non-model organisms

UR - http://www.scopus.com/inward/record.url?scp=85064510806&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064510806&partnerID=8YFLogxK

U2 - 10.1186/s12864-019-5445-3

DO - 10.1186/s12864-019-5445-3

M3 - Article

C2 - 30999844

AN - SCOPUS:85064510806

VL - 19

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

M1 - 238

ER -