An improved nonparametric lower bound of species richness via a modified good-turing frequency formula

Chun Huo Chiu, Yi Ting Wang, Bruno A. Walther, Anne Chao

Research output: Contribution to journalArticle

80 Citations (Scopus)

Abstract

It is difficult to accurately estimate species richness if there are many almost undetectable species in a hyper-diverse community. Practically, an accurate lower bound for species richness is preferable to an inaccurate point estimator. The traditional nonparametric lower bound developed by Chao (1984, Scandinavian Journal of Statistics 11, 265-270) for individual-based abundance data uses only the information on the rarest species (the numbers of singletons and doubletons) to estimate the number of undetected species in samples. Applying a modified Good-Turing frequency formula, we derive an approximate formula for the first-order bias of this traditional lower bound. The approximate bias is estimated by using additional information (namely, the numbers of tripletons and quadrupletons). This approximate bias can be corrected, and an improved lower bound is thus obtained. The proposed lower bound is nonparametric in the sense that it is universally valid for any species abundance distribution. A similar type of improved lower bound can be derived for incidence data. We test our proposed lower bounds on simulated data sets generated from various species abundance models. Simulation results show that the proposed lower bounds always reduce bias over the traditional lower bounds and improve accuracy (as measured by mean squared error) when the heterogeneity of species abundances is relatively high. We also apply the proposed new lower bounds to real data for illustration and for comparisons with previously developed estimators.

Original languageEnglish
Pages (from-to)671-682
Number of pages12
JournalBiometrics
Volume70
Issue number3
DOIs
Publication statusPublished - Sep 1 2014

Fingerprint

Species Richness
Turing
Statistics
Lower bound
species diversity
Incidence
statistics
biogeography
incidence
Datasets
Estimator
Inaccurate
Mean Squared Error
Estimate
testing
sampling

Keywords

  • Abundance data
  • Good-Turing frequency formula
  • Incidence data
  • Species richness

ASJC Scopus subject areas

  • Applied Mathematics
  • Statistics and Probability
  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)
  • Medicine(all)

Cite this

An improved nonparametric lower bound of species richness via a modified good-turing frequency formula. / Chiu, Chun Huo; Wang, Yi Ting; Walther, Bruno A.; Chao, Anne.

In: Biometrics, Vol. 70, No. 3, 01.09.2014, p. 671-682.

Research output: Contribution to journalArticle

Chiu, Chun Huo ; Wang, Yi Ting ; Walther, Bruno A. ; Chao, Anne. / An improved nonparametric lower bound of species richness via a modified good-turing frequency formula. In: Biometrics. 2014 ; Vol. 70, No. 3. pp. 671-682.
@article{34d268f80a1d461084517cd6b7776792,
title = "An improved nonparametric lower bound of species richness via a modified good-turing frequency formula",
abstract = "It is difficult to accurately estimate species richness if there are many almost undetectable species in a hyper-diverse community. Practically, an accurate lower bound for species richness is preferable to an inaccurate point estimator. The traditional nonparametric lower bound developed by Chao (1984, Scandinavian Journal of Statistics 11, 265-270) for individual-based abundance data uses only the information on the rarest species (the numbers of singletons and doubletons) to estimate the number of undetected species in samples. Applying a modified Good-Turing frequency formula, we derive an approximate formula for the first-order bias of this traditional lower bound. The approximate bias is estimated by using additional information (namely, the numbers of tripletons and quadrupletons). This approximate bias can be corrected, and an improved lower bound is thus obtained. The proposed lower bound is nonparametric in the sense that it is universally valid for any species abundance distribution. A similar type of improved lower bound can be derived for incidence data. We test our proposed lower bounds on simulated data sets generated from various species abundance models. Simulation results show that the proposed lower bounds always reduce bias over the traditional lower bounds and improve accuracy (as measured by mean squared error) when the heterogeneity of species abundances is relatively high. We also apply the proposed new lower bounds to real data for illustration and for comparisons with previously developed estimators.",
keywords = "Abundance data, Good-Turing frequency formula, Incidence data, Species richness, Abundance data, Good-Turing frequency formula, Incidence data, Species richness",
author = "Chiu, {Chun Huo} and Wang, {Yi Ting} and Walther, {Bruno A.} and Anne Chao",
year = "2014",
month = "9",
day = "1",
doi = "10.1111/biom.12200",
language = "English",
volume = "70",
pages = "671--682",
journal = "Biometrics",
issn = "0006-341X",
publisher = "Wiley-Blackwell",
number = "3",

}

TY - JOUR

T1 - An improved nonparametric lower bound of species richness via a modified good-turing frequency formula

AU - Chiu, Chun Huo

AU - Wang, Yi Ting

AU - Walther, Bruno A.

AU - Chao, Anne

PY - 2014/9/1

Y1 - 2014/9/1

N2 - It is difficult to accurately estimate species richness if there are many almost undetectable species in a hyper-diverse community. Practically, an accurate lower bound for species richness is preferable to an inaccurate point estimator. The traditional nonparametric lower bound developed by Chao (1984, Scandinavian Journal of Statistics 11, 265-270) for individual-based abundance data uses only the information on the rarest species (the numbers of singletons and doubletons) to estimate the number of undetected species in samples. Applying a modified Good-Turing frequency formula, we derive an approximate formula for the first-order bias of this traditional lower bound. The approximate bias is estimated by using additional information (namely, the numbers of tripletons and quadrupletons). This approximate bias can be corrected, and an improved lower bound is thus obtained. The proposed lower bound is nonparametric in the sense that it is universally valid for any species abundance distribution. A similar type of improved lower bound can be derived for incidence data. We test our proposed lower bounds on simulated data sets generated from various species abundance models. Simulation results show that the proposed lower bounds always reduce bias over the traditional lower bounds and improve accuracy (as measured by mean squared error) when the heterogeneity of species abundances is relatively high. We also apply the proposed new lower bounds to real data for illustration and for comparisons with previously developed estimators.

AB - It is difficult to accurately estimate species richness if there are many almost undetectable species in a hyper-diverse community. Practically, an accurate lower bound for species richness is preferable to an inaccurate point estimator. The traditional nonparametric lower bound developed by Chao (1984, Scandinavian Journal of Statistics 11, 265-270) for individual-based abundance data uses only the information on the rarest species (the numbers of singletons and doubletons) to estimate the number of undetected species in samples. Applying a modified Good-Turing frequency formula, we derive an approximate formula for the first-order bias of this traditional lower bound. The approximate bias is estimated by using additional information (namely, the numbers of tripletons and quadrupletons). This approximate bias can be corrected, and an improved lower bound is thus obtained. The proposed lower bound is nonparametric in the sense that it is universally valid for any species abundance distribution. A similar type of improved lower bound can be derived for incidence data. We test our proposed lower bounds on simulated data sets generated from various species abundance models. Simulation results show that the proposed lower bounds always reduce bias over the traditional lower bounds and improve accuracy (as measured by mean squared error) when the heterogeneity of species abundances is relatively high. We also apply the proposed new lower bounds to real data for illustration and for comparisons with previously developed estimators.

KW - Abundance data

KW - Good-Turing frequency formula

KW - Incidence data

KW - Species richness

KW - Abundance data

KW - Good-Turing frequency formula

KW - Incidence data

KW - Species richness

UR - http://www.scopus.com/inward/record.url?scp=84927693389&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84927693389&partnerID=8YFLogxK

U2 - 10.1111/biom.12200

DO - 10.1111/biom.12200

M3 - Article

C2 - 24945937

AN - SCOPUS:84927693389

VL - 70

SP - 671

EP - 682

JO - Biometrics

JF - Biometrics

SN - 0006-341X

IS - 3

ER -