Improved speech modeling and recognition using multi-dimensional articulatory states as primitive speech units

L. Deng, J. Wu, H. Sameti

Research output: Contribution to journalConference article

3 Citations (Scopus)

Abstract

In this paper we provide a formal description of a speech recognizer designed on the basis of elaborate articulatory timing that is asynchronous across the multiple articulatory-feature dimensions. Three recently improved critical components of the recognizer are described in detail. Evaluation results, obtained from a standard TIMIT phonetic recognition task confined within the N-best rescoring scenario, are reported on comparative performances between the new feature-based recognizer and a recognizer using the conventional context-dependent triphone units. The results demonstrate an overall superior quality of the rescored N-best list from the feature-based recognizer over that from the triphone-based recognizer. Greater performance improvements are observed as the top number of candidate sentences increases.

Original languageEnglish
Pages (from-to)385-388
Number of pages4
JournalICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume1
Publication statusPublished - Jan 1 1995
Externally publishedYes
EventProceedings of the 1995 20th International Conference on Acoustics, Speech, and Signal Processing. Part 1 (of 5) - Detroit, MI, USA
Duration: May 9 1995May 12 1995

Fingerprint

Speech analysis

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

@article{5073ceb93a7b4c3999faebac282c09c9,
title = "Improved speech modeling and recognition using multi-dimensional articulatory states as primitive speech units",
abstract = "In this paper we provide a formal description of a speech recognizer designed on the basis of elaborate articulatory timing that is asynchronous across the multiple articulatory-feature dimensions. Three recently improved critical components of the recognizer are described in detail. Evaluation results, obtained from a standard TIMIT phonetic recognition task confined within the N-best rescoring scenario, are reported on comparative performances between the new feature-based recognizer and a recognizer using the conventional context-dependent triphone units. The results demonstrate an overall superior quality of the rescored N-best list from the feature-based recognizer over that from the triphone-based recognizer. Greater performance improvements are observed as the top number of candidate sentences increases.",
author = "L. Deng and J. Wu and H. Sameti",
year = "1995",
month = "1",
day = "1",
language = "English",
volume = "1",
pages = "385--388",
journal = "Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing",
issn = "0736-7791",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Improved speech modeling and recognition using multi-dimensional articulatory states as primitive speech units

AU - Deng, L.

AU - Wu, J.

AU - Sameti, H.

PY - 1995/1/1

Y1 - 1995/1/1

N2 - In this paper we provide a formal description of a speech recognizer designed on the basis of elaborate articulatory timing that is asynchronous across the multiple articulatory-feature dimensions. Three recently improved critical components of the recognizer are described in detail. Evaluation results, obtained from a standard TIMIT phonetic recognition task confined within the N-best rescoring scenario, are reported on comparative performances between the new feature-based recognizer and a recognizer using the conventional context-dependent triphone units. The results demonstrate an overall superior quality of the rescored N-best list from the feature-based recognizer over that from the triphone-based recognizer. Greater performance improvements are observed as the top number of candidate sentences increases.

AB - In this paper we provide a formal description of a speech recognizer designed on the basis of elaborate articulatory timing that is asynchronous across the multiple articulatory-feature dimensions. Three recently improved critical components of the recognizer are described in detail. Evaluation results, obtained from a standard TIMIT phonetic recognition task confined within the N-best rescoring scenario, are reported on comparative performances between the new feature-based recognizer and a recognizer using the conventional context-dependent triphone units. The results demonstrate an overall superior quality of the rescored N-best list from the feature-based recognizer over that from the triphone-based recognizer. Greater performance improvements are observed as the top number of candidate sentences increases.

UR - http://www.scopus.com/inward/record.url?scp=0028996936&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0028996936&partnerID=8YFLogxK

M3 - Conference article

VL - 1

SP - 385

EP - 388

JO - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

JF - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing

SN - 0736-7791

ER -