時域上基頻軌跡演算法的改良與探討

Translated title of the contribution: On the Modified Algorithm of Pitch Contour Detection in Time Domain

吳俊甫, 張國清, 李宗寶, 江翠蓮, 黃湘玲

Research output: Contribution to journalArticle

Abstract

Chinese is a tonal language. The difference between tones can be determined by pitch contour. During the process of extracting pitch contour, the situation that a pitch's occurrence at half-frequency or at double-frequency constantly happens, which lead to a pitch contour's discontinuousness and mistakes in its tone recognition. Based on the traditional methods (Auto-Correlation Function (ACF), Average Magnitude Difference Function (AMDF) and Correlation Function (CF)), this study aims to improve the deficiency of pitch contour detection. We propose a modified method to reduce the impact of noise in speech and possibly find the precise fundamental frequency for each extracted signal. We use the methods of clustering and linear regression model to achieve correction and smoothness for the pitch at half frequency and double frequency. The test corpus consists of a total of 1331 Chinese words from tone 1 to tone 4, excluding tone 5. From the experimental results, compared to traditional methods, the modified method contributes to the higher recognition rate by about 20% (the highest achieved recognition rate is 95.54%). Meanwhile, the recognition rate of this study is higher than the maximum recognition rate up to 95.03%, which adopts the Unbroken Pitch Determination Using Dynamic Programming (UPDUDP); though the difference between these two rates is not remarkable. The modified method decreases the error rate of the pitch at half frequency or double frequency about 3%, compared to the one adopting the UPDUDP. In terms of tone 1, our modified method only has 0.1% error rate, which is far lower than the error rate of pitch at double frequency using UPDUDP is 8.28%. Thus, our method is proved to effectively improve the detection of pitch contour.
Original languageTraditional Chinese
Pages (from-to)129-165
Number of pages37
JournalJournal of Data Analysis
Volume7
Issue number6
DOIs
Publication statusPublished - 2012
Externally publishedYes

Fingerprint

Dynamic programming
Autocorrelation
Linear regression

Keywords

  • Tone language
  • Pitch contour
  • Frame
  • ACF
  • AMDF
  • CF
  • UPDUDP

Cite this

時域上基頻軌跡演算法的改良與探討. / 吳俊甫; 張國清; 李宗寶; 江翠蓮; 黃湘玲.

In: Journal of Data Analysis, Vol. 7, No. 6, 2012, p. 129-165.

Research output: Contribution to journalArticle

吳俊甫 ; 張國清 ; 李宗寶 ; 江翠蓮 ; 黃湘玲. / 時域上基頻軌跡演算法的改良與探討. In: Journal of Data Analysis. 2012 ; Vol. 7, No. 6. pp. 129-165.
@article{9a0f0ad37e124ce28bcf74f808379279,
title = "時域上基頻軌跡演算法的改良與探討",
abstract = "中文是一種聲調語言,不同聲調之間的差異,可以由基頻軌跡來決定。在擷取基頻軌跡的過程中,常常會擷取到半頻或倍頻的情況,造成基頻軌跡不連續,以致在聲調辨識上的錯誤。本論文將以傳統方法(Auto-Correlation Function (ACF)、Average Magnitude Difference Function (AMDF)與Correlation Function (CF))作為基礎,提出一種時域上基頻軌跡的演算法來改良中文之聲調辨識。其方法主要是利用解強調與一階差分兩種濾波器,使語音訊號之波形能夠更具有週期性,來降低語音受到雜訊的影響,並且利用半頻、基頻與倍頻之間的頻率特性,來擷取出音框最有可能的基頻值,再利用分群的方式以及線性迴歸的方法,對音框做修正與平滑的動作。最後以本實驗室所錄製語料做測試,語料內容為中文聲調一到四聲,不考慮輕聲,共1331個中文單字。由實驗結果發現,本論文之改良方法在中文之聲調辨識上,最高可達95.54{\%}的辨識率,比傳統方法之辨識率提高約兩成,也比UPDUDP方法之最高辨識率95.03{\%}還高,雖然兩者辨識率差異不大,但是在半頻與倍頻的錯誤率,UPDUDP為3.04{\%}比本論文之改良方法(錯誤率為0.19{\%})高出約3{\%}的錯誤率,尤其是中文聲調為一聲時,UPDUDP之倍頻錯誤率高達8.28{\%},而本論文則為0.1{\%},因此本論文之改良方法能夠有效的改善中文聲調之辨識。",
keywords = "聲調語言, 基頻軌跡, 音框, Tone language, Pitch contour, Frame, ACF, AMDF, CF, UPDUDP",
author = "吳俊甫 and 張國清 and 李宗寶 and 江翠蓮 and 黃湘玲",
year = "2012",
doi = "10.6338/JDA.201212_7(6).0007",
language = "繁體中文",
volume = "7",
pages = "129--165",
journal = "Journal of Data Analysis",
issn = "1819-2343",
publisher = "中華資料採礦協會",
number = "6",

}

TY - JOUR

T1 - 時域上基頻軌跡演算法的改良與探討

AU - 吳俊甫, null

AU - 張國清, null

AU - 李宗寶, null

AU - 江翠蓮, null

AU - 黃湘玲, null

PY - 2012

Y1 - 2012

N2 - 中文是一種聲調語言,不同聲調之間的差異,可以由基頻軌跡來決定。在擷取基頻軌跡的過程中,常常會擷取到半頻或倍頻的情況,造成基頻軌跡不連續,以致在聲調辨識上的錯誤。本論文將以傳統方法(Auto-Correlation Function (ACF)、Average Magnitude Difference Function (AMDF)與Correlation Function (CF))作為基礎,提出一種時域上基頻軌跡的演算法來改良中文之聲調辨識。其方法主要是利用解強調與一階差分兩種濾波器,使語音訊號之波形能夠更具有週期性,來降低語音受到雜訊的影響,並且利用半頻、基頻與倍頻之間的頻率特性,來擷取出音框最有可能的基頻值,再利用分群的方式以及線性迴歸的方法,對音框做修正與平滑的動作。最後以本實驗室所錄製語料做測試,語料內容為中文聲調一到四聲,不考慮輕聲,共1331個中文單字。由實驗結果發現,本論文之改良方法在中文之聲調辨識上,最高可達95.54%的辨識率,比傳統方法之辨識率提高約兩成,也比UPDUDP方法之最高辨識率95.03%還高,雖然兩者辨識率差異不大,但是在半頻與倍頻的錯誤率,UPDUDP為3.04%比本論文之改良方法(錯誤率為0.19%)高出約3%的錯誤率,尤其是中文聲調為一聲時,UPDUDP之倍頻錯誤率高達8.28%,而本論文則為0.1%,因此本論文之改良方法能夠有效的改善中文聲調之辨識。

AB - 中文是一種聲調語言,不同聲調之間的差異,可以由基頻軌跡來決定。在擷取基頻軌跡的過程中,常常會擷取到半頻或倍頻的情況,造成基頻軌跡不連續,以致在聲調辨識上的錯誤。本論文將以傳統方法(Auto-Correlation Function (ACF)、Average Magnitude Difference Function (AMDF)與Correlation Function (CF))作為基礎,提出一種時域上基頻軌跡的演算法來改良中文之聲調辨識。其方法主要是利用解強調與一階差分兩種濾波器,使語音訊號之波形能夠更具有週期性,來降低語音受到雜訊的影響,並且利用半頻、基頻與倍頻之間的頻率特性,來擷取出音框最有可能的基頻值,再利用分群的方式以及線性迴歸的方法,對音框做修正與平滑的動作。最後以本實驗室所錄製語料做測試,語料內容為中文聲調一到四聲,不考慮輕聲,共1331個中文單字。由實驗結果發現,本論文之改良方法在中文之聲調辨識上,最高可達95.54%的辨識率,比傳統方法之辨識率提高約兩成,也比UPDUDP方法之最高辨識率95.03%還高,雖然兩者辨識率差異不大,但是在半頻與倍頻的錯誤率,UPDUDP為3.04%比本論文之改良方法(錯誤率為0.19%)高出約3%的錯誤率,尤其是中文聲調為一聲時,UPDUDP之倍頻錯誤率高達8.28%,而本論文則為0.1%,因此本論文之改良方法能夠有效的改善中文聲調之辨識。

KW - 聲調語言

KW - 基頻軌跡

KW - 音框

KW - Tone language

KW - Pitch contour

KW - Frame

KW - ACF

KW - AMDF

KW - CF

KW - UPDUDP

U2 - 10.6338/JDA.201212_7(6).0007

DO - 10.6338/JDA.201212_7(6).0007

M3 - 文章

VL - 7

SP - 129

EP - 165

JO - Journal of Data Analysis

JF - Journal of Data Analysis

SN - 1819-2343

IS - 6

ER -