摘 要:通過對(duì)特征提取模塊2個(gè)重要部分:端點(diǎn)檢測(cè)和線性預(yù)測(cè)倒譜(LPCC)相關(guān)原理的介紹分析,闡述了一種以線性預(yù)測(cè)倒譜(LPCC)為基礎(chǔ),進(jìn)行特征提取的孤立詞語音識(shí)別的具體實(shí)現(xiàn)方法,并對(duì)該方法所描述的系統(tǒng)進(jìn)行了軟件建模。通過分析研究,給出了提高識(shí)別率的具體改進(jìn)方案。最后使用Matlab軟件對(duì)相關(guān)方法及結(jié)論進(jìn)行了驗(yàn)證,表明該方法確實(shí)在傳統(tǒng)方法的基礎(chǔ)上提高了識(shí)別率,且速度較快,具有實(shí)用性和良好的硬件可移植性,并討論了它在一些關(guān)鍵環(huán)節(jié)的未來實(shí)現(xiàn)及改進(jìn)方向。關(guān)鍵詞:語音識(shí)別; 特征提取; LPCC; Matlab
中圖分類號(hào):TN912.3-34文獻(xiàn)標(biāo)識(shí)碼:A
文章編號(hào):1004-373X(2010)16-0109-04
Realization and Improvement of Isolated Word Phonetic Recognition
LIU Li-yuan, YAN Jia-ming
(School of Electronic Information, Northwestern Polytechnical University, Xi’an 710129, China)
Abstract: An implementation method of the isolated word speech recognition with feature extraction based on the linear prediction cepstrum (LPCC) is elaborated by the analysis of the relevant principles of two important parts (the endpoint detection and LPCC) of the feature extraction module. The software modeling of the system which is described by the method is carried out. A specific improvement program to improve the recognition rate is given through the analysis. carried on the confirmation for the relevant method and conclusion are demonstrated with Matlab software. The demonstration shows that the method can raise the recognition rate indeed based on the traditional method, and has the characteristics of high-speed recognition, good practicability and hardware portability. The direction of the future implementation and improvement in some key links is discussed for the method.Keywords: phonetic recognization; feature extraction; LPCC; Matlab
收稿日期:2010-03-30
語音識(shí)別是機(jī)器通過識(shí)別和理解過程把語音信號(hào)轉(zhuǎn)變?yōu)橄鄳?yīng)的文本文件或命令的技術(shù),而特征提取階段是其至關(guān)重要的一步。特征參數(shù)值選取的適當(dāng)與否,提取時(shí)篩選的合適與否,直接影響識(shí)別正確率的高低。此階段主要包括2個(gè)方面:端點(diǎn)檢測(cè)和特征參數(shù)提取。
1 端點(diǎn)檢測(cè)
在語音識(shí)別系統(tǒng)中,語音信號(hào)是由語音、靜音和背景噪音混合而成的,在其中提取語音,準(zhǔn)確地確定語音的起始點(diǎn)被稱之為端點(diǎn)檢測(cè)。端點(diǎn)檢測(cè)的作用有以下幾個(gè)方面[1]:檢測(cè)每幀信號(hào)是語音,還是背景噪聲;減少識(shí)別器的數(shù)據(jù)處理量;許多噪聲中的語音識(shí)別算法需要估計(jì)噪聲的頻譜特性。
當(dāng)前方法中實(shí)用且普遍的應(yīng)屬雙門限檢測(cè)。雙門限端點(diǎn)檢測(cè)法是在短時(shí)能量檢測(cè)方法的基礎(chǔ)上,加上短時(shí)平均過零率,利用兩者作為特征來進(jìn)行檢測(cè)。……