唐美麗 胡瓊 馬廷淮
摘 ?要: 語音識別作為人工智能研究中不可或缺的一部分已經逐漸滲透到人們的日常生活中。針對傳統語音識別方法不能很好地實現并識別復雜多變、非特定人語音的問題,文中提出利用在時間序列上關聯性較強的循環神經網絡(RNN)建立語音識別模型。考慮到語音信號豐富的時頻信息表達,在特征提取環節進行改進,利用具有較好時頻分辨率的小波變換(WT)取代快速傅里葉變換(FFT)作為該模型的輸入;然后,采用隨時間展開的反向傳播算法(BPTT)進行特征學習與訓練。在實驗測試中,首先,對比分析了基于小波變換的特征提取對識別效果的影響;其次,通過與傳統的HMM模型及BP神經網絡的識別率做對比,驗證RNN神經網絡可提高語音識別準確率和穩定性。
關鍵詞: 語音識別; 循環神經網絡; 反向傳播算法; 特征提取; 小波變換; HMM模型; BP神經網絡
中圖分類號: TN912?34; TP391.1 ? ? ? ? ? ? ? 文獻標識碼: A ? ? ? ? ? ? ? ? ? ?文章編號: 1004?373X(2019)14?0152?05
Research on speech recognition based on recurrent neural network
TANG Meili, HU Qiong, MA Tinghuai
(Nanjing University of Information Science & Technology, Nanjing 210044, China)
Abstract: Speech recognition as an indispensable part of artificial intelligence research has gradually penetrated into people's daily live. In allusion to the problems that the traditional method of speech recognition can not properly identify the complex and non?specific speech, establishing a speech recognition model based on recurrent neural network (RNN) with strong correlation in time series is propose in this paper. In consideration of the abundant time?frequency information of speech signal, the feature extraction process is improved, in which the wavelet transform (WT) with better time?frequency resolution is used as the input of the model to replace the fast Fourier transform (FFT). The back propagation time algorithm (BPTT) expanding with time is adopted to conduct the feature learning and training. In the experiment test, the contrastive analysis on the influence of the feature extraction based on wavelet transform on recognition effect was carried out, and the recognition rate of the speech recognition model proposed in this paper was compared with that of the traditional HMM model and BP neural network. By the above measures, the RNN neural network is proved that its accuracy of speech recognition rate and the stability of the recognition are improved to a certain extent.
Keywords: speech recognition; recurrent neural network; back propagation algorithm; feature extraction; wavelet transform; HMM model; BP network
0 ?引 ?言
隨著人工智能的迅猛發展,語音識別作為人機交互的樞紐工具而備受人們青睞,而且已經初步應用于手機、車載系統、搜索引擎、機器人、電子商務等多個領域。語音識別在應用上的蓬勃發展使得對它的研究不斷更新和完善,傳統的模板匹配方法和統計學習方法對語音識別而言已趨成熟甚至出現了瓶頸[1],而利用人工神經網絡進行語音識別因其突出效果而方興未艾。利用人工神經網絡對語音進行學習與處理的優勢在于神經網絡的工作原理模仿了人腦神經元的活動機理,通過各節點連接形成網絡結構再輔之以自適應算法完成識別過程。另一方面神經網絡可映射復雜語音信號之間的非線性關系,對語音序列有強大的學習能力[2?3]。語音信號具有在時間序列上展開以及包含豐富的時頻信息兩個重要特點。傳統聲學模型雖然分析了各語音音子的內部狀態,但忽略了音子與音子之間相互影響的關系;而常用的人工神經網絡雖然強調了語言音子之間的聯系,但內部狀態之間沒有形成全連接而是以層與層的形式連接。鑒于以上方法的缺點,本文采用能彌補以上缺陷的循環神經網絡進行語音識別的研究。