[논문]Recognition Performance Improvement of Unsupervised Limabeam Algorithm using Post Filtering Technique

Nguyen, Dinh Cuong; Choi, Suk-Nam; Chung, Hyun-Yeol

doi:10.14372/iemek.2013.8.4.185

Recognition Performance Improvement of Unsupervised Limabeam Algorithm using Post Filtering Technique 원문보기

대한임베디드공학회논문지 = IEMEK Journal of embedded systems and applications, v.8 no.4, 2013년, pp.185 - 194

Nguyen, Dinh Cuong (Department of information and communication engineering, Yeungnam university) , Choi, Suk-Nam (Department of information and communication engineering, Yeungnam university) , Chung, Hyun-Yeol (Department of the information and communication, Yeungnam University)

Abstract ▼ AI-Helper

Abstract- In distant-talking environments, speech recognition performance degrades significantly due to noise and reverberation. Recent work of Michael L. Selzer shows that in microphone array speech recognition, the word error rate can be significantly reduced by adapting the beamformer weights to generate a sequence of features which maximizes the likelihood of the correct hypothesis. In this approach, called Likelihood Maximizing Beamforming algorithm (Limabeam), one of the method to implement this Limabeam is an UnSupervised Limabeam(USL) that can improve recognition performance in any situation of environment. From our investigation for this USL, we could see that because the performance of optimization depends strongly on the transcription output of the first recognition step, the output become unstable and this may lead lower performance. In order to improve recognition performance of USL, some post-filter techniques can be employed to obtain more correct transcription output of the first step. In this work, as a post-filtering technique for first recognition step of USL, we propose to add a Wiener-Filter combined with Feature Weighted Malahanobis Distance to improve recognition performance. We also suggest an alternative way to implement Limabeam algorithm for Hidden Markov Network (HM-Net) speech recognizer for efficient implementation. Speech recognition experiments performed in real distant-talking environment confirm the efficacy of Limabeam algorithm in HM-Net speech recognition system and also confirm the improved performance by the proposed method.

주제어

AI 본문요약
AI-Helper

* AI 자동 식별 결과로 적합하지 않은 문장이 있을 수 있으니, 이용에 유의하시기 바랍니다.

제안 방법

HM-Net speech recognition system was used for all experiment in this paper. 1004 states (8 Gaussians/state) HM-Net system were trained using Trade database, a speaker-independent database consisting of 8892 utterances uttered by 90 speakers.
Originally, Limabeam was investigated with Sphinx3, an HMM-based large-vocabulary speech recognition system [7], and English database. In this paper, we present the results of investigation the performance of speech recognition using microphone array and the way to implement Limabeam algorithm with Hidden Markov Network (HM-Net) speech recognition system and Korean database.
In this paper, we proposed an alternative way to implement Limabeam algorithm in Hidden Markov Network speech recognizer for efficient implementation and we proposed to add a post filter technique with Feature Weighted Mahalanobis Distance to Limabeam algorithm in order to improve recognition performance. From our prior investigation for the unsupervised Limabeam, we could see that because the performance of optimization depended strongly on the transcription output of the first recognition step, the output became unstable and that caused to lead lower performance.
1004 states (8 Gaussians/state) HM-Net system were trained using Trade database, a speaker-independent database consisting of 8892 utterances uttered by 90 speakers. The system was trained using 39-dimensional feature vectors consisting of 13 MFCC parameters, along with their delta and delta-delta parameters. A 25-ms window length and a 10-ms frame shift were used.

대상 데이터

In order to investigate the performance of speech recognition with microphone array, we employed two microphone array databases recorded at Yeungnam University. In the first database, YUM4-6, we play backed Trade6 databases (596 utterances uttered by 6 speakers) through an Harman/Kardon loudspeaker and used a linear B&K microphone array with 4 elements spaced 20cm apart for recording.

이론/모형

In the experiment, the channels were aligned based on time delays estimated by GCC-PHAT method. The aligned channels were then averaged to generate the delay-and-sum beamforming output signal.
Delay-and-Sum (D&S) is the most popular and simplest method in microphone array processing. To process, one channel is chosen as a reference and the Time-Difference Of Arrival (TDOA) for the rest of the channels is estimated using Generalized Cross-Correlation (GCC) Phase Transform (PHAT) [12] or any other Time Delay Estimation (TDE) techniques. Next the time aligned speech signals are summed up as

성능/효과

So, the fluctuations in output signal that are caused by the higher recognition performance. Experimental results also show that proposed USL-FWMD-WF give approximately 5.8% higher recognition performance than USL.
8% to USL, and the effectiveness of post filtering was proved. Experimental results also showed that USL-FWMD-WF gave approximately 5.8% higher recognition performance compared to D&S algorithm in case of 25 taps filtering.
In the second experiment, our proposed Unsupervised Limabeam algorithm which combined FWMD with Wiener Filter(USL-FWMD-WF) was estimated. The results are compared to those from Unsupervised Limabeam algorithm with FWMD (USL-FWMD).
We estimated recognition performance of our proposed Unsupervised Limabeam algorithm which combined FWMD with Wiener Filter (USL-FWMD-WF). The results were compared to those from Unsupervised Limabeam algorithm with FWMD (USL-FWMD). We could see that the correct recognition rate of USL-FWMD-WF was increased approximately 1.
The experimental results are shown in Figure 9. We can see that the correct recognition rate of USL-FWMD-WF increases 1.4 % average approximately compared to USL-FWMD and the effectiveness of post filtering is proved. This is considered that when we use Wiener filter for the output signal, the incoherent signal components (noise and reverberant speech) are suppressed and the highly coherent speech signals are passed.
The results were compared to those from Unsupervised Limabeam algorithm with FWMD (USL-FWMD). We could see that the correct recognition rate of USL-FWMD-WF was increased approximately 1.4% average compared to USL-FWMD, 5.8% to USL, and the effectiveness of post filtering was proved. Experimental results also showed that USL-FWMD-WF gave approximately 5.

참고문헌 (19)

M.F. Font, "Multi-microphone signal processing for automatic speech recognition in meeting rooms," Master thesis, Berkeley, California, 2005.
P.J. Moreno, "Speech recognition in noisy environments," Doctoral dissertation, Carnegie Mellon University, Pittsburgh, PA, 1996.
F.H. Liu, "Environmental adaptation for robust speech recognition," Doctoral dissertation, Carnegie Mellon University, Pittsburgh, PA, 1994.
A. Acero, "Acoustical and environmental robustness in automatic speech recognition," Doctoral dissertation, Carnegie Mellon University, Pittsburgh, PA, 1990.
L.J. Griffiths, C.W. Jim, "An alternative approach to linearly constrained adaptive beamforming," IEEE Transaction on Antennas and Propagation, Vol. AP-30, No. 1, pp.27-34, 1982.
M. Seltzer, "Microphone Array Processing for Robust Speech Recognition," Doctoral dissertation, Carnegie Mellon University, Pittsburgh, PA, 2003.
P. Placeway, S. Chen, M. Eskenazi, U. Jain, V. Parikh, B. Raj, M. Ravishankar, R. Ronsenfeld, K. Seymore, M. Siegler, R. Stern, E. Thayer, "The 1996 hub-4 sphinx-3 system," Proceedings on the DARPA Speech Recognition workshop, Vol. 1, pp.243-252, 1997.
S.J. Oh, C.J. Hwang, H.Y. Jung, H.Y. Chung, "A study on statistical language models for large vocabulary continuous speech recognition system," Proceedings on ICSP, Vol. 1, pp.113-119, 1999.
W.H. Press, B.P. Flannery, S.A. Teukolsky, W.T. Vetterling, "Numerical Recipes in C: The Art of Scientific Computin," New York: Cambridge University Press, 1998.
L. Rabiner, B.H. Juang, "Fundamentals of Speech Recognition," New Jersey: Prentice Hall, 1993.
A.J. Viterbi, "Error bounds for convolutional codes and an asymptotically optimum decoding algorithm," IEEE Transactions on Information Theory, Vol. 13, No. 2, pp.260-269, 1967.

상세보기
P. Morento, B. Raj, R.M. Stern, "A Unified Approach for Robust Speech Recognition," Proceedings of Eurospeech, Vol. 1, pp.481-485, 1995.
R. Zenlinski, "A microphone array with adaptive post-filtering for noise reduction in reverberant room," Proceedings on International Conference of Acoustics, Speech, and Signal Processing, Vol. 5, pp.2578-2581, 1988.
I.A. McCowan, "Robust speech recognition using microphone arrays," Doctoral dissertation, Queen land University of Technology, Australia, 2001.
C.H. Knapp, C. Carter, "The generalized correlation method for estimation of time delay," IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 24, No. 4, pp.320-327, 1976.

상세보기
D.C. Nguyen, H.Y. Chung, "Performance Improvement of Microphone Array Speech Recognition using Feature Weighted Mahalanobis Distance," The Journal of the Acoustical Society of Korea, Vol. 29, No. 1E, pp.45-53, 2010.
N.D. Cuong, S. Guanghu, J.H. Youl, C.H. Yeol, "Performace improvement of speech recognition system using microphone array," Proceedings on IEEE International Conference of Research, Innovation and Vision for the Future, pp.91-95, 2008.
L. Brayda, C. Wellekens, M. Omologo, "N-Best Parallel Maximum Likelihood Beamformers for Robust Speech Recognition," Proceedings of European Signal Processing Conference, 2006.
K. Bahram, B. Hamidreza, R. Farbod, "Improvement in speech recognition using phone-based filter and sum parameter optimuization," IEICE Electronics Express, Vol. 6, No. 8, pp.437-442, 2009.

상세보기

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Recognition Performance Improvement of Unsupervised Limabeam Algorithm using Post Filtering Technique 원문보기

Abstract ▼ AI-Helper

주제어

AI 본문요약
AI-Helper

제안 방법

대상 데이터

이론/모형

성능/효과

참고문헌 (19)

이 논문을 인용한 문헌

저자의 다른 논문 :

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Recognition Performance Improvement of Unsupervised Limabeam Algorithm using Post Filtering Technique 원문보기

Abstract ▼ AI-Helper

주제어

AI 본문요약 엑셀 다운로드 AI-Helper

제안 방법

대상 데이터

이론/모형

성능/효과

참고문헌 (19)

이 논문을 인용한 문헌

저자의 다른 논문 :

Nguyen, Dinh Cuong (2) Choi, Suk-Nam (1) 정현열 (45)

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

AI 본문요약
AI-Helper