System and method of pattern recognition in very high-dimensional space
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G10L-015/06
G10L-015/00
G10L-015/04
출원번호
US-0617834
(2006-12-29)
등록번호
US-7369993
(2008-05-06)
발명자
/ 주소
Atal,Bishnu Saroop
출원인 / 주소
AT&T Corp.
인용정보
피인용 횟수 :
1인용 특허 :
22
초록▼
A system and method of recognizing speech comprises an audio receiving element and a computer server. The audio receiving element and the computer server perform the process steps of the method. The method involves training a stored set of phonemes by converting them into n-dimensional space, where
A system and method of recognizing speech comprises an audio receiving element and a computer server. The audio receiving element and the computer server perform the process steps of the method. The method involves training a stored set of phonemes by converting them into n-dimensional space, where n is a relatively large number. Once the stored phonemes are converted, they are transformed using single value decomposition to conform the data generally into a hypersphere. The received phonemes from the audio-receiving element are also converted into n-dimensional space and transformed using single value decomposition to conform the data into a hypersphere. The method compares the transformed received phoneme to each transformed stored phoneme by comparing a first distance from a center of the hypersphere to a point associated with the transformed received phoneme and a second distance from the center of the hypersphere to a point associated with the respective transformed stored phoneme.
대표청구항▼
The invention claimed is: 1. A method of training phonemes for use in recognizing a received phoneme having an associated received-signal vector using a stored plurality of phoneme classes, each of the plurality of phoneme classes comprising class phonemes, the method comprising, for each class pho
The invention claimed is: 1. A method of training phonemes for use in recognizing a received phoneme having an associated received-signal vector using a stored plurality of phoneme classes, each of the plurality of phoneme classes comprising class phonemes, the method comprising, for each class phoneme: generating an expanded stored-phoneme vector from the class phoneme; and transforming the expanded stored-phoneme vector into an orthogonal form associated with a hypersphere having a center and a radius, wherein a received phoneme may be recognized by generating an expanded received-signal vector into an orthogonal form for analysis in the hypersphere. 2. The method of claim 1, wherein the generating expanded stored phoneme vector from the class phoneme further comprises: determining a phoneme vector as a time-frequency representation of the class phoneme; dividing the phoneme vector into phoneme segments; assigning each phoneme segment into a plurality of phoneme parameters; and expanding each phoneme segment and plurality of phoneme parameters into an expanded stored-phoneme vector with expanded vector parameters. 3. The method of claim 1, wherein transforming the expanded stored-phoneme vector into an orthogonal form further comprises: setting [x1 x2 . . . xm]=[u1 u2 . . . um] ΛVt, where xk is a kth acoustic vector for a corresponding stored phoneme, uk is the corresponding orthogonal vector and Λ and V are diagonal and unitary matrices, respectively. 4. The method of claim 1, further comprising transforming the expanded received signal vector which is associated with the received phoneme into an orthogonal form using a singular-value decomposition to conform the expanded received-single signal vector into the hypersphere. 5. A method of recognizing a received phoneme having an associated received-signal vector using a plurality of phoneme classes, each of the plurality of phoneme classes comprising class phonemes, the method comprising: generating an expanded received-signal vector from a received analog acoustic signal; transforming the expanded received-signal vector into an orthogonal form associated with a hypersphere having a center and a radius; determining a first distance associated with the orthogonal form of the expanded received-signal vector and a second distance associated respectfully with each orthogonal form of expanded stored phoneme vectors; and recognizing the received phoneme according to a comparison of the first distance with the second distance. 6. The method of claim 5, wherein generating the expanded received-signal vector further comprises: receiving the analog acoustic signal; converting the analog acoustic signal into a digital signal; determining the received-signal vector as a time-frequency representation of the received digital signal; dividing the received-signal vector into received-signal segments; assigning each received-signal segment into a plurality of received-signal parameters; and expanding each received-signal segment and plurality of received-signal parameters into an expanded received-signal vector. 7. The method of claim 5, wherein transforming the expanded received-signal vector into an orthogonal form further comprises: setting [yk]=[zk] ΛVt, where yk is a kth acoustic vector for a corresponding received phoneme, zk is the corresponding orthogonal vector and Λ and V are diagonal and unitary matrices, respectively. 8. The method of claim 5, wherein transforming the expanded stored phoneme vector into an orthogonal form uses singular-value decomposition and wherein transforming the expanded received-signal vector into an orthogonal form using singular-value decomposition further conforms the stored-phoneme vector into the hypersphere. 9. The method of claim 8, wherein determining a distance associated with the orthogonal form of the expanded received-signal vector and each orthogonal form of the expanded stored-phoneme vectors further comprises: comparing a distance from the center of the hypersphere of the orthogonal form of the expanded received-signal vector with a distance from the center of the hypersphere for each orthogonal form of the expanded stored-phoneme vector. 10. The method of claim 9, wherein determining a distance associated with the orthogonal form of the expanded received-signal vector and each orthogonal form of the expanded stored-phoneme vectors further comprises: determining a difference between the distance from the center of the hypersphere of the orthogonal form of the expanded received-signal vector and the distance from the center of the hypersphere for each orthogonal form of the expanded stored-phoneme vectors, wherein the expanded stored-phoneme vectors associated with m-shortest differences between the distance from the center of the hypersphere of the orthogonal form of the expanded received-signal vector and the distance from the center of the hypersphere for each orthogonal form of the expanded stored-phoneme vectors are recognized as most likely to be associated with the received phoneme. 11. A computing device for recognizing a received phoneme having an associated received-signal vector using a stored plurality of phoneme classes, each of the plurality of phoneme classes comprising class phonemes, the computing device comprising: a module configured to generate an expanded stored-phoneme vector from each respective class phoneme; and a module configured to transform the expanded stored-phoneme vector into an orthogonal form associated with a hypersphere having a center and a radius, wherein a received phoneme may be recognized by generating an expanded received-signal vector into an orthogonal form for analysis in the hypersphere. 12. The computing device of claim 11, wherein the module configured to generate the expanded stored phoneme vector from the class phoneme further: determines the phoneme vector as a time-frequency representation of the class phoneme; divides the phoneme vector into phoneme segments; assigns each phoneme segment into a plurality of phoneme parameters; and expands each phoneme segment and plurality of phoneme parameters into an expanded stored-phoneme vector with expanded vector parameters. 13. The computing device of claim 11, wherein the module configured to transform the expanded stored-phoneme vector into an orthogonal form further: sets [x1 x2 . . . xm]=[u1 u2 . . . um] ΛVt, where xk is a kth acoustic vector for a corresponding stored phoneme, uk is the corresponding orthogonal vector and Λ and V are diagonal and unitary matrices, respectively. 14. The computing device of claim 11, further comprising a module configured to transform the expanded received signal vector which is associated with the received phoneme into an orthogonal form using a singular-value decomposition to conform the expanded received-single signal vector into the hypersphere. 15. A computing device for recognizing a received phoneme having an associated received-signal vector using a plurality of phoneme classes, each of the plurality of phoneme classes comprising class phonemes, the computing device comprising: a module configured to generate an expanded received-signal vector from a received analog acoustic signal; a module configured to transform the expanded received-signal vector into an orthogonal form associated with a hypersphere having a center and a radius; a module configured to determine a first distance associated with the orthogonal form of the expanded received-signal vector and a second distance associated with each orthogonal form of expanded stored phoneme vectors; and a module configured to recognize the received phoneme according to a comparison of the first distance with the second distance. 16. The computing device of claim 15, wherein the module configured to generate the expanded received-signal vector further: receives the analog acoustic signal; converts the analog acoustic signal into a digital signal; determines the received-signal vector as a time-frequency representation of the received digital signal; divides the received-signal vector into received-signal segments; assigns each received-signal segment into a plurality of received-signal parameters; and expands each received-signal segment and plurality of received-signal parameters into an expanded received-signal vector. 17. The computing device of claim 15, wherein the module configured to transform the expanded received-signal vector into an orthogonal form further: sets [yk]=[zk] ΛVt, where yk is a kth acoustic vector for a corresponding received phoneme, zk is the corresponding orthogonal vector and Λ and V are diagonal and unitary matrices, respectively. 18. The computing device of claim 15, wherein the module configured to transform the expanded stored phoneme vector into an orthogonal form uses singular-value decomposition and wherein transforming the expanded received-signal vector into an orthogonal form using singular-value decomposition further conforms the stored-phoneme vector into the hypersphere. 19. The computing device of claim 18, wherein determining a distance associated with the orthogonal form of the expanded received-signal vector and each orthogonal form of the expanded stored-phoneme vectors further comprises: comparing a distance from the center of the hypersphere of the orthogonal form of the expanded received-signal vector with a distance from the center of the hypersphere for each orthogonal form of the expanded stored-phoneme vector. 20. The computing device of claim 19, wherein determining a distance associated with the orthogonal form of the expanded received-signal vector and each orthogonal form of the expanded stored-phoneme vectors further comprises: determining a difference between the distance from the center of the hypersphere of the orthogonal form of the expanded received-signal vector and the distance from the center of the hypersphere for each orthogonal form of the expanded stored-phoneme vectors, wherein the expanded stored-phoneme vectors associated with m-shortest differences between the distance from the center of the hypersphere of the orthogonal form of the expanded received-signal vector and the distance from the center of the hypersphere for each orthogonal form of the expanded stored-phoneme vectors are recognized as most likely to be associated with the received phoneme.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (22)
Baji Toru (Burlingame CA) Noguchi Kouki (Kokubunji CA JPX) Nakagawa Tetsuya (Millbrae CA) Tonomura Motonobu (Kodaira JPX) Akimoto Hajime (Mobara JPX) Masuhara Toshiaki (Tokyo JPX), Apparatus including a pair of neural networks having disparate functions cooperating to perform instruction recognition.
Prasad K. Venkatesh (Cupertino CA) Stork David G. (Stanford CA), Facial feature extraction method and apparatus for a neural network acoustic and visual speech recognition system.
Stork David G. (Stanford CA) Wolff Gregory J. (Mountain View CA), Neural network acoustic and visual speech recognition system training method and apparatus.
Inazumi Mitsuhiro (Suwa JPX), Neural network speech recognition apparatus recognizing the frequency of successively input identical speech data sequen.
Campbell William Michael ; Kleider John Eric ; Broun Charles Conway ; Gifford Carl Steven ; Assaleh Khaled, Speaker independent speech recognition system and method.
Tian,Jilei; Nurminen,Jani K.; Popa,Victor, Method, apparatus, mobile terminal and computer program product for providing efficient evaluation of feature transformation.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.