Methods for training a speech recognition system
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G10L-015/00
G10L-015/06
G10L-015/07
G10L-015/02
G10L-015/065
출원번호
US-0619093
(2015-02-11)
등록번호
US-10121466
(2018-11-06)
발명자
/ 주소
Pecorari, John
출원인 / 주소
Hand Held Products, Inc.
대리인 / 주소
Additon, Higgins & Pendleton, P.A.
인용정보
피인용 횟수 :
0인용 특허 :
216
초록▼
Speech recognition systems that use voice templates may create (or update) voice templates for a particular user by training (or re-training). If a training results in a vocabulary with similar voice templates, then the speech recognition system's performance may suffer. The present invention provid
Speech recognition systems that use voice templates may create (or update) voice templates for a particular user by training (or re-training). If a training results in a vocabulary with similar voice templates, then the speech recognition system's performance may suffer. The present invention provides embraces methods for training a speech recognition system to prevent voice template similarity. In these methods, a trained word's voice template may be evaluated for similarity to other vocabulary templates prior to enrolling the voice template into the vocabulary. If template similarity is found, then a user may be prompted to retrain the system using an alternate word. Alternatively, the user may be prompted to retrain the system with the word spoken more clearly. This dynamic enrollment training analysis insures that all templates in the vocabulary are distinct.
대표청구항▼
1. A method for re-training a speech recognition system, the method comprising: acquiring, using the speech recognition system, multiple samples of a spoken word from a user, said spoken word representing a vocabulary word from an application vocabulary stored in a memory;creating, via at least one
1. A method for re-training a speech recognition system, the method comprising: acquiring, using the speech recognition system, multiple samples of a spoken word from a user, said spoken word representing a vocabulary word from an application vocabulary stored in a memory;creating, via at least one processor, a voice template for said spoken word from the multiple samples of said spoken word;comparing, via the at least one processor, the voice template for said spoken word to other voice templates for other words from the application vocabulary;if the voice template for said spoken word is similar to at least one of the other voice templates for the other words, then providing, via the at least one processor, information to the user, wherein the information comprises: (i) a prompt to create a new voice template for said spoken word, and(ii) instructions for adjusting said spoken word to make said new voice template for said spoken word less similar to the other voice templates for the other words, wherein the instructions for adjusting said spoken word comprise a prompt to help the user to enunciate said spoken word differently;acquiring, using the speech recognition system, multiple samples of an adjusted spoken word from the user;creating, via at least one processor, said new voice template for said adjusted spoken word from the multiple samples of said adjusted spoken word;comparing, via the at least one processor, said new voice template for said adjusted spoken word to other voice templates for other words from the application vocabulary; andif said new voice template for said adjusted spoken word is dissimilar to the other voice templates for the other words, then assigning said voice template for said adjusted spoken word to said spoken word in the application vocabulary stored in the memory;wherein, during the re-training, the comparison of the voice template for said spoken word to other voice templates for other words is performed until a unique voice template which is different from the other voice templates and having no template similarity with the other voice templates is created for the said spoken word;wherein the re-training is initiated after an initial enrollment training performed for the speech recognition system before use based on an outcome of a performance evaluation performed periodically by the speech recognition system; andwherein the performance evaluation is associated with recognition performance for the spoken word. 2. The method according to claim 1, wherein the instructions for adjusting said spoken word comprise prompts to help the user to enunciate said spoken word differently. 3. The method according to claim 1, wherein the instructions for adjusting said spoken word comprise prompting the user to utter an alternate word to represent said spoken word, wherein the alternate word is a variant of the word. 4. The method according to claim 3, wherein prompting the user to utter an alternate word comprises presenting the user with a set of possible alternate words. 5. The method according to claim 1, wherein the information provided to the user is displayed on a screen. 6. The method according to claim 1, wherein comparing the voice template for said spoken word to the other voice templates for other words from the application vocabulary comprises comparing the voice template for said spoken word to a subset of other words from the application vocabulary and wherein the subset of words corresponds to words from the application vocabulary which are at least of same type and of same class and wherein the comparing further comprises computing a similarity score and comparing the similarity score to a threshold. 7. The method according to claim 1, wherein the other voice templates for the other words comprise custom voice templates created for a specific user. 8. The method according to claim 1, wherein the other voice templates for the other words comprise generic voice templates created for any user. 9. A method for re-training a speaker-independent speech recognition system with respect to a word of an application vocabulary, wherein a generic voice template is assigned to said word in the application vocabulary, the method comprising: acquiring from a user a speech sample of said word using the speaker-independent speech recognition system;comparing, via at least one processor, the speech sample to generic voice templates in the application vocabulary; andif the speech sample matches more than one of the generic voice templates in the application vocabulary, then: prompting, via the at least one processor, the user to create a custom voice template for a substitute word,training, via the at least one processor, the speaker-independent speech recognition system on the substitute word to create the custom voice template for the substitute word, andreplacing, via the at least one processor, in the application vocabulary the generic voice template for said word with the custom voice template for the substitute word; andotherwise, if the speech sample matches the generic voice template for said word, using, via the at least one processor, the generic voice template for the word;wherein, during the re-training, the comparison of the speech sample of said word to generic voice templates in the application vocabulary is performed until the custom voice template for the substitute word which is different from the generic voice templates and having no template similarity with the generic voice templates is created;wherein the re-training is initiated after an initial enrollment training performed for the speech recognition system before use based on an outcome of a performance evaluation performed periodically by the speech recognition system; andwherein the performance evaluation is associated with recognition performance for the word. 10. The method according to claim 9, wherein prompting the user to create a custom voice template for a substitute word comprises a list of possible substitute words. 11. The method according to claim 9, wherein the generic voice templates comprise voice templates for other words that sound similar to the word. 12. The method according to claim 9, wherein the generic voice templates comprise voice templates for a subset of other words of the application library which are at least one of the same type of words and the same class of words. 13. The method according to claim 9, wherein the substitute word comprises a different enunciation of the word. 14. The method according to claim 9, wherein the substitute word comprises a new word chosen by a user that is different from the word. 15. A method for re-training a speech recognition system with respect to a word of an application vocabulary, wherein a voice template is assigned to said word in the application vocabulary, the method comprising: acquiring from a user a speech sample of said word using the speech recognition system;comparing, via at least one processor, the speech sample to voice templates in the application vocabulary; andif the speech sample matches more than one of the voice templates in the application vocabulary, then: prompting, via the at least one processor, the user to re-train the speech recognition system using an alternate word in place of said word, wherein the alternate word is a variant of said word;training, via the at least one processor, the speech recognition system on the alternate word to create a voice template for the alternate word; andreplacing, via the at least one processor, in the application vocabulary the voice template for said word with the voice template for the alternate word;wherein, during the re-training, the comparison of the speech sample of to the voice templates in the application vocabulary is performed until a voice template corresponding to the alternate word which is different from the voice templates of words in the application vocabulary and having no template similarity with the voice templates of words in the application library is created;wherein the re-training is initiated after an initial enrollment training performed for the speech recognition system before use based on an outcome of a performance evaluation performed periodically by the speech recognition system; andwherein the performance evaluation is associated with recognition performance for the spoken word. 16. The method according to claim 15, comprising, before acquiring the speech sample of said word, determining that the speech recognition system has poor performance. 17. The method according to claim 15, wherein the voice templates comprise voice templates for words that sound similar to the word. 18. The method according to claim 15, wherein the speech sample comprises utterances of phrases that use the word. 19. The method according to claim 15, wherein the alternate word comprises a word chosen from a list of suggested words. 20. The method according to claim 19, wherein the alternate word comprises a set of words.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (216)
Woodburn, William, Access door with integrated switch actuator.
Caballero, Aldo M.; French, Daniel Brant; Hinson, Douglas M.; Kosecki, James C.; Mangicaro, David; Reynolds, Scott; Yeakley, Daniel Duane, Apparatus and methods for monitoring one or more portable data terminals.
Havens, William H.; Barber, Charles P.; Gannon, Colleen; Gardiner, Robert C.; Hennick, Robert J.; Pettinelli, John A., Apparatus operative for capture of image data.
Landell Blakely P. (Burlington NJ) Wohlford Robert E. (Kildeer IL) Bahler Lawrence G. (San Diego CA), Automatic speech recognition system using seed templates.
Horn, Erik Van; Giordano, Patrick Anthony; Amundsen, Thomas; Olson, Daniel James; Brady, Robert Hugh; Colavito, Stephen; Saber, Kevin; Haggerty, Thomas; Wilz, Sr., David M., Bar code symbol reading system employing an extremely elongated laser scanning beam capable of reading poor and damaged quality bar code symbols with improved levels of performance.
Xian, Tao; Ellis, Duane; Good, Timothy; Zhu, Xiaoxun, Bar code symbol reading system supporting visual or/and audible display of product scan speed for throughput optimization in point of sale (POS) environments.
Todeschini, Erik; Deloge, Stephen Patrick; Meier, Timothy; Anderson, Donald; Hejl, Benjamin; Koziol, Thomas, Cloud-based system for reading of decodable indicia.
Kearney, Sean Philip; Giordano, Patrick Anthony; Cunningham, Charles Joseph; Bond, Desmond; Amundsen, Thomas, Decodable indicia reading terminal with combined illumination.
Biss, Charles E.; Havens, William H.; Robinson, Michael D.; Balschweit, Paul; Fitch, Timothy R.; McCall, Melvin D.; Gomez, Garrison; McClaude, Mark A.; Longacre, Andrew; Sonneville, Eunice, Device and system for processing image data representing bar codes.
Edmonds, Shane Michael; Keaney, Sean Philip, Hybrid-type bioptical laser scanning and digital imaging system supporting automatic object motion detection at the edges of a 3D scanning volume.
Edmonds, Shane Michael; Kearney, Sean Philip, Hybrid-type bioptical laser scanning and digital imaging system supporting automatic object motion detection at the edges of a 3D scanning volume.
Kearney, Sean Philip, Hybrid-type bioptical laser scanning and imaging system supporting digital-imaging based bar code symbol reading at the surface of a laser scanning window.
Barber, Charles P.; Gerst, Carl W.; Smith, George S.; Hussey, Robert M.; Gardiner, Robert C.; Pankow, Matthew W., Imaging apparatus having imaging assembly.
Barber, Charles P.; Gerst, III, Carl W.; Smith, II, George S.; Hussey, Robert M.; Gardiner, Robert C.; Pankow, Matthew W., Imaging apparatus having imaging assembly.
Havens, William H.; Pitou, David Stewart; McColloch, Laurence Ray; Barber, Charles Paul; Gannon, Colleen Patricia, Imaging module having lead frame supported light source or sources.
Wang, Ynjiun P.; Ahearn, Kevin; Deloge, Stephen P.; Ehrhart, Michael A.; Havens, William H.; Hussey, Robert M.; Koziol, Thomas J.; Li, Jianhua; Li, Jingquan; Montoro, James; Powilleit, Sven M. A., Indicia reading terminal having spatial measurement functionality.
Havens, William H.; Wang, Ynjiun P.; Hennick, Robert J.; Gannon, Colleen; Anderson, Donald; Hunter, Vivian L.; Bremer, Edward C.; Feng, Chen, Indicia reading terminal including focus element with expanded range of focus distances.
Wang, Ynjiun P.; Bremer, Edward C.; Feng, Chen; Gannon, Colleen P.; Havens, William H.; Li, Jianhua; Meier, Timothy P., Indicia reading terminal processing plurality of frames of image data responsively to trigger signal activation.
Hennick, Robert J.; Havens, William H.; Meier, Timothy; McCloskey, Scott; Anderson, Donald; Wang, Ynjiun P.; Hussey, Robert M.; Van Horn, Erik; Kearney, Sean P., Indicia reading terminals and methods for decoding decodable indicia employing light field imaging.
Wilz, Sr., David M., Laser scanning bar code symbol reading system having intelligent scan sweep angle adjustment capabilities over the working range of the system for optimized bar code symbol reading performance.
Xian, Tao; Wang, Ynjiun P.; Liu, Yong; Feng, Chen, Laser scanning code symbol reading system employing multi-channel scan data signal processing with synchronized digital gain control (SDGC) for full range scanning.
Brady, Robert Hugh; Colavito, Stephen; Wilz, Sr., David; Teng, Zhipeng; Dixon, Myron Levon, Laser scanning code symbol reading system providing improved control over the length and intensity characteristics of a laser scan line projected therefrom using laser source blanking control.
Fritz, Bernard; Cox, James Allen; Reutiman, Peter L., Laser scanning system employing an optics module capable of forming a laser beam having an extended depth of focus (DOF) over the laser scanning field.
Havens, William; Kearney, Sean Philip, Laser scanning system using laser beam sources for producing long and short wavelengths in combination with beam-waist extending optics to extend the depth of field thereof while resolving high resolution bar code symbols having minimum code element widths.
Braho, Keith; El-Jaroudi, Amro; Pike, Jeffrey, Method and system for considering information about an expected response when performing speech recognition.
Van Horn, Erik; Olson, Daniel James, Method of and apparatus for managing and redeeming bar-coded coupons displayed from the light emitting display surfaces of information display devices.
Amundsen, Thomas; Kearney, Sean Philip; Edmonds, Shane Michael; Wang, Ynjiun Paul; Good, Timothy; Miraglia, Michael; Cunningham, IV, Charles Joseph; Zhu, Xiaoxun; Giordano, Patrick Anthony, Method of and system for detecting object weighing interferences.
Amundsen, Thomas; Kearney, Sean Philip; Edmonds, Shane Michael; Wang, Ynjiun Paul; Good, Timothy; Miraglia, Michael; Cunningham, IV, Charles Joseph; Zhu, Xiaoxun; Giordano, Patrick Anthony, Method of and system for detecting produce weighing interferences in a POS-based checkout/scale system.
Van Horn, Erik; Kearney, Sean Philip, Method of and system for reading visible and/or invisible code symbols in a user-transparent manner using visible/invisible illumination source switching during data capture and processing operations.
Berthiaume, Guy H.; Caballero, Aldo M.; Cairns, James A.; Havens, William H.; Koziol, Thomas J.; Stewart, James W.; Wang, Ynjiun P.; Yeakley, Daniel D., Methods and apparatus to change a feature set on data collection devices.
Plesko, George, Molded elastomeric flexural elements for use in a laser scanning assemblies and scanners, and methods of manufacturing, tuning and adjusting the same.
Good, Timothy, Omnidirectional laser scanning bar code symbol reader generating a laser scanning pattern with a highly non-uniform scan density with respect to line orientation.
Kotlarsky, Anatoly; Zhu, Xiaoxun; Veksland, Michael; Au, Ka Man; Giordano, Patrick; Yan, Weizhen; Ren, Jie; Smith, Taylor; Miraglia, Michael V.; Knowles, C. Harry; Mandal, Sudhin; De Foney, Shawn; Allen, Christopher; Wilz, Sr., David M., Optical code symbol reading system employing a LED-driven optical-waveguide structure for illuminating a manually-actuated trigger switch integrated within a hand-supportable system housing.
Kotlarsky, Anatoly; Zhu, Xiaoxun; Veksland, Michael; Au, Ka Man; Giordano, Patrick; Yan, Weizhen; Ren, Jie; Smith, Taylor; Miraglia, Michael V.; Knowles, C. Harry; Mandal, Sudhin; De Foney, Shawn; Allen, Christopher; Wilz, Sr., David M., Optical code symbol reading system employing an acoustic-waveguide structure for coupling sonic energy, produced from an electro-transducer, to sound wave ports formed in the system housing.
Kotlarsky, Anatoly; Zhu, Xiaoxun; Veksland, Michael; Au, Ka Man; Giordano, Patrick; Yan, Weizhen; Ren, Jie; Smith, Taylor; Miraglia, Michael V.; Knowles, C. Harry; Mandal, Sudhin; De Foney, Shawn; Allen, Christopher; Wilz, Sr., David M., Optical scanning system having an extended programming mode and method of unlocking restricted extended classes of features and functionalities embodied therewithin.
Barten, Henri Jozef Maria, POS-based code symbol reading system with integrated scale base and system housing having an improved produce weight capturing surface design.
Cunningham, Charles; Good, Timothy; Kearney, Sean Philip; Miraglia, Michael; Amundsen, Thomas; Giordano, Patrick; Wang, Yujiun Paul; Zhu, Xiaoxun, Point of sale (POS) based checkout system supporting a customer-transparent two-factor authentication process during product checkout operations.
Barber, Charles P.; Gerst, III, Carl W.; Smith, II, George S.; Hussey, Robert M.; Gardiner, Robert C.; Pankow, Matthew W., Reading apparatus having partial frame operating mode.
Murawski, Mark David; Russell, Philip E., Receiving application specific individual battery adjusted battery use profile data upon loading of work application for managing remaining power of a mobile device.
Soule, III, Robert M.; Berthiaume, Guy H.; Caballero, Aldo Mario; Conti, Brian V.; Harper, Jeffrey Dean; Hooks, Larry K.; Meggitt, Adam Edward; Sauerwein, James T.; Yeakley, Daniel D., Reprogramming system and method for devices including programming symbol.
Dautrich Bruce A. (Jackson NJ) Goeddel Thomas W. (Fair Haven NJ) Roe David B. (Tokyo JPX), Speaker-trained speech recognizer having the capability of detecting confusingly similar vocabulary words.
Maloy, James D.; Kusar, Michael; Mranca, Alexander; Narayan, Venkatesh; Thorsen, Jeffrey, System and method for generating and updating location check digits.
Gomez, Garrison; Siegler, Thomas A.; Soule, III, Robert M.; Daddabbo, Nick; Sperduti, David, System and method to store and retrieve identifier associated information content.
Furlong, John A.; Hernandez, Mark Jose Antonio; Koch, Craig; Nahill, James; Cunningham, IV, Charles Joseph; Kearney, Sean Philip; Smith, Taylor, System having imaging assembly for use in output of image data.
Hendrickson, James; Scott, Debra Drylie; Littleton, Duane; Pecorari, John; Slusarczyk, Arkadiusz, Systems and methods for dynamically improving user intelligibility of synthesized speech in a work environment.
Pease, Michael; Bouchat, Christopher; Dobeck, Brian Roman; Sauerwein, Jr., James T.; Youngblood, Eric, Terminal configurable for use within an unknown regulatory domain.
Harding, Andrew C.; Suhr, Jeffrey K.; Allen, Nicholas P., Testing automatic data collection devices, such as barcode, RFID and/or magnetic stripe readers.
Essinger, Steven; Zhu, Xiaoxun; Schnee, Michael; Liu, JiBin; Shen, Xin; Chen, LiangLiang; Lu, Jun, Wireless dual-function network device dynamically switching and reconfiguring from a wireless network router state of operation into a wireless network coordinator state of operation in a wireless communication network.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.