Interactive robot, speech recognition method and computer program product
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G10L-015/00
G10L-021/00
G05B-019/19
B25J-005/00
B25J-009/18
출원번호
UP-0311429
(2005-12-20)
등록번호
US-7680667
(2010-04-21)
우선권정보
JP-2004-374946(2004-12-24)
발명자
/ 주소
Sonoura, Takafumi
Suzuki, Kaoru
출원인 / 주소
Kabuhsiki Kaisha Toshiba
대리인 / 주소
Nixon & Vanderhye, PC
인용정보
피인용 횟수 :
27인용 특허 :
4
초록▼
An interactive robot capable of speech recognition includes a sound-source-direction estimating unit that estimates a direction of a sound source for target voices which are required to undergo speech recognition; a moving unit that moves the interactive robot in the sound-source direction; a target
An interactive robot capable of speech recognition includes a sound-source-direction estimating unit that estimates a direction of a sound source for target voices which are required to undergo speech recognition; a moving unit that moves the interactive robot in the sound-source direction; a target-voice acquiring unit that acquires the target voices at a position after moving; and a speech recognizing unit that performs speech recognition of the target voices.
대표청구항▼
What is claimed is: 1. An interactive robot capable of speech recognition, comprising: a sound-source-direction estimating unit that estimates a direction of a sound source for target voices which are required to undergo speech recognition; a moving unit that moves the interactive robot in the soun
What is claimed is: 1. An interactive robot capable of speech recognition, comprising: a sound-source-direction estimating unit that estimates a direction of a sound source for target voices which are required to undergo speech recognition; a moving unit that moves the interactive robot in the sound-source direction; a target-voice acquiring unit that acquires the target voices at a position after moving; a target-voice holding unit that holds voice patterns of the target voices, the target voices including misrecognition-notification voices signifying that speech recognition by the speech recognizing unit is erroneous; a speech recognizing unit that performs speech recognition of the target voices by pattern matching of the voice patterns of the target voices, which are held in the target-voice holding unit, with the target voices acquired by the target-voice acquiring unit; a recognition-accuracy evaluating unit that calculates, as an accuracy of recognition results, an agreement accuracy between the acquired target voices and the voice patterns of the target voices held in the target-voice holding unit; wherein the moving unit moves the interactive robot itself in the direction of the sound source when the recognition accuracy for results of speech recognition of the target voices is smaller than a predetermined recognition-accuracy threshold and when the misrecognition-notification voices held in the target-voice holding unit are recognized. 2. The interactive robot according to claim 1, further comprising a voice-producing directing unit by which the sound source of the target voices is directed to produce voices after the robot is moved in the direction of the sound source, wherein the speech recognizing unit performs speech recognition of the target voices produced according to the voice-producing direction. 3. The interactive robot according to claim 1, further comprising: a signal-to-noise ratio calculating unit that calculates a signal-to-noise ratio of the target voices; and a signal-to-noise-ratio evaluating unit that compares the calculated signal-to-noise ratio and a predetermined threshold for the signal-to-noise ratio, wherein the moving unit moves the interactive robot itself in the direction of the sound source when the signal-to-noise ratio is smaller than the threshold for the signal-to-noise ratio. 4. The interactive robot according to claim 3, wherein the target voices are voices produced by an interlocutor communicating with the interactive robot, and the interactive robot further comprises an image acquiring unit that acquires images including the interlocutor as the sound source of the target voices; and a mouth-movement detecting unit that detects, from the images, mouth movement caused by voices produced by the interlocutor, wherein the moving unit moves the interactive robot itself in the direction of the sound source when the signal-to-noise ratio is smaller than the threshold for the signal-to-noise ratio, and the mouth movement of the interlocutor is detected. 5. The interactive robot according to claim 3, wherein the target voices are voices produced by an interlocutor communicating with the interactive robot, and the interactive robot further comprises an image acquiring unit that acquires images including the interlocutor as the sound source of the target voices; and a mouth-movement detecting unit that detects, from the images, mouth movement caused by voices produced by the interlocutor, wherein the moving unit moves the interactive robot itself in the direction of the sound source when the signal-to-noise ratio is equal to or larger than the threshold for the signal-to-noise ratio, and the mouth movement of the interlocutor is not detected. 6. The interactive robot according to claim 1, wherein the target voices are voices produced by an interlocutor communicating with the interactive robot, and the interactive robot further comprises an image acquiring unit that acquires images including the interlocutor as the sound source of the target voices; and a mouth-movement detecting unit that detects, from the images acquired in the image acquiring unit, mouth movement caused by voices produced by the interlocutor, wherein the moving unit moves the interactive robot itself in the direction of the sound source when the recognition-accuracy is smaller than the threshold for the recognition accuracy, and the mouth movement of the interlocutor is detected. 7. The interactive robot according to claim 1, wherein the target voices are voices produced by an interlocutor communicating with the interactive robot, and the interactive robot further comprises an image acquiring unit that acquires images including the interlocutor as the sound source of the target voices; and a mouth-movement detecting unit that detects, from the images, mouth movement caused by voices produced by the interlocutor, wherein the moving unit moves the interactive robot itself in the direction of the sound source when the recognition-accuracy is equal to or larger than the threshold for the recognition accuracy, and the mouth movement of the interlocutor is not detected. 8. The interactive robot according to claim 1, wherein the target voices are voices produced by an interlocutor communicating with the interactive robot, the interactive robot further comprises an image acquiring unit that acquires images including the interlocutor as the sound source of the target voices; and a mouth-movement detecting unit that detects, from the images acquired in the image acquiring unit, mouth movement caused by voices produced by the interlocutor, wherein the moving unit moves the interactive robot in the direction of the sound source when the mouth movement is detected and the target voices are not acquired. 9. The interactive robot according to claim 1, wherein the target voices are voices produced by an interlocutor communicating with the interactive robot, and the interactive robot further comprises an image acquiring unit that acquires images including the interlocutor as the sound source of the target voices; and a mouth-movement detecting unit that detects, from the images, mouth movement of the interlocutor, wherein the moving unit moves the interactive robot in the direction of the sound source when the mouth movement is not detected and the target voices are not acquired. 10. The interactive robot according to claim 1, further comprising a microphone array that has a plurality of microphones which pick up the target voices, wherein the direction of the sound source is estimated, based on differential arrival time between plane waves of the target voices picked up with corresponding voice microphones. 11. The interactive robot according to claim 1, further comprising a distance measuring sensor that measures a distance between the target voices and the interactive robot, wherein the sound-source-direction estimating unit estimates the direction of the sound source, based on measured results. 12. The interactive robot according to claim 1, further comprising an image forming unit that forms an image of the sound source of the target voices, wherein the sound-source-direction estimating unit estimates the direction of the sound source, assuming that an image-forming direction is the direction of the sound source. 13. The interactive robot according to claim 1, further comprising: a signal-strength measurement unit that measures signal strength of the target voices at a position after the interactive robot is moved by the moving unit; and an amplification-gain-adjustment unit that, based on the value of the signal strength, adjusts a gain of amplification by which voice signal of the target voices is amplified, wherein the speech recognizing unit performs speech recognition of the target voices acquired after the gain of amplification is adjusted. 14. A computer-implemented method for an interactive robot capable of speech recognition, the method comprising: estimating a direction of the sound source of target voices which are required to undergo speech recognition; moving the interactive robot in the direction of the sound source; acquiring the target voices when the interactive robot is located at a position after moving; performing speech recognition of the target voices by pattern matching of voice patterns of the target voices, which are held in a target-voice holding unit, with the acquired target voices, where the target voices held in the target-voice holding unit include misrecognition-notification voices signifying that speech recognition is erroneous; calculating, as an accuracy of recognition results, an agreement accuracy between the acquired target voices and the voice patterns of the target voices held in the target-voice holding unit; and moving the interactive robot itself in the direction of the sound source when the recognition accuracy for results of speech recognition of the target voices is smaller than a predetermined recognition-accuracy threshold and when the misrecognition-notification voices held in the target-voice holding unit are recognized. 15. A computer program product having a computer readable medium including programmed instructions for performing speech recognition processing on an interactive robot capable of speech recognition, wherein the instructions, when executed by a computer, cause the computer to perform: estimating a direction of the sound source of target voices which are required to undergo speech recognition; moving the interactive robot in the direction of the sound source; acquiring the target voices when the interactive robot is located at a position after moving; performing speech recognition of the target voices by pattern matching of voice patterns of the target voices, which are held in a target-voice holding unit, with the acquired target voices, where the target voices held in the target-voice holding unit include misrecognition-notification voices signifying that speech recognition is erroneous; calculating, as an accuracy of recognition result, an agreement accuracy between the acquired target voices and the voice patterns of the target voices held in the target-voice holding unit; and moving the interactive robot itself in the direction of the sound source when the recognition accuracy for results of speech recognition of the target voices is smaller than a predetermined recognition-accuracy threshold and when the misrecognition-notification voices held in the target-voice holding unit are recognized.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (4)
Chigier Benjamin (Brookline MA), Automatic speech recognition.
Petroni Marco,CAX ; Peters Steven Douglas,CAX, Method and apparatus for providing an improved feature set in speech recognition by performing noise cancellation and background masking.
Osterhout, Ralph F.; Haddick, John D.; Lohse, Robert Michael; Cella, Charles; Nortrup, Robert J.; Nortrup, Edward H., AR glasses with event and sensor triggered AR eyepiece interface to external devices.
Osterhout, Ralph F.; Haddick, John D.; Lohse, Robert Michael; Cella, Charles; Nortrup, Robert J.; Nortrup, Edward H., AR glasses with event and sensor triggered control of AR eyepiece applications.
Osterhout, Ralph F.; Haddick, John D.; Lohse, Robert Michael; Cella, Charles; Nortrup, Robert J.; Nortrup, Edward H., AR glasses with event and user action control of external applications.
Osterhout, Ralph F.; Haddick, John D.; Lohse, Robert Michael; Border, John N.; Miller, Gregory D.; Stovall, Ross W., Eyepiece with uniformly illuminated reflective display.
Miller, Gregory D.; Border, John N.; Osterhout, Ralph F., Grating in a light transmissive illumination system for see-through near-eye display glasses.
Miller, Gregory D.; Border, John N.; Osterhout, Ralph F., Optical imperfections in a light transmissive illumination system for see-through near-eye display glasses.
Border, John N.; Bietry, Joseph; Osterhout, Ralph F., See-through near-eye display glasses including a curved polarizing film in the image source, a partially reflective, partially transmitting optical element and an optically flat film.
Border, John N.; Haddick, John D.; Osterhout, Ralph F., See-through near-eye display glasses including a partially reflective, partially transmitting optical element.
Border, John N.; Osterhout, Ralph F., See-through near-eye display glasses including an auto-brightness control for the display brightness based on the brightness in the environment.
Border, John N.; Bietry, Joseph; Osterhout, Ralph F., See-through near-eye display glasses wherein image light is transmitted to and reflected from an optically flat film.
Border, John N.; Osterhout, Ralph F., See-through near-eye display glasses with a fast response photochromic film system for quick transition from dark to clear.
Border, John N.; Haddick, John D.; Osterhout, Ralph F., See-through near-eye display glasses with a light transmissive wedge shaped illumination system.
Border, John N.; Haddick, John D.; Lohse, Robert Michael; Osterhout, Ralph F., See-through near-eye display glasses with the optical assembly including absorptive polarizers or anti-reflective coatings to reduce stray light.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.