IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0349872
(2003-01-23)
|
등록번호 |
US-7274800
(2007-09-25)
|
발명자
/ 주소 |
- Nefian,Ara Victor
- Grzesczuk,Radek
- Eruhimov,Victor
|
출원인 / 주소 |
|
대리인 / 주소 |
Blakely, Sokoloff, Taylor & Zafman LLP
|
인용정보 |
피인용 횟수 :
116 인용 특허 :
32 |
초록
▼
According to an embodiment, an apparatus and method are disclosed for dynamic gesture recognition from stereo sequences. In an embodiment, a stereo sequence of images of a subject is obtained and a depth disparity map is generated from the stereo sequence. The system is initiated automatically based
According to an embodiment, an apparatus and method are disclosed for dynamic gesture recognition from stereo sequences. In an embodiment, a stereo sequence of images of a subject is obtained and a depth disparity map is generated from the stereo sequence. The system is initiated automatically based upon a statistical model of the upper body of the subject. The upper body of the subject is modeled as three planes, representing the torso and arms of the subject, and three Gaussian components, representing the head and hands of the subject. The system tracks the upper body of the subject using the statistical upper body model and extracts three-dimensional features of the gestures performed. The system recognizes the gestures using recognition units, which, under a particular embodiment, utilizes hidden Markov models for the three-dimensional gestures.
대표청구항
▼
What is claimed is: 1. A method comprising: capturing a sequence of stereo images, the stereo images including at least a portion of a subject performing a dynamic gesture; obtaining depth disparities relating to the stereo images; automatically initializing parameters of a statistical model of the
What is claimed is: 1. A method comprising: capturing a sequence of stereo images, the stereo images including at least a portion of a subject performing a dynamic gesture; obtaining depth disparities relating to the stereo images; automatically initializing parameters of a statistical model of the subject based upon matching an image of the subject to the statistical model; tracking the subject using the statistical model of the subject; extracting three-dimensional features from the stereo images; and interpreting the dynamic gesture performed by the subject. 2. The method of claim 1, further comprising segmenting an image of the subject into subparts. 3. The method of claim 2, wherein the subparts represent at least the torso, head, arms, and hands of the subject. 4. The method of claim 1, wherein the statistical model of the subject models the arms and torso of the subject as planes. 5. The method of claim 1, wherein the statistical model of the subject models the head and hands of the subject as Gaussian components. 6. The method of claim 1, further comprising removing the background from the stereo images. 7. The method of claim 6, wherein removing the background from the stereo images comprises eliminating any portion of the stereo images that is more than a given distance away from a location. 8. The method of claim 1, wherein the stereo images are captured using a stereo camera. 9. The method of claim 1, wherein obtaining depth disparities comprises generating a depth disparity map. 10. The method of claim 1, wherein interpreting the dynamic gesture comprises comparing the dynamic gesture to a three-dimensional model of a gesture. 11. The method of claim 10, wherein comparing the dynamic gesture to a three-dimensional model of a gesture includes the use of hidden Markov models of three-dimensional gestures. 12. A gesture recognition system comprising: an imaging device to capture a sequence of three-dimensional images of a least a portion of a subject and a background, the subject performing a dynamic gesture; a processor to perform operations comprising: processing a set of depth disparities relating to the stereo images; automatically initializing parameters of a statistical model of the subject based upon matching an image of the subject to the statistical model; tracking the subject using the statistical model of the subject; extracting three-dimensional features from the subject; and interpreting the dynamic gesture performed by the subject. 13. The gesture recognition system of claim 12, wherein the imaging device is a stereo video camera. 14. The gesture recognition system of claim 12, wherein the processor further performs operations comprising removing the background from the sequence of stereo images. 15. The gesture recognition system of claim 14, wherein removing the background from the sequence of stereo images comprises eliminating any portion of the images that is farther away from the imaging device than a given distance. 16. The gesture recognition system of claim 12, wherein the processor further performs operations comprising segmenting an image of the subject into subparts. 17. The gesture recognition system of claim 16, wherein the subparts represent at least the torso, head, arms, and hands of the subject. 18. The gesture recognition system of claim 12, wherein the statistical model of the subject models the arms and torso of the subject as planes. 19. The gesture recognition system of claim 12, wherein the statistical model of the subject models the head and hands of the subject as Gaussian components. 20. The gesture recognition system of claim 12, wherein interpreting the dynamic gesture performed by the subject comprises comparing the dynamic gesture to a three-dimensional model of a gesture. 21. The gesture recognition system of claim 20, wherein comparing the dynamic gesture to a three-dimensional model of a gesture includes the use of hidden Markov models of three-dimensional gestures. 22. A machine-readable medium having stored thereon data representing sequences of instruction that, when executed by a machine, cause the machine to perform operations comprising: capturing a sequence of stereo images, the stereo images including at least a portion of a subject performing a dynamic gesture; obtaining depth disparities relating to the stereo images; automatically initializing parameters of a statistical model of the subject based upon matching an image of the subject to the statistical model; tracking the subject using the statistical model of the subject; extracting three-dimensional features from the stereo images; and interpreting the dynamic gesture performed by the subject. 23. The medium of claim 22, further comprising sequences of instruction that, when executed by a machine, cause the machine to perform operations comprising segmenting an image of the subject into subparts. 24. The medium of claim 23, wherein the subparts represent at least the torso, head, arms, and hands of the subject. 25. The medium of claim 22, wherein the statistical model of the subject models the arms and torso of the subject as planes. 26. The medium of claim 22, wherein the statistical model of the subject models the head and hands of the subject as Gaussian components. 27. The medium of claim 22, further comprising sequences of instruction that, when executed by a machine, cause the machine to perform operations comprising removing the background from the stereo images. 28. The medium of claim 27, wherein removing the background from the stereo images comprises eliminating any portion of the stereo images that is more than a given distance away from a location. 29. The medium of claim 22, wherein the stereo images are captured using a stereo camera. 30. The medium of claim 22, wherein obtaining depth disparities comprises generating a depth disparity map. 31. The medium of claim 22, wherein interpreting the dynamic gesture comprises comparing the dynamic gesture to a three-dimensional model of a gesture. 32. The medium of claim 31, wherein comparing the dynamic gesture to a three-dimensional model of a gesture includes the use of hidden Markov models of three-dimensional gestures.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.