[미국특허]
Action recognition and detection on videos
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06K-009/00
G06K-009/62
G10L-015/14
출원번호
US-0100595
(2013-12-09)
등록번호
US-9230159
(2016-01-05)
발명자
/ 주소
Vijayanarasimhan, Sudheendra
Varadarajan, Balakrishnan
Sukthankar, Rahul
출원인 / 주소
Google Inc.
대리인 / 주소
Bryne Poh LLP
인용정보
피인용 횟수 :
3인용 특허 :
6
초록▼
This disclosure generally relates to systems and methods that facilitate employing exemplar Histogram of Oriented Gradients Linear Discriminant Analysis (HOG-LDA) models along with Localizer Hidden Markov Models (HMM) to train a classification model to classify actions in videos by learning poses an
This disclosure generally relates to systems and methods that facilitate employing exemplar Histogram of Oriented Gradients Linear Discriminant Analysis (HOG-LDA) models along with Localizer Hidden Markov Models (HMM) to train a classification model to classify actions in videos by learning poses and transitions between the poses associated with the actions in a view of a continuous state represented by bounding boxes corresponding to where the action is located in frames of the video.
대표청구항▼
1. A method, comprising: accessing, by a system including a processor, a set of training videos respectively classified for an action of a plurality of actions;learning, by the system, a plurality of exemplar Histogram of Oriented Gradients Linear Discriminant Analysis (HOG-LDA) templates using a HO
1. A method, comprising: accessing, by a system including a processor, a set of training videos respectively classified for an action of a plurality of actions;learning, by the system, a plurality of exemplar Histogram of Oriented Gradients Linear Discriminant Analysis (HOG-LDA) templates using a HOG-LDA model on the a set of training videos;estimating, by the system, respective sets of candidate bounding boxes for each frame of the set of training videos using the learned exemplar HOG-LDA templates, wherein each candidate bounding box has an associated template matching score;inferring, by the system, respective discrete hidden states and a respective bounding box for each discrete hidden state for a plurality of localizer Hidden Markov Models (HMM) using the estimated sets of candidate bounding boxes, where each localizer HMM is associated with an action of the plurality of actions; anddetermining, by the system, a respective set of parameters for each inferred discrete hidden state of the plurality of localizer HMMs using an objective function. 2. The method of claim 1, wherein respective videos of the set of training videos comprise a plurality of frames respectively comprising a bounding box around area of the frame corresponding to the action to which the video is classified. 3. The method of claim 2, wherein the objective function considers an overlap function comparing inferred bounding boxes for each frame of the set of training videos with corresponding bounding boxes included with each frame of the set of training videos. 4. The method of claim 1, wherein the set of parameters comprises: a width penalty weight for an inferred bounding box for the discrete hidden state to deviate from an estimated mean width of bounding boxes associated with the discrete hidden state; anda height penalty weight for the inferred bounding box for the discrete hidden state to deviate from an estimated mean height of bounding boxes associated with the discrete hidden state;a temporal consistency cost for the inferred bounding box for the discrete hidden state to deviate from an previous inferred bounding box for a previous discrete hidden state immediately preceding the discrete hidden state in a sequence of discrete hidden states associated with a localizer HMM;a template consistency cost for the inferred bounding box for the discrete hidden state to deviate from a candidate bounding box for the discrete hidden state;a scale for calibrating template matching scores associated with the discrete hidden state across the set of actions; andan offset for calibrating template matching scores associated with the discrete hidden state across the set of actions. 5. The method of claim 1, wherein the inferring the respective discrete hidden states and a respective bounding box for each discrete hidden state for the plurality of localizer Hidden Markov Models (HMM) comprises employing a Viterbi inference algorithm on the estimated sets of candidate bounding boxes. 6. The method of claim 1, wherein the determining the respective set of parameters for each inferred discrete hidden state of the plurality of localizer HMMs comprises employing a gradient based approach on the objective function. 7. The method of claim 6, wherein the employing a gradient based approach on the objective function comprises generating a random direction. 8. The method of claim 1, wherein the objective function considers a soft-max of the cost of associated with a localizer HMM associated with an action against costs associated with other localizer HMMs associated with other actions. 9. A non-transitory computer-readable medium having instructions stored thereon that, in response to execution, cause a system including a processor to perform operations comprising: accessing a set of training videos respectively classified for an action of a plurality of actions;learning a plurality of exemplar Histogram of Oriented Gradients Linear Discriminant Analysis (HOG-LDA) templates using a HOG-LDA model on the a set of training videos;estimating respective sets of candidate bounding boxes for each frame of the set of training videos using the learned exemplar HOG-LDA templates, wherein each candidate bounding box has an associated template matching score;inferring respective discrete hidden states and a respective bounding box for each discrete hidden state for a plurality of localizer Hidden Markov Models (HMM) using the estimated sets of candidate bounding boxes, where each localizer HMM is associated with an action of the plurality of actions; anddetermining a respective set of parameters for each inferred discrete hidden state of the plurality of localizer HMMs using an objective function. 10. The non-transitory computer-readable medium of claim 9, wherein respective videos of the set of training videos comprise a plurality of frames respectively comprising a bounding box around area of the frame corresponding to the action to which the video is classified. 11. The non-transitory computer-readable medium of claim 10, wherein the objective function considers an overlap function comparing inferred bounding boxes for each frame of the set of training videos with corresponding bounding boxes included with each frame of the set of training videos. 12. The non-transitory computer-readable medium of claim 9, wherein the set of parameters comprises: a width penalty weight for an inferred bounding box for the discrete hidden state to deviate from an estimated mean width of bounding boxes associated with the discrete hidden state; anda height penalty weight for the inferred bounding box for the discrete hidden state to deviate from an estimated mean height of bounding boxes associated with the discrete hidden state;a temporal consistency cost for the inferred bounding box for the discrete hidden state to deviate from an previous inferred bounding box for a previous discrete hidden state immediately preceding the discrete hidden state in a sequence of discrete hidden states associated with a localizer HMM;a template consistency cost for the inferred bounding box for the discrete hidden state to deviate from a candidate bounding box for the discrete hidden state;a scale for calibrating template matching scores associated with the discrete hidden state across the set of actions; andan offset for calibrating template matching scores associated with the discrete hidden state across the set of actions. 13. The non-transitory computer-readable medium of claim 9, wherein the inferring the respective discrete hidden states and a respective bounding box for each discrete hidden state for the plurality of localizer Hidden Markov Models (HMM) comprises employing a Viterbi inference algorithm on the estimated sets of candidate bounding boxes. 14. The non-transitory computer-readable medium of claim 9, wherein the determining the respective set of parameters for each inferred discrete hidden state of the plurality of localizer HMMs comprises employing a gradient based approach on the objective function. 15. The non-transitory computer-readable medium of claim 14, wherein the employing a gradient based approach on the objective function comprises generating a random direction. 16. The non-transitory computer-readable medium of claim 9, wherein the objective function considers a soft-max of the cost of associated with a localizer HMM associated with an action against costs associated with other localizer HMMs associated with other actions. 17. A system comprising: a processor; anda memory communicatively coupled to the processor, the memory having stored therein computer-executable instructions, comprising: a HOG-LDA training component configured to: access a set of training videos respectively classified for an action of a plurality of actions; andlearn a plurality of exemplar Histogram of Oriented Gradients Linear Discriminant Analysis (HOG-LDA) templates using a HOG-LDA model on the a set of training videos;a HOG-LDA scoring component configured to estimate respective sets of candidate bounding boxes for each frame of the set of training videos using the learned exemplar HOG-LDA templates, wherein each candidate bounding box has an associated template matching score;a Viterbi inference component configured to infer respective discrete hidden states and a respective bounding box for each discrete hidden state for a plurality of localizer Hidden Markov Models (HMM) using the estimated sets of candidate bounding boxes, where each localizer HMM is associated with an action of the plurality of actions; andan objective maximization component configured to determine a respective set of parameters for each inferred discrete hidden state of the plurality of localizer HMMs using an objective function. 18. The system of claim 17, wherein respective videos of the set of training videos comprise a plurality of frames respectively comprising a bounding box around area of the frame corresponding to the action to which the video is classified. 19. The system of claim 18, wherein the objective function considers an overlap function configured to compare inferred bounding boxes for each frame of the set of training videos with corresponding bounding boxes included with each frame of the set of training videos. 20. The system of claim 17, wherein the set of parameters comprises: a width penalty weight for an inferred bounding box for the discrete hidden state to deviate from an estimated mean width of bounding boxes associated with the discrete hidden state; anda height penalty weight for the inferred bounding box for the discrete hidden state to deviate from an estimated mean height of bounding boxes associated with the discrete hidden state;a temporal consistency cost for the inferred bounding box for the discrete hidden state to deviate from an previous inferred bounding box for a previous discrete hidden state immediately preceding the discrete hidden state in a sequence of discrete hidden states associated with a localizer HMM;a template consistency cost for the inferred bounding box for the discrete hidden state to deviate from a candidate bounding box for the discrete hidden state;a scale for calibrating template matching scores associated with the discrete hidden state across the set of actions; andan offset for calibrating template matching scores associated with the discrete hidden state across the set of actions. 21. The system of claim 17, wherein the Viterbi inference component is further configured to employ a Viterbi inference algorithm on the estimated sets of candidate bounding boxes to infer the respective discrete hidden states and the respective bounding box for each discrete hidden state for the plurality of localizer Hidden Markov Models (HMM). 22. The system of claim 17, wherein the objective maximization component is further configured to employ a gradient based approach on the objective function to determine the respective set of parameters for each inferred discrete hidden state of the plurality of localizer HMMs comprises. 23. The system of claim 22, wherein the gradient based approach uses a random direction. 24. The system of claim 17, wherein the objective function considers a soft-max of the cost of associated with a localizer HMM associated with an action against costs associated with other localizer HMMs associated with other actions.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (6)
Liu, Zicheng; Wang, Jiang, Action recognition based on depth maps.
Goronzy-Thomae, Silke; Kemp, Thomas; Kompe, Ralf; Lam, Yin Hay; Marasek, Krzysztof; Tato, Raquel, Apparatus and method for automatic extraction of important events in audio signals.
Liang, Yiqing; Crnic, Linda; Kobla, Vikrant; Wolf, Wayne, System and method for object identification and behavior characterization using video analysis.
Bernal, Edgar A.; Li, Qun; Zhang, Yun; Kumar, Jayant; Bala, Raja, System and method for relevance estimation in summarization of videos of multi-step activities.
De Souza, César Roberto; Gaidon, Adrien; Vig, Eleonora; Lopez, Antonio M., System and method for video classification using a hybrid unsupervised and supervised multi-layer architecture.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.