[특허]Action recognition and detection on videos

[미국특허] Action recognition and detection on videos 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06K-009/00 G06K-009/62 G10L-015/14
출원번호	US-0100595 (2013-12-09)
등록번호	US-9230159 (2016-01-05)
발명자 / 주소	Vijayanarasimhan, Sudheendra Varadarajan, Balakrishnan Sukthankar, Rahul
출원인 / 주소	Google Inc.
대리인 / 주소	Bryne Poh LLP
인용정보	피인용 횟수 : 3 인용 특허 : 6

초록 ▼

This disclosure generally relates to systems and methods that facilitate employing exemplar Histogram of Oriented Gradients Linear Discriminant Analysis (HOG-LDA) models along with Localizer Hidden Markov Models (HMM) to train a classification model to classify actions in videos by learning poses and transitions between the poses associated with the actions in a view of a continuous state represented by bounding boxes corresponding to where the action is located in frames of the video.

대표청구항 ▼

1. A method, comprising: accessing, by a system including a processor, a set of training videos respectively classified for an action of a plurality of actions;learning, by the system, a plurality of exemplar Histogram of Oriented Gradients Linear Discriminant Analysis (HOG-LDA) templates using a HOG-LDA model on the a set of training videos;estimating, by the system, respective sets of candidate bounding boxes for each frame of the set of training videos using the learned exemplar HOG-LDA templates, wherein each candidate bounding box has an associated template matching score;inferring, by the system, respective discrete hidden states and a respective bounding box for each discrete hidden state for a plurality of localizer Hidden Markov Models (HMM) using the estimated sets of candidate bounding boxes, where each localizer HMM is associated with an action of the plurality of actions; anddetermining, by the system, a respective set of parameters for each inferred discrete hidden state of the plurality of localizer HMMs using an objective function. 2. The method of claim 1, wherein respective videos of the set of training videos comprise a plurality of frames respectively comprising a bounding box around area of the frame corresponding to the action to which the video is classified. 3. The method of claim 2, wherein the objective function considers an overlap function comparing inferred bounding boxes for each frame of the set of training videos with corresponding bounding boxes included with each frame of the set of training videos. 4. The method of claim 1, wherein the set of parameters comprises: a width penalty weight for an inferred bounding box for the discrete hidden state to deviate from an estimated mean width of bounding boxes associated with the discrete hidden state; anda height penalty weight for the inferred bounding box for the discrete hidden state to deviate from an estimated mean height of bounding boxes associated with the discrete hidden state;a temporal consistency cost for the inferred bounding box for the discrete hidden state to deviate from an previous inferred bounding box for a previous discrete hidden state immediately preceding the discrete hidden state in a sequence of discrete hidden states associated with a localizer HMM;a template consistency cost for the inferred bounding box for the discrete hidden state to deviate from a candidate bounding box for the discrete hidden state;a scale for calibrating template matching scores associated with the discrete hidden state across the set of actions; andan offset for calibrating template matching scores associated with the discrete hidden state across the set of actions. 5. The method of claim 1, wherein the inferring the respective discrete hidden states and a respective bounding box for each discrete hidden state for the plurality of localizer Hidden Markov Models (HMM) comprises employing a Viterbi inference algorithm on the estimated sets of candidate bounding boxes. 6. The method of claim 1, wherein the determining the respective set of parameters for each inferred discrete hidden state of the plurality of localizer HMMs comprises employing a gradient based approach on the objective function. 7. The method of claim 6, wherein the employing a gradient based approach on the objective function comprises generating a random direction. 8. The method of claim 1, wherein the objective function considers a soft-max of the cost of associated with a localizer HMM associated with an action against costs associated with other localizer HMMs associated with other actions. 9. A non-transitory computer-readable medium having instructions stored thereon that, in response to execution, cause a system including a processor to perform operations comprising: accessing a set of training videos respectively classified for an action of a plurality of actions;learning a plurality of exemplar Histogram of Oriented Gradients Linear Discriminant Analysis (HOG-LDA) templates using a HOG-LDA model on the a set of training videos;estimating respective sets of candidate bounding boxes for each frame of the set of training videos using the learned exemplar HOG-LDA templates, wherein each candidate bounding box has an associated template matching score;inferring respective discrete hidden states and a respective bounding box for each discrete hidden state for a plurality of localizer Hidden Markov Models (HMM) using the estimated sets of candidate bounding boxes, where each localizer HMM is associated with an action of the plurality of actions; anddetermining a respective set of parameters for each inferred discrete hidden state of the plurality of localizer HMMs using an objective function. 10. The non-transitory computer-readable medium of claim 9, wherein respective videos of the set of training videos comprise a plurality of frames respectively comprising a bounding box around area of the frame corresponding to the action to which the video is classified. 11. The non-transitory computer-readable medium of claim 10, wherein the objective function considers an overlap function comparing inferred bounding boxes for each frame of the set of training videos with corresponding bounding boxes included with each frame of the set of training videos. 12. The non-transitory computer-readable medium of claim 9, wherein the set of parameters comprises: a width penalty weight for an inferred bounding box for the discrete hidden state to deviate from an estimated mean width of bounding boxes associated with the discrete hidden state; anda height penalty weight for the inferred bounding box for the discrete hidden state to deviate from an estimated mean height of bounding boxes associated with the discrete hidden state;a temporal consistency cost for the inferred bounding box for the discrete hidden state to deviate from an previous inferred bounding box for a previous discrete hidden state immediately preceding the discrete hidden state in a sequence of discrete hidden states associated with a localizer HMM;a template consistency cost for the inferred bounding box for the discrete hidden state to deviate from a candidate bounding box for the discrete hidden state;a scale for calibrating template matching scores associated with the discrete hidden state across the set of actions; andan offset for calibrating template matching scores associated with the discrete hidden state across the set of actions. 13. The non-transitory computer-readable medium of claim 9, wherein the inferring the respective discrete hidden states and a respective bounding box for each discrete hidden state for the plurality of localizer Hidden Markov Models (HMM) comprises employing a Viterbi inference algorithm on the estimated sets of candidate bounding boxes. 14. The non-transitory computer-readable medium of claim 9, wherein the determining the respective set of parameters for each inferred discrete hidden state of the plurality of localizer HMMs comprises employing a gradient based approach on the objective function. 15. The non-transitory computer-readable medium of claim 14, wherein the employing a gradient based approach on the objective function comprises generating a random direction. 16. The non-transitory computer-readable medium of claim 9, wherein the objective function considers a soft-max of the cost of associated with a localizer HMM associated with an action against costs associated with other localizer HMMs associated with other actions. 17. A system comprising: a processor; anda memory communicatively coupled to the processor, the memory having stored therein computer-executable instructions, comprising: a HOG-LDA training component configured to: access a set of training videos respectively classified for an action of a plurality of actions; andlearn a plurality of exemplar Histogram of Oriented Gradients Linear Discriminant Analysis (HOG-LDA) templates using a HOG-LDA model on the a set of training videos;a HOG-LDA scoring component configured to estimate respective sets of candidate bounding boxes for each frame of the set of training videos using the learned exemplar HOG-LDA templates, wherein each candidate bounding box has an associated template matching score;a Viterbi inference component configured to infer respective discrete hidden states and a respective bounding box for each discrete hidden state for a plurality of localizer Hidden Markov Models (HMM) using the estimated sets of candidate bounding boxes, where each localizer HMM is associated with an action of the plurality of actions; andan objective maximization component configured to determine a respective set of parameters for each inferred discrete hidden state of the plurality of localizer HMMs using an objective function. 18. The system of claim 17, wherein respective videos of the set of training videos comprise a plurality of frames respectively comprising a bounding box around area of the frame corresponding to the action to which the video is classified. 19. The system of claim 18, wherein the objective function considers an overlap function configured to compare inferred bounding boxes for each frame of the set of training videos with corresponding bounding boxes included with each frame of the set of training videos. 20. The system of claim 17, wherein the set of parameters comprises: a width penalty weight for an inferred bounding box for the discrete hidden state to deviate from an estimated mean width of bounding boxes associated with the discrete hidden state; anda height penalty weight for the inferred bounding box for the discrete hidden state to deviate from an estimated mean height of bounding boxes associated with the discrete hidden state;a temporal consistency cost for the inferred bounding box for the discrete hidden state to deviate from an previous inferred bounding box for a previous discrete hidden state immediately preceding the discrete hidden state in a sequence of discrete hidden states associated with a localizer HMM;a template consistency cost for the inferred bounding box for the discrete hidden state to deviate from a candidate bounding box for the discrete hidden state;a scale for calibrating template matching scores associated with the discrete hidden state across the set of actions; andan offset for calibrating template matching scores associated with the discrete hidden state across the set of actions. 21. The system of claim 17, wherein the Viterbi inference component is further configured to employ a Viterbi inference algorithm on the estimated sets of candidate bounding boxes to infer the respective discrete hidden states and the respective bounding box for each discrete hidden state for the plurality of localizer Hidden Markov Models (HMM). 22. The system of claim 17, wherein the objective maximization component is further configured to employ a gradient based approach on the objective function to determine the respective set of parameters for each inferred discrete hidden state of the plurality of localizer HMMs comprises. 23. The system of claim 22, wherein the gradient based approach uses a random direction. 24. The system of claim 17, wherein the objective function considers a soft-max of the cost of associated with a localizer HMM associated with an action against costs associated with other localizer HMMs associated with other actions.

이 특허에 인용된 특허 (6)

Liu, Zicheng; Wang, Jiang, Action recognition based on depth maps.
상세보기
Goronzy-Thomae, Silke; Kemp, Thomas; Kompe, Ralf; Lam, Yin Hay; Marasek, Krzysztof; Tato, Raquel, Apparatus and method for automatic extraction of important events in audio signals.
상세보기
Ding, Yuanyuan; Xiao, Jing, Contextual boost for object detection.
상세보기
Vaddadi, Sundeep; Hong, John H.; Hamsici, Onur C.; Reznik, Yuriy; Lee, Chong U., Improving performance of image recognition algorithms by pruning features, image scaling, and spatially constrained feature matching.
상세보기
Othmezouri, Gabriel; Sakata, Ichiro; Schiele, Bernt; Andriluka, Mykhaylo; Roth, Stefan, Monocular 3D pose estimation and tracking by detection.
상세보기
Liang, Yiqing; Crnic, Linda; Kobla, Vikrant; Wolf, Wayne, System and method for object identification and behavior characterization using video analysis.
상세보기

이 특허를 인용한 특허 (3)

Xu, Ting, Self-attention deep neural network for action recognition in surveillance videos.
상세보기
Bernal, Edgar A.; Li, Qun; Zhang, Yun; Kumar, Jayant; Bala, Raja, System and method for relevance estimation in summarization of videos of multi-step activities.
상세보기
De Souza, César Roberto; Gaidon, Adrien; Vig, Eleonora; Lopez, Antonio M., System and method for video classification using a hybrid unsupervised and supervised multi-layer architecture.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

[미국특허] Action recognition and detection on videos 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (6)

이 특허를 인용한 특허 (3)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

이 특허와 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

[미국특허] Action recognition and detection on videos 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (6)

이 특허를 인용한 특허 (3)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

이 특허와 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트