IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0687790
(2010-01-14)
|
등록번호 |
US-8280158
(2012-10-02)
|
발명자
/ 주소 |
- Adcock, John
- Cooper, Matthew
- Denoue, Laurent
- Pirsiavash, Hamed
|
출원인 / 주소 |
|
대리인 / 주소 |
Morgan, Lewis & Bockius LLP
|
인용정보 |
피인용 횟수 :
7 인용 특허 :
6 |
초록
▼
A system and method for identifying key frames of a presentation video that include stationary informational content. A sequence of frames is obtained from a presentation video and differences of pixel values between consecutive frames of the sequence of frames are computed. Sets of consecutive fram
A system and method for identifying key frames of a presentation video that include stationary informational content. A sequence of frames is obtained from a presentation video and differences of pixel values between consecutive frames of the sequence of frames are computed. Sets of consecutive frames that are stationary are identified, wherein consecutive frames that are stationary have a proportion of changed pixel values below a first predetermined threshold, and wherein pixel values are deemed to be changed when the difference between the pixel values for corresponding pixels in consecutive frames exceeds a second predetermined threshold. Next, a set of key frames that include stationary informational content is retained. The set of key frames that include stationary informational content is then displayed for user interaction.
대표청구항
▼
1. A computer-implemented method for identifying key frames of a presentation video that include stationary informational content, comprising: at a computer system including one or more processors and memory storing one or more programs, the one or more processors executing the one or more programs
1. A computer-implemented method for identifying key frames of a presentation video that include stationary informational content, comprising: at a computer system including one or more processors and memory storing one or more programs, the one or more processors executing the one or more programs to perform the operations of: obtaining a sequence of frames from a presentation video;computing differences of pixel values between consecutive frames of the sequence of frames;identifying sets of consecutive frames that are stationary, wherein consecutive frames that are stationary have a proportion of changed pixel values below a first predetermined threshold, and wherein pixel values are deemed to be changed when the difference between the pixel values for corresponding pixels in consecutive frames exceeds a second predetermined threshold;extracting key frames from the sets of consecutive frames;retaining a set of key frames that include stationary informational content; anddisplaying the set of key frames that include stationary informational content for user interaction. 2. The method of claim 1, wherein the presentation video is an archived presentation video. 3. The method of claim 2, wherein the archived presentation video is located on a second computer system that is separate and distinct from the computer system. 4. The method of claim 1, wherein only connected regions of changed pixel larger than a predetermined size are considered when determining if consecutive frames are stationary. 5. The method of claim 1, wherein the presentation video is a real-time presentation video. 6. The method of claim 1, wherein the stationary informational content includes one or more of: text;symbols;line drawings; andpictures. 7. The method of claim 6, wherein the stationary informational content is included in one or more presentation slides. 8. The method of claim 1, wherein obtaining a sequence of frames from the presentation video includes: retrieving the presentation video; andobtaining frames of the presentation video at a predetermined time interval to produce the sequence of frames. 9. The method of claim 1, wherein a respective set of consecutive frames includes a predetermined number of consecutive frames having a proportion of changed pixel values below a first predetermined threshold, and wherein pixel values are deemed to be changed when the difference between the pixel values for corresponding pixels in consecutive frames exceeds a second predetermined threshold. 10. The method of claim 1, wherein extracting a respective key frame from a respective set of consecutive frames includes selecting a predetermined frame from the respective set of consecutive frames. 11. The method of claim 1, wherein retaining the set of key frames that include stationary informational content includes: using a visual appearance model to identify key frames in the set of key frames that include stationary informational content;removing key frames that do not include stationary informational content; andretaining key frames that include stationary informational content. 12. The method of claim 11, wherein at least one frame in the sequence of frames includes a face of a person without stationary informational content. 13. The method of claim 12, wherein prior to using the visual appearance model to identify key frames in the set of key frames that include stationary informational content, the method further comprises generating the visual appearance model by: identifying a first set of frames in the sequence of frames that do not include stationary informational content;identifying a second set of frames in the sequence of frames that include stationary informational content; andtraining a support vector machine to identify frames that include stationary informational content using the first set of frames and the second set of frames. 14. The method of claim 13, wherein identifying the first set of frames in the sequence of frames that do not include stationary informational content includes: selecting a third set of frames in the sequence of frames that includes consecutive frames that are not stationary, wherein consecutive frames that are not stationary have differences of pixel values between consecutive frames that are above the predetermined threshold;identifying a fourth set of frames in the third set of frames that include faces using a face detection technique;identifying a fifth set of frames in the third set of frames that do not include stationary informational content using an informational content detection technique;identifying the first set of frames as frames that are included in both the fourth set of frames and the fifth set of frames. 15. The method of claim 14, wherein the informational content detection technique is an optical character recognition technique that detects text. 16. The method of claim 13, wherein identifying the second set of frames in the sequence of frames that include stationary informational content includes: selecting a sixth set of frames in the sequence of frames that includes consecutive frames that are stationary;identifying a seventh set of frames in the sequence of frames that include stationary informational content using an informational content detection technique;identifying the second set of frames as the frames that are included in both the sixth set of frames and the seventh set of frames. 17. The method of claim 13, wherein training the support vector machine to identify frames that include stationary informational content using the first set of frames and the second set of frames includes: computing color histograms for the first set of frames and the second set of frames; andtraining the support vector machine to identify frames that include stationary informational content using the color histograms for the first set of frames and the second set of frames. 18. The method of claim 13, wherein using the visual appearance model to identify key frames in the set of key frames that include stationary informational content includes using the support vector machine to classify key frames as either key frames that include stationary informational content or key frames that include a face of a person without stationary informational content. 19. The method of claim 12, wherein prior to using the visual appearance model to identify key frames in the set of key frames that include stationary informational content, the method further comprises generating the visual appearance model by: identifying frames in the first set of frames that include faces using a face detection technique;determining color histograms for the frames in the first set of frames that include faces; andgenerating a template histogram based on the color histograms for the frames in the first set of frames that include faces. 20. The method of claim 19, wherein using the visual appearance model to identify key frames in the set of key frames that include stationary informational content includes comparing the template histogram to color histograms of the key frames in the set of key frames to identify key frames in the set of key frames that include stationary informational content or key frames that include a face of a person without stationary informational content. 21. The method of claim 11, wherein at least one key frame in the set of key frames includes both a localized face of a person and stationary informational content, and wherein using the visual appearance model to identify key frames in the set of key frames that include stationary informational content includes: using the visual appearance model to identify key frames that include a localized face of a person;using an information detection technique to identify a subset of the key frames that include a localized face of a person that also includes stationary informational content; andidentifying key frames in the subset of the key frames that include both a localized face of a person and stationary informational content. 22. The method of claim 1, wherein at least one key frame in the set of key frames includes both a room in which the presentation video is being filmed and stationary informational content, and wherein retaining the set of key frames that include stationary informational content includes: using a room model to identify key frames in the set of key frames that include both the room in which the presentation video is being filmed and stationary informational content;applying a perspective distortion correction factor to key frames that include both the room in which the presentation video is being filmed and stationary informational content;cropping the distortion-corrected key frames so that only the stationary information content remains; andretaining the cropped distortion-corrected key frames. 23. The method of claim 22, wherein prior to using the room model to identify key frames in the set of key frames that include both the room in which the presentation video is being filmed and stationary informational content, the method further comprises generating the room model by: receiving a user-selected key frame that includes both the room in which the presentation video is being filmed and stationary informational content;receiving a user-identified bounding area of the user-selected key frame, wherein the user-identified bounding area indicates an area of the user-selected key frame that includes stationary informational content;generating the color histogram for the area of the user-selected key frame that is outside of the user-identified bounding area; andcalculating a perspective distortion correction factor. 24. The method of claim 23, wherein using the room model to identify key frames in the set of key frames that include both the room in which the presentation video is being filmed and stationary informational content includes using the color histogram to identify key frames in the set of key frames that include both a room in which the presentation video is being filmed and stationary informational content. 25. The method of claim 1, wherein at least one sequence of consecutive frames comprise informational content that is built up over a number of frames, and wherein identifying a respective set of consecutive frames that are stationary includes: identifying a current frame and a prior frame in which the differences of the pixel values between the current frame and the prior frame are greater than the predetermined threshold;identifying bounding boxes of regions of the current frame in which the differences of the pixel values between the current frame and the prior frame are greater than the predetermined threshold;determining whether the bounding boxes are in previously blank regions of the prior frame using an edge detection technique;if the bounding boxes are in previously blank regions of the prior frame, repeating the identifying, testing, and determining operations until the differences of the pixel values between the current frame and the prior frame exceeds the predefined threshold in regions of the prior frame that were not blank;identifying the respective set of consecutive frames that are stationary as including the sequence of consecutive frames up to and including the prior frame. 26. The method of claim 25, further comprising providing links into the presentation video at time points corresponding to frames of the respective set of consecutive frames in which the bounding boxes of regions of the frames have differences of the pixel values between consecutive frames that are greater than the predetermined threshold and that have been added to previously blank regions of prior frames. 27. The method of claim 25, wherein the at least one sequence of consecutive frames comprise informational content that is built up over a number of frames is a presentation slide that includes elements that are built up over a period of time. 28. The method of claim 25, wherein the at least one sequence of consecutive frames comprise informational content that is built up over a number of frames is a handwritten presentation that includes information content that is built up over a period of time. 29. The method of claim 1, further comprising: using an optical character recognition technique to extract text from the set of key frames; andindexing the extracted text. 30. The method of claim 29, further comprising providing a search interface for user interaction, wherein the search interface allows users to perform searches based on keywords to identify presentation videos including the keywords. 31. The method of claim 1, further comprising providing links into the presentation video at time points corresponding to respective key frames in the displayed set of key frames. 32. A system for identifying key frames of a presentation video that include stationary informational content, comprising: one or more processors;memory; andone or more programs stored in the memory, the one or more programs comprising instructions to: obtain a sequence of frames from a presentation video;compute differences of pixel values between consecutive frames of the sequence of frames;identify sets of consecutive frames that are stationary, wherein consecutive frames that are stationary have a proportion of changed pixel values below a first predetermined threshold, and wherein pixel values are deemed to be changed when the difference between the pixel values for corresponding pixels in consecutive frames exceeds a second predetermined threshold;extract key frames from the sets of consecutive frames;retain a set of key frames that include stationary informational content; anddisplay the set of key frames that include stationary informational content for user interaction. 33. A computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions to: at a computer system including one or more processors and memory storing one or more programs, obtain a sequence of frames from a presentation video;compute differences of pixel values between consecutive frames of the sequence of frames;identify sets of consecutive frames that are stationary, wherein consecutive frames that are stationary have a proportion of changed pixel values below a first predetermined threshold, and wherein pixel values are deemed to be changed when the difference between the pixel values for corresponding pixels in consecutive frames exceeds a second predetermined threshold;extract key frames from the sets of consecutive frames;retain a set of key frames that include stationary informational content; anddisplay the set of key frames that include stationary informational content for user interaction.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.