Methods and systems for real-time user extraction using deep learning networks
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06K-009/00
G06K-009/46
G06K-009/62
G06N-003/04
G06N-003/08
출원번호
US-0333623
(2016-10-25)
등록번호
US-9881207
(2018-01-30)
발명자
/ 주소
Nguyen, Quang
Nguyen, Cong
Dang, Long
Dang, Gia
Venshtain, Simion
출원인 / 주소
Personify, Inc.
대리인 / 주소
Invention Mine LLC
인용정보
피인용 횟수 :
1인용 특허 :
117
초록▼
Methods and systems for real-time user extraction using deep learning networks. In one embodiment, user extraction comprises obtaining a given frame of color pixel data, checking whether a reset flag is cleared or set, and generating a trimap for the given frame. If the reset flag is cleared, genera
Methods and systems for real-time user extraction using deep learning networks. In one embodiment, user extraction comprises obtaining a given frame of color pixel data, checking whether a reset flag is cleared or set, and generating a trimap for the given frame. If the reset flag is cleared, generating the trimap comprises: obtaining a user-extraction contour based on a preceding frame; and generating the trimap based on the obtained user-extraction contour. If the reset flag is set, generating the trimap comprises: detecting at least one persona feature in the given frame; generating an alpha mask by aligning an intermediate contour with the detected persona feature(s), wherein the intermediate contour is based on a color-based flood-fill operation performed on a previous frame which was segmented by a machine-learning-segmentation process; and generating the trimap based on the generated alpha mask. The generated trimap is output for extracting a user persona.
대표청구항▼
1. A method comprising: obtaining a first frame of color pixel data;checking whether a reset flag is cleared or set at a first time;generating a trimap for the first frame, wherein: if the reset flag is cleared at the first time, then generating the trimap for the first frame comprises: obtaining a
1. A method comprising: obtaining a first frame of color pixel data;checking whether a reset flag is cleared or set at a first time;generating a trimap for the first frame, wherein: if the reset flag is cleared at the first time, then generating the trimap for the first frame comprises: obtaining a user-extraction contour that is based on an immediately preceding frame; andgenerating the trimap for the first frame based on the obtained user-extraction contour;if the reset flag is set at the first time, then generating the trimap for the first frame comprises: detecting at least one persona feature in the first frame;generating an alpha mask at least in part by aligning an intermediate persona contour with the detected at least one persona feature, wherein the intermediate persona contour is based on a result of a color-based flood-fill operation having been performed on a previous frame of color pixel data that had been segmented by a machine-learning-segmentation (MLS) process; andgenerating the trimap for the first frame based on the generated alpha mask; andoutputting the generated trimap for use in extracting a user persona from the first frame. 2. The method of claim 1, wherein if the reset flag is cleared at the first time, then generating the trimap for the first frame further comprises performing an erosion operation inward from the obtained user-extraction contour and a dilation operation outward from the obtained user-extraction contour. 3. The method of claim 1, wherein if the reset flag is set at the first time, generating the trimap for the first frame further comprises performing an erosion operation inward from the generated alpha mask and a dilation operation outward from the generated alpha mask. 4. The method of claim 1, wherein if the reset flag is set at the first time, aligning the intermediate persona contour with the detected at least one persona feature comprises at least one of scaling and translating the intermediate persona contour to fit the detected at least one persona feature. 5. The method of claim 1, further comprising: evaluating at least one reset condition;if at least a predefined subset of the at least one reset condition is true, responsively setting the reset flag for a following frame; andif each of the at least one reset condition is true, responsively communicating an MLS request to the MLS process. 6. The method of claim 1, wherein the at least one reset condition includes: (i) an elapsed time from a previous machine-learning segmentation request being greater than a time threshold;(ii) a difference between a post-core segmentation user-extraction contour for the first frame and the intermediate persona contour exceeding a difference threshold;(iii) a level of motion in the first frame exceeding a motion threshold; and(iv) at least one user being detected in the first frame. 7. The method of claim 6, wherein the predefined subset includes reset conditions (i) and (ii). 8. The method of claim 6, wherein the predefined subset consists of reset conditions (i) and (ii). 9. The method of claim 6, wherein the MLS request specifies the first frame as an input to the MLS process. 10. The method of claim 1, further comprising: obtaining a second frame of color pixel data;checking whether the reset flag is cleared or set at a second time;generating a second-frame trimap for the second frame, wherein: if the reset flag is cleared at the second time, then generating the second-frame trimap comprises: obtaining a post-core segmentation user-extraction contour that is based on the first frame; andgenerating the second-frame trimap based on the obtained post-core segmentation user-extraction contour;if the reset flag is set at the second time, then generating the second-frame trimap comprises: detecting at least one second-frame persona feature in the second frame;generating a second-frame alpha mask at least in part by aligning the intermediate persona contour with the detected at least one second-frame persona feature; andgenerating the second-frame trimap based on the generated second-frame alpha mask; andoutputting the generated second-frame trimap for use in extracting a user persona from the second frame. 11. The method of claim 10 wherein aligning the intermediate persona contour with the detected at least one second-frame persona feature comprises at least one of scaling and translating the intermediate persona contour to fit the detected at least one second-frame persona feature. 12. A method comprising: receiving a first segmented frame of color pixel data, the first segmented frame comprising an initial-segmentation persona contour that was identified by a machine-learning-segmentation (MLS) process;defining an eroded MLS persona contour and a dilated MLS persona contour, both based on the initial-segmentation persona contour;defining an intermediate persona contour at least in part by performing a bidirectional color-based flood-fill operation outward from the eroded MLS persona contour and inward from the dilated MLS persona contour;obtaining a second frame of color pixel data from an input source;detecting at least one second-frame persona feature in the second frame;generating a second-frame alpha mask at least in part by aligning the intermediate persona contour with the detected at least one second-frame persona feature;generating a second-frame trimap based on the generated second-frame alpha mask; andoutputting the generated second-frame trimap for use in extracting a user persona from the second frame. 13. The method of claim 12, wherein the eroded MLS persona contour is defined at least in part by performing an erosion operation inward from the initial-segmentation persona contour, and the dilated MLS persona contour is defined at least in part by performing a dilation operation outward from the initial-segmentation persona contour. 14. The method of claim 12, wherein generating the second-frame trimap comprises eroding and dilating the alpha mask to a lesser extent than is used when defining the eroded MLS persona contour and the dilated MLS persona contour for the first segmented frame. 15. The method of claim 12, wherein aligning the intermediate persona contour with the detected at least one second-frame persona feature comprises at least one of scaling and translating the intermediate persona contour to fit the detected at least one second-frame persona feature. 16. The method of claim 12, further comprising filtering the first segmented frame to exclude pixels whose confidence level, as defined by a confidence map generated by the MLS process, falls below a confidence threshold. 17. The method of claim 12, further comprising: evaluating at least one reset condition;if at least a predefined subset of the at least one reset condition is true, responsively setting a reset flag for a following frame; andif each of the at least one reset condition is true, responsively communicating an MLS request to the MLS process. 18. The method of claim 17, wherein the at least one reset condition includes: (i) an elapsed time from a previous MLS request being greater than a time threshold;(ii) a difference between a post-core segmentation user-extraction contour for the second frame and the intermediate persona contour exceeding a difference threshold;(iii) a level of motion in the second frame exceeding a motion threshold; and(iv) at least one user being detected in the second frame. 19. The method of claim 18, wherein the predefined subset includes reset conditions (i) and (ii). 20. The method of claim 18, wherein the predefined subset consists of reset conditions (i) and (ii). 21. The method of claim 12, further comprising: obtaining a third frame of color pixel data;checking whether a reset flag is cleared or set at a third time;generating a third-frame trimap, wherein: if the reset flag is cleared at the third time, then generating the third-frame trimap comprises: obtaining a post-core segmentation user-extraction contour that is based on the second frame; andgenerating the third-frame trimap based on the obtained post-core segmentation user-extraction contour;if the reset flag is set at the third time, then generating the third-frame trimap comprises: detecting at least one third-frame persona feature in the third frame;generating a third-frame alpha mask at least in part by aligning the intermediate persona contour with the detected at least one third-frame persona feature; andgenerating the third-frame trimap based on the generated third-frame alpha mask; andoutputting the generated third-frame trimap for use in extracting a user persona from the third frame. 22. An apparatus comprising: a communication interface;a processor; andnon-transitory computer readable data storage containing instructions executable by the processor for causing the system to carry out a set of functions, the set of functions comprising: obtaining a first frame of color pixel data;checking whether a reset flag is cleared or set at a first time;generating a trimap for the first frame, wherein: if the reset flag is cleared at the first time, then generating the trimap for the first frame comprises: obtaining a user-extraction contour that is based on an immediately preceding frame; andgenerating the trimap for the first frame based on the obtained user-extraction contour;if the reset flag is set at the first time, then generating the trimap for the first frame comprises: detecting at least one persona feature in the first frame;generating an alpha mask at least in part by aligning an intermediate persona contour with the detected at least one persona feature, wherein the intermediate persona contour is based on a result of a color-based flood-fill operation having been performed on a previous frame of color pixel data that had been segmented by a machine-learning-segmentation (MLS) process; andgenerating the trimap for the first frame based on the generated alpha mask; andoutputting the generated trimap for use in extracting a user persona from the first frame.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (117)
Cipolla Roberto (Cambridge GBX) Okamoto Yasukazu (Chiba-ken JPX) Kuno Yoshinori (Osaka-fu JPX), 3D human interface apparatus using motion recognition based on dynamic image processing.
Panahpour Tehrani, Mehrdad; Ishikawa, Akio; Sakazawa, Shigeyuki, Apparatus, method and computer program for classifying pixels in a motion picture as foreground or background.
Clanton,Charles H.; Ventrella,Jeffrey J.; Paiz,Fernando J., Cinematic techniques in avatar-centric communication during a multi-user online simulation.
DeMenthon Daniel F. (Columbia MD), Computer vision system for position monitoring in three dimensions using non-coplanar light sources attached to a monito.
Tian, Dihong; Mauchly, J. William; Friel, Joseph T., Generating and rendering synthesized views with multiple video streams in telepresence video conference sessions.
Iwamoto, Masayuki; Fujimura, Koichi, Image processing apparatus, method for processing and image and computer-readable recording medium for causing a computer to process images.
Carter, James; Yaacob, Arik; Darrah, James F., Managing the layout of multiple video streams displayed on a destination display screen during a videoconference.
Bang, Gun; Um, Gi-Mun; Chang, Eun-Young; Kim, Taeone; Hur, Nam-Ho; Kim, Jin-Woong; Lee, Soo-In, Method and apparatus for improving quality of depth image.
Yeh Hwa-Young M ; Lure Yuan-Ming F ; Lin Jyh-Shyan, Method and system for re-screening nodules in radiological images using multi-resolution processing, neural network, and image processing.
Berman Arie ; Vlahos Paul ; Dadourian Arpag, Method for removing from an image the background surrounding a selected subject by generating candidate mattes.
Colmenarez, Antonio J.; Gutta, Srinivas, Person tagging in an image processing system utilizing a statistical model based on both appearance and geometric features.
Haskell, Barin Geoffry; Puri, Atul; Schmidt, Robert Lewis, Scene description nodes to support improved chroma-key shape representation of coded arbitrary images and video objects.
Mackie, David J.; Tian, Dihong; Weir, Andrew P.; Buttimer, Maurice; Friel, Joseph T.; Mauchly, J. William; Chen, Wen-Hsiung, System and method for providing enhanced video processing in a network environment.
Prahlad, Anand; Schwartz, Jeremy A.; Ngo, David; Brockway, Brian; Muller, Marcus S., Systems and methods for classifying and transferring information in a storage network.
Weiser, Reginald; McGravie, Richard; Diouskine, Roman; Teboul, Jeremy, Systems and methods for providing video conferencing services via an ethernet adapter.
Rudolph, Eric; Rui, Yong; Malvar, Henrique S; He, Li Wei; Cohen, Michael F; Tashev, Ivan, Systems and methods for real-time audio-visual communication and data collaboration in a network conference environment.
Iwasaki, Tomoki; Suzuki, Tatsuhiko; Hashimoto, Susumu; Kutsuma, Yuji; Hamada, Toshihiro, Image processing apparatus for generating combined image signal of region-of-interest image signal and second image signal, the region-of-interest image signal being generated based on blank portion and initial region-of-interest of first image signal.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.