Real-time capturing and generating stereo images and videos with a monoscopic low power mobile device
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
H04N-013/02
H04N-013/00
H04N-005/222
G06T-007/00
H04N-005/232
출원번호
US-0497906
(2006-08-01)
등록번호
US-8970680
(2015-03-03)
발명자
/ 주소
Wang, Haohong
Li, Hsiang-Tsun
Manjunath, Sharath
출원인 / 주소
Qualcomm Incorporated
대리인 / 주소
Boyd, Brent A.
인용정보
피인용 횟수 :
2인용 특허 :
12
초록▼
A monoscopic low-power mobile device is capable of creating real-time stereo images and videos from a single captured view. The device uses statistics from an autofocusing process to create a block depth map of a single capture view. Artifacts in the block depth map are reduced and an image depth ma
A monoscopic low-power mobile device is capable of creating real-time stereo images and videos from a single captured view. The device uses statistics from an autofocusing process to create a block depth map of a single capture view. Artifacts in the block depth map are reduced and an image depth map is created. Stereo three-dimensional (3D) left and right views are created from the image depth map using a Z-buffer based 3D surface recover process and a disparity map which is a function of the geometry of binocular vision.
대표청구항▼
1. A monoscopic low-power mobile device comprising: a single-sensor camera sensor module operable to capture an image and having an autofocusing sub-module operable to determine a best focus position by moving a lens through an entire focusing range via a focusing process and to select the focus pos
1. A monoscopic low-power mobile device comprising: a single-sensor camera sensor module operable to capture an image and having an autofocusing sub-module operable to determine a best focus position by moving a lens through an entire focusing range via a focusing process and to select the focus position with a maximum focus value when capturing the image;a depth map generator assembly operable to: in a first-stage, develop a block-level depth map automatically using statistics from the autofocusing sub-module;in a second-stage to develop an image depth map, the block-level depth map including a depth value for each of a plurality of portions of the captured image, the image depth map including a pixel depth value for a pixel in a portion of the plurality of portions; andduring the second stage, obtain a depth value for corner pixels of each block included in the block-level depth map, the depth value for a corner pixel based at least in part on an average of depth values for middle points of neighboring blocks of a respective block, the neighboring blocks included in the block-level depth map,the depth map generator assembly including a bilinear filter configured to generate the pixel depth value for the pixel based at least in part on depth values for each corner pixel, the depth values for each corner pixel weighted based on a ratio between a distance of the pixel to a respective corner pixel and a total distance of the pixel to each corner pixel, the bilinear filter configured to generate the pixel depth value during the second-stage; andan image pair generator module operable to create a missing second view from the captured image to create three dimensional (3D) stereo left and right views. 2. The device of claim 1, wherein the image pair generator module comprises: a disparity map sub-module which calculates a disparity map based on a distance in pixels between image points in the left and right views of binocular vision geometry for the captured image wherein the captured image represents the left view;a Z-buffer 3D surface recover sub-module operable to construct a 3D visible surface for the captured image from the right view; anda stereo view generator sub-module operable to project the 3D surface of the right view onto a projection plane. 3. The device of claim 1, wherein the focusing process of the autofocusing sub-module in a still image mode performs an exhaustive search focusing process to capture a still-image, and in a video mode, to achieve real-time capturing of a video clip, is initiated with the exhaustive search focusing process and follows with a climbing-hill focusing process. 4. The device of claim 3, wherein the depth map generator assembly in the second stage is operable to reduce artifacts with the bilinear filter. 5. The device of claim 1, wherein generating the pixel depth value for the pixel based at least in part on depth values for each corner pixel comprises generating the pixel depth value dp for pixel P (xp, yp, dp) in accordance with the equation dp=(xp-xA)4+(yp-yA)4(xp-xA)4+(yp-yA)4+(xp-xB)4+(yp-yB)4+(xp-xC)4+(yp-yC)4+(xp-xD)4+(yp-yD)4dA+(xp-xB)4+(yp-yB)4(xp-xA)4+(yp-yA)4+(xp-xB)4+(yp-yB)4+(xp-xC)4+(yp-yC)4+(xp-xD)4+(yp-yD)4dB+(xp-xC)4+(yp-yC)4(xp-xA)4+(yp-yA)4+(xp-xB)4+(yp-yB)4+(xp-xC)4+(yp-yC)4+(xp-xD)4+(yp-yD)4dC+(xp-xD)4+(yp-yD)4(xp-xA)4+(yp-yA)4+(xp-xB)4+(yp-yB)4+(xp-xC)4+(yp-yC)4+(xp-xD)4+(yp-yD)4dDwhere position values and the depth values for the corner pixels (A, B, C, and D) of the block are denoted as (xA, yA, dA), (xB, yB, dB), (xC, yC, dC), (xD, yD, dD). 6. The device of claim 3, further comprising a video coding module for coding the video clip captured and providing statistics information for calculating the block-level depth map, the video coding module being operable to determine motion estimation, and the depth map generator assembly being operable in the second stage to detect and estimate depth information for real-time capturing and generation of stereo video using the statistics information from the motion estimation, the focusing process, and history data plus heuristic rules to obtain a final block depth map from which the image depth map is derived. 7. The device of claim 1, further comprising a display and a 3D effects generator module for displaying on the display the 3D stereo left and right views. 8. The device of claim 7, wherein the 3D effects generator module is operable to produce a red-blue anaglyph image of the 3D stereo left and right views on the display. 9. The device of claim 1, wherein the monoscopic low-power mobile device comprises one of a hand-held digital camera, a camcorder, and a single-sensor camera phone. 10. A monoscopic low-power mobile device comprising: means for capturing an image with a single sensor;means for autofocusing a lens and determining a best focus position by moving the lens through an entire focusing range and for selecting the focus position with a maximum focus value when capturing the image;means for generating in a first-stage a block-level depth map automatically using statistics from the autofocusing means and in a second-stage an image depth map, the block-level depth map including a depth value for each of a plurality of portions of the captured image, the image depth map including a pixel depth value for a pixel in a portion of the plurality of portions, wherein during the second stage, a depth value for corner pixels of each block included in the block-level depth map is obtained, the depth value for a corner pixel based at least in part on an average of depth values for middle points of neighboring blocks of a respective block, the neighboring blocks included in the block-level depth map,the means for generating including means for reducing artifacts configured to, during the second-stage, generate the pixel depth value for the pixel based at least in part on depth values for each corner pixel, the depth values for each corner pixel weighted based on a ratio between a distance of the pixel to a respective corner pixel and a total distance of the pixel to each corner pixel; andmeans for creating a missing second view from the captured image to create three dimensional (3D) stereo left and right views. 11. The device of claim 10, wherein the creating means comprises: means for calculating a disparity map based on a distance in pixels between image points in the left and right views of binocular vision geometry for the captured image wherein the captured image represents the left view;means for 3D surface recovering with Z-buffering for constructing a 3D visible surface for the captured image from a missing right viewpoint; andmeans for generating stereo views by projecting the constructed 3D surface onto a projection plane. 12. The device of claim 10, wherein the autofocusing means includes means for performing an exhaustive search focusing process to capture a still-image in a still image mode; means for initiating the exhaustive search focusing process in a video mode; and means for climbing-hill focusing in a video mode to capture a real-time video clip. 13. The device of claim 10, wherein generating the pixel depth value for the pixel based at least in part on depth values for each corner pixel comprises generating the pixel depth value dp for pixel P (xp, yp, dp) in accordance with equation dp=(xp-xA)4+(yp-yA)4(xp-xA)4+(yp-yA)4+(xp-xB)4+(yp-yB)4+(xp-xC)4+(yp-yC)4+(xp-xD)4+(yp-yD)4dA+(xp-xB)4+(yp-yB)4(xp-xA)4+(yp-yA)4+(xp-xB)4+(yp-yB)4+(xp-xC)4+(yp-yC)4+(xp-xD)4+(yp-yD)4dB+(xp-xC)4+(yp-yC)4(xp-xA)4+(yp-yA)4+(xp-xB)4+(yp-yB)4+(xp-xC)4+(yp-yC)4+(xp-xD)4+(yp-yD)4dC+(xp-xD)4+(yp-yD)4(xp-xA)4+(yp-yA)4+(xp-xB)4+(yp-yB)4+(xp-xC)4+(yp-yC)4+(xp-xD)4+(yp-yD)4dDwhere position values and the depth values for the corner pixels (A, B, C, and D) of the block are denoted as (xA, yA, dA), (xB, yB, dB), (xC, yC, dC), (xD, yD, dD). 14. The device of claim 12, further comprising means for video coding the video clip captured and providing statistics information; wherein the means for video coding includes means for motion estimating; and wherein the generating means includes means for detecting and estimating depth information for real-time capturing and generation of stereo video using statistics information from the motion estimating means, the autofocusing means, and history data plus some heuristic rules to obtain a final block depth map from which the image depth map is derived. 15. The device of claim 10, further comprising a display and means for generating 3D effects of the 3D stereo left and right views on the display. 16. The device of claim 15, wherein the 3D effects generating means produces a red-blue anaglyph image of the 3D stereo left and right views on the display. 17. The device of claim 10, wherein the monoscopic low-power mobile device comprises one of a hand-held digital camera, a camcorder, and a single-sensor camera phone. 18. A method for generating real-time stereo images, the method comprising: capturing an image with a single sensor;autofocusing a lens and determining a best focus position by moving the lens through an entire focusing range and selecting the focus position with a maximum focus value when capturing the image;generating in a first-stage a block-level depth map automatically using statistics from the autofocusing and in a second-stage generating an image depth map, the block-level depth map including a depth value for each of a plurality of portions of the captured image, the image depth map including a pixel depth value for a pixel in a portion of the plurality of portions, the second-stage generating including: obtaining a depth value for corner pixels of each block included in the block-level depth map, the depth value for a corner pixel based at least in part on an average of depth values for middle points of neighboring blocks of a respective block, the neighboring blocks included in the block-level depth map; andgenerating the pixel depth value for the pixel based at least in part on depth values for each corner pixel, the depth values for each corner pixel weighted based on a ratio between a distance of the pixel to a respective corner pixel and a total distance of the pixel to each corner pixel; andcreating a missing second view from the captured image to create three-dimensional (3D) stereo left and right views. 19. The method of claim 18, wherein creating the missing second view comprises: calculating a disparity map based on a distance in pixels between image points in the left and right views of binocular vision geometry for the captured image wherein the captured image represents the left view;3D surface recovering with Z-buffering for constructing a 3D visible surface for the captured image from a missing right viewpoint; andgenerating a missing right view by projecting the constructed 3D surface onto a projection plane. 20. The method of claim 18, wherein autofocusing includes: performing an exhaustive search focusing process to capture a still-image in a still image mode;initiating the exhaustive search focusing process in a video mode; andclimbing-hill focusing in a video mode to capture a real-time video clip. 21. The method of claim 18, wherein generating the pixel depth value for the pixel based at least in part on depth values for each corner pixel comprises generating pixel depth value dp for pixel P (xp, yp, dp) in accordance with equation dp=(xp-xA)4+(yp-yA)4(xp-xA)4+(yp-yA)4+(xp-xB)4+(yp-yB)4+(xp-xC)4+(yp-yC)4+(xp-xD)4+(yp-yD)4dA+(xp-xB)4+(yp-yB)4(xp-xA)4+(yp-yA)4+(xp-xB)4+(yp-yB)4+(xp-xC)4+(yp-yC)4+(xp-xD)4+(yp-yD)4dB+(xp-xC)4+(yp-yC)4(xp-xA)4+(yp-yA)4+(xp-xB)4+(yp-yB)4+(xp-xC)4+(yp-yC)4+(xp-xD)4+(yp-yD)4dC+(xp-xD)4+(yp-yD)4(xp-xA)4+(yp-yA)4+(xp-xB)4+(yp-yB)4+(xp-xC)4+(yp-yC)4+(xp-xD)4+(yp-yD)4dDwhere position values and the depth values for the corner pixels (A, B, C, and D) of the block are denoted as (xA, yA, dA), (xB, yB, dB), (xC, yC, dC), (xD, yD, dD). 22. The method of claim 20, further comprising video coding the video clip and motion estimating, wherein generating the block-level depth map includes detecting and estimating depth information for real-time capturing and generation of stereo video using statistics from the motion estimating the autofocusing and history data plus heuristic rules to obtain a final block depth map from which the image depth map is derived. 23. The method of claim 18, further comprising generating 3D effects of the 3D stereo left and right views on a display. 24. The method of claim 23, wherein generating 3D effects includes producing a red-blue anaglyph image of the 3D stereo left and right views on the display. 25. A method of operating a still image processing device, the method comprising: autofocusing processing a captured still image and estimating depth information of remote objects in the image to generate a block-level depth map, the block-level depth map including a depth value for each of a plurality of portions of the captured still image;obtaining a depth value for corner pixels of each block included in the block-level depth map, the depth value for a corner pixel based at least in part on an average of depth values for middle points of neighboring blocks of a respective block, the neighboring blocks included in the block-level depth map; andgenerating an image depth map based on the block-level depth map, the image depth map including a pixel depth value for a pixel of a portion of the plurality of portions, the pixel depth value of the pixel based at least in part on depth values for each corner pixel, the depth values for each corner pixel weighted based on a ratio between a distance of the pixel to a respective corner pixel to a total distance of the pixel to each corner pixel. 26. The method of claim 25, wherein the autofocusing processing includes processing the image using a coarse-to-fine depth detection process. 27. The method of claim 25, wherein generating the image depth map comprises bilinear filtering the block-level depth map to derive an approximated image depth map. 28. The method of claim 25, wherein position values and the depth values for the corner pixels (A, B, C, and D) of the block are denoted as (xA, yA, dA), (xB, yB, dB), (xC, yC, dC), (xD, yD, dD) wherein for a respective pixel denoted by a point P (xp, yp, dp), generating the pixel depth value dp of the respective pixel is in accordance with the equation dp=(xp-xA)4+(yp-yA)4(xp-xA)4+(yp-yA)4+(xp-xB)4+(yp-yB)4+(xp-xC)4+(yp-yC)4+(xp-xD)4+(yp-yD)4dA+(xp-xB)4+(yp-yB)4(xp-xA)4+(yp-yA)4+(xp-xB)4+(yp-yB)4+(xp-xC)4+(yp-yC)4+(xp-xD)4+(yp-yD)4dB+(xp-xC)4+(yp-yC)4(xp-xA)4+(yp-yA)4+(xp-xB)4+(yp-yB)4+(xp-xC)4+(yp-yC)4+(xp-xD)4+(yp-yD)4dC+(xp-xD)4+(yp-yD)4(xp-xA)4+(yp-yA)4+(xp-xB)4+(yp-yB)4+(xp-xC)4+(yp-yC)4+(xp-xD)4+(yp-yD)4dD. 29. A still image capturing device comprising: an autofocusing module operable to: process a captured still image and estimate depth information of remote objects in the image to detect a block-level depth map, the block-level depth map including a depth value for each of a plurality of portions of the captured still image; andobtain a depth value for corner pixels of each block included in the block-level depth map, the depth value for a corner pixel based at least in part on an average of depth values for middle points of neighboring blocks of a respective block, the neighboring blocks included in the block-level depth map;an image depth map module operable to approximate from the block-level depth map an image depth map using bilinear filtering, the image depth map including a pixel depth value for a pixel of a portion of the plurality of portions, the pixel depth value of the pixel based at least in part on depth values for each corner pixel, the depth values for each corner pixel weighted based on a ratio between a distance of the pixel to a respective corner pixel and a total distance of the pixel to each corner pixel; andan image pair generator module operable to create a missing second view from the captured image to create three-dimensional (3D) stereo left and right views. 30. The device of claim 29, further comprising a 3D effects generator module operable to display 3D effects of the 3D stereo left and right views. 31. The device of claim 29, wherein a focusing process of the autofocusing module performs an exhaustive search focusing process to capture the still image. 32. The device of claim 29, wherein the image depth map module is operable to reduce artifacts with the bilinear filtering. 33. A video image capturing device comprising: an autofocusing module operable to process a captured video clip and estimate depth information of remote objects in a scene, the autofocusing module configured to generate an autofocusing block depth map and an autofocusing focus value map for frames of the captured video clip;a video coding module operable to code the video clip captured, provide statistics information and determine motion estimation, the video coding module configured to generate a video coding block depth map and a video coding focus value map for frames of the captured video clip based at least in part on the motion estimation; andan image depth map module operable to detect and estimate depth information for real-time capturing and generation of stereo video based on a final block depth map from which an image depth map is derived, the block depth map including a depth value for each of a plurality of portions of the captured video clip, the image depth map including a pixel depth value for a pixel in a portion of the plurality of portions, the image depth map module configured to generate the final block depth map based on values included in the autofocusing block depth map, the autofocusing focus value map, the video coding block depth map, and the video coding focus value map. 34. The device of claim 33, wherein a focusing process of the autofocusing module to achieve real-time capturing of a video clip is initiated with the exhaustive search focusing process and follows with a climbing-hill focusing process. 35. The device of claim 33, further comprising an image pair generator module operable to create a missing second view from the captured image to create three-dimensional (3D) stereo left and right views. 36. The device of claim 35, further comprising a 3D effects generator module operable to display 3D effects of the 3D stereo left and right views. 37. The device of claim 33, wherein the depth map module is operable to predict an internal block depth map (Pn(i, j)) and a focus value map (Tn(i, j)) of a current frame n from those of a previous frame by the following equations Pn(i,j)={Dn-1(a,b)ifVn(i,j)-Fn-1(a,b)
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (12)
Oh,Teik; Flack,Julien; Harman,Philip Victor, 3D image synthesis from depth encoded source view.
Haruhiko Murata JP; Yukio Mori JP; Shuugo Yamashita JP; Akihiro Maenaka JP; Seiji Okada JP; Kanji Ihara JP, Device and method for converting two-dimensional video into three-dimensional video.
Nakagawa Yasuo (Chigasaki PA JPX) Nayer Shree K. (Pittsburgh PA), Method of detecting solid shape of object with autofocusing and image detection at each focus level.
Eleftheriadis Alexandros ; Anastassiou Dimitris ; Chang Shif-Fu ; Nayar Shree, Methods and apparatus for performing digital image and video segmentation and compression using 3-D depth information.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.