Systems and methods for generating depth maps and corresponding confidence maps indicating depth estimation reliability
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06T-007/00
G02B-027/00
G06T-015/20
H04N-013/02
H04N-009/097
H04N-013/00
출원번호
US-0526364
(2014-10-28)
등록번호
US-9123117
(2015-09-01)
발명자
/ 주소
Ciurea, Florian
Venkataraman, Kartik
Molina, Gabriel
Lelescu, Dan
출원인 / 주소
Pelican Imaging Corporation
대리인 / 주소
KPPB LLP
인용정보
피인용 횟수 :
57인용 특허 :
132
초록▼
Systems in accordance with embodiments of the invention can perform parallax detection and correction in images captured using array cameras. Due to the different viewpoints of the cameras, parallax results in variations in the position of objects within the captured images of the scene. Methods in
Systems in accordance with embodiments of the invention can perform parallax detection and correction in images captured using array cameras. Due to the different viewpoints of the cameras, parallax results in variations in the position of objects within the captured images of the scene. Methods in accordance with embodiments of the invention provide an accurate account of the pixel disparity due to parallax between the different cameras in the array, so that appropriate scene-dependent geometric shifts can be applied to the pixels of the captured images when performing super-resolution processing. In a number of embodiments, generating depth estimates considers the similarity of pixels in multiple spectral channels. In certain embodiments, generating depth estimates involves generating a confidence map indicating the reliability of depth estimates.
대표청구항▼
1. A method of estimating distances to objects within a scene based upon a set of images captured from different viewpoints using a processor configured by an image processing application, the method comprising: selecting the viewpoint of an image from the set of images captured from different viewp
1. A method of estimating distances to objects within a scene based upon a set of images captured from different viewpoints using a processor configured by an image processing application, the method comprising: selecting the viewpoint of an image from the set of images captured from different viewpoints as a reference viewpoint;normalizing the set of images to increase the similarity of corresponding pixels within the set of images;determining depth estimates for pixel locations in an image from the reference viewpoint using at least a subset of the set of images, wherein generating a depth estimate for a given pixel location in the image from the reference viewpoint comprises: identifying pixels in the at least a subset of the set of images that correspond to the given pixel location in the image from the reference viewpoint based upon expected disparity at a plurality of depths;comparing the similarity of the corresponding pixels identified at each of the plurality of depths; andselecting the depth from the plurality of depths at which the identified corresponding pixels have the highest degree of similarity as a depth estimate for the given pixel location in the image from the reference viewpoint; andgenerating confidence metrics for the depth estimates for pixel locations in the image from the reference viewpoint, where a confidence metric for a given depth estimate for a given pixel location in the image from the reference viewpoint encodes at least one confidence factor indicating the reliability of the given depth estimate including a confidence factor determined using a measure of the signal to noise ratio (SNR) in a region surrounding the given pixel of the image from the set of images captured from the reference viewpoint. 2. The method of claim 1, wherein the confidence metric encodes at least one binary confidence factor. 3. The method of claim 1, wherein the confidence metric encodes at least one confidence factor represented as a range of degrees of confidence. 4. The method of claim 1, wherein the confidence metric encodes a plurality of confidence factors. 5. The method of claim 4, wherein generating a depth estimate for a given pixel location in the image from the reference viewpoint further comprises: determining an initial depth estimate for the pixel location in an image from the reference viewpoint based upon the disparity at which the identified corresponding pixels have the highest degree of similarity;comparing the similarity of the identified corresponding pixels to detect mismatched pixels;when an initial depth estimate does not result in the detection of a mismatch between the identified corresponding pixels, selecting the initial depth estimate as the current depth estimate for the pixel location in the image from the reference viewpoint; andwhen an initial depth estimate results in the detection of a mismatch between the identified corresponding pixels, updating the depth estimate for the pixel location in the image from the reference viewpoint by: determining a set of candidate depth estimates using a plurality of competing subsets of the set of images based upon the disparities at which corresponding pixels have the highest degree of similarity in each of a plurality of competing subsets of images; andselecting the candidate depth of the subset having the corresponding pixels with the highest degree of similarity as the updated depth estimate for the pixel location in the image from the reference viewpoint. 6. The method of claim 5, wherein the confidence metric for the depth estimate for a given pixel location in the image from the reference viewpoint comprises at least one confidence factor determined based upon the number of corresponding pixels used to generate the depth estimate. 7. The method of claim 5, wherein the confidence metric for the depth estimate for a given pixel location in the image from the reference viewpoint comprises at least one confidence factor generated by comparing the highest degree of similarity of corresponding pixels from different competing subsets of the set of images. 8. The method of claim 5, wherein the confidence metric encodes at least one confidence factor determined by comparing the similarity of the pixels in the set of images that were used to generate the updated depth estimate for a given pixel location in the image from the reference viewpoint. 9. The method of claim 8, wherein: a cost function is utilized to generate a cost metric indicating the similarity of corresponding pixels; andcomparing the similarity of the pixels in the set of images that were used to generate the depth estimate for a given pixel location in the image from the reference viewpoint further comprises: applying a threshold to a cost metric of the pixels in the set of images that were used to generate the updated depth estimate for a given pixel location in the image from the reference viewpoint; andwhen the cost metric exceeds the threshold, assigning a confidence metric that indicates that the updated depth estimate for the given pixel location in the image from the reference viewpoint was generated using at least one pixel in the set of images that is a problem pixel. 10. The method of claim 9, wherein the threshold is modified based upon at least one of: a mean intensity of a region surrounding the given pixel location in the image from the reference viewpoint; andnoise statistics for at least one sensor used to capture the set of images. 11. The method of claim 10, wherein the mean intensity of a region surrounding the given pixel location in the image from the reference viewpoint is calculated using a spatial box averaging filter centered around the given pixel. 12. The method of claim 11, wherein: the set of images are captured within multiple color channels including at least Red, Green and Blue color channels;selecting a reference viewpoint relative to the viewpoints of the set of images captured from different viewpoints comprises selecting one of the images in the Green color channel as the reference viewpoint; andthe mean intensity is used to determine the noise statistics for the Green channel using a table that relates a particular mean at a particular exposure and gain to a desired threshold. 13. The method of claim 5, wherein: a cost function is utilized to generate a cost metric indicating the similarity of corresponding pixels;a confidence metric based upon general mismatch is obtained using the following formula: Confidence(x,y)=F(Costmin(x,y),Costd(x,y),I(x,y)cam,Sensor,Camera Intrinsics)where Costmin(x, y) is the cost metric of the pixels in the set of images that were used to generate the depth estimate;Costd(x, y) is the distribution of the cost metrics at pixel location (x, y) for different depths;I(x, y)cam image data captured by any camera can be utilized to augment the confidence;Sensor is the sensor prior, which includes known properties of the sensor; andCamera intrinsics is the camera intrinsic, which specifies elements intrinsic to the camera and camera array that can impact confidence. 14. The method of claim 5, wherein: a cost function is utilized to generate a cost metric indicating the similarity of corresponding pixels; anda confidence metric based upon general mismatch is obtained using the following formula: Confidence(x,y)=a×Costmin(x,y)Avg(x,y)+offsetwhere Costmin(x, y) is the cost metric of the pixels in the set of images that were used to generate the updated depth estimate,Avg(x, y) is the mean intensity of the reference image in a spatial neighborhood surrounding (x, y), anda and offset are empirically chosen scale and offset factors used to adjust the confidence with prior information about the gain and noise statistics of at least one sensor used to capture images in the set of images. 15. The method of claim 4, wherein the confidence metric for the depth estimate for a given pixel location in the image from the reference viewpoint further comprises at least one confidence factor selected from the group consisting of: an indication that the given pixel is within a textureless region within an image;a number of corresponding pixels used to generate the depth estimate;an indication of a number of depths searched to generate the depth estimate;an indication that the given pixel is adjacent a high contrast edge;an indication that the given pixel is adjacent a high contrast boundary;an indication that the given pixel lies on a gradient edge;an indication that corresponding pixels to the given pixel are mismatched;an indication that corresponding pixels to the given pixel are occluded;an indication that depth estimates generated using different reference cameras exceed a threshold for the given pixel; andan indication that depth estimates generated using different subsets of cameras exceed a threshold for the given pixel;an indication as to whether the depth of the given threshold exceeds a threshold;an indication that the given pixel is defective; andan indication that corresponding pixels to the given pixel are defective. 16. The method of claim 1, wherein generating confidence metrics for the depth estimates for pixel locations in the image from the reference viewpoint comprises determining at least one sensor gain used to capture at least one of the set of images and adjusting at least one confidence factor based upon a sensor gain. 17. The method of claim 1, generating confidence metrics for the depth estimates for pixel locations in the image from the reference viewpoint comprises determining at least one exposure time used to capture at least one of the set of images and adjusting at least one confidence factor based upon an exposure time. 18. The method of claim 1, further comprising: outputting a depth map containing the depth estimates for pixel locations in the image from the reference viewpoint; andoutputting a confidence map containing confidence metrics for the updated depth estimates contained within the depth map. 19. The method of claim 18, further comprising filtering the depth map based upon the confidence map. 20. The method of claim 1, wherein generating confidence metrics for the depth estimates for pixel locations in the image from the reference viewpoint further comprises encoding a confidence factor for the depth estimate of a specific pixel location in the image from the reference viewpoint by: calculating local statistics of a region of interest around the specific pixel location in the image from the reference viewpoint; andcomparing the calculated local statistics to local statistics of a similar region in at least one other image from the set of images. 21. The method of claim 20, wherein the confidence factor for the depth estimate of a specific pixel location in the image from the reference viewpoint is a function of the mean and variance of the regions across images. 22. A method of synthesizing a higher resolution image from a set of lower resolution images captured from different viewpoints, the method comprising: estimating distances to objects within a scene from a set of images captured from different viewpoints using a processor directed by an image processing application to: select the viewpoint of an image from the set of images captured from different viewpoints as a reference viewpoint;normalize the set of images to increase the similarity of corresponding pixels within the set of images;determine depth estimates for pixel locations in an image from the reference viewpoint using at least a subset of the set of images, wherein generating a depth estimate for a given pixel location in the image from the reference viewpoint comprises: identifying pixels in the at least a subset of the set of images that correspond to the given pixel location in the image from the reference viewpoint based upon expected disparity at a plurality of depths;comparing the similarity of the corresponding pixels identified at each of the plurality of depths; andselecting the depth from the plurality of depths at which the identified corresponding pixels have the highest degree of similarity as a depth estimate for the given pixel location in the image from the reference viewpoint; andgenerating confidence metrics for the depth estimates for pixel locations in the image from the reference viewpoint using a processor directed by an image processing application, where a confidence metric for a given depth estimate for a pixel location in the image from the reference viewpoint encodes at least one confidence factor indicating the reliability of the given depth estimate;determining the visibility of the pixels in the set of images from the reference viewpoint using a processor directed by an image processing application; andfusing pixels from the set of images using the processor directed by the image processing application based upon the depth estimates that are indicated as being reliable by the confidence metrics to create a fused image having a resolution that is greater than the resolutions of the images in the set of images by: identifying the pixels from the set of images that are visible in an image from the reference viewpoint and that have reliable depth estimates using the visibility information and the confidence metrics; andapplying scene dependent geometric shifts to the pixels from the set of images that are visible in an image from the reference viewpoint and that have reliable depth estimates to shift the pixels into the reference viewpoint, where the scene dependent geometric shifts are determined using the depth estimates; andfusing the shifted pixels from the set of images to create a fused image from the reference viewpoint having a resolution that is greater than the resolutions of the images in the set of images. 23. The method of claim 22, further comprising synthesizing an image from the reference viewpoint using the processor directed by the image processing application to perform a super-resolution process based upon the fused image from the reference viewpoint, the set of images captured from different viewpoints, the depth estimates, the visibility information, and the confidence metrics. 24. The method of claim 22, wherein a confidence metric for a given depth estimate for a pixel location in the image from the reference viewpoint encodes at least one confidence factor determined using a measure of the signal to noise ratio (SNR) in a region surrounding a given pixel of the image from the set of images captured from the reference viewpoint. 25. The method of claim 22, wherein generating confidence metrics for the depth estimates for pixel locations in the image from the reference viewpoint further comprises encoding a confidence factor for the depth estimate of a specific pixel location in the image from the reference viewpoint by: calculating local statistics of a region of interest around the specific pixel location in the image from the reference viewpoint; andcomparing the calculated local statistics to the local statistics of a similar region in at least one other image from the set of images. 26. The method of claim 25, wherein the confidence factor for the depth estimate of a specific pixel location in the image from the reference viewpoint is a function of the mean and variance of the regions across images. 27. The method of claim 22, wherein the confidence metric encodes a plurality of confidence factors. 28. The method of claim 27, wherein the confidence metric for the depth estimate for a given pixel location in the image from the reference viewpoint further comprises at least one confidence factor selected from the group consisting of: an indication that the given pixel is within a textureless region within an image;a number of corresponding pixels used to generate the depth estimate;an indication of a number of depths searched to generate the depth estimate;an indication that the given pixel is adjacent a high contrast edge;an indication that the given pixel is adjacent a high contrast boundary;an indication that the given pixel lies on a gradient edge;an indication that corresponding pixels to the given pixel are mismatched;an indication that corresponding pixels to the given pixel are occluded;an indication that depth estimates generated using different reference cameras exceed a threshold for the given pixel; andan indication that depth estimates generated using different subsets of cameras exceed a threshold for the given pixel;an indication as to whether the depth of the given threshold exceeds a threshold;an indication that the given pixel is defective; andan indication that corresponding pixels to the given pixel are defective. 29. An image processing system, comprising: a processor;memory containing a set of images captured from different viewpoints and an image processing application;wherein the image processing application directs the processor to: select a viewpoint of an image from the set of images captured from different viewpoints as a reference viewpoint;normalize the set of images to increase the similarity of corresponding pixels within the set of images;determine depth estimates for pixel locations in an image from the reference viewpoint using at least a subset of the set of images, wherein generating a depth estimate for a given pixel location in the image from the reference viewpoint comprises: identify pixels in the at least a subset of the set of images that correspond to the given pixel location in the image from the reference viewpoint based upon expected disparity at a plurality of depths;compare the similarity of the corresponding pixels identified at each of the plurality of depths; andselect the depth from the plurality of depths at which the identified corresponding pixels have the highest degree of similarity as a depth estimate for the given pixel location in the image from the reference viewpoint; andgenerate confidence metrics for the depth estimates for pixel locations in the image from the reference viewpoint, where a confidence metric for a given depth estimate for a given pixel location in the image from the reference viewpoint encodes at least one confidence factor indicating the reliability of the given depth estimate including a confidence factor determined using a measure of the signal to noise ratio (SNR) in a region surrounding the given pixel from the reference viewpoint.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (132)
Wilburn, Bennett; Joshi, Neel; Levoy, Marc C.; Horowitz, Mark, Apparatus and method for capturing a scene using staggered triggering of dense camera arrays.
Iwase Toshihiro (Nara JPX) Kanekura Hiroshi (Yamatokouriyama JPX), Apparatus for and method of converting a sampling frequency according to a data driven type processing.
Boisvert, David Michael; McMahon, Andrew Kenneth John, CCD output processing stage that amplifies signals from colored pixels based on the conversion efficiency of the colored pixels.
Venkataraman, Kartik; Jabbi, Amandeep S.; Mullis, Robert H., Capturing and processing of images using monolithic camera array with heterogeneous imagers.
Venkataraman, Kartik; Jabbi, Amandeep S.; Mullis, Robert H.; Duparre, Jacques; Hu, Shane Ching-Feng, Capturing and processing of images using monolithic camera array with heterogeneous imagers.
Yamashita,Syugo; Murata,Haruhiko; Iinuma,Toshiya; Nakashima,Mitsuo; Mori,Takayuki, Device and method for converting two-dimensional video to three-dimensional video.
Ward, Gregory John; Seetzen, Helge; Heidrich, Wolfgang, Electronic camera having multiple sensors for capturing high dynamic range images and related methods.
Abell Gurdon R. (West Woodstock CT) Cook Francis J. (Topsfield MA) Howes Peter D. (Sudbury MA), Method and apparatus for arraying image sensor modules.
Sawhney,Harpreet Singh; Tao,Hai; Kumar,Rakesh; Hanna,Keith, Method and apparatus for synthesizing new video and/or still imagery from a collection of real video and/or still imagery.
Alexander David H. (Santa Monica CA) Hershman George H. (Carlsbad CA) Jack Michael D. (Carlsbad CA) Koda N. John (Vista CA) Lloyd Randahl B. (San Marcos CA), Monolithic imager for near-IR.
Hornbaker ; III Cecil V. (New Carrolton MD) Driggers Thomas C. (Falls Church VA) Bindon Edward W. (Fairfax VA), Scanning apparatus using multiple CCD arrays and related method.
Ciurea, Florian; Venkataraman, Kartik; Molina, Gabriel; Lelescu, Dan, Systems and methods for parallax detection and correction in images captured using array cameras that contain occlusions using subsets of images to perform depth estimation.
Venkataraman, Kartik; Jabbi, Amandeep S.; Mullis, Robert H., Systems and methods for parallax measurement using camera arrays incorporating 3 x 3 camera configurations.
Ciurea, Florian; Venkataraman, Kartik; Molina, Gabriel; Lelescu, Dan, Systems and methods for performing depth estimation using image data from multiple spectral channels.
Ludwig, Lester F., Vignetted optoelectronic array for use in synthetic image formation via signal processing, lensless cameras, and integrated camera-displays.
Rieger Albert,DEX ; Barclay David ; Chapman Steven ; Kellner Heinz-Andreas,DEX ; Reibl Michael,DEX ; Rydelek James G. ; Schweizer Andreas,DEX, Watertight body for accommodating a photographic camera.
Venkataraman, Kartik; Gallagher, Paul; Jain, Ankit K.; Nisenzon, Semyon; Lelescu, Dan; Ciurea, Florian; Molina, Gabriel, Autofocus system for a conventional camera that uses depth information from an array camera.
Venkataraman, Kartik; Jabbi, Amandeep S.; Mullis, Robert H.; Duparre, Jacques; Hu, Shane Ching-Feng, Capturing and processing of images including occlusions focused on an image sensor by a lens stack array.
Venkataraman, Kartik; Jabbi, Amandeep S.; Mullis, Robert H.; Duparre, Jacques; Hu, Shane Ching-Feng, Capturing and processing of images using camera array incorperating Bayer cameras having different fields of view.
Srikanth, Manohar; Ramamoorthi, Ravi; Venkataraman, Kartik; Chatterjee, Priyam, System and methods for depth regularization and semiautomatic interactive matting using RGB-D images.
Nayar, Shree; Venkataraman, Kartik; Pain, Bedabrata; Lelescu, Dan, Systems and methods for controlling aliasing in images captured by an array camera for use in super resolution processing using pixel apertures.
Lelescu, Dan; Venkataraman, Kartik, Systems and methods for controlling aliasing in images captured by an array camera for use in super-resolution processing.
Duparre, Jacques; McMahon, Andrew Kenneth John; Lelescu, Dan; Venkataraman, Kartik; Molina, Gabriel, Systems and methods for detecting defective camera arrays and optic arrays.
Ciurea, Florian; Venkataraman, Kartik; Molina, Gabriel; Lelescu, Dan, Systems and methods for estimating depth and visibility from a reference viewpoint for pixels in a set of images captured from different viewpoints.
Venkataraman, Kartik; Lelescu, Dan; Molina, Gabriel, Systems and methods for generating compressed light field representation data using captured light fields, array geometry, and parallax information.
Venkataraman, Kartik; Lelescu, Dan; Molina, Gabriel, Systems and methods for generating compressed light field representation data using captured light fields, array geometry, and parallax information.
Venkataraman, Kartik; Jabbi, Amandeep S.; Mullis, Robert H., Systems and methods for generating depth maps using a camera arrays incorporating monochrome and color cameras.
Venkataraman, Kartik; Jabbi, Amandeep S.; Mullis, Robert H., Systems and methods for generating depth maps using a camera arrays incorporating monochrome and color cameras.
Venkataraman, Kartik; Jabbi, Amandeep S.; Mullis, Robert H., Systems and methods for generating depth maps using images captured by camera arrays incorporating cameras having different fields of view.
Duparre, Jacques; McMahon, Andrew Kenneth John; Lelescu, Dan, Systems and methods for manufacturing camera modules using active alignment of lens stack arrays and sensors.
Duparre, Jacques; McMahon, Andrew Kenneth John; Lelescu, Dan, Systems and methods for manufacturing camera modules using active alignment of lens stack arrays and sensors.
Venkataraman, Kartik; Jabbi, Amandeep S.; Mullis, Robert H., Systems and methods for measuring depth using images captured by a camera array including cameras surrounding a central camera.
Venkataraman, Kartik; Huang, Yusong; Jain, Ankit K.; Chatterjee, Priyam, Systems and methods for performing high speed video capture and depth estimation using array cameras.
Lelescu, Dan; Duong, Thang, Systems and methods for synthesizing high resolution images using image deconvolution based on motion and depth information.
Lelescu, Dan; Molina, Gabriel; Venkataraman, Kartik, Systems and methods for synthesizing high resolution images using images captured by an array of independently controllable imagers.
Venkataraman, Kartik; Nisenzon, Semyon; Chatterjee, Priyam; Molina, Gabriel, Systems and methods for synthesizing images from image data captured by an array camera using restricted depth of field depth maps in which depth estimation precision varies.
Venkataraman, Kartik; Nisenzon, Semyon; Chatterjee, Priyam; Molina, Gabriel, Systems and methods for synthesizing images from image data captured by an array camera using restricted depth of field depth maps in which depth estimation precision varies.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.