[논문]CNN과 Attention을 통한 깊이 화면 내 예측 방법

윤재혁; 이동석; 윤병주; 권순각

doi:10.9723/jksiis.2024.29.2.035

CNN과 Attention을 통한 깊이 화면 내 예측 방법
Intra Prediction Method for Depth Picture Using CNN and Attention Mechanism

한국산업정보학회논문지 = Journal of Korea Society of Industrial Information Systems, v.29 no.2, 2024년, pp.35 - 45

윤재혁 (동의대학교 컴퓨터소프트웨어공학과) , 이동석 (동의대학교 인공지능그랜드ICT연구센터) , 윤병주 (경북대학교 전자공학부) , 권순각 (동의대학교 컴퓨터소프트웨어공학과)

초록
AI-Helper

본 논문에서는 CNN과 Attention 기법을 통한 깊이 영상의 화면 내 예측 방법을 제안한다. 제안하는 방법을 통해 예측하고자 하는 블록 내 화소마다 참조 화소를 선택할 수 있도록 한다. CNN을 통해 예측 블록의 상단과 좌단에서 각각 수직방향과 수평 방향의 공간적 특징을 검출한다. 두 공간적 특징은 예측블록과 참조 화소들에 대한 특징을 예측하기 위해 각각 특징차원과 공간적 차원으로 병합된다. Attention을 통해 예측 블록과 참조 화소간의 상관성을 입력된 공간적 특징을 통해 예측한다. Attention을 통해 예측된 상관성은 CNN 레이어를 통해 화소 도메인으로 복원되어 블록 내 화소 값이 예측된다. 제안된 방법이 VVC의 인트라 모드에 추가되었을 때 화면 예측 오차가 평균 5.8% 감소하였다.

Abstract ▼ AI-Helper

In this paper, we propose an intra prediction method for depth picture using CNN and Attention mechanism. The proposed method allows each pixel in a block to predict to select pixels among reference area. Spatial features in the vertical and horizontal directions for reference pixels are extracted from the top and left areas adjacent to the block, respectively, through a CNN layer. The two spatial features are merged into the feature direction and the spatial direction to predict features for the prediction block and reference pixels, respectively. the correlation between the prediction block and the reference pixel is predicted through attention mechanism. The predicted correlations are restored to the pixel domain through CNN layers to predict the pixels in the block. The average prediction error of intra prediction is reduced by 5.8% when the proposed method is added to VVC intra modes.

주제어

표/그림 (12)

그림 Fig. 1 Flow of the proposed method
그림 Fig. 2 Module structure for spatial feature extraction
그림 Fig. 3 Spatial feature extraction for top and left blocks
그림 Fig. 4 Merging spatial feature maps for target block and reference pixels
그림 Fig. 5 Conversion of spatial features to pixel values
그림 Fig. 7 Block prediction results through the proposed method
그림 Fig. 6 Samples of pictures for simulation
표 Table 2 Prediction errors for m
표 Table 3 Prediction errors for t
표 Table 4 Comparison of intra prediction between VVC and the proposed method
그림 Fig. 8 Depth videos for comparison simulation with VVC
표 Table 1 Prediction errors for f

참고문헌 (20)

Aguilar, W. G., Rodriguez, G. A., Alvarez, L.,？Sandoval, S., Quisaguano, F. and Limaico,？A. (2017). Visual SLAM with a RGB-D？Camera on A Quadrotor UAV Using？On-board Processing, Proceedings of the？Advances in Computational Intelligence: 14th？International Work-Conference on Artificial？Neural Networks, June 14-16, Cadiz, Spain.,？pp. 596-606.
Bross, B., Wang, Y., Ye, Y., Liu, S., Chen, J.,？Sullivan, G. J. and Ohm, J. (2021).？Overview of The Versatile Video Coding？(VVC) Standard and Its Applications, IEEE？Transactions on Circuits and Systems for？Video Technology, 31(10), 3736-3764.

상세보기
Jiang, M. X., Luo, X. X., Hai, T., Wang, H.？Y., Yang, S. and Abdalla, A. N. (2019).？Visual Object Tracking in RGB-D Data via？Genetic Feature Learning, Complexity, 4539410.
Kwon, S. K., Kim, H. J. and Lee, D. S.？(2017). Face Recognition Method Based on？Local Binary Pattern using Depth Images,？Journal of Korea Society of Industrial？Information Systems, 22(6), 39-45.
Kwon, S. K., Tamhankar, A. and Rao, K. R.？(2006). Overview of H.264/MPEG-4 Part 10,？Journal of Visual Communication and？Image Representation, 17(2), 186-216.

상세보기
Lee, D. S. and Kwon, S. K. (2022). Intra？Prediction Method for Depth Video Coding？by Block Clustering through Deep Learning,？Sensors, 22(24), 9656.

상세보기
Lee, D. S., Kim, B. G. and Kwon, S. K.？(2021). Efficient Depth Data Coding Method？Based on Plane Modeling for Intra Prediction,？IEEE Access, 9, 29153-29164.

상세보기
Lee, D. S. and Kwon, S. K. (2019). Vehicle？Plate Detection Method by Measuring Plane？Similarity Using Depth Information, Journal？of Korea Society of Industrial Information？Systems, 24(2), 47-55.
Li, Y. (2012). Hand Gesture Recognition？Using Kinect, Proceedings of the 2012？IEEE International Conference on Computer？Science and Automation Engineering, June？22-24, Beijing, Chian, pp. 196-199.
Li, Y., Miao, Q., Tian, K., Fan, Y., Xu, X., Li,？R. and Song, J. (2016). Large-scale Gesture？Recognition with A Fusion of RGB-D Data？Based on The C3D Model, Proceedings of？the 23rd international conference on pattern？recognition, Dec. 4-8, Cancun, Mexico, pp.？25-30.
Nenci, F., Spinello, L. and Stachniss, C.？(2014). Effective Compression of Range？Data Streams for Remote Robot Operations？Using H.264, Proceedings of the 2014？IEEE/RSJ International Conference on？Intelligent Robots and Systems, Sep. 14-18,？Chicago, IL, USA, pp. 3794-3799.
Ren, C. Y., Prisacariu, V. A., Kahler, O., Reid,？I. D. and Murray, D. W. (2017). Real-time？Tracking of Single and Multiple Objects？from Depth-colour Imagery Using 3D？Signed Distance Functions, International？Journal of Computer Vision, 124, 80-95.

상세보기
Ren, Z., Yuan, J., Meng, J. and Zhang, Z.？(2013). Robust Part-based Hand Gesture？Recognition Using Kinect Sensor, IEEE？transactions on multimedia, 15(5), 1110-1120.

상세보기
Oh, K. J., Han, D. H. and Kwon, S. K.？(2018). Character Floating Hologram Using？Detection of User's Height and Motion by？Depth Image, Journal of Korea Society of？Industrial Information Systems, 23(4), 33-40.
Silberman, N., Hoiem, D., Kohli, P. and？Fergus, R. (2012). Indoor Segmentation and？Support Inference from RGBD Images,？Proceedings of the 12th European Conference？on Computer Vision, Oct. 7-13, Florence,？Italy, pp. 746-760.
Stankiewicz, O., Wegner, K. and Domanski,？M. (2013). Nonlinear Depth Representation？for 3D Video Coding, Proceedings of the？IEEE International Conference on Image？Processing, Sep. 15-18, Melbourne, Australia,？pp. 1752-1756.
Sullivan, G. J., Ohm, J. R., Han, W. J. and？Wiegand, T. (2012). Overview of The High？Efficiency Video Coding (HEVC) Standard,？IEEE Transactions on Circuits and？Systems for Video Technology, 22(12), 1649-1668.

상세보기
Sun, Y., Liu, M. and Meng, M. Q. H. (2017).？Improving RGB-D SLAM in Dynamic？Environments: A Motion Removal Approach,？Robotics and Autonomous Systems, 89, 110-122.

상세보기
Vaswani, A., Shazeer, N., Parmar, N.,？Uszkoreit, J., Jones, L., Gomez, A. N., L.？Kaiser. and Polosukhin, I. (2017). Attention？is All You Need, Proceedings of the？Neural Information Processing Systems,？Dec. 4-9, Long Beach, CA, USA, pp.？5998-6008.
Zhao, Y., Carraro, M., Munaro, M. and？Menegatti, E. (2017). Robust Multiple Object？Tracking in RGB-D Camera Networks,？Proceedings of the IEEE/RSJ International？Conference on Intelligent Robots and？Systems, Sep. 24-28, Vancouver, Canada,？pp. 6625-6632.

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증