[논문]DNN 학습을 이용한 퍼스널 비디오 시퀀스의 멀티 모달 기반 이벤트 분류 방법

이유진; 낭종호

doi:10.5626/jok.2016.43.11.1281

DNN 학습을 이용한 퍼스널 비디오 시퀀스의 멀티 모달 기반 이벤트 분류 방법
A Personal Video Event Classification Method based on Multi-Modalities by DNN-Learning

정보과학회논문지 = Journal of KIISE, v.43 no.11, 2016년, pp.1281 - 1297

초록
AI-Helper

최근 스마트 기기의 보급으로 자유롭게 비디오 컨텐츠를 생성하고 이를 빠르고 편리하게 공유할 수 있는 네트워크 환경이 갖추어지면서, 퍼스널 비디오가 급증하고 있다. 그러나, 퍼스널 비디오는 비디오라는 특성 상 멀티 모달리티로 구성되어 있으면서 데이터가 시간의 흐름에 따라 변화하기 때문에 이벤트 분류를 할 때 이에 대한 고려가 필요하다. 본 논문에서는 비디오 내의 멀티 모달리티들로부터 고수준의 특징을 추출하여 시간 순으로 재배열한 것을 바탕으로 모달리티 사이의 연관관계를 Deep Neural Network(DNN)으로 학습하여 퍼스널 비디오 이벤트를 분류하는 방법을 제안한다. 제안하는 방법은 비디오에 내포된 이미지와 오디오를 시간적으로 동기화하여 추출한 후 GoogLeNet과 Multi-Layer Perceptron(MLP)을 이용하여 각각 고수준 정보를 추출한다. 그리고 이들을 비디오에 표현된 시간순으로 재 배열하여 비디오 한 편당 하나의 특징으로 재 생성하고 이를 바탕으로 학습한 DNN을 이용하여 퍼스널 비디오 이벤트를 분류한다.

Abstract ▼ AI-Helper

In recent years, personal videos have seen a tremendous growth due to the substantial increase in the use of smart devices and networking services in which users create and share video content easily without many restrictions. However, taking both into account would significantly improve event detection performance because videos generally have multiple modalities and the frame data in video varies at different time points. This paper proposes an event detection method. In this method, high-level features are first extracted from multiple modalities in the videos, and the features are rearranged according to time sequence. Then the association of the modalities is learned by means of DNN to produce a personal video event detector. In our proposed method, audio and image data are first synchronized and then extracted. Then, the result is input into GoogLeNet as well as Multi-Layer Perceptron (MLP) to extract high-level features. The results are then re-arranged in time sequence, and every video is processed to extract one feature each for training by means of DNN.

주제어

참고문헌 (19)

J.-H. Shin, S.-K. Baek and P.-K. Kim, "Video Event Detection according to Generating of Semantic Unit based on Moving Object," Journal of Korea Multimedia Society, Vol. 11, No. 2, pp. 143-152, 2008.
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar and L. Fei-Fei, "Large-Scale Video Classification with Convolutional Neural Networks," Proc. of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Vol. 37, pp. 448-456, 2014.
S. Yu et al., "CMU-Informedia@ TRECVID 2014 Multimedia Event Detection (MED)," Proc. of the 2014 TRECVID Video Retrieval Evaluation Workshop, 2014.
J. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga and G. Toderici, "Beyond Short Snippets: Deep Networks for Video Classification," Proc. of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 4694-4702, 2015.
Z. Wu, Y.-G. Jiang, J. Wang, J. Pu and X. Xue, "Exploring Inter-feature and Inter-class Relationships with Deep Neural Networks for Video Classification," Proc. of the 22nd ACM International Conference on Multimedia, pp. 167-176, 2014.
C. Szegedy et al., "Going Deeper with Convolutions," Proc. of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9, 2015.
F. Rosenblatt, Principles of Neurodynamics, SpartanBook, 1962.
Soomro, A. R. Zamir and M. Shah, UCF101: A Dataset of 101 Human Action Classes From Videos in The Wild, CRCV-TR-12-01, 2012.
A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet Classificattion with Deep Convolutional Neural Networks," Proc. of Neural Information Processing Systems 2012, 2012.
Y. LeCunn, L. Bottou, Y. Bengio and P. Haffiner, "Gradient-Based Learning Applied to Document Recognition," Proc. of IEEE, Vol. 86, No. 11, pp. 2278-2324, 1998.

상세보기
S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Computation, Vol. 9, No. 8, pp. 1735-1780, 1997.

상세보기
K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Visual Recognition," Proc. of International Conference on Learning Representations 2014, 2014.
B. Zhu, W. Li and X. Xue, "A Novel Audio Fingerprinting Method Robust to Time Scale Modification and Pitch Shifting," Proc. of the 18th ACM International Conference on Multimedia, pp. 987-990, 2010.
M. Sahidullah and Goutam Saha, "Design, Analysis and Experimental Evaluation of Block based Transformation in MFCC Computation for Speaker Recognition," Journal of Speech Communication, Vol. 54, No. 4, pp. 543-565, 2012.

상세보기
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama and T. Darrell, "Caffe: Convolutional Architecture for Fast Feature Embedding," Proc. of the 22nd ACM International Conference on Multimedia, pp. 675-678, 2014.
LISA Lab, Theano, http://deeplearning.net/software/theano/, 2015.
D. Tran, L. Bourdev, R. Fergus, L. Torresani and M. Paluri, "Learning Spatiotemporal Features with 3D Convolutional Networks," Proc. of International Conference on Computer Vision 2015, pp. 435-442, 2015.
N. Srivastava, E. Mansimov and R. Salakhutdinov, "Unsupervised Learning of Video Representations using LSTMs," Proc. of the 32nd International Conference on Machine Learning, pp. 843-852, 2015.
H. Ye, Z. Wu, R.-W. Zhao, X. Wang, Y.-G. Jiang and X. Xue, "Evaluating Two-Stream CNN for Video Classification," Proc. of the 5th ACM International Conference on Multimedia Retrieval, pp. 435-442, 2015.

저자의 다른 논문 :

LOADING...

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

DNN 학습을 이용한 퍼스널 비디오 시퀀스의 멀티 모달 기반 이벤트 분류 방법
A Personal Video Event Classification Method based on Multi-Modalities by DNN-Learning

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (19)

이 논문을 인용한 문헌

저자의 다른 논문 :

연구과제 타임라인

관련 콘텐츠

원문 URL 링크

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

DNN 학습을 이용한 퍼스널 비디오 시퀀스의 멀티 모달 기반 이벤트 분류 방법 A Personal Video Event Classification Method based on Multi-Modalities by DNN-Learning

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (19)

이 논문을 인용한 문헌

저자의 다른 논문 :

낭종호 (40)

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

관련 콘텐츠

원문 URL 링크

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

DNN 학습을 이용한 퍼스널 비디오 시퀀스의 멀티 모달 기반 이벤트 분류 방법
A Personal Video Event Classification Method based on Multi-Modalities by DNN-Learning

초록
AI-Helper