비디오의 오디오 정보 요약 기법에 관한 연구

Investigating the Efficient Method for Constructing Audio Surrogates of Digital Video Data


본 연구는 비디오의 오디오 정보를 추출하여 자동으로 요약하는 알고리즘을 설계하고, 제안된 알고리즘에 의해서 구성한 오디오 요약의 품질을 평가하여 효율적인 비디오 요약의 구현 방안을 제안하였다. 구체적인 연구 결과를 살펴보면 다음과 같다. 먼저, 제안 오디오 요약의 품질이 위치 기반 오디오 요약의 품질 보다 내재적 평가에서 더 우수하게 나타났다. 이용자 평가(외재적 평가)의 요약문 정확도에서는 제안 요약문이 위치 기반 요약문 보다 더 우수한 것으로 나타났지만, 항목 선택에서는 이 두 요약문간의 성능 차이는 없는 것으로 나타났다. 이외에 비디오 브라우징을 위한 오디오 요약에 대한 이용자 만족도를 조사하였다. 끝으로 이러한 조사 결과를 기초로 하여 제안된 오디오 요약 기법을 인터넷이나 디지털 도서관에 활용하는 방안들을 제시하였다.


The study proposed the algorithm for automatically summarizing the audio information from a video and then conducted an experiment for the evaluation of the audio extraction that was constructed based on the proposed algorithm. The research results showed that first, the recall and precision rates of the proposed method for audio summarization were higher than those of the mechanical method by which audio extraction was constructed based on the sentence location. Second, the proposed method outperformed the mechanical method in summary making tasks, although in the gist recognition task(multiple choice), there is no statistically difference between the proposed and mechanical methods. In addition, the study conducted the participants' satisfaction survey regarding the use of audio extraction for video browsing and also discussed the practical implications of the proposed method in Internet and digital library environments.

  1. 김재곤 등. 2000. 효율적인 비디오 브라우징을 위한 동적 요약 및 요약 기술구조. 방송 공학회논문지, 5(1): 82-93 
  2. 정영미. 2005. 정보검색연구. 서울: 구미무역 출판부 
  3. Edmunson, H. P. 1969. 'New methods in automatic extracting.' Journal of the ACM, 16(2): 265-285 
  4. Furini, M. and V. Ghini. 2006. 'An Audiovideo smmarisation scheme based on audio and video analysis.' Proceedings of the IEEE Consumer Communications and Networking Conference(CCNC '06), vol. 2, Las Vegas, NV, USA, 8-10 January, 2006, 1209-1213 
  5. Hauptmann, A. G. 2005. 'Lessons for the future from a decade of informedia video analysis research.' Lecture Notes in Computer Science, Vol. 3568: 1-10. [cited 2006.6.25].  
  6. Kristin, B. et al. 2006. Audio surrogation for digital video: A design framework. UNC School of Information and Library Science(SILS) Technical Report TR 2006-21 
  7. Kupiec, J., J. Pedersen, and F. Chen. 1995. 'A trainable document summarizer.' Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 68-73 
  8. Luhn, H. P. 1958. The automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2): 159- 165 
  9. Mani, I. 2001. Automatic summarization. Amsterdam: John Benjamins Publishing Co 
  10. Money, A. G. and H. Agius. 2008. 'Video summarisation: A conceptual framework and survey of the state of the art.' Journal of visual communication and image representation, 19(2): 121- 143 
  11. Money, A. G. and H. Agius. 2009. 'Analysing user physiological responses for affective video summarisation.' Displays, 30: 59-70 
  12. Over, P. et al. 2005. TRECVID, 2005: 'An introduction.' Proceedings of the TRECVID, 2005(Gaithersburg, MD), 1-14 
  13. Schmandt, C. and A. Mullins. 1995. 'Audio- Streamer: Exploiting simultaneity for listening.' CHI '95: Conference companion on human factors in computing systems, Denver, Colorado, United States, 218-219. from  
  14. Smeaton, A. F. 2007. 'Techniques used and open challenges to the analysis, indexing and retrieval of digital video.' Information Systems, 32: 545-559 
  15. Song, Y. and G. Marchionini. 2007. 'Effects of audio and visual surrogates for making sense of digital video.' Proceedings of CHI 2007, San Jose, CA, USA. 867-876 
  16. Sparck Jones, K. 2007. 'Automatic summarising: The state of the art.' Information Processing and Management, 43: 1449- 1481 
  17. Witbrock, M. and A. Hauptmann. 1998. 'Speech recognition for a digital video library.' Journal of the American Society for Information Science and Technology, 49(7): 619-632 
  18. Yang, M. and G. Marchionini. 2005. 'Deciphering visual gist and its implications for video retrieval and interface design.' Conference on Human Factors in Computing Systems(CHI). Portland, OR. Apr. 2-7 
  19. 진성원 등. 2005. 개인화된 의미 기반 콘텐츠 소비 를 위한 지능형 방송 시스템과 서비스. 방 송공학회논문지, 10(3): 422-435 
  20. Gunther, R., R. Kazman, and C. MaccGregor. 2004. 'Using 3D sound as a navigational aid in virtual environments.' Behaviour and Information Technology, 23(6): 435-446 
  21. Marchionini, G., B. M. Wildemuth, and G. Geisler. 2006. 'The Open Video Digital Library: A Mobius strip of research and practice.' Journal of the American Society for Information Science and Technology, 57(12): 1623- 1643 
  22. Myaeng, S. H. and D. H. Jang. 1999. 'Development and evaluation of a statistically- based document summarization system.' In I. Mani and M. T. Maybury, eds. Advances in automatic text summarization. Cambridge, MA: The MIT Press, 61-70 
  23. Smeaton, A. F. and P. Browne. 2006. 'A usage study of retrieval modalities for video shot retrieval.' Information Processing and Management, 42(5): 1330- 1344 

  1. Kim, Hyun-Hee 2011. "A Study on the Interactive Effect of Spoken Words and Imagery not Synchronized in Multimedia Surrogates for Video Gisting" 한국문헌정보학회지 = Journal of the Korean Society for Library and Information Science, 45(2): 97~118 
  2. Kim, Hyun-Hee 2012. "Investigating an Automatic Method in Summarizing a Video Speech Using User-Assigned Tags" 한국문헌정보학회지 = Journal of the Korean Society for Library and Information Science, 46(1): 163~181 
  3. Kim, Hyun-Hee 2013. "Comparing the Use of Semantic Relations between Tags Versus Latent Semantic Analysis for Speech Summarization" 한국문헌정보학회지 = Journal of the Korean Society for Library and Information Science, 47(3): 343~361 


