$\require{mediawiki-texvc}$

연합인증

연합인증 가입 기관의 연구자들은 소속기관의 인증정보(ID와 암호)를 이용해 다른 대학, 연구기관, 서비스 공급자의 다양한 온라인 자원과 연구 데이터를 이용할 수 있습니다.

이는 여행자가 자국에서 발행 받은 여권으로 세계 각국을 자유롭게 여행할 수 있는 것과 같습니다.

연합인증으로 이용이 가능한 서비스는 NTIS, DataON, Edison, Kafe, Webinar 등이 있습니다.

한번의 인증절차만으로 연합인증 가입 서비스에 추가 로그인 없이 이용이 가능합니다.

다만, 연합인증을 위해서는 최초 1회만 인증 절차가 필요합니다. (회원이 아닐 경우 회원 가입이 필요합니다.)

연합인증 절차는 다음과 같습니다.

최초이용시에는
ScienceON에 로그인 → 연합인증 서비스 접속 → 로그인 (본인 확인 또는 회원가입) → 서비스 이용

그 이후에는
ScienceON 로그인 → 연합인증 서비스 접속 → 서비스 이용

연합인증을 활용하시면 KISTI가 제공하는 다양한 서비스를 편리하게 이용하실 수 있습니다.

[해외논문] Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications 원문보기

IEEE transactions on pattern analysis and machine intelligence, v.43 no.5, 2021년, pp.1605 - 1619  

Senocak, Arda (KAIST, School of Electrical Engineering, Daejeon, Republic of Korea) ,  Oh, Tae-Hyun (POSTECH, Pohang, Korea) ,  Kim, Junsik (KAIST, School of Electrical Engineering, Daejeon, Republic of Korea) ,  Yang, Ming-Hsuan (University of California, Merced, CA, USA) ,  Kweon, In So (KAIST, School of Electrical Engineering, Daejeon, Republic of Korea)

Abstract AI-Helper 아이콘AI-Helper

Visual events are usually accompanied by sounds in our daily lives. However, can the machines learn to correlate the visual scene and sound, as well as localize the sound source only by observing them like humans? To investigate its empirical learnability, in this work we first present a novel unsup...

참고문헌 (55)

  1. Proc Int Conf Learn Representations Neural machine translation by jointly learning to align and translate bahdanau 2015 1 

  2. Proc Int Conf Learn Representations Very deep convolutional networks for large-scale image recognition simonyan 2015 1 

  3. Proc Int Conf Mach Learn Show, attend and tell: Neural image caption generation with visual attention xu 2015 2048 

  4. Corbetta, Maurizio, Shulman, Gordon L.. Control of goal-directed and stimulus-driven attention in the brain. Nature reviews. Neuroscience, vol.3, no.3, 201-215.

  5. Perrott, David R., Cisneros, John, Mckinley, Richard L., D'Angelo, William R.. Aurally Aided Visual Search under Virtual and Free-Field Listening Conditions. Human factors : the journal of the Human Factors and Ergonomics Society, vol.38, no.4, 702-715.

  6. Bolia, Robert S., D'Angelo, William R., McKinley, Richard L.. Aurally Aided Visual Search in Three-Dimensional Space. Human factors : the journal of the Human Factors and Ergonomics Society, vol.41, no.4, 664-669.

  7. Proc 26th Int Conf Neural Inf Process Syst Deep content-based music recommendation van den oord 2013 2643 

  8. Proc Eur Conf Comput Vis Visualizing and understanding convolutional networks zeiler 2014 818 

  9. Stein, Barry E., Stanford, Terrence R.. Multisensory integration: current issues from the perspective of the single neuron. Nature reviews. Neuroscience, vol.9, no.4, 255-266.

  10. 10.1109/CVPR.2016.319 

  11. Majdak, Piotr, Goupell, Matthew J., Laback, Bernhard. 3-D localization of virtual sound sources: Effects of visual environment, pointing method, and training. Attention, perception & psychophysics, vol.72, no.2, 454-469.

  12. Jones, Bill, Kabanoff, Boris. Eye movements in auditory space perception. Perception & psychophysics, vol.17, no.3, 241-245.

  13. Shelton, B. R., Searle, C. L.. The influence of vision on the absolute identification of sound-source position. Perception & psychophysics, vol.28, no.6, 589-596.

  14. 10.1109/CVPR.2018.00458 

  15. Gaver, William W.. What in the World Do We Hear?: An Ecological Approach to Auditory Event Perception. Ecological psychology : a publication of the International Society for Ecological Psychology, vol.5, no.1, 1-29.

  16. 10.1109/CVPR.2005.274 

  17. 10.1109/CVPR.2007.383344 

  18. Proc 13th Int Conf Neural Inf Process Syst Learning joint statistical models for audio-visual fusion and segregation fisher 2001 742 

  19. Optimum Array Processing Part IV of Detection Estimation and Modulation Theory van trees 2002 10.1002/0471221104 

  20. Izadinia, H., Saleemi, I., Shah, M.. Multimodal Analysis for Identification and Segmentation of Moving-Sounding Objects. IEEE transactions on multimedia, vol.15, no.2, 378-390.

  21. 10.21437/Interspeech.2018-1400 

  22. 10.1109/ICCVW.2015.95 

  23. Proc IEEE Conf Comput Vis Pattern Recognit Deep 360 pilot: Learning a deep agent for piloting through $360^{\circ }$360? sports video hu 2017 1396 

  24. Proc AAAI Self-view grounding given a narrated $360^{\circ }$360? video chou 2017 6748 

  25. Proc 32nd Int Conf Neural Inf Process Syst Self-supervised generation of spatial audio for $360^{\circ }$360? video morgado 2018 360 

  26. arXiv 1904 07933 Audio–visual model distillation using acoustic images perez 2019 

  27. 10.1109/CVPR.2019.00041 

  28. ACM Trans Graphics $360^{\circ }$360? video stabilization kopf 2016 10.1145/2980179.2982405 35 

  29. 10.1109/CVPR.2018.00374 

  30. Proc Asia Conf Comput Vis On learning associations of faces and voices kim 2018 276 

  31. Proc Int Workshop Similarity-Based Pattern Recognit Deep metric learning using triplet network hoffer 2015 10.1007/978-3-319-24261-3_7 84 

  32. 10.1017/CBO9781107298019 

  33. Proc Eur Conf Comput Vis Ambient sound provides supervision for visual learning owens 2016 801 

  34. Owens, Andrew, Wu, Jiajun, McDermott, Josh H., Freeman, William T., Torralba, Antonio. Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning. International journal of computer vision, vol.126, no.10, 1120-1137.

  35. Proc 30th Int Conf Neural Inf Process Syst SoundNet: Learning sound representations from unlabeled video aytar 2016 892 

  36. Proc IEEE Int Conf Comput Vis Look, listen and learn arandjelovi? 2017 609 

  37. CoRR See, hear, and read: Deep aligned representations aytar 2017 abs 1706 932 

  38. Proc 32nd Int Conf Neural Inf Process Syst Cooperative learning of audio and video models from self-supervised synchronization korbar 2018 7774 

  39. Proc 12th Int Conf Neural Inf Process Syst Audio vision: Using audio-visual synchrony to locate sounds hershey 1999 

  40. Proc Eur Conf Comput Vis Learning to separate object sounds by watching unlabeled video gao 2018 36 

  41. Proc Eur Conf Comput Vis Objects that sound arandjelovic 2018 451 

  42. ACM Trans Graphics Looking to listen at the cocktail party: A speaker-independent audio-visual model for speech separation ephrat 2018 10.1145/3197517.3201357 37 

  43. Proc Eur Conf Comput Vis The sound of pixels zhao 2018 587 

  44. Proc Eur Conf Comput Vis Audio-visual scene analysis with self-supervised multisensory features owens 2018 639 

  45. Proc Eur Conf Comput Vis Jointly discovering visual objects and spoken words from raw sensory input harwath 2018 659 

  46. Proc IEEE Conf Comput Vis Pattern Recognit Making $360^{\circ }$360? video watchable in 2D: Learning videography for click free viewing su 2017 1368 

  47. Proc Eur Conf Comput Vis Audio-visual event localization in unconstrained videos tian 2018 252 

  48. Everingham, Mark, Van Gool, Luc, Williams, Christopher K. I., Winn, John, Zisserman, Andrew. The Pascal Visual Object Classes (VOC) Challenge. International journal of computer vision, vol.88, no.2, 303-338.

  49. Kafle, Kushal, Kanan, Christopher. Visual question answering: Datasets, algorithms, and future challenges. Computer vision and image understanding : CVIU, vol.163, 3-20.

  50. Proc Asia Conf Comput Vis Pano2vid: Automatic cinematography for watching $360^{\circ }$360? videos su 2016 154 

  51. 10.1109/CVPR.2018.00154 

  52. TensorFlow: Large-scale machine learning on heterogeneous systems abadi 2015 

  53. Skinner, B. F.. 'Superstition' in the pigeon.. Journal of experimental psychology, vol.38, no.2, 168-172.

  54. Thomee, Bart, Shamma, David A., Friedland, Gerald, Elizalde, Benjamin, Ni, Karl, Poland, Douglas, Borth, Damian, Li, Li-Jia. YFCC100M : the new data in multimedia research. Communications of the ACM, vol.59, no.2, 64-73.

  55. Proc Int Conf Learn Representations Adam: A method for stochastic optimization kingma 2015 1 

LOADING...

활용도 분석정보

상세보기
다운로드
내보내기

활용도 Top5 논문

해당 논문의 주제분야에서 활용도가 높은 상위 5개 콘텐츠를 보여줍니다.
더보기 버튼을 클릭하시면 더 많은 관련자료를 살펴볼 수 있습니다.

관련 콘텐츠

오픈액세스(OA) 유형

GREEN

저자가 공개 리포지터리에 출판본, post-print, 또는 pre-print를 셀프 아카이빙 하여 자유로운 이용이 가능한 논문

유발과제정보 저작권 관리 안내
섹션별 컨텐츠 바로가기

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

AI-Helper 아이콘
AI-Helper
안녕하세요, AI-Helper입니다. 좌측 "선택된 텍스트"에서 텍스트를 선택하여 요약, 번역, 용어설명을 실행하세요.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.

선택된 텍스트

맨위로