$\require{mediawiki-texvc}$

연합인증

연합인증 가입 기관의 연구자들은 소속기관의 인증정보(ID와 암호)를 이용해 다른 대학, 연구기관, 서비스 공급자의 다양한 온라인 자원과 연구 데이터를 이용할 수 있습니다.

이는 여행자가 자국에서 발행 받은 여권으로 세계 각국을 자유롭게 여행할 수 있는 것과 같습니다.

연합인증으로 이용이 가능한 서비스는 NTIS, DataON, Edison, Kafe, Webinar 등이 있습니다.

한번의 인증절차만으로 연합인증 가입 서비스에 추가 로그인 없이 이용이 가능합니다.

다만, 연합인증을 위해서는 최초 1회만 인증 절차가 필요합니다. (회원이 아닐 경우 회원 가입이 필요합니다.)

연합인증 절차는 다음과 같습니다.

최초이용시에는
ScienceON에 로그인 → 연합인증 서비스 접속 → 로그인 (본인 확인 또는 회원가입) → 서비스 이용

그 이후에는
ScienceON 로그인 → 연합인증 서비스 접속 → 서비스 이용

연합인증을 활용하시면 KISTI가 제공하는 다양한 서비스를 편리하게 이용하실 수 있습니다.

[해외논문] Speech Reconstruction With Reminiscent Sound Via Visual Voice Memory

IEEE/ACM transactions on audio, speech, and language processing, v.29, 2021년, pp.3654 - 3667  

Hong, Joanna (Korea Advanced Institute of Science and Technology (KAIST), Image and Video Systems Laboratory, School of Electrical Engineering, Daejeon, Korea) ,  Kim, Minsu (Korea Advanced Institute of Science and Technology (KAIST), Image and Video Systems Laboratory, School of Electrical Engineering, Daejeon, Korea) ,  Park, Se Jin (Korea Advanced Institute of Science and Technology (KAIST), Image and Video Systems Laboratory, School of Electrical Engineering, Daejeon, Korea) ,  Ro, Yong Man (Korea Advanced Institute of Science and Technology (KAIST), Image and Video Systems Laboratory, School of Electrical Engineering, Daejeon, Korea)

Abstract AI-Helper 아이콘AI-Helper

The goal of this work is to reconstruct speech from silent video, in both speaker dependent and independent ways. Unlike previous works that have been mostly restricted to a speaker dependent setting, we propose Visual Voice memory to restore essential auditory information to generate proper speech ...

참고문헌 (64)

  1. Cooke, Martin, Barker, Jon, Cunningham, Stuart, Shao, Xu. An audio-visual corpus for speech perception and automatic speech recognition. The Journal of the Acoustical Society of America, vol.120, no.5, 2421-2424.

  2. IEEE/ACM Trans Audio Speech Lang Process Non-parallel sequence-to-sequence voice conversion with disentangled linguistic and speaker representations zhang 2019 10.1109/TASLP.2019.2960721 28 540 

  3. 10.1109/ICASSP.2018.8461368 

  4. 10.4324/9780203098752 

  5. 10.1109/ICASSP39728.2021.9414040 

  6. 10.21437/Interspeech.2019-1445 

  7. Zhang, Jing-Xuan, Ling, Zhen-Hua, Liu, Li-Juan, Jiang, Yuan, Dai, Li-Rong. Sequence-to-Sequence Acoustic Modeling for Voice Conversion. IEEE/ACM transactions on audio, speech, and language processing, vol.27, no.3, 631-644.

  8. 10.21437/Interspeech.2017-314 

  9. Wavenet: A generative model for raw oord 2016 

  10. Proc Adv Neural Inf Process Syst Melgan: Generative adversarial networks for conditional waveform synthesis kumar 2019 32 14?910 

  11. Int J Signal Process Improved closed set text-independent speaker identification by combining MFCC with evidence from flipped filter banks chakroborty 2007 4 114 

  12. Proc Int Conf Learn Representations Deep voice 3: 2000-speaker neural text-to-speech ping 0 214 

  13. Proc Int Conf Mach Learn Parallel wavenet: Fast high-fidelity speech synthesis oord 0 3918 

  14. Fast wavenet generation algorithm paine 2016 

  15. 10.1109/ISM.2018.00-19 

  16. Proc Adv Neural Inf Process Syst Transfer learning from speaker verification to multispeaker text-to-speech synthesis jia 2018 31 

  17. Salik, Khwaja Mohd., Aggarwal, Swati, Kumar, Yaman, Shah, Rajiv Ratn, Jain, Rohit, Zimmermann, Roger. Lipper: Speaker Independent Speech Synthesis Using Multi-View Lipreading. Proceedings of the ... aaai conference on artificial intelligence, vol.33, 10023-10024.

  18. 10.21437/Interspeech.2019-3269 

  19. Afouras, Triantafyllos, Chung, Joon Son, Senior, Andrew, Vinyals, Oriol, Zisserman, Andrew. Deep Audio-Visual Speech Recognition. IEEE transactions on pattern analysis and machine intelligence, vol.44, no.12, 8717-8727.

  20. Dupont, S., Luettin, J.. Audio-visual speech modeling for continuous speech recognition. IEEE transactions on multimedia, vol.2, no.3, 141-151.

  21. 10.1109/ICASSP.2017.7953127 

  22. 10.1109/ICCVW.2017.61 

  23. Le Cornu, Thomas, Milner, Ben. Generating Intelligible Audio Speech From Visual Speech. IEEE/ACM transactions on audio, speech, and language processing, vol.25, no.9, 1751-1761.

  24. 10.1109/CVPR42600.2020.01381 

  25. 10.1109/ICASSP.2018.8461856 

  26. 10.1145/3240508.3241911 

  27. 10.21437/Interspeech.2020-1026 

  28. 10.1109/CVPR.2016.90 

  29. Schuster, M., Paliwal, K.K.. Bidirectional recurrent neural networks. IEEE transactions on signal processing : a publication of the IEEE Signal Processing Society, vol.45, no.11, 2673-2681.

  30. 10.1109/ICASSP.2001.941023 

  31. Jensen, Jesper, Taal, Cees H.. An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers. IEEE/ACM transactions on audio, speech, and language processing, vol.24, no.11, 2009-2022.

  32. 10.1109/ICASSP.2010.5495701 

  33. Griffin, D., Jae Lim,. Signal estimation from modified short-time Fourier transform. IEEE transactions on acoustics, speech, and signal processing, vol.32, no.2, 236-243.

  34. Proc Int Conf Learn Representations (ICLR) Adam: A method for stochastic optimization kingma 2015 

  35. Proc IEEE Int Conf Comput Vis S3FD: Single shot scale-invariant face detector zhang 0 192 

  36. Proc Adv Neural Inf Process Syst Attention-based models for speech recognition chorowski 2015 577 

  37. 10.1214/aoms/1177729694 

  38. 10.1109/CVPR.2017.367 

  39. LipNet: End-to-end sentence-level lipreading assael 2016 

  40. Harte, Naomi, Gillen, Eoin. TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech. IEEE transactions on multimedia, vol.17, no.5, 603-615.

  41. Lip-reading with hierarchical pyramidal convolution and self-attention chen 2020 

  42. 10.1007/978-3-319-54184-6_6 

  43. 10.21437/Interspeech.2017-85 

  44. Proc Brit Mach Vis Conf Learning spatio-temporal features with two-stream deep 3D CNNs for lipreading weng 2019 

  45. Multi-grained spatio-temporal modeling for lip-reading wang 2019 

  46. 10.1109/FG47880.2020.00132 

  47. 10.1109/FG47880.2020.00133 

  48. 10.1109/CVPR42600.2020.01444 

  49. 10.1109/ICASSP39728.2021.9414353 

  50. Jong-Seok Lee, Cheol Hoon Park. Robust Audio-Visual Speech Recognition Based on Late Integration. IEEE transactions on multimedia, vol.10, no.5, 767-779.

  51. Adeel, Ahsan, Gogate, Mandar, Hussain, Amir. Contextual deep learning-based audio-visual switching for speech enhancement in real-world environments. Information fusion, vol.59, 163-170.

  52. Sadeghi, Mostafa, Leglaive, Simon, Alameda-Pineda, Xavier, Girin, Laurent, Horaud, Radu. Audio-Visual Speech Enhancement Using Conditional Variational Auto-Encoders. IEEE/ACM transactions on audio, speech, and language processing, vol.28, 1788-1800.

  53. 10.21437/Interspeech.2018-1400 

  54. Adeel, Ahsan, Ahmad, Jawad, Larijani, Hadi, Hussain, Amir. A Novel Real-Time, Lightweight Chaotic-Encryption Scheme for Next-Generation Audio-Visual Hearing Aids. Cognitive computation, vol.12, no.3, 589-601.

  55. Proc Proc 27th Int Conf Mach Learn Rectified linear units improve restricted Boltzmann machines nair 2010 807 

  56. 10.1109/ICASSP.2018.8461326 

  57. Adv Neural Inf Process Syst End-to-end memory networks sukhbaatar 2015 28 2440 

  58. 10.1109/CVPR.2019.00595 

  59. Proc Int Conf Mach Learn Batch normalization: Accelerating deep network training by reducing internal covariate shift ioffe 0 448 

  60. 10.18653/v1/D16-1147 

  61. 10.1109/ICASSP40776.2020.9053841 

  62. 10.3115/v1/W14-4012 

  63. 10.1109/CVPR.2018.00429 

  64. Proc Int Conf Learn Representations (ICLR) Learning to remember rare events kaiser 2017 

LOADING...

활용도 분석정보

상세보기
다운로드
내보내기

활용도 Top5 논문

해당 논문의 주제분야에서 활용도가 높은 상위 5개 콘텐츠를 보여줍니다.
더보기 버튼을 클릭하시면 더 많은 관련자료를 살펴볼 수 있습니다.

관련 콘텐츠

유발과제정보 저작권 관리 안내
섹션별 컨텐츠 바로가기

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

AI-Helper 아이콘
AI-Helper
안녕하세요, AI-Helper입니다. 좌측 "선택된 텍스트"에서 텍스트를 선택하여 요약, 번역, 용어설명을 실행하세요.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.

선택된 텍스트

맨위로