최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기IEEE/ACM transactions on audio, speech, and language processing, v.29, 2021년, pp.3654 - 3667
Hong, Joanna (Korea Advanced Institute of Science and Technology (KAIST), Image and Video Systems Laboratory, School of Electrical Engineering, Daejeon, Korea) , Kim, Minsu (Korea Advanced Institute of Science and Technology (KAIST), Image and Video Systems Laboratory, School of Electrical Engineering, Daejeon, Korea) , Park, Se Jin (Korea Advanced Institute of Science and Technology (KAIST), Image and Video Systems Laboratory, School of Electrical Engineering, Daejeon, Korea) , Ro, Yong Man (Korea Advanced Institute of Science and Technology (KAIST), Image and Video Systems Laboratory, School of Electrical Engineering, Daejeon, Korea)
The goal of this work is to reconstruct speech from silent video, in both speaker dependent and independent ways. Unlike previous works that have been mostly restricted to a speaker dependent setting, we propose Visual Voice memory to restore essential auditory information to generate proper speech ...
Cooke, Martin, Barker, Jon, Cunningham, Stuart, Shao, Xu. An audio-visual corpus for speech perception and automatic speech recognition. The Journal of the Acoustical Society of America, vol.120, no.5, 2421-2424.
IEEE/ACM Trans Audio Speech Lang Process Non-parallel sequence-to-sequence voice conversion with disentangled linguistic and speaker representations zhang 2019 10.1109/TASLP.2019.2960721 28 540
Zhang, Jing-Xuan, Ling, Zhen-Hua, Liu, Li-Juan, Jiang, Yuan, Dai, Li-Rong. Sequence-to-Sequence Acoustic Modeling for Voice Conversion. IEEE/ACM transactions on audio, speech, and language processing, vol.27, no.3, 631-644.
Wavenet: A generative model for raw oord 2016
Proc Adv Neural Inf Process Syst Melgan: Generative adversarial networks for conditional waveform synthesis kumar 2019 32 14?910
Int J Signal Process Improved closed set text-independent speaker identification by combining MFCC with evidence from flipped filter banks chakroborty 2007 4 114
Proc Int Conf Learn Representations Deep voice 3: 2000-speaker neural text-to-speech ping 0 214
Proc Int Conf Mach Learn Parallel wavenet: Fast high-fidelity speech synthesis oord 0 3918
Fast wavenet generation algorithm paine 2016
Proc Adv Neural Inf Process Syst Transfer learning from speaker verification to multispeaker text-to-speech synthesis jia 2018 31
Salik, Khwaja Mohd., Aggarwal, Swati, Kumar, Yaman, Shah, Rajiv Ratn, Jain, Rohit, Zimmermann, Roger. Lipper: Speaker Independent Speech Synthesis Using Multi-View Lipreading. Proceedings of the ... aaai conference on artificial intelligence, vol.33, 10023-10024.
Afouras, Triantafyllos, Chung, Joon Son, Senior, Andrew, Vinyals, Oriol, Zisserman, Andrew. Deep Audio-Visual Speech Recognition. IEEE transactions on pattern analysis and machine intelligence, vol.44, no.12, 8717-8727.
Dupont, S., Luettin, J.. Audio-visual speech modeling for continuous speech recognition. IEEE transactions on multimedia, vol.2, no.3, 141-151.
Le Cornu, Thomas, Milner, Ben. Generating Intelligible Audio Speech From Visual Speech. IEEE/ACM transactions on audio, speech, and language processing, vol.25, no.9, 1751-1761.
Schuster, M., Paliwal, K.K.. Bidirectional recurrent neural networks. IEEE transactions on signal processing : a publication of the IEEE Signal Processing Society, vol.45, no.11, 2673-2681.
Jensen, Jesper, Taal, Cees H.. An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers. IEEE/ACM transactions on audio, speech, and language processing, vol.24, no.11, 2009-2022.
Griffin, D., Jae Lim,. Signal estimation from modified short-time Fourier transform. IEEE transactions on acoustics, speech, and signal processing, vol.32, no.2, 236-243.
Proc Int Conf Learn Representations (ICLR) Adam: A method for stochastic optimization kingma 2015
Proc IEEE Int Conf Comput Vis S3FD: Single shot scale-invariant face detector zhang 0 192
Proc Adv Neural Inf Process Syst Attention-based models for speech recognition chorowski 2015 577
LipNet: End-to-end sentence-level lipreading assael 2016
Harte, Naomi, Gillen, Eoin. TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech. IEEE transactions on multimedia, vol.17, no.5, 603-615.
Lip-reading with hierarchical pyramidal convolution and self-attention chen 2020
Proc Brit Mach Vis Conf Learning spatio-temporal features with two-stream deep 3D CNNs for lipreading weng 2019
Multi-grained spatio-temporal modeling for lip-reading wang 2019
Jong-Seok Lee, Cheol Hoon Park. Robust Audio-Visual Speech Recognition Based on Late Integration. IEEE transactions on multimedia, vol.10, no.5, 767-779.
Adeel, Ahsan, Gogate, Mandar, Hussain, Amir. Contextual deep learning-based audio-visual switching for speech enhancement in real-world environments. Information fusion, vol.59, 163-170.
Sadeghi, Mostafa, Leglaive, Simon, Alameda-Pineda, Xavier, Girin, Laurent, Horaud, Radu. Audio-Visual Speech Enhancement Using Conditional Variational Auto-Encoders. IEEE/ACM transactions on audio, speech, and language processing, vol.28, 1788-1800.
Adeel, Ahsan, Ahmad, Jawad, Larijani, Hadi, Hussain, Amir. A Novel Real-Time, Lightweight Chaotic-Encryption Scheme for Next-Generation Audio-Visual Hearing Aids. Cognitive computation, vol.12, no.3, 589-601.
Proc Proc 27th Int Conf Mach Learn Rectified linear units improve restricted Boltzmann machines nair 2010 807
Adv Neural Inf Process Syst End-to-end memory networks sukhbaatar 2015 28 2440
Proc Int Conf Mach Learn Batch normalization: Accelerating deep network training by reducing internal covariate shift ioffe 0 448
Proc Int Conf Learn Representations (ICLR) Learning to remember rare events kaiser 2017
해당 논문의 주제분야에서 활용도가 높은 상위 5개 콘텐츠를 보여줍니다.
더보기 버튼을 클릭하시면 더 많은 관련자료를 살펴볼 수 있습니다.
*원문 PDF 파일 및 링크정보가 존재하지 않을 경우 KISTI DDS 시스템에서 제공하는 원문복사서비스를 사용할 수 있습니다.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.