최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기말소리와 음성과학 = Phonetics and speech sciences, v.15 no.3, 2023년, pp.75 - 82
오창한 (한국전자통신연구원 복합지능연구실) , 김청빈 (한국전자통신연구원 복합지능연구실) , 박기영 (한국전자통신연구원 복합지능연구실)
Automatic speech recognition (ASR) has been revolutionized with deep learning-based approaches, among which self-supervised learning methods have proven to be particularly effective. In this study, we aim to enhance the performance of OpenAI's Whisper model, a multilingual ASR system on the Korean l...
AiHub (2021). Aihub broadcast content korean speech recognition?data. Retrieved from https://aihub.or.kr/aihubdata/data/view.do?currMenu115&topMenu100&aihubDataSerealm&dataSetSn463
Baevski, A., Zhou, Y., Mohamed, A., & Auli, M. (2020, December).?wav2vec 2.0: A framework for self-supervised learning of speech?representations. Proceedings of the Advances in Neural Information?Processing Systems (pp. 12449-12460). Online Conference.
Bang, J. U., Yun, S., Kim, S. H., Choi, M. Y., Lee, M. K., Kim, Y. J.,?Kim, D. H., ... Kim, S. H. (2020). KsponSpeech: Korean?spontaneous speech corpus for automatic speech recognition.?Applied Sciences, 10(19), 6936.
Chan, W., Jaitly, N., Le, Q., & Vinyals, O. (2016, March). Listen,?attend and spell: A neural network for large vocabulary?conversational speech recognition. Proceedings of the 2016 IEEE?International Conference on Acoustics, Speech and Signal?Processing (ICASSP) (pp. 4960-4964). Shanghai, China.
Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020, July). A?simple framework for contrastive learning of visual representations.?Proceedings of the 37th International Conference on Machine?Learning (pp. 1597-1607). Online Conference.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert:?Pre-training of deep bidirectional transformers for language?understanding. Retrieved from https://arxiv.org/abs/1810.04805
Graves, A., Fernandez, S., Gomez, F., & Schmidhuber, J. (2006,?June). Connectionist temporal classification: labelling unsegmented?sequence data with recurrent neural networks. Proceedings of the 23rd International Conference on Machine Learning (pp. 369-376). Pittsburgh, PA.
Graves, A., & Jaitly, N. (2014, June). Towards end-to-end speech?recognition with recurrent neural networks. Proceedings of the 31st International Conference on Machine Learning (pp. 1764-1772). Beijing, China.
Hadsell, R., Chopra, S., & LeCun, Y. (2006, June). Dimensionality?reduction by learning an invariant mapping. Proceedings of the?2006 IEEE Computer Society Conference on Computer Vision and?Pattern Recognition (CVPR'06) (pp. 1735-1742). New York, NY.
Nam, K. (2019). A study on processing of speech recognition Korean?words. The Journal of the Convergence on Culture Technology, 5(4), 407-412.
OpenAi (2023). Openai/whisper. Retrieved from https://github.com/openai/whisper
Panayotov, V., Chen, G., Povey, D., & Khudanpur, S. (2015, April).?Librispeech: An ASR corpus based on public domain audio books.?Proceedings of the 2015 IEEE International Conference on?Acoustics, Speech and Signal Processing (ICASSP) (pp. 5206-5210). South Brisbane, Australia.
Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., &?Sutskever, I. (2023, July). Robust speech recognition via?large-scale weak supervision. Proceedings of the 40th International?Conference on Machine Learning (pp. 28492-28518). Honolulu, HI.
Schneider, S., Baevski, A., Collobert, R., & Auli, M. (2019,?September). wav2vec: Unsupervised pre-training for speech?recognition. Proceedings of the Interspeech 2019 (pp. 3465-3469). Graz, Austria.
Tsai, Y. H. H., Bai, S., Liang, P. P., Kolter, J. Z., Morency, L. P., & Salakhutdinov, R. (2019, July). Multimodal transformer for?unaligned multimodal language sequences. Proceedings of the?57th Annual Meeting of the Association for Computational?Linguistics. (pp. 6558-6569). Florence, Italy.
van den Oord, A., Li, Y., & Vinyals, O. (2018). Representation?learning with contrastive predictive coding. Retrieved from?https://doi.org/10.48550/arxiv.1807.03748
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez,?A. N., Kaiser, L., ... Polosukhin, I. (2017, December). Attention is?all you need. Proceedings of the Advances in Neural Information?Processing Systems. Long Beach, CA.
Watanabe, S., Hori, T., Karita, S., Hayashi, T., Nishitoba, J., Unno,?Y., Soplin, N. E. Y., ... Ochiai, T. (2018). ESPnet: End-to-end?speech processing toolkit. Retrieved from https://doi.org/10.48550/arXiv.1804.00015
Yadav, H., & Sitaram, S. (2022, June). A survey of multilingual?models for automatic speech recognition. Proceedings of the?Thirteenth Language Resources and Evaluation Conference (pp. 5071-5079). Marseille, France.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.