최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기한국융합학회논문지 = Journal of the Korea Convergence Society, v.12 no.12, 2021년, pp.39 - 47
김진성 (고려대학교 컴퓨터학과) , 김경민 (고려대학교 컴퓨터학과) , 손준영 (고려대학교 컴퓨터학과) , 박정배 (고려대학교 Human-inspired AI연구소) , 임희석 (고려대학교 컴퓨터학과)
The construction of high-quality input features through effective segmentation is essential for increasing the sentence comprehension of a language model. Improving the quality of them directly affects the performance of the downstream task. This paper comparatively studies the segmentation that eff...
Y. Liu. et al. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
J. Devlin, M. W. Chang, K. Lee & K. Toutanova. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Y. Wu. et al. (2016). Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.
T. Kudo. & J. Richardson. (2018). Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv:1808.06226.
M. Kim., Y. Kim., Y. Lim. & E. N. Huh. (2019, July). Advanced subword segmentation and interdependent regularization mechanisms for korean language understanding. In 2019 Third World Conference on Smart Trends in Systems Security and Sustainablity (WorldS4) (pp. 221-227). London : UK. DOI : 10.1109/WorldS4.2019.8903977
O. Kwon, D. Kim, S. R. Lee, J. Choi & S. Lee. (2021, April). Handling Out-Of-Vocabulary Problem in Hangeul Word Embeddings. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (pp. 3213-3221). DOI : 10.18653/v1/2021.eacl-main.280
S. Lee, H. Jang, Y. Baik, S. Park & H. Shin. (2020). Kr-bert: A small-scale korean-specific language model. arXiv preprint arXiv:2008.03979. DOI : 10.5626/jok.2020.47.7.682
A. Matteson, C. Lee, Y. Kim & H. S. Lim. (2018, August). Rich character-level information for Korean morphological analysis and part-of-speech tagging. In Proceedings of the 27th International Conference on Computational Linguistics (pp. 2482-2492).
S. Moon. & N. Okazaki. (2020, May). Jamo Pair Encoding: Subcharacter Representation-based Extreme Korean Vocabulary Compression for Efficient Subword Tokenization. In Proceedings of the 12th Language Resources and Evaluation Conference (pp. 3490-3497). Marseille : France.
D. B. Cho, H. Y. Lee & S. S. Kang. (2021). An Empirical Study of Korean Sentence Representation with Various Tokenizations. Electronics, 10(7), 845. DOI : 10.3390/electronics10070845
K. Park, J. Lee, S. Jang & D. Jung. (2020). An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks. arXiv preprint arXiv:2010.02534.
E. F. Sang & F. De Meulder. (2003). Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. arXiv preprint cs/0306050.
S. Park. et al. (2021). KLUE: Korean Language Understanding Evaluation. arXiv preprint arXiv:2105.09680.
*원문 PDF 파일 및 링크정보가 존재하지 않을 경우 KISTI DDS 시스템에서 제공하는 원문복사서비스를 사용할 수 있습니다.
출판사/학술단체 등이 한시적으로 특별한 프로모션 또는 일정기간 경과 후 접근을 허용하여, 출판사/학술단체 등의 사이트에서 이용 가능한 논문
※ AI-Helper는 부적절한 답변을 할 수 있습니다.