최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기한국정보전자통신기술학회논문지 = Journal of Korea institute of information, electronics, and communication technology, v.16 no.4, 2023년, pp.193 - 202
주하영 (KAIST-Megazone Cloud Research Center for Intelligent Cloud Computing Convergence Research) , 오현택 (KAIST Institute for Information Technology Convergence & KAIST-Megazone Cloud Research Center for Intelligent Cloud Computing Convergence Research) , 양진홍 (Department of Medical IT, INJE University)
In recent years, the outstanding performance of large language models (LLMs) trained on extensive datasets has become a hot topic. Since studies on LLMs are available on open-source approaches, the ecosystem is expanding rapidly. Models that are task-specific, lightweight, and high-performing are be...
M. Shanahan, "Talking about large language?models," CoRR, vol. abs/2212.03551, 2022.?
J. Kaplan, S. McCandlish, T. Henighan, T. B.?Brown, B. Chess, R. Child, S. Gray, A.?Radford, J. Wu, and D. Amodei, "Scaling?laws for neural language models," CoRR,?vol. abs/2001.08361, 2020.?
J. Hoffmann, S. Borgeaud, A. Mensch, E.?Buchatskaya, T. Cai, E. Rutherford, D. de?Las Casas, L. A. Hendricks, J. Welbl, A.?Clark, T. Hennigan, E. Noland, K. Millican,?G. van den Driessche, B. Damoc, A. Guy, S.?Osindero, K. Simonyan, E. Elsen, J. W. Rae,?O. Vinyals, and L. Sifre, "Training?compute-optimal large language models,"?vol. abs/2203.15556, 2022.?
W. Fedus, B. Zoph, and N. Shazeer, "Switch?transformers: Scaling to trillion parameter?models with simple and efficient sparsity," J.?Mach. Learn. Res, pp. 1-40, 2021.?
Zhao, Wayne Xin, et al. "A survey of large?language models." arXiv preprint?arXiv:2303.18223, 2023.?
A. Vaswani, N. Shazeer, N. Parmar, J.?Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser,?and I. Polosukhin, "Attention is all you?need," in Advances in Neural Information?Processing Systems 30: Annual Conference on Neural Information Processing Systems,?pp. 5998-6008, 2017.?
J. Devlin, M. Chang, K. Lee, and K.?Toutanova, "BERT: pre-training of deep?bidirectional transformers for language?understanding," in Proceedings of the 2019?Conference of the North American Chapter?of the Association for Computational?Linguistics: Human Language Technologies,?NAACL-HLT 2019, Minneapolis, MN, USA,?June 2-7, 2019, Volume 1 (Long and Short?Papers), J. Burstein, C. Doran, and T.?Solorio, Eds. Association for Computational?Linguistics, p. 4171-4186, 2019.?
A. Radford, K. Narasimhan, T. Salimans, I.?Sutskever et al., "Improving language?understanding by generative pre-training,"?2018.?
A. Radford, J. Wu, R. Child, D. Luan, D.?Amodei, I. Sutskever et al., "Language?models are unsupervised multitask learners,"?OpenAI blog, p. 9, 2019.?
R. Nakano, J. Hilton, S. Balaji, J. Wu, L.?Ouyang, C. Kim, C. Hesse, S. Jain, V.?Kosaraju, W. Saunders, X. Jiang, K. Cobbe,?T. Eloundou, G. Krueger, K. Button, M.?Knight, B. Chess, and J. Schulman, "Webgpt:?Browser-assisted question-answering with?human feedback," CoRR, vol.?abs/2112.09332, 2021.?
T. B. Brown, B. Mann, N. Ryder, M.?Subbiah, J. Kaplan, P. Dhariwal, A.?Neelakantan, P. Shyam, G. Sastry, A. Askell,?S. Agarwal, A. Herbert-Voss, G. Krueger, T.?Henighan, R. Child, A. Ramesh, D. M.?Ziegler, J. Wu, C. Winter, C. Hesse, M.?Chen, E. Sigler, M. Litwin, S. Gray, B.?Chess, J. Clark, C. Berner, S. Mc- Candlish,?A. Radford, I. Sutskever, and D. Amodei,?"Language models are few-shot learners," in?Advances in Neural Information Processing Systems 33: Annual Conference on Neural?Information Processing Systems 2020,?NeurIPS 2020, December 6-12, 2020, virtual,?H. Larochelle, M. Ranzato, R. Hadsell, M.?Balcan, and H. Lin, Eds., 2020.?
R. Thoppilan, et al., "Lamda: Language?models for dialog applications," CoRR, vol.?abs/2201.08239, 2022.?
S. Black, L. Gao, P. Wang, C. Leahy, and?S. Biderman, "GPT-Neo: Large Scale?Autoregressive Language Modeling with?Mesh-Tensorflow," 2021.?
B. Wang and A. Komatsuzaki, "GPT-J-6B: A?6 Billion Parameter Autoregressive Language?Model,"https://github.com/kingoflolz/mesh-transformer-jax, 2021.?
"Introducing chatgpt," OpenAI Blog,?November 2022.?
J. Ye, X. Chen, N. Xu, C. Zu, Z. Shao, S.?Liu, Y. Cui, Z. Zhou, C. Gong, Y. Shen, J.?Zhou, S. Chen, T. Gui, Q. Zhang, and X.?Huang, "A comprehensive capability analysis?of gpt-3 and gpt-3.5 series models," arXiv?preprint arXiv:2303.10420, 2023.?
V. Korthikanti, J. Casper, S. Lym, L.?McAfee, M. Andersch, M. Shoeybi, and B.?Catanzaro, "Reducing activation?recomputation in large transformer models,"?CoRR, vol. abs/2205.05198, 2022.?
E. Nijkamp, B. Pang, H. Hayashi, L. Tu, H.?Wang, Y. Zhou, S. Savarese, and C. Xiong,?"Codegen: An open large language model?for code with mtulti-turn program?synthesis," arXiv preprint arXiv:2203.13474,?2022.?
S. Iyer, X. V. Lin, R. Pasunuru, T.?Mihaylov, D. Simig, P. Yu, K. Shuster, T.?Wang, Q. Liu, P. S. Koura, X. Li, B. O'Horo,?G. Pereyra, J.Wang, C. Dewan, A.?Celikyilmaz, L. Zettlemoyer, and V.?Stoyanov, "OPT-IML: scaling language model?instruction meta learning through the lens?of generalization," CoRR, vol.?abs/2212.12017, 2022.?
R. Taylor, M. Kardas, G. Cucurull, T.?Scialom, A. Hartshorn, E. Saravia, A.?Poulton, V. Kerkez, and R. Stojnic,?"Galactica: A large language model for?science," CoRR, vol. abs/2211.09085, 2022.?
A. Chowdhery, et al., "Palm: Scaling?language modeling with pathways," CoRR,?vol. abs/2204.02311, 2022.?
H. Touvron, T. Lavril, G. Izacard, X.?Martinet, M. Lachaux, T. Lacroix, B.?Rozi'ere, N. Goyal, E. Hambro, F. Azhar, A.?Rodriguez, A. Joulin, E. Grave, an G.?Lample, "Llama: Open and efficient?foundation language models," CoRR, 2023.?
H. Touvron, et al., "Llama 2: Open?Foundation and Fine-Tuned Chat Models",?arXiv:2307.09288, 2023?
OpenAI, "Gpt-4 technical report," OpenAI,?2023.?
S. Wang, Y. Sun, Y. Xiang, Z. Wu, S. Ding,?W. Gong, S. Feng, J. Shang, Y. Zhao, C.?Pang, J. Liu, X. Chen, Y. Lu, W. Liu, X.?Wang, Y. Bai, Q. Chen, L. Zhao, S. Li, P.?Sun, D. Yu, Y. Ma, H. Tian, H. Wu, T. Wu,?W. Zeng, G. Li, W. Gao, and H. Wang,?"ERNIE 3.0 titan: Exploring larger-scale?knowledge enhanced pretraining for?language understanding and generation,"?CoRR, vol. abs/2112.12731, 2021.?
James Manyika, "An overview of Bard: an early experiment with generative AI", 2023?
Google, "PaLM 2 Technical Report", 2023?
https://crfm.stanford.edu/2023/03/13/alpaca.html
https://www.mosaicml.com/blog/mpt-7b
https://huggingface.co/tiiuae/falcon-40b
https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html
S. Biderman, H. Schoelkopf, Q. Anthony,?H. Bradley, K. O'Brien, E. Hallahan, M. A.?Khan, S. Purohit, U. S. Prashanth, E. Raff et?al., "Pythia: A suite for analyzing large?language models across training and?scaling," arXiv preprint arXiv:2304.01373,?2023.?
LMSYS The Vicuna Team, "Vicuna: An?open-source chatbot impressing gpt-4 with?90%** chatgpt quality," 2023.?
T. B. Brown, B. Mann, N. Ryder, M.?Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell,?S. Agarwal, A. Herbert-Voss, G. Krueger, T.?Henighan, R. Child, A. Ramesh, D. M.?Ziegler, J. Wu, C. Winter, C. Hesse, M.?Chen, E. Sigler, M. Litwin, S. Gray, B.?Chess, J. Clark, C. Berner, S. Mc- Candlish,?A. Radford, I. Sutskever, and D. Amodei,?"Language models are few-shot learners," in?Advances in Neural Information Processing?Systems 33: Annual Conference on Neural?Information Processing Systems 2020,?NeurIPS 2020, December 6-12, 2020, virtual,?H. Larochelle, M. Ranzato, R. Hadsell, M.?Balcan, and H. Lin, Eds., 2020.?
E. J. Hu, Y. Shen, P.Wallis, Z. Allen-Zhu,?Y. Li, S.Wang, L.Wang, andW. Chen, "Lora:?Low-rank adaptation oflarge language?models," in The Tenth International?Conferenceon Learning Representations,?ICLR 2022, 2022.?
S. Mangrulkar, S. Gugger, L. Debut, Y.?Belkada, and S. Paul, "Peft: State-of-the-art?parameter-efficient fine tuning methods,",?2022.?
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?
https://huggingface.co/EleutherAI/polyglot-ko-12.8b?
https://github.com/Beomi/KoAlpaca?
https://github.com/melodysdreamj/KoVicuna?
https://github.com/nlpai-lab/KULLM?
Srivastava, Aarohi, et al. "Beyond the?imitation game: Quantifying and?extrapolating the capabilities of language?models." arXiv preprint arXiv:2206.04615,?2022.?
S. Bubeck, V. Chandrasekaran, R. Eldan, J.?Gehrke, E. Horvitz, E. Kamar, P. Lee, Y. T.?Lee, Y. Li, S. Lundberg, H. Nori, H. Palangi,?M. T. Ribeiro, and Y. Zhang, "Sparks of?artificial general intelligence: Early?experiments with gpt-4," vol.?abs/2303.12712, 2023.?
J. Li, T. Tang, W. X. Zhao, and J. Wen,?"Pretrained language model for text?generation: A survey," in Proceedings of the?Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual?Event / Montreal, Canada, 19-27 August?2021, Z. Zhou, Ed. ijcai.org, 2021.?
*원문 PDF 파일 및 링크정보가 존재하지 않을 경우 KISTI DDS 시스템에서 제공하는 원문복사서비스를 사용할 수 있습니다.
Free Access. 출판사/학술단체 등이 허락한 무료 공개 사이트를 통해 자유로운 이용이 가능한 논문
※ AI-Helper는 부적절한 답변을 할 수 있습니다.