[논문]ChatGPT가 자동 생성한 더블린 코어 메타데이터의 품질 평가: 국내 도서를 대상으로

김선욱; 이혜경; 이용구

doi:10.3743/kosim.2023.40.2.183

ChatGPT가 자동 생성한 더블린 코어 메타데이터의 품질 평가: 국내 도서를 대상으로
Quality Evaluation of Automatically Generated Metadata Using ChatGPT: Focusing on Dublin Core for Korean Monographs 원문보기

정보관리학회지 = Journal of the Korean society for information management, v.40 no.2, 2023년, pp.183 - 209

김선욱 (경북대학교 사회과학대학 문헌정보학과) , 이혜경 (경북대학교 사회과학대학 문헌정보학과) , 이용구 (경북대학교 사회과학대학 문헌정보학과)

초록
AI-Helper

이 연구의 목적은 ChatGPT가 도서의 표지, 표제지, 판권기 데이터를 활용하여 생성한 더블린코어의 품질 평가를 통하여 ChatGPT의 메타데이터의 생성 능력과 그 가능성을 확인하는 데 있다. 이를 위하여 90건의 도서의 표지, 표제지와 판권기 데이터를 수집하여 ChatGPT에 입력하고 더블린 코어를 생성하게 하였으며, 산출물에 대해 완전성과 정확성 척도로 성능을 파악하였다. 그 결과, 전체 데이터에 있어 완전성은 0.87, 정확성은 0.71로 준수한 수준이었다. 요소별로 성능을 보면 Title, Creator, Publisher, Date, Identifier, Right, Language 요소가 다른 요소에 비해 상대적으로 높은 성능을 보였다. Subject와 Description 요소는 완전성과 정확성에 대해 다소 낮은 성능을 보였으나, 이들 요소에서 ChatGPT의 장점으로 알려진 생성 능력을 확인할 수 있었다. 한편, DDC 주류인 사회과학과 기술과학 분야에서 Contributor 요소의 정확성이 다소 낮았는데, 이는 ChatGPT의 책임표시사항 추출 오류 및 데이터 자체에서 메타데이터 요소용 서지 기술 내용의 누락, ChatGPT가 지닌 영어 위주의 학습데이터 구성등에 따른 것으로 판단하였다.

Abstract ▼ AI-Helper

The purpose of this study is to evaluate the Dublin Core metadata generated by ChatGPT using book covers, title pages, and colophons from a collection of books. To achieve this, we collected book covers, title pages, and colophons from 90 books and inputted them into ChatGPT to generate Dublin Core metadata. The performance was evaluated in terms of completeness and accuracy. The overall results showed a satisfactory level of completeness at 0.87 and accuracy at 0.71. Among the individual elements, Title, Creator, Publisher, Date, Identifier, Rights, and Language exhibited higher performance. Subject and Description elements showed relatively lower performance in terms of completeness and accuracy, but it confirmed the generation capability known as the inherent strength of ChatGPT. On the other hand, books in the sections of social sciences and technology of DDC showed slightly lower accuracy in the Contributor element. This was attributed to ChatGPT's attribution extraction errors, omissions in the original bibliographic description contents for metadata, and the language composition of the training data used by ChatGPT.

주제어

표/그림 (15)

표 <표 1> 더블린 코어 15대 요소
그림 <그림 1> 연구 절차
표 <표 2> 도서 선정 현황
표 <표 3> 전처리 규칙
그림 <그림 2> ChatGPT를 이용한 더블린 코어 생성 과정
표 <표 4> 판권기 기술서지 요소와 DC 요소의 연결
표 <표 5> 더블린 코어 생성 가능 패턴
그림 <그림 3> 완전성과 정확성의 성능별 해당 데이터 수
표 <표 6> 전체 데이터 기준 품질 평가
표 <표 7> 요소별 완전성과 정확성 성능
표 <표 8> 주류별 완전성과 정확성 평균
그림 <그림 4> 주류별 데이터의 완전성 및 정확성
그림 <그림 5> DDC 주류별 더블린 코어 요소별 완전성
그림 <그림 6> DDC 주류별 더블린 코어 요소별 정확성
표 <표 9> ChatGPT가 생성한 더블린 코어를 요소별 성능으로 구분한 사례

참고문헌 (42)

Jung, Hanmin & Park, Junghun (2023). Design and issues of writing literatures using ChatGPT. Journal of Knowledge Information Technology and Systems, 18(1), 31-40.？https://doi.org/10.34163/jkits.2023.18.1.004
Jung, Jong Jin, Kim, Kyung Won, & Kim, Gu Hwan (2020). A Study on automatic metadata？extraction to support dataset search. Proceedings of KICS Summer Conference 2020, 867-868.
Lee, Chi-Ju, Lee, Sung-Sook, Kim, Sang-Gyu, Choi, Sung-Hwan, & Kook, Min-Sang (2000).？Dublin core and MARC. KLA Buttetin, 41(6), 4-34.
Lee, Kyungho (2013). Information Science (3rd ed.). Daegu: Inswaemadang.
Lee, Myounggyu (2010). A study on the description elements of the book colophon in Korea.？Journal of Korean Library and Information Science Society, 41(1), 211-231.？https://doi.org/10.16981/kliss.41.1.201003.211

원문보기 상세보기
Lee, Yong-Gu & Kim, Byungkyu (2011). A study on quantitative measurement of metadata？quality for journal articles. Journal of the Korean Society for Information Management,？28(1), 309-326. https://doi.org/10.3743/KOSIM.2011.28.1.309

원문보기 상세보기
Lee, Yunhee, Kim, Changsik, & Ahn, Hyunchul (2023). A study on the ChatGPT: focused on？the news big data service and ChatGPT use cases. Journal of the Korea Society of Digital？Industry and Information Management, 19(1), 139-151.
Noh, Dae-won (2023). Fiction-writing robot: ChatGPT and AI-generated literature. Journal of？Korean Literary Criticism, 77, 125-160.
Song, Hak Jun, Song, Hyoung-yong, & Lee, JiEun (2023). A study on the future of tourism？industry and ChatGPT. Journal of Hotel & Resort, 22(1), 115-128.
Yang, Gi-Chul & Park, Jeong-Ran (2018). Automatic extraction of metadata information for？library collections. The International Journal of Advanced Culture Technology, 6(2), 117-122.？https://doi.org/10.17703/IJACT.2018.6.2.117

원문보기 상세보기
Yong, Sung-Jung, Park, Hyo-Gyeong, You, Yeon-Hwi, & Moon, Il-Young (2021). Method of？automatically generating metadata through audio analysis of video content. Journal of？Advanced Navigation Technology, 25(6), 557-561.
Armengol-Estape, J., Bonet, O. G., & Melero, M. (2021). On the Multilingual Capabilities of Very？Large-Scale English Language Models. arXiv e-prints.？https://doi.org/10.48550/arXiv.2108.13349
Chapman, A. & Massey, O. (2002). A catalogue quality audit tool. Library management, 23(6/7),？314-324. https://doi.org/10.1108/01435120210432282

상세보기
Cox, C. & Tzoc, E. (2023). ChatGPT: Implications for academic libraries. College & Research？Libraries News, 84(3), 99. https://doi.org/10.5860/crln.84.3.99

상세보기
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional？transformers for language understanding. arXiv preprint.？https://doi.org/10.48550/arXiv.1810.04805
Guinchard, C. (2002). Dublin Core use in libraries: a survey. OCLC Systems & Services: International？digital library perspectives, 18(1), 40-50. https://doi.org/10.1108/10650750210418190

상세보기
Han, H., Giles, C. L., Manavoglu, E., Zha, H., Zhang, Z., & Fox, E. A. (2003). Automatic？document metadata extraction using support vector machines. In 2003 Joint Conference？on Digital Libraries, 37-48. https://doi.org/10.1109/JCDL.2003.1204842
Huang, J., Shao, H., Chang, K. C. C., Xiong, J., & Hwu, W. M. (2022). Understanding jargon:？Combining extraction and generation for definition modeling. In Proceedings of the 2022？Conference on Empirical Methods in Natural Language Processing, 3994-4004.
Irvin, K. M. (2003). Comparing information retrieval effectiveness of different metadata generation？methods. Master's thesis, University of North Carolina at Chapel Hill, United States.？https://doi.org/10.17615/grff-0v98
James, R. & Weiss, A. (2012). An assessment of Google Books' metadata. Journal of Library？Metadata, 12(1), 15-22. https://doi.org/10.1080/19386389.2012.652566

상세보기
Johnson, D., Goodman, R., Patrinely, J., Stone, C., Zimmerman, E., Donald, R., Chang, S., Berkowitz,？S., Finn, A., Jahangir, E., Scoville, E., Reese, T., Friedman, D., Bastarache, J., Heijden,？Y., Wright, J., Carter, N., Alexander, M., Choe, J., Chastain, C., Zic, J., Horst, S., Turker,？I., Agarwal, R., Osmundson, E., Idrees, K., Kiernan, C., Padmanabhan, C., Bailey, C.,？Schlegel, C., Chambless, L., Gibson, M., Osterman, T., & Wheless, L. (2023). Assessing？the accuracy and reliability of AI-generated medical responses: an evaluation of the？Chat-gpt model. https://doi.org/10.21203/rs.3.rs-2566942/v1？
Kirtania, D. K. & Patra, S. K. (2023). OpenAI ChatGPT Generated Content and Similarity？Index: A study of selected terms from the Library & Information Science (LIS). Qeios.？https://doi.org/10.32388/FO1CP6.3
Liu, V. & Chilton, L. B. (2022). Design guidelines for prompt engineering text-to-image generative？models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing？Systems, 384, 1-23. https://doi.org/10.1145/3491102.3501825
Lund, B. D. & Wang, T. (2023). Chatting about ChatGPT: how may AI and GPT impact？academia and libraries?. Library Hi Tech News.？https://doi.org/10.1108/LHTN-01-2023-0009

상세보기
Margaritopoulos, M., Margaritopoulos, T., Mavridis, I., & Manitsaris, A. (2012). Quantifying and？measuring metadata completeness. Journal of the American Society for Information Science？and Technology, 63(4), 724-737. https://doi.org/10.1002/asi.21706

상세보기
Moradi, M., Blagec, K., Haberl, F., & Samwald, M. (2021). Gpt-3 models are poor few-shot？learners in the biomedical domain. arXiv preprint.？https://doi.org/10.48550/arXiv.2109.02555
Ochoa, X. & Duval, E. (2009). Automatic evaluation of metadata quality in digital repositories.？International journal on digital libraries, 10, 67-91.？https://doi.org/10.1007/s00799-009-0054-4

상세보기
Ojokoh, B. A., Adewale, O. S., & Falaki, S. O, (2009). Automated document metadata extraction.？Journal of Information Science, 35(5), 563-570. https://doi.org/10.1177/0165551509105195

상세보기
Park, J. R. (2009). Metadata quality in digital repositories: A survey of the current state of the？art. Cataloging & classification quarterly, 47(3-4), 213-228.？https://doi.org/10.1080/01639370902737240

상세보기
Qu, Y., Liu, P., Song, W., Liu, L., & Cheng, M. (2020). A text generation and prediction system:？pre-training on new corpora using BERT and GPT-2. In 2020 IEEE 10th international？conference on electronics information and emergency communication (ICEIEC), 323-326.？IEEE. https://doi.org/10.1109/ICEIEC49280.2020.9152352
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding？by generative pre-training.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models？are unsupervised multitask learners. OpenAI blog, 1(8), 9.
Rombach, R., Blattmann, A., & Ommer, B. (2022). Text-guided synthesis of artistic images with？retrieval-augmented diffusion models. arXiv preprint. https://doi.org/10.48550/arXiv.2207.13038
Sokvitne, L. (2000). An evaluation of the effectiveness of current Dublin Core metadata for？retrieval. In VALA conference.
Underwood, W. (2020). Automatic Extraction of Dublin Core Metadata from Presidential E-records.？In 2020 IEEE International Conference on Big Data (Big Data), 1931-1938.？https://doi.org/10.1109/BigData50022.2020.9377943
Valls-Vargas, J. (2013). Narrative extraction, processing and generation for interactive fiction？and computer games. In Proceedings of the AAAI Conference on Artificial Intelligence？and Interactive Digital Entertainment, 9(6), 37-40. https://doi.org/10.1609/aiide.v9i6.12600
White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith,？J., & Schmidt, D. C. (2023). A prompt pattern catalog to enhance prompt engineering？with chatgpt. arXiv preprint. https://doi.org/10.48550/arXiv.2302.11382
Zavalina, O. L. & Burke, M. (2021). Assessing skill building in metadata instruction: quality？evaluation of dublin core metadata records created by graduate students. Journal of Education？for Library and Information Science, 62(4), 423-442.？https://doi.org/10.3138/jelis.62-4-2020-0083

상세보기
Zhang, J. & Dimitroff, A. (2004). Internet search engines' response to metadata Dublin Core？implementation. Journal of Information Science, 30(4), 310-320.？https://doi.org/10.1177/0165551504045851

상세보기
Zhang, J. & Dimitroff, A. (2005a). The impact of webpage content characteristics on webpage？visibility in search engine results (Part I). Information Processing & Management, 41(3),？665-690. https://doi.org/10.1016/j.ipm.2003.12.001

상세보기
Zhang, J. & Dimitroff, A. (2005b). The impact of metadata implementation on webpage visibility？in search engine results (Part II). Information processing & management, 41(3), 691-715.？https://doi.org/10.1016/j.ipm.2003.12.002

상세보기
Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., & Ba, J. (2022). Large language？models are human-level prompt engineers. arXiv preprint.？https://doi.org/10.48550/arXiv.2211.01910？

저자의 다른 논문 :

표제어: PCR

동의어: Packet Collision Rate

용어 설명 출처 목록 (6)

용어 설명: PCR은 세균 특이성이 있는 primer를 이용하여 적은 수의 세균이 있을지라도 쉽게 검출할 수 있는 유용한 방법이며, 이를 이용하여 구강 내 치면세균막이나 타액에서 직접 세균을 검출할 수 있게 되었다[8].

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증