보고서 정보
주관연구기관 |
한국과학기술정보연구원 Korea Institute of Science and Technology Information |
연구책임자 |
참여연구자 |
Sarah Jones
보고서유형 | 2단계보고서 |
발행국가 | 대한민국 |
언어 |
발행년월 | 2019-12 |
과제시작연도 |
2019 |
주관부처 |
과학기술정보통신부 Ministry of Science and ICT |
연구관리전문기관 |
한국과학기술정보연구원 Korea Institute of Science and Technology Information |
등록번호 |
TRKO202000029576 |
과제고유번호 |
1711097327 |
사업명 |
한국과학기술정보연구원연구운영비지원(R&D)(주요사업비) |
DB 구축일자 |
키워드 |
콘텐츠 큐레이션.정보 융합.개체 식별.디지털 아카이빙.Content Curation.Information Convergence.Entity Identification.Digital Archiving.DOI.
■ 과학기술 콘텐츠(국내외 학술논문, 기계학습 데이터) 구축 및 기술 개발
ᴼ KISTI 콘텐츠 큐레이션 모델 연구 및 실행전략 개발
ᴼ 인공지능 기술을 적용한 콘텐츠 큐레이션 기술 개발
ᴼ 국내외 학술논문 DB 구축
ᴼ 과학기술 전문용어 구축 및 연구
■ 국가 R&D 성과물(논문, 보고서) 관리 체제 구축 및 운영
ᴼ 국가 R&D 보고서 원문 등록 및 구축
ᴼ 국가 R&D 논문성과물 검증 구축
ᴼ 국가 R&D 성과물 콘텐츠 분석을 통한 데이터셋 구축
■ 과학기술 콘텐츠
■ 과학기술 콘텐츠(국내외 학술논문, 기계학습 데이터) 구축 및 기술 개발
ᴼ KISTI 콘텐츠 큐레이션 모델 연구 및 실행전략 개발
ᴼ 인공지능 기술을 적용한 콘텐츠 큐레이션 기술 개발
ᴼ 국내외 학술논문 DB 구축
ᴼ 과학기술 전문용어 구축 및 연구
■ 국가 R&D 성과물(논문, 보고서) 관리 체제 구축 및 운영
ᴼ 국가 R&D 보고서 원문 등록 및 구축
ᴼ 국가 R&D 논문성과물 검증 구축
ᴼ 국가 R&D 성과물 콘텐츠 분석을 통한 데이터셋 구축
■ 과학기술 콘텐츠 식별·연계 관리 체제 구축 및 운영
ᴼ DOI 등록관리 서비스 운영
ᴼ DOI 등록대상 콘텐츠 확대 및 시스템 고도화
ᴼ 인물/기관 식별정보 구축 및 연계
(출처 : 초록 5p)
Ⅳ. Result of R&D
1. S&T content construction and technology development
1) S&T content construction based on content curation model
❍ e-Gate DB construction and management
- Article level metadata 4,000,226 (Total 96,859,361) / Reference data 90,919,348 (Total 754,927,879)
- Paper jou
Ⅳ. Result of R&D
1. S&T content construction and technology development
1) S&T content construction based on content curation model
❍ e-Gate DB construction and management
- Article level metadata 4,000,226 (Total 96,859,361) / Reference data 90,919,348 (Total 754,927,879)
- Paper journal holding information of academic and research libraries : 536,180 (Total 25,874,373 issues of paper journals from 556 libraries)
- Electronic journal licensing information of KESLI consortium member libraries : 1,159,698 journal licensing information from 451 libraries
- Journal title authority data 3,532 controlled journal titles from 8,989 titles (Total 122,339 controlled journal titles from 331,714 tittles)
- Open Access, Free Access journal information : Journal 2,715 / Articles 1,835,486 (Total Journal 34,393 / Articles 15,619,965)
- Linking information : DOI 4,845,964건 (Total 56,350,308건)
- Electronic full-text of international journal for public service(Medical field OA article): Journal 6,631 / Article 588,982 (Total 4,089,311건)
- DOI validation and duplicate record maintenance: Verification of presence in Crossref and DOI centers with 52 million cases, Error data detection(158,316), DOI duplicate record187,973 maintenance
- DOI priority check and duplicate record input prevention measure as checking duplicate articles
- Apply Funding information and author ORCID extraction logic in articles loading
❍ Domestic S&T journal information construction
- Obtaining and constructing domestic journals with 1,301 species* compared to 1,414 journals in science and technology field
* Total number of journals with article information construction history in KISTI domestic article DB
- Article metadata 58,102 (Total 1,698,828건) / Reference data 1,200,023 (Total 15,535,486) / DOI deposit article 21,081 (Total 313,373)
- Refinement of bibliographic information previously constructed : 14,021
- Table/figures from article 18,008(article) and author identification 58,102(article)
- The scope of DB construction is as follows: Newly published papers in 2019 and unconstructed articles in 2018
- Registering DOI of scholarly article result : 21,081 (Total 313,373)
- Calculation of IF / ZIF / Immediacy Index for KJCR 636 journals
- Publication of Korea Journal Citation Reports(KJCR) 2019 and Distribution
- New construction data quality level : 99.9979% (Article)
❍ Construction and research of S&T terminology
2) Advancement of content curation technology
❍ Advancement of technology extracting paper metadata automatically
- Extraction of 37 metadata items from 40 journals : extracting most items necessary in practice
- Improvement of identifying metadata division and items(accuracy 0.97, 0.89)
❍ Advancement of technology extracting bibliography references automatically
- Improvement of identifying metadata division by excluding multi-column document structure and table/figures, etc(accuracy 0.84)
- Refernce bibliography extraction performance F1 0.97, superior to that of the major open source bibliographic parsers
3) Construction and research of machine learning data of S&T information
❍ Construction general-purpose machine learning data set based on medical/biochemistry field
- Ontology type of article summary information : 1,439,683
- Auxiliary information for information service : info-box terminology entry 1,225,363, explanation terminology entry 2,226,176
❍ Development of general-purpose machine learning data construction and management system and API service
❍ Definition of AI shared tasks using articles in Korean
- Korean article abstract machine reading (KorSciQA) shared task data production : 2,490 QnA data set from 498 articles.
- BERT model implementation using KorSciQA / Evaluation of Machine Reading Performance Comparison with KorQuAD Model : KorSciQA(66.52%) is better thanKorQuAD(45.62%)
※ The 31st Annual Conference on Human & Cognitive Language Technology Excellent Paper Award (2019)
- Production of shared task data of Korean article argument structure : 2,936 QnA data set from 498 articles
❍ Establishment a strategy for constructing and utilizing general-purpose machine learning data
- Investigating the latest trends and demands through KISTI internal demand survey, external expert advisory meeting (9 times), written survey (5 times) by machine learningdata TFT(8 people)
- Establishing 5-year long-term data construction and utilization plan considering derivation of general machine learning data items, availability of KISTI, possibility of use, and urgency
4) Development of content curation execution strategy
❍ In-depth analysis of KISTI S&T content curation model and development of execution strategy
- Joint research with UK Digital Curation Center(DCC)
- Gap analysis between existing system and curation applied To-Be model in terms of content/system/organization
- Developing guidelines for content selection, evaluation decision tree, stakeholder analysis
5) KISTI holding information resources development and service
❍ Registration and preservation management of information resources
- Subscription, registration and preservation management for KISTI information materials: Series, books, research reports, etc.
❍ Procurement and service of electonic information for internal service
- Electronic resource subscription management: CIENCE, NATURE, ACM, IEL, WEB OF SCIENCE, etc.
- User service / Operation and management of material management system
2. National R&D(article, report) outcomes content construction
1) National R&D outcomes article content construction
❍ Reinforcement of SCI(E) article verification and registration system
- Identifying, collecting, verifying, and analyzeing R&D article of domestic researchers by establishing an integrated verification DB for R&D articles covering major domestic and international index DB(Web of Science, SCOPUS, KCI)
- Enhancement of construction management for systematic linkage of national R&D metadata, references, and text
- Construction of integrated verification DB for R&D articles
❍ Acquisition of texts of national R&D articles : Achievement 35.4% (Goal 35%)
- The goal is to preserve 35% of R&D article text available for public disclosure, a 5% increase over the previous year, as we obtained 167,253 article text marked as open access (Open Access, Free Access) among national R&D article published since 2008
- Cooperation with domestic project management institutes and research institutes and obtaining the largest OA article texts distributed at home and abroad
❍ Improvement of national R&D article collection management system
- Improvement R&D article registration function(application of author identification API)
- Improvement R&D article registration management process
❍ Improvement of function of integrated DB for R&D article registration verification
- Improvement of function of integrated DB for R&D article registration verification
- Selecting Korea's top researcher by utilizing the world's top 1% Highly Cited article data included in the Essential Science Indicator of WoS for the last 5 years
❍ Liking text and metadata with R&D article
- Acquisition of results of 2018 investigation and analysis article and NTIS input article
- Among the 525,049 article records registered in '14 ~ '18, SCIE article was 333,884, among which the 327,92 article is linking with original text.
- Providing data to NTIS-NDSL for national R&D performance information service : 81,491
2) National R&D outcomes registration and construction
❍ Advancemnet of co-utilization system for acquisition of national R&D report
- Policy improvement and cooperation with project management institution for maximizing the registration rate and utilization of reports at the level of the national R&D reportdedicated institution
- Fully surveyed over 160 project management institutions to confirm the total amount of national R&D report and investigating whether the report is generated per project
❍ Achieving 94.83% of report registrations (Registration goal 90%)
- Conducting policy improvement and cooperative activities with project management institutions in order to identify the total amount of national R&D report generated through the execution of national R&D projects, and to maximize the registration rate and utilization at the level of the institution dedicated to national R&D report
- Among the 280,779 projects in '14 ~ '18, 57,166 reports were generated, of which 54,208 were registered
❍ Building High value-added DB of national R&D report
- Structured DB by classifying chapter/section of report : 16,300
- Detection and removal of personal information in report : 12,236
- Extracting non-text (table, figure) contents of report : 275,141 tables, 558,172 figures
- Selected and deposited DOI grant targets: 5,485 in 2019 (Total 48,612)
- Inserting watermark to enhance copyright and security of report : 16,300
❍ Providing data for the original text of national R&D report : NDSL and NTIS linked service
- Providing 12,160 of NDSL, 9.348 of NTIS
❍ Development of technologies to utilize and spread national R&D outcomes
- Improvement of national R&D research outcome report collection management system
- Improvement of automatic personal information detector
- Implementation of non-text API sorting function and non-text search function mapped with project information
- Enhancement of national R&D research outcomes similarity search : development of large capacity search API, text analysis function, research data recommendation function
3. Establishment and operation of S&T content identification and linkage management system
1) Operation of international standardized identification system (DOI) registration management service and system enhancement
❍ Global dissemination of domestic research outcomes through the registration of scientific and technological content such as papers, patents, and reports
- DOI registration status : 1,087,369 registrations (including retroactive portion, registration goal 450,000)
❍ Cooperation between international DOI registration institutions : DOI strategic meeting, GDPR working group
❍ Supporting domestic academic organizations : DOI registration, paper plagiarism prevention service, etc.
❍ Operation of KoreaScience for globally spreading domestic academic papers
❍ Education and promotion for DOI registration institutions : 3 times
2) Expansion of Author/institution identification DB and improvement of accuracy
❍ Author/institution identification data establishment
- Extracting/identifying/establishing author/institution name identification data from domestic and international papers, research reports, and patents by applying machinelearning technology : author identification rate 93.43%, institution identification rate 89.72%
- Completion of machine learning autor identification by linking approximately 1.9 million global identifiers: 33,308 additional identification
❍ Expanding linkage and utilization with other identification systems
- Developing web-based autor identification API and Providing it to DB construction (2 cases): Domestic academic paper DB construction, paper outcome DB construction
- Improving accuracy and expandability of information service by applying author/institution identification data in academic search service such as NDSL, NTIS, Korea Science, ISNI-KOREA, etc
(출처 : SUMMARY 22p)
목차 Contents
- 표지 ... 1
- 제 출 문 ... 3
- 보고서 초록 ... 5
- 요 약 문 ... 7
- SUMMARY ... 17
- CONTENTS ... 29
- 목차 ... 31
- 표목차 ... 33
- 그림목차 ... 35
- 제1장 연구개발과제의 개요 ... 39
- 제1절 목적 및 필요성 ... 39
- 제2절 목표 및 내용 ... 44
- 1. 연구개발과제의 목표 ... 44
- 2. 연구개발과제의 내용 ... 45
- 3. 추진전략 및 방법 ... 48
- 4. 기대효과 ... 51
- 제2장 국내외 기술개발 현황 ... 53
- 제1절 국내현황 ... 53
- 제2절 해외현황 ... 56
- 제3장 연구개발 수행 내용 및 결과 ... 62
- 제1절 과학기술 콘텐츠(국내외 학술논문, 기계학습 데이터) 구축 및 기술 개발 ... 62
- 1. 콘텐츠 큐레이션 기반 과학기술 콘텐츠 구축 ... 62
- 2. 콘텐츠 큐레이션 기술 고도화 ... 91
- 3. 과학기술분야 기계학습용 데이터 구축 ... 93
- 4. 콘텐츠 큐레이션 실행전략 개발 ... 97
- 5. KISTI 소장 정보자원 개발 및 서비스 ... 102
- 제2절 국가 R&D 연구성과(논문, 보고서) 관리 체제 구축 및 운영 ... 109
- 1. 국가 R&D 연구성과 개요 ... 109
- 2. 국가 R&D 연구성과(논문) 관리 체제 구축 ... 111
- 3. 국가 R&D 연구성과(보고서원문) 관리 체제 구축 ... 121
- 제3절 과학기술 콘텐츠 식별·연계 관리 체제 구축 및 운영 ... 134
- 1. 국제표준식별체계(DOI) 등록관리 서비스 운영 ... 134
- 2. 국제표준식별체계(DOI) 등록관리 시스템 고도화 ... 137
- 3. 저자명/기관명 식별데이터 구축 ... 144
- 제4장 목표달성도 및 관련 분야에의 기여도 ... 147
- 제1절 연구개발 목표의 달성도 ... 147
- 1. 과학기술 콘텐츠(국내외 학술논문, 기계학습 데이터) 구축 및 기술 개발 ... 147
- 2. 국가 R&D 연구성과(논문, 보고서) 관리 체제 구축 및 운영 ... 150
- 3. 과학기술 콘텐츠 식별·연계 관리 체제 구축 및 운영 ... 153
- 제2절 관련 분야 기여도 ... 155
- 제5장 연구개발결과의 활용계획 ... 156
- 제6장 참고문헌 ... 157
- 붙임1. 언론보도 현황 ... 159
- 붙임2. 데이터 공동활용 현황 ... 160
- 붙임3. 국내학술논문DB구축 대상 학술단체 현황 ... 162
- 붙임4. 한국DOI센터 회원기관 현황 ... 167
- 붙임5. 논문표절검사서비스 지원지관 ... 171
- 끝페이지 ... 177
※ AI-Helper는 부적절한 답변을 할 수 있습니다.