보고서 정보
주관연구기관 |
한국과학기술정보연구원 Korea Institute of Science and Technology Information |
연구책임자 |
참여연구자 |
보고서유형 | 1단계보고서 |
발행국가 | 대한민국 |
언어 |
발행년월 | 2022-12
과제시작연도 |
2022
주관부처 |
과학기술정보통신부 Ministry of Science and ICT |
연구관리전문기관 |
한국과학기술정보연구원 Korea Institute of Science and Technology Information |
등록번호 |
TRKO202300002681
과제고유번호 |
1711175872 |
사업명 |
한국과학기술정보연구원연구운영비지원(주요사업비) |
DB 구축일자 |
키워드 |
클라우드 컴퓨팅.고성능컴퓨팅.슈퍼컴퓨팅.인공지능.빅데이터.Cloud Computing.High Performance Computing.Supercomputing.Artificial Intelligence.Big Data.
Ⅳ. 연구개발결과
○ 미래 인프라 대응을 위한 이기종 초고성능시스템 기술 개발
- 자체 개발 초고성능 서버 HW 리비젼 및 클러스터 구축
• 리비전 서버 기반 클러스터 구축 및 실측 성능 검증
✓ 클러스터 시스템 하드웨어 구축 및 소프트웨어 환경 설정
✓ 클러스터 시스템 성능검증 시험
• 소재플랫폼 HW 인프라를 위한 개발 시스템 검증
✓ 개발 시제품 성능 검증 수행 및 결과 분석
- 초고성능컴퓨팅 서버 구성요소 관리기술 도출
• PCIe 4.0
○ HPC 소프트웨어 기술 연구
- 주요 분야 HPC 애플리케이션의 실행패턴 분석
• OpenCL 기반 수치 라이브러리(CLBlast) 분석
✓ CLBlast 소스 코드 및 함수 호출구조 분석
• CUDA 기반 공개코드 수치라이브러리(CUTLASS) 분석
✓ CUTLASS의 세부 구현요소 분석
✓ CUTLASS 내에서의 GEMM 실행구조 분석
• Netlib HPL 2.3을 이용한 CPU 중심 HPL 실행흐름 분석
✓ HPL 소스코드 및 주요 함수 호출구조 분석
• HPL 알고리즘 분석
✓ BLAS 루틴 기반 LU 분해 과정 분석
• NVIDIA GPU에 최적화된 GPU 중심 HPL 프로파일링
✓ 문제 크기에 따른 메모리 소요, HPL 성능, 통신비중 변화 분석
• 엑사스케일 컴퓨팅을 위한 수치라이브러리 최적화 관련 개발동향 조사
✓ ECP, PRACE에서 사용 중인 주요 수치라이브러리 현황 조사
- 이기종 아키텍처 기반 HPC 시스템을 위한 고수준 프로그래밍 모델 분석
• NVIDIA GPU를 위한 CUDA / AMD GPU를 위한 ROCm / Intel XPU를 위한 Level-Zero Interface 특징 분석
• OpenCL 프로그래밍 모델
• SYCL 현황 및 분석
✓ 인텔 LLVM과 DPC++ 분석
✓ HipSYCL 분석
✓ CodePlay의 ComputeCpp 분석
✓ triSYCL 분석
• hipSYCL의 런타임 최적화
- 이기종 아키텍처를 위한 공개 소스 디버거 분석
• GDB 분석
• LLDB 분석
• oclgrind 분석
• Intel Debugger 분석
• ComputeCpp 분석
• 기타 공개프로젝트 분석
- 이기종 아키텍처를 위한 성능 분석 및 프로파일링 기술 동향 조사
• ECP 프로젝트에서의 개발도구 포트폴리오
• PAPI 기반 프로파일링 툴
• NVIDIA GPU의 프로파일링 툴 (퍼포먼스 카운터 및 CUPTI)
- 이기종 아키텍처를 위한 프로파일링 기술개발
• 초고성능 컴퓨팅 시스템 로그 및 프로파일링 데이터 분석 기술 개발
• 초고성능 컴퓨팅 시스템 계층적 메모리 성능 최적화를 위한 데이터 이주 기술 개발
• 이기종 프로파일러 프로토타입 설계 및 개발
- 성능 이식성 중심의 밀집형 행렬/벡터 단위계산 소프트웨어 패키지 개발
• FP32 (single-precision) 단위계산 루틴 개발 및 기능 검증
- 다중 가속기 환경의 HPL 병렬 최적화 기술 개발
• GPU-accelerated HPL의 다중 가속기 기반 오프로드 기술 분석
• 다중 가속기 중심의 HPL 구현을 위한 HPL 주요 함수 실행 분석
• 다중 가속기 기반 HPL 병렬 최적화 기술 적용 사례 조사
○ 초고성능컴퓨팅 공동 활용 플랫폼 기술개발 및 클라우드 환경 구축
- KI Cloud 고도화 및 초고성능컴퓨팅 공동활용 플랫폼 개발 계획 수립
• KI Cloud 고도화 및 공동활용 플랫폼 추진방향 설정
✓ 분산클라우드 지향
✓ 오픈소스 활용
✓ 능동적인 소프트웨어 서비스 지원
✓ 데이터관리 및 공유의 용이성
✓ 공동활용체계 구현방안 고려
✓ KI Cloud 고도화 및 공동활용 플랫폼 추진 계획 수립
✓ 자원통합 및 구축 설계
✓ 분산/멀티클라우드 통신 기술
✓ 마이크로서비스 적용
✓ 보안관리 및 네트워크 관리
✓ HPC Job 서비스
✓ 가상서버 서비스
✓ FPGA 서비스
✓ 웹어플리케이션 서비스
✓ 소프트웨어 개발 플랫폼
✓ 데이터관리 서비스
- KISTI 고성능컴퓨팅 클라우드 구축
• 소재연구데이터 플랫폼 클라우드를 위한 HW 인프라설계 및 구축
• KI Cloud 기반 클라우드 플랫폼 설계 구축
• KI Cloud 백엔드, 프렌트엔드 소프트웨어 플랫폼 구축
- KI Cloud 안정적 운영 및 클라우드 기술 개발
• KAIROS 클러스터 및 슈퍼컴퓨터 Nurion 기반의 KI Cloud 플랫폼 시스템 운영현황
• KAIROS 클러스터 및 슈퍼컴퓨터 Nurion 기반의 KI Cloud 플랫폼 시스템 유지보수 및 장애대응 현황
• KAIROS 클러스터 및 슈퍼컴퓨터 Nurion 기반의 KI Cloud 사용자 대응 지원 현황
- 단일 노드를 통한 가상머신 SSH 접속 기능 개발
Ⅳ. Result of the study
○ Development of heterogeneous high performance computer system technology for future infrastructure response
- Revise the self-developed HPC server HW and build up a cluster system
• Build up a cluster system based on the revised server HW and perform a performance
- Revise the self-developed HPC server HW and build up a cluster system
• Build up a cluster system based on the revised server HW and perform a performance evaluation
✓ Build up a cluster system and adjust the software environment settings
✓ Perform a performance evaluation of the cluster system
• Application level Performance evaluation of the self-developed cluster system for materials R&D
✓ Performance evaluation of the prototype system
- Study to manage the HPC server components
• Development of PCIe 4.0 based system bus expansion prototype
✓ Development of PCIe 4.0 backplane boards and HCA cards
• Study on PCIe based system expansion
✓ Study on the characteristic of PCIe 4.0 switch processor
✓ Study on TCP/IP over PCIe for NTB based communication
• Investigate management techniques for the HPC server components
✓ Propose the heterogenous HW expansion module management scheme for local HW systems
✓ Investigate of requirements for heterogeneous system monitoring
✓ Open BMC based firmware build and self-developed high-performance server optimization
- Research and development of heterogeneous system architecture and management technologies
• Comparison and Analysis of the Functions of Major Cluster Management Tools Developed and Utilized by Domestic and Foreign Research Institutions
✓ Comparison of key cluster management tools such as OpenHPC, MaaS, BCM, and Grendel
• Cluster Management Tools Infrastructure Center Collaboration and Requirements Development
✓ Derive provisioning requirements
• Design and development of key management technologies for heterogeneous systems based on existing remote management tools
✓ Development of hardware sensor template technology for various heterogeneous system hardware management
✓ Development of BMC-based heterogeneous system operating system resource monitoring technology
✓ Develop user-friendly advanced management technologies
• Virtual Machine Environment-Based Heterogeneous System Management Technology Analysis and Conceptual Validation
✓ Analysis, design, and automation of integrated remote management systems based on large virtual machines
• Design and development of a single operating system installation technology based on existing remote management tools
✓ Development of PXE-based Remote Operating System Installation Technology
✓ Support for heterogeneous (tavendor) server PXE booting and improved performance
• An in-depth analysis of key technologies in open-source remote management tools
✓ MaaS deployment and functional testing for analysis
✓ Analyze the structure of MaaS provisioning services
✓ MaaS Node Management Structure Analysis
✓ MaaS image management and operating system deployment structure analysis
• Development of Prototype Technology for Remote Management Platform in Cluster Deployment
✓ Separation of MaaS-based operating system image management and installation capabilities and KiERA porting
- Research on HPC I/O system utilizing non-volatile memory and PIM-based in-memory computing
✓ The research on performance enhancement of All-flash based LustreFS
✓ Technology development on ZFS I/O worker thread optimization in order to minimize overhead on context switch
✓ Technology development on ZFS I/O pipeline parallelization in order to minimize overhead on checksum calculation
✓ Technology development on split data store with multi-stream SSD on Lustre MDS
✓ Technology development on Lustre MDS I/O bandwidth isolation using Linux Cgroup
✓ Development and Performance Verification of LRU Lock Policy Optimization Technology to Improve Parallel Write Performance in Manycore CPU Systems
• Research on linux-based technology analysis and optimal utilization of storage system development with persistent memory
✓ Various I/O performance comparison on App-direct mode: thread scalability, remote NUMA node access, and access granularity
✓ I/O performance comparison between PMEM-aware file systems
✓ Analysis on performance bottleneck and optimal utilization of OdinFS file system
• Research on persistent memory for HPC applications
✓ Performance analysis on persistent memory as system main memory
✓ Analysis on memory bus bottleneck when heavily using both DRAM and persistent memory
✓ Optimization of STREAM benchmark for persistent memory to bypass bottleneck from memory bus using non-temporal write API
• A study on real hardware implementation case of PIM technology for next-generation superomputing architecture
✓ Comparative analysis of hardware structure, software stack, benchmark performance, etc., targeting Samsung HBM-PIM and UPMEM PIM implemented in real hardware
✓ Promoting the acquisition of the latest PIM prototype products through industry-academia-research NDA agreement between KISTI-KAIST-Samsung Electronics
• A study on how to apply PIM technology based on GPU system
✓ An analysis of how address mapping and hashing works on Nvidia V100 GPU
✓ Analysis of the impact of hashing on PIM architecture and study of ways to overcome it at the single channel, single module and multi-module levels
• Establishment of UPMEM PIM-based PIM research environment
✓ UPMEM PIM-based testbed establishment for actual product-based PIM research
✓ UPMEM PIM SDK-based PIM programming environment establishment and function analysis
• A study on UPMEM PIM-based benchmark and case-based performance analysis and optimization method
✓ UPMEM PIM performance analysis based on synthetic workload through PrIM benchmark
✓ A Study on PIM Applicability and optimization techniques in Fully Homomorphic Encryption
○ Study on expanding the adaptability of heterogeneous HPC systems
- Revise the self-developed HPC server HW and build up a cluster system
• Build up a cluster system based on the revised server HW and perform a performance evaluation
✓ Build up a cluster system and adjust the software environment settings
✓ Perform a performance evaluation of the cluster system
• Application level Performance evaluation of the self-developed cluster system for materials R&D
✓ Performance evaluation of the prototype system
- Study to manage the HPC server components
• Analysis of technique trend about major system buses
✓ Survey on the PCIe Gen 4 bus specification
• Study on PCIe based system expansion
✓ Research on stabilization of self-developed optical PCIe interconnection network
✓ Development of research plan for converging device network and interconnection network based on PCIe bus
• Investigate management techniques for the HPC server components
✓ Propose the heterogenous HW expansion module management scheme for local HW systems
- R&D on heterogeneous system architecture and management technology
• A survey on trends in domestic and foreign heterogeneous system management technologies
✓ Comparison and analysis of functions of management tools domestic and foreign research institutes
• Analysis of requirements for heterogeneous system management, design, and verification of major technology concepts
✓ Verification of major management technologies for heterogeneous systems using previously developed management systems
✓ Implementation of hardware sensor template technology for various heterogeneous system hardware management
✓ Implementation of resource monitoring technology on BMC-based heterogeneous system operating system
✓ Implementation of network interface-based operating system installation technology
✓ Implementation of advanced management technologies for user convenience
• Analysis and concept verification of heterogeneous system management technology based on virtual machine environment.
✓ Analysis, design, and automation of methods for interlocking large-scale virtual machine-based remote management integrated systems
✓ Function verification through a large-scale virtual machine-based remote management integrated system interworking experiment
- Research on HPC I/O system utilizing non-volatile memory and PIM-based in-memory computing
• The research on performance enhancement of All-flash based LustreFS
✓ Performance analysis on LustreFS with ZFS as underlaying filesystem
✓ Technology development on ZFS I/O worker thread optimization in order to minimize overhead on context switch
✓ Technology development on ZFS I/O pipeline parallelization in order to minimize overhead on checksum calculation
✓ Research on asynchronous ZFS checksum I/O operations
✓ Performance analysis on metadata, small file I/O affected by SSD WAF
✓ Utilization of Data-on-MDT feature on LustreFS
✓ Technology development on split data store with multi-stream SSD on Lustre MDS
✓ Technology development on Lustre MDS I/O bandwidth isolation using Linux Cgroup
• Research on HPC I/O technology uisng next-generation NVRAM
✓ Performance analysis of Intel Optane Persistent memory
✓ Problem detection of PMEM in means of I/O bandwidth, caused by system configurations and characteristics of the device hardware
• A study on real hardware implementation case of PIM technology for next-generation superomputing architecture
✓ Comparative analysis of hardware structure, software stack, benchmark performance, etc., targeting Samsung HBM-PIM and UPMEM PIM implemented in real hardware
✓ Promoting the acquisition of the latest PIM prototype products through industry-academia-research NDA agreement between KISTI-KAIST-Samsung Electronics
• A study on how to apply PIM technology based on GPU system
✓ An analysis of how address mapping and hashing works on Nvidia V100 GPU
✓ Analysis of the impact of hashing on PIM architecture and study of ways to overcome it at the single channel, single module and multi-module levels
○ Study on HPC software technology
- Analysis of the execution patterns of well-known HPC applications
• OpenCL based math library(CLBlast)
✓ Function call structures in CLBlast
• Open source math library based on CUDA (CUTLASS)
✓ Implementation of CUTLASS
✓ Execution flow of GEMM in CUTLASS
• Analysis of the execution flow and algorithms of Netlib HPL 2.3, including function call structures
• HPL algorithms : implementation of LU decomposition based on BLAS
• Profiling HPL optimized for NVIDIA GPUs
✓ Memory consumption, performance and communication pattern according to the size of the input matrix
• Survey of optimization techniques for math libraries used by exascale systems
✓ Math libraries used by ECP and PRACE projects
- Study on high level programming models for HPC systems of heterogeneous architecture
• NVIDIA CUDA, ROCm for AMD GPUs and Level-Zero interface for Intel XPU
• OpenCL programming model
• Current status of SYCL programming model
✓ Intel LLVM and DPC++ / HipSYCL / ComputeCpp / triSYCL
- Study on open source debuggers for heterogeneous architectures
✓ GDB, LLDB, oclgrind, Intel Debugger, etc
- Profiling techniques for heterogeneous architectures
• Survey on profiling techniques for heterogeneous architectures
○ Study on HPC resource Sharing platform technology and cloud technology
- Plan of the KI Cloud advancement and HPC sharing platform
• Goal of KI Cloud advancement and HPC resource sharing
✓ Distributed cloud
✓ Using open soruce
✓ Flexible and dynamic softwoare services
✓ Convenience of data management and sharing
✓ Supporting HPC resource sharing
• Plan of KI Cloud advancement and HPC resource sharing
✓ Design of resource integration in KISTI
✓ Technologies of distributed and multi cloud communications
✓ Applying MSA
✓ Management of network and security
✓ Design HPC Job service
✓ Design virtual server service
✓ Design FPGA service
✓ Design web application service
✓ Design software development paltform service
✓ Design data management service
- Support for KISTI HPC Cloud
• Design and install hardware infrastructure of material data platform project
• Design and install for KI Cloud platform
• Install KI Cloud software
- Management of KI Cloud and development for Cloud technologies
• Management KI Cloud system consited of KAIROS cluster and Nurion cluster
• Support users of KI Cloud and system failures
- Technology about SSH communication for virtual machine
