[논문]최적화된 CUDA 소프트웨어 제작을 위한 프로그래밍 기법 분석

김성수; 김동헌; 우상규; 임인성

초록
AI-Helper

GPU(Graphics Processing Unit)는 범용 CPU와는 달리 다수코어 스트리밍 프로세서(manycore streaming processor) 형태로 특화되어 발전되어 왔으며, 최근 뛰어난 병렬 처리 연산 능력으로 인하여 점차 많은 영역에서 CPU의 역할을 대체하고 있다. 이러한 추세에 따라 최근 NVIDIA 사에서는 GPGPU(General Purpose GPU) 아키텍처인 CUDA(Compute Unified Device Architecture)를 발표하여 보다 유연한 GPU 프로그래밍 환경을 제공하고 있다. 일반적으로 CUDA API를 사용한 프로그래밍 작업시 GPU의 계산구조에 관한 여러 가지 요소들에 대한 특성을 정확히 파악해야 효율적인 병렬 소프트웨어를 개발할 수 있다. 본 논문에서는 다양한 실험과 시행착오를 통하여 획득한 CUDA 프로그래밍에 관한 최적화 기법에 대하여 설명하고, 그러한 방법들이 프로그램 수행의 효율에 어떠한 영향을 미치는지 알아본다. 특히 특정 예제 문제에 대하여 효과적인 계층 구조 메모리의 접근과 코어 활성화 비율(occupancy), 지연 감춤(latency hiding) 등과 같이 성능에 영향을 미치는 몇 가지 규칙을 실험을 통해 분석해봄으로써, 향후 CUDA를 기반으로 하는 효과적인 병렬 프로그래밍에 유용하게 활용할 수 있는 구체적인 방안을 제시한다.

Abstract ▼ AI-Helper

Unlike general-purpose CPUs, the GPUs have been specialized as many-core streaming processors, and are frequently replacing the CPUs in an increasing range of computations thanks to their outstanding parallel computing capacity. In order to respond to such trend, NVIDIA has recently issued a new par...

Unlike general-purpose CPUs, the GPUs have been specialized as many-core streaming processors, and are frequently replacing the CPUs in an increasing range of computations thanks to their outstanding parallel computing capacity. In order to respond to such trend, NVIDIA has recently issued a new parallel computing architecture called CUDA(Compute Unified Device Architecture), offering a flexible GPU programming environment for GPGPU(General Purpose GPU) computing. In general, when programmers use the CUDA API, they should clearly understand many aspects of GPU's computing architecture to produce efficient parallel software. In this article, we explain several optimization techniques for CUDA programming that we have verified through a lot of experiment and trial and error, and review how those techniques affect the performance of code execution. In particular, we use a specific problem as an example to analyze several elements that affect performances, such as effective accesses to hierarchical memory system, processor occupancy, and latency hiding. In conclusion, we present several directions that may be utilized effectively in CUDA-based parallel programming.

주제어

참고문헌 (13)

Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, and kevin Skadron, A Performance Study of General-Purpose Applicaions on Graphics Processors Using CUDA, Journal of Parallel and Distributed Computing, University of Virginia, 2008.
Shane Ryoo, Christopher I. Rodrigues, Sara S. Baghsorkhi, Sam S. Stone, David B. Kirk, and Wen-mei W. Hwu, Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA, Proc. 13th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, ACM Press, 2008.
NVIDIA. http://www.nvidia.com/object/product_geforc e_gtx_280_us.html, 2009.
NVIDIA. NVIDIA CUDA Compute Unified Device Architecture: Programming Guide (Version 2.3), 2009.
Maryam Moazeni, Alex Bui, and Majid Sarrafzadeh, A Memory Optimization Technique for Software- Managed Scratchpad Memory in GPUs, University of California, 2009.
NVIDIA. NVIDIA CUDA Visual Profiler (Version 2.3), 2009.
Joe Stam, Convolution Soup, NVIDIA, 2009.
NVIDIA. NVIDIA CUDA Compute Unified Device Architecture: Technical Brief NVIDIA GeForce GTX 200 GPU Architectural Overview, 2008.
NVIDIA. Optimizing CUDA, 2009.
B. Parhami. Introduction to Parallel Processing: Algorithms and Architectures, Plenum Press, New York, pp.377-379, 1999.
Sobel, I., Feldman,G., A 3x3 Isotropic Gradient Operator for Image Processing, presented at a talk at the Stanford Artificial Project, 1968.
Victor Podlozhnyuk, Image Convolution with CUDA, NVIDIA CUDA 2.0 SDK document, 2007.
Mark Segal, Kurt Akeley, The OpenGL Graphics System: A Specification(Version 2.1 - December 1), 2006.

이 논문을 인용한 문헌

저자의 다른 논문 :

LOADING...

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

최적화된 CUDA 소프트웨어 제작을 위한 프로그래밍 기법 분석
Analysis of Programming Techniques for Creating Optimized CUDA Software 원문보기

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (13)

이 논문을 인용한 문헌

저자의 다른 논문 :

연구과제 타임라인

관련 콘텐츠

원문 보기

원문 URL 링크

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

최적화된 CUDA 소프트웨어 제작을 위한 프로그래밍 기법 분석 Analysis of Programming Techniques for Creating Optimized CUDA Software 원문보기

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (13)

이 논문을 인용한 문헌

저자의 다른 논문 :

임인성 (36)

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

관련 콘텐츠

원문 보기

원문 URL 링크

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

최적화된 CUDA 소프트웨어 제작을 위한 프로그래밍 기법 분석
Analysis of Programming Techniques for Creating Optimized CUDA Software 원문보기

초록
AI-Helper