[보고서]전노심 수송계산 코드 nTRACER의 선추적 모듈 최적화

최남재

전노심 수송계산 코드 nTRACER의 선추적 모듈 최적화 원문보기

보고서 정보
주관연구기관	서울대학교 Seoul National University
연구책임자	최남재
보고서유형	최종보고서
발행국가	대한민국
언어	한국어
발행년월	2016-06
과제시작연도	2015
주관부처	미래창조과학부 Ministry of Science, ICT and Future Planning
등록번호	TRKO201700009903
과제고유번호	1711025884
사업명	원자력연구기반확충사업
DB 구축일자	2017-11-04

초록 ▼

nTRACER는 서울대학교 원자로물리연구실에서 개발하고 있는 전노심 직접계산 코드로, 3차원 노심 문제에 대해 반경방향 평면 2차원 MOC와 축방향 1차원 SENM을 3차원 CMFD 가속 체제 하에서 결합하는 방법을 취하고 있다. 이중 MOC의 선추적 계산은 상당한 시간과 자원을 소모하기 때문에 실용적인 3차원 노심 계산을 위해서는 반드시 계산클러스터를 구축해야 하고, 이는 큰 비용이 든다. 따라서 선추적 모듈을 최적화하는 것은 계산 시간과 계산 클러스터 구축비용을 줄이는 데 핵심적이다. 본 연구에서는 nTRACER가 채택하고 있는 Group Major 알고리즘의 효율을 개선하고, OpenMOC에 적용된 Node Major 알고리즘을 새로 nTRACER에 구현하였다. 또한 GPU 병렬계산을 위한 독자적인 선추적 계산 알고리즘을 고안하고 OpenACC를 이용하여 GPU 병렬 선추적 모듈을 개발하였다. 그리고 각 알고리즘의 성능을 비교 분석하여 수렴 속도와 정확성, 캐시 적중률을 위시로 하는 계산 효율 및 GPU 병렬계산 적용성을 종합적으로 고려한 최적의 선추적 알고리즘을 모색하였다.

그 결과 수치 계산 성능에서는 Group Major 알고리즘과 Node Major 알고리즘이 의미 있는 차이를 보이지 않았다.
CMFD 가속을 적용하지 않은 순수 MOC 계산에서는 Group Major 알고리즘이 Node Major 알고리즘에 비해 우수한 수렴 속도를 보여주었으나 CMFD 가속 하에서는 문제의 Dominance Ratio와 무관하게 두 알고리즘의 수렴 속도와 정확성에 차이가 나타나지 않았다. 이는 CMFD 가속이 선원 분포를 효과적으로 갱신해주는 역할을 하기 때문으로 판단된다. 실용적인 측면에서 모든 MOC 노심해석 코드는 CMFD 가속을 비롯한 가속기법을 사용하고 nTRACER의 3차원 노심 계산 체제에서도 MOC와 SENM의 결합을 위해 반드시 CMFD 가속을 사용해야 하므로 결과적으로 실용적인 MOC 계산에서는 Group Major 알고리즘과 Node Major 알고리즘의 수치적인 차별성이 없다.

반면 계산 효율에서는 Node Major 알고리즘이 Group Major 알고리즘을 능가했다. Node Major 알고리즘은 모든 계산 결과에서 Group Major 알고리즘보다 빠른 계산 속도를 보여주었다. 기하학적 구조에 따라 효율이 크게 변동하는 모습을 보이긴 했으나, 계산 속도는 기존 nTRACER 알고리즘과 비교해서는 33.8% ∼ 107.3%, 최적화한 Group Major 알고리즘보다도 5.0% ∼ 53.0% 빨랐다. 특히 Node Major 알고리즘의 계산 속도는 에너지군의 수와 비례하여 nTRACER의 47군 계산 체제에 강점이 있었다.

Node Major 알고리즘은 벡터화의 수준이 높아 Group Major 알고리즘보다 GPU 병렬계산 적용성도 우수했다. 다만 독자적으로 고안한 GPU 병렬 알고리즘은 Node Major 알고리즘을 직접 GPU 병렬화한 경우보다 성능이 크게 떨어졌고, 전체적으로 GPU 병렬 선추적 계산은 기대한 효율을 달성하지 못했다. 이 결과는 OpenACC의 지시어 기반 특성에 따른 본질적인 비효율성과 Atomic 명령어에 의한 스레드 직렬화가 절대적인 영향을 미친 것으로 보인다. 따라서 CUDA를 활용하여 GPU 병렬 선추적 계산 모듈을 최적화하고 자료구조를 GPU 상에서 사유화해 Atomic 명령어의 사용 빈도를 줄인다면 높은 효율을 달성할 수 있을 것으로 기대된다.

결과적으로 수치 계산 성능, 계산 효율, 그리고 GPU 병렬계산 적용성을 모두 고려할 때 Node Major 알고리즘이 최적의 선택지로 판단된다. nTRACER는 Node Major 알고리즘을 적용함으로써 상당한 계산속도 향상과 GPU 기반 병렬계산 환경으로의 확장성을 얻을 수 있을 것이다. 따라서 본 연구의 차후과제는 P₀ 선추적 계산 모듈에만 구현한 Node Major 알고리즘을 Pn 선추적 계산 모듈에도 구현하는 것, CUDA를 이용해 GPU 병렬 선추적 계산 모듈을 최적화하는 것, 그리고 영역분할법을 구현하여 한정된 GPU 메모리로 자료구조를 사유화할 수 있도록 하는 것이 된다.

( 출처 : 요약 3p )

Abstract ▼

nTRACER, a direct whole core calculation code which is being developed by Reactor Physics Laboratory of Seoul National University (SNURPL), combines planar two-dimensional method of characteristics (MOC) and axial one-dimensional source expansion nodal method (SENM) under the framework of three-dimensional coarse mesh finite difference (CMFD) acceleration in three-dimensional core calculations. Among the methods, MOC ray tracing calculation requires substantial computing time and resources. So for practical three-dimensional core calculations a computing cluster has to be constructed, which is expensive. Therefore, optimizing ray tracing module is crucial in reducing computing time and costs for constructing computing clusters. In this research, we optimized group major algorithm which nTRACER is employing, and we newly implemented node major algorithm which OpenMOC is using in nTRACER. Further, we devised an algorithm for GPU parallel computing and developed a GPU parallel ray tracing module with OpenACC. And we compared performances of each algorithm and sought for the optimal ray tracing algorithm considering convergence rate, accuracy, computing efficiency which is mostly governed by cache hit ratio, and applicability to GPU parallel computing.

In numerical performance, group major algorithm and node major algorithm did not show meaningful differences. In pure MOC calculations where CMFD acceleration is not applied group major algorithm showed higher convergence rate than node major algorithm, but under CMFD acceleration convergence rate and accuracy of the two algorithms were almost identical regardless of the dominance ratio of the problem. It is thought that this result is because CMFD acceleration effectively updates source distribution. For the sake of practicality all MOC core analysis codes employ acceleration methods including CMFD acceleration, and nTRACER has to use CMFD acceleration to combine MOC and SENM. Resultantly, in practical MOC calculations group major algorithm and node major algorithm are numerically equivalent.

However, in terms of computing efficiency, node major algorithm superseded group major algorithm. Node major algorithm showed higher computing speed in all calculation cases than group major algorithm. Though the efficiency showed fluctuation depending on geometrical structure, node major algorithm was 33.8% ~ 107.3% faster than original nTRACER algorithm, and still 5.0% ~ 53.0% faster than optimized group major algorithm. Especially, computing speed of node major algorithm appeared to be proportional to the number of groups of the problem, which is advantageous to nTRACER’s 47-group calculation system.

Node major algorithm was also superior in applicability to GPU parallel computing than group major algorithm due to its high vectorization level. Nonetheless, the devised algorithm showed significantly worse performance than GPU parallelized node major algorithm, and the efficiency of GPU parallel ray tracing calculation in overall could not achieve expected efficiency. This result is thought to be caused by intrinsic inefficiency of OpenACC due to its directive-based characteristic and serialization of threads due to atomic operation. Therefore, it is expected that by optimizing GPU parallel computing module with CUDA and reducing the frequency of atomic operation by privatizing data structures on GPU, it will be able to achieve high efficiency.

As a result, considering numerical performance, computing efficiency, and applicability to GPU parallel computing, node major seems to be the optimal choice. nTRACER will obtain substantial speedup and extendability to GPU-based parallel computing environment by adopting node major algorithm in ray tracing module. Therefore, future work of this research will be implementing node major algorithm, which we have implemented in P₀ ray tracing module only, in P_n ray tracing module, optimizing GPU parallel ray tracing module with CUDA, and implementing domain decomposition method in order to privatize data structures with limited GPU memory.

( 출처 : ABSTRACT 4p )

목차 Contents

표지 ... 1
최 종 보 고 서 ... 2
요 약 ... 3
Abstract ... 4
1. 서 론 ... 5
2. 특성곡선법 ... 5
2-1. 특성방정식의 해법 ... 5
2-2. 선추적 계산법 ... 6
2-3. 반복계산 알고리즘 ... 8
3. GPU 병렬계산 ... 10
3-1. GPU 계산 아키텍처 ... 10
3-2. OpenACC ... 11
3-3. GPU 병렬 반복계산 알고리즘 ... 11
4. 결과 ... 13
4-1. 수치 계산 성능 ... 13
4-2. 컴퓨팅 성능 ... 16
4-3. GPU 병렬계산 성능 ... 17
5. 결 론 ... 18
끝페이지 ... 18

과제명(ProjectTitle) :	-
연구책임자(Manager) :	-
과제기간(DetailSeriesProject) :	-
총연구비 (DetailSeriesProject) :	-
키워드(keyword) :	-
과제수행기간(LeadAgency) :	-
연구목표(Goal) :	-
연구내용(Abstract) :	-
기대효과(Effect) :	-

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 제목(한글), 저자명(한글), 발행일자, 전자원문, 초록(한글), 초록(영문) 관리번호, 제목(한글), 제목(영문), 저자명(한글), 저자명(영문), 주관연구기관(한글), 주관연구기관(영문), 발행일자, 총페이지수, 주관부처명, 과제시작일, 보고서번호, 과제종료일, 주제분류, 키워드(한글), 전자원문, 키워드(영문), 입수제어번호, 초록(한글), 초록(영문), 목차
저장형식	Text(ASCII format) Excel format
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

전노심 수송계산 코드 nTRACER의 선추적 모듈 최적화 원문보기