[보고서]메모리 네트워크를 위한 시스템 및 네트워크 구조

김동준

[국가R&D연구보고서] 메모리 네트워크를 위한 시스템 및 네트워크 구조
System and Network Architecture for Memory Networks 원문보기

보고서 정보
주관연구기관	한국과학기술원 Korea Advanced Institute of Science and Technology
연구책임자	김동준
보고서유형	최종보고서
발행국가	대한민국
언어	한국어
발행년월	2016-12
과제시작연도	2015
주관부처	과학기술정보통신부 Ministry of Science and ICT
과제관리전문기관	한국연구재단 National Research Foundation of Korea
등록번호	TRKO201800005994
과제고유번호	1711029989
사업명	중견연구자지원
DB 구축일자	2018-05-12
키워드	3차원 적층 메모리.시스템 인터커넥트.메모리 네트워크.멀티-CPU 시스템.멀티-GPU.라우팅.대역폭.GPU 스케줄링.메모리 할당.3D stacked memory.system interconnect.memory network.multi-CPU system.multi-GPU system.routing.bandwidth.GPU scheduling.memory allocation.
DOI	https://doi.org/10.23000/TRKO201800005994

초록 ▼

연구의 목적 및 내용
최근 3차원 적층메모리의 등장과 함께, 로직레이어와 결합된 3차원 적층메모리가 HMC(Hybrid Memory Cube) 의 구현에 제안되어져왔다. 이 연구에서는, 메모리 모듈의 효용성을 활용하여, HMC내의 로직레이어가 오프로딩 계산과 어플리케이션을 가속화하는 근접데이터 처리(near data processing) 제공하도록한다. 또한, 높은기수 네트워크내(gigh-radix network) 에서의 파워 게이팅 주제를 다시 논의하여, 분산된 전원 관리 방법을 제안한다. 특히, 여러 경로에 분산되어있는 트래픽들을 통합하여, 비최소 라우팅 경로(non-minimal routing path) 이지만 보다 적은수의 채널들을 이용함으로써, 다른 경로의 파워소비를 최소화 시킬수 있도록한다.

연구결과
이 연구에서 연결리스트 순환내(Linked-List Traversal)에 서의 근접 데이터 처리(near data processing)하는 방식을 새롭게 제안하고 평가한다. 특히, 근접 데이터 처리를 위해서 단순히 LLT를 분담처리하는 것이 항상 성능을 향상시키지 않고, 오히려 부가적인 데이터의 오프칩 이동이 에너지 효율성을 감소시킨다는것을 실험으로 보인다. 결과적으로, 이 연구를 통해서 NDP 인식 데이터 지역성과 여러개의 LLT 연산을 일괄 처리하는 것이 근점 데이터 처리 내에 존재하는 병렬처리를 활용함으로써, 더큰 성능향상을 이루는 방식을 제안한다. 결과적으로 비교적 많은양의 메모리를 사용하는 서로다른 LLT 워크로드들, 이 연구에서 제안하는 근접 데이터 처리 아키텍처에서 평가하였을 때 최대 5.9배에 해당하는 성능향상과 2.8x 배에 해당하는 에너지 효율을 보였다. 또한, 몇몇 관찰된 특징을 이용하는 사전동작 네트워크 채널 파워게이팅방식과 높은 기수의 토폴로지에서 높은 에너지 효율을 위한 트래픽 통합기능을 갖는 EPCOT(Energy-efficient Proactive COnsolidation of Traffic)을 제안한다. 그리고 네트워크 링크가 비활성화 되었을 때, 제한된 개수의 네트워크 링크를 내에서의 부하균형(load-balancing)을 이루어네는 파워게이팅 부하균형 적응형 라우팅 알고리즘을 제안한다. 이 알고리즘은 성능에 영향을 최소화하며 여러 트래픽 패턴에 대해서, 기존의 연구들에 비해 더 나은 에너지 효율을 보인다. 실제 어플리케이션에 대해서, 기존대비 75% 정도의 에너지 사용을 요구한다.

연구결과의 활용계획
이 연구가 LLT를 대상으로 진행되었지만, 근접 데이터 처리를 위해 사용되는 패킷화된 인터페이스와 읽기/저장 명렁어, 지역인식 데이터 배치 (혹은 최대 성능을 위한 일괄처리) 등의 기여등은 많은 메모리 접근을 요구하는 다른 어플리케이션에도 쉽게 확장되어 적용될수 있다. 또한, 데이터 센터에서와 같은 큰 규모의 네트워크에서부터 모바일 장치에서와 같은 작은 규모의 네트워크까지, 인터커넥트 네트워크는 일반적으로 활용되어 지고 있다. 데이터 센터내에서의 전력소비는 예산에 중요한 요소이기 때문에, 에너지 효율이 높은 시스템이 점점 요구되어진다. 특히, 낮은 이용성을 보이는 큰 규모의 시스템에서의 전체 전력중 인터커넥트가 차지하는 비유을 매우 높기 때문에, 초고속 채널의 파워게이팅을 통해서 에너지 비례 원칙을 수립하는 것은 매우 중요하다. 따라서, 이 연구를 통해서 인터커넥트 네트워크에서의 에너지 효율성을 향상시킴으로써, 그린컴퓨팅 구축에 초석이 될 수있다.

(출처 : 한글요약문 4P)

Abstract ▼

Purpose & contents
With the recent emergence of 3D memory stacking, 3D stacked memory combined with a logic layer has been proposed to create a hybrid memory cube (HMC) In this work, we exploit the availability of such memory modules; in particular, the logic layer within an HMC provides the opportunity for offloading computation and near-data processing (NDP) to accelerate different workloads. In addtion, we revisit power-gating in high-radix networks and propose a distributed power management mechanism.With traffic consolidation, several traffic flows that are distributed across multiple paths can be consolidated or merged onto fewer links with non-minimal routing such that other links can be power-gated.

Result
In this work, we propose and evaluate near data process- ing (NDP) of linked-list traversal (LLT). We show how simply offloading LLT for NDP does not necessarily improve performance but can actually degrade energy efficiency because of the additional off-chip channel traversals. As a result, we propose NDP-aware data localization and batching to fully realize the benefits of off-loading to near-memory. NDP-aware data localization minimizes off-chip accesses and reduces LLT latency while batching multiple LLT operations improves overall throughput by exploiting the parallelism available within NDP. We evaluate different big-memory workloads with LLT on our proposed NDP architecture and show up to 5.9x increase in performance and 2.8x reduction in energy compared with baseline host-processing. In addtion, we propose EPCOT (Energy-efficient Proactive COnsolidation of Traffic) - proactive network channel power-gating that exploits several of observed characteristics and provide traffic consolidation for high energy efficiency in high-radix topologies. We also propose a power-aware adaptive routing algorithm that addresses the challenge of load-balancing different links with limited path diversity as network links are turned off. our proposed power-aware load-balanced adaptive routing minimized the performance impact due to power-gating for various traffic patterns and achieved better energy proportionality compared to prior work. For real workloads, EPCOT reduced network energy by up to 75% compared to the baseline with negligible impact on performance.

Expected Contribution
While this work focused on LLT, the contributions of this work, including exploiting the packetized interface for NDP and using load/store instructions, locality-aware data placement, or batching to maximize throughput, can be possibly extended for other computations that involve significant amount of memory accesses by providing additional hardware logic near the memory. In addition, the interconnect network becomes more commonly used from large-scale network, including data center, to on-chip network, including chip-multiprocessor, mobile devices. The power consumption in data centers is critical for the capital budget and thus the energy efficient system is required increasingly. And the embedded system such as mobile devices is critical to energy efficiency as well. Especially, because interconnection networks can take a significant portion of total power in large-scale systems at low utilization, it is important to achieve energy-proportionality in the network by properly power-gating the high-speed channels. Thus, with contribution of this work, we can improve the energy efficiency for the interconnect network and have contributions to a green-computing infrastructure.

(출처 : SUMMARY 5P)

목차 Contents

표지 ... 1
목차 ... 2
연구계획 요약문 ... 3
연구결과 요약문 ... 4
한글요약문 ... 4
SUMMARY ... 5
연구내용 및 결과 ... 6
1. 연구개발과제의 개요 ... 6
2. 국내외 기술개발 현황 ... 7
3. 연구수행 내용 및 결과 ... 9
4. 목표달성도 및 관련분야에의 기여도 ... 35
5. 연구결과의 활용계획 ... 37
6. 연구과정에서 수집한 해외 과학기술정보 ... 38
7. 주관연구책임자 대표적 연구실적 ... 39
8. 참고문헌 ... 39
9. 연구성과 ... 43
10. 국가과학기술지식정보서비스에 등록한 연구시설‧장비 현황 ... 46
11. 연구개발과제 수행에 따른 연구실 등의 안전조치 이행실적 ... 46
12. 기타사항 ... 46
별첨1. 대 표 연 구 성 과 ... 47
별첨2. 세 부 목 표 성 과 ... 62
끝페이지 ... 93

표/그림 (20)

표 (a) HMC logic diagram with additional logics for LLT NDP shaded with grey color, (b) offloading command packet format, and (c) simplied nite state machine of the LLT engine.
표 Data access in host-processing and NDP with different localization degrees.
표 NDP-aware data localization with 4 memory groups for the four different linked-list types.
표 NDP (a) without batching and (b) with batching, and (c) support for Memcached batching
표 Batching in graph: vertexes connected at the same distance from the source vertex.
표 Normalized performance, energy consumption, average LLT data read latency, and CPU-memory link utilization of host-processing (HSP) and NDP by applying proposed optimizations one by one.
표 Scalability of HSP and NDP for Hash Join (Probe) workload - normalized to HSP 4 threads.
표 Trade-off with different data localization.
표 Average and Tail latency of Memcached(FIX) workload.
표 Root networks for a (a) 1D and (b) 2D flattened butterfly based on star topology.
표 Path diversity comparison with (a) concentration and (b) arbitrary distribution of active links.
표 The number of total paths available, including minimal and non-minimal paths, with concentration and random distribution of active links to routers.
표 Comparison of link power-gating cost with different link choices for (a) given current network state. Aggregate link utilization is shown for power-gating (b)the link with minimally routed traffic and (c) the link with non-minimally routed traffic.
표 An example of link deactivation algorithm description.
표 An example of indirect activation request.
표 Routing decision based on output port state.
표 Latency-throughput curves of different power-gating mechanisms for (a) uniform random, (b) tornado, and(c) bitrev traffic patterns.
표 Energy comparison of different power-gating mechanisms for (a) uniform random, (b) tornado, and (c) bitrev traffic patterns.
표 Normalized average packet latency with different real workload traces.
표 Total network energy with different realwork load traces.

과제명(ProjectTitle) :	-
연구책임자(Manager) :	-
과제기간(DetailSeriesProject) :	-
총연구비 (DetailSeriesProject) :	-
키워드(keyword) :	-
과제수행기간(LeadAgency) :	-
연구목표(Goal) :	-
연구내용(Abstract) :	-
기대효과(Effect) :	-

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 제목(한글), 저자명(한글), 발행일자, 전자원문, 초록(한글), 초록(영문) 관리번호, 제목(한글), 제목(영문), 저자명(한글), 저자명(영문), 주관연구기관(한글), 주관연구기관(영문), 발행일자, 총페이지수, 주관부처명, 과제시작일, 보고서번호, 과제종료일, 주제분류, 키워드(한글), 전자원문, 키워드(영문), 입수제어번호, 초록(한글), 초록(영문), 목차
저장형식	Text(ASCII format) Excel format
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

[국가R&D연구보고서] 메모리 네트워크를 위한 시스템 및 네트워크 구조
System and Network Architecture for Memory Networks 원문보기