[논문]Bank Stealing for a Compact and Efficient Register File Architecture in GPGPU

Jing, Naifeng; Jiang, Shunning; Chen, Shuang; Zhang, Jingjie; Jiang, Li; Li, Chao; Liang, Xiaoyao

doi:10.1109/TVLSI.2016.2584623

Bank Stealing for a Compact and Efficient Register File Architecture in GPGPU

IEEE transactions on very large scale integration (VLSI) systems, v.25 no.2, 2017년, pp.520 - 533

Jing, Naifeng (Shanghai Jiao Tong University, Shanghai, China) , Jiang, Shunning , Chen, Shuang , Zhang, Jingjie , Jiang, Li , Li, Chao , Liang, Xiaoyao

Abstract ▼ AI-Helper

Modern general-purpose graphic processing units (GPGPUs) have emerged as pervasive alternatives for parallel high-performance computing. The extreme multithreading in modern GPGPUs demands a large register file (RF), which is typically organized into multiple banks to support the massive parallelism. Although a heavily banked structure benefits RF throughput, its associated area and energy costs with diminishing performance gains greatly limit the future RF scaling. In this paper, we propose an improved RF design with bank stealing techniques, which enable a high RF throughput with compact area. By deeply investigating the GPGPU microarchitecture, we find that the state-of-the-art RF designs’ is far from optimal due to the deficiency in bank utilization, which is the intrinsic limitation to a high RF throughput and a compact RF area. We investigate the causes for bank conflicts and identify that most conflicts can be eliminated by leveraging the fact that the highly banked RF oftentimes experiences underutilization. This is especially true in GPGPUs, where multiple ready warps are available at the scheduling stage with their operands to be wisely coordinated. In this paper, we propose two lightweight bank stealing techniques that can opportunistically fill the idle banks and register entries for better operand service. Using the proposed architecture, the average GPGPU performance can be improved under a smaller energy budget with significant area saving, which makes it promising for sustainable RF scaling.

참고문헌 (43)

10.1145/1165573.1165633
10.1109/MICRO.2001.991122
10.1145/2540708.2540715
Naifeng Jing, Li Jiang, Tao Zhang, Chao Li, Fengfeng Fan, Xiaoyao Liang. Energy-Efficient eDRAM-Based On-Chip Storage Architecture for GPGPUs. IEEE transactions on computers, vol.65, no.1, 122-135.

상세보기
10.1145/2485922.2485952
10.1109/HPCA.2013.6522331
10.1109/MICRO.2002.1176248
10.1145/2485922.2485965
10.1109/ISCA.1999.765938
10.1145/2541940.2541944
Parallel thread execution ISA version 3.0 2012
Jones, Timothy M., O'Boyle, Michael F. P., Abella, Jaume, González, Antonio, Ergin, Oğuz. Energy-efficient register caching with compiler assistance. ACM transactions on architecture and code optimization, vol.6, no.4, 1-23.

상세보기
Tseng, J.H., Asanovic, K.. A speculative control scheme for an energy-efficient banked register file. IEEE transactions on computers, vol.54, no.6, 741-751.

상세보기
10.1145/2016604.2016608
10.1145/859618.859627
10.1109/MICRO.2012.16
10.1145/2155620.2155656
10.1145/782837.782839
10.1109/ISLPED.2013.6629258
10.1145/2749469.2750417
Proc 19th IEEE Int Symp High Perform Comput Archit (HPCA) Technology comparison for large last-level caches (L3Cs): Low-leakage SRAM, low write-energy STT-RAM, and refresh-optimized eDRAM chang 2013 143
10.1145/1669112.1669140
10.1145/2000064.2000093
10.1145/2485922.2485964
10.1109/HPCA.2013.6522337
Proc Int Conf Comput -Aided Design (ICCAD) Architectural power models for SRAM and cam structures based on hybrid analytical/empirical techniques liang 2007 824
10.1109/MICRO.2007.40
10.1109/MICRO.2012.18
Pseudo-dual port memory where ratio of first to second memory access is clock duty cycle independent jung 2007
2011 38th Annual International Symposium on Computer Architecture (ISCA) ISCA SRAM-DRAM hybrid memory with applications to efficient register files in fine-grained multi-threading yu 2011 247
Nvidia’s next generation CUDA compute architecture: Kepler GK110 2012
Proc IEEE/ACM Int Symp Low Power Electron Design Bank stealing for conflict mitigation in GPGPU register file jing 2015 55
NVIDIA’s next generation CUDA compute architecture: Fermi 2009
Single-Port Register-File User Guide 2012
10.1109/ICCAD.2011.6105418
Proc 4th ACM/IEEE Int Symp Netw -Chip A $128\times 128\times 24$ Gb/s crossbar interconnecting 128 tiles in a single hop and occupying 6% of their area passas 2010 87
10.1109/MICRO.2014.11
NVIDIA Cuda Toolkit 2013
10.1109/HPCA.2014.6835938
10.1109/ISPASS.2009.4919648
GPGPU-Sim 3 x Simulator aamodt 2014
10.1109/HPCA.2013.6522351
Parboil: A revised benchmark suite for scientific and commercial throughput computing stratton 2012

표제어: PCR

동의어: Packet Collision Rate

용어 설명 출처 목록 (6)

용어 설명: PCR은 세균 특이성이 있는 primer를 이용하여 적은 수의 세균이 있을지라도 쉽게 검출할 수 있는 유용한 방법이며, 이를 이용하여 구강 내 치면세균막이나 타액에서 직접 세균을 검출할 수 있게 되었다[8].

AI-Helper ※ AI-Helper는 을 사용합니다.

AI-Helper

안녕하세요, AI-Helper입니다. 좌측 "선택된 텍스트"에서 텍스트를 선택하여 요약, 번역, 용어설명을 실행하세요.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Bank Stealing for a Compact and Efficient Register File Architecture in GPGPU

Abstract ▼ AI-Helper

참고문헌 (43)

이 논문을 인용한 문헌

관련 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트