[특허]Retargetting an application program for execution by a general purpose processor

Retargetting an application program for execution by a general purpose processor 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06F-009/38
출원번호	US-0407711 (2009-03-19)
등록번호	US-8612732 (2013-12-17)
발명자 / 주소	Grover, Vinod Aarts, Bastiaan Joannes Matheus Murphy, Michael Beylin, Boris Kolhe, Jayant B. Saylor, Douglas
출원인 / 주소	NVIDIA Corporation
대리인 / 주소	Patterson + Sheridan, L.L.P.
인용정보	피인용 횟수 : 8 인용 특허 : 18

초록 ▼

One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.

대표청구항 ▼

1. A computer-implemented method for translating an application program for execution by a general purpose processor, the method comprising: receiving the application program written using a parallel programming model for execution on a multi-core graphics processing unit;partitioning the application program into regions of synchronization independent instructions to produce a partitioned application program;determining a plurality of variance vectors associated with a first region in the regions of synchronization independent instructions, wherein each variance vector indicates dependence of a different statement in the region on zero or more cooperative thread array dimensions; andinserting a loop around the first region to produce a translated application program for execution by the general purpose processor, wherein the loop iterates over only the cooperative thread array dimensions that correspond to the thread dimensions in the plurality of variance vectors. 2. The method of claim 1, further comprising, prior to the step of partitioning, identifying a synchronization barrier instruction within the application program. 3. The method of claim 2, wherein the first region includes instructions that are before the synchronization barrier instruction and a second region of the partitioned application program includes instructions that are after the synchronization barrier instruction. 4. The method of claim 3, wherein the step of inserting a loop includes inserting a first loop around the first region of the partitioned application program to ensure that all threads in a cooperative thread array will complete execution of the first region of the partitioned application program before any one of the threads in the cooperative thread array begins execution of the second region of the partitioned application program. 5. The method of claim 2, wherein the application program is represented as a control flow graph including basic block nodes connected by edges. 6. The method of claim 5, wherein the step of partitioning includes replacing the synchronization barrier instruction with an edge to separate one of the block nodes into a first basic block node corresponding to a first region and a second basic block node corresponding to a second region. 7. The method of claim 1, further comprising the step of classifying the partitioned application program to identify each statement as either convergent or divergent with respect to a cooperative thread array dimension in the cooperative thread array dimensions. 8. A non-transitory computer-readable medium that includes instructions that, when executed by a processing unit, cause the processing unit to translate an application program for execution by a general purpose processor, by performing the steps of: receiving the application program written using a parallel programming model for execution on a multi-core graphics processing unit;partitioning the application program into regions of synchronization independent instructions to produce a partitioned application program;determining a plurality of variance vectors associated with a first region in the regions of synchronization independent instructions, wherein each variance vector indicates dependence of a different statement in the region on zero or more cooperative thread array dimensions; andinserting a loop around the first region to produce a translated application program for execution by the general purpose processor, wherein the loop iterates over only the cooperative thread array dimensions that correspond to the thread dimensions in the plurality of variance vectors. 9. The non-transitory computer-readable medium of claim 8, further comprising, prior to the step of partitioning, identifying a synchronization barrier instruction within the application program. 10. The non-transitory computer-readable medium of claim 9, wherein the first region includes instructions that are before the synchronization barrier instruction and a second region of the partitioned application program includes instructions that are after the synchronization barrier instruction. 11. The non-transitory computer-readable medium of claim 10, wherein the step of inserting the loop includes inserting a first loop around the first region of the partitioned application program to ensure that all threads in a cooperative thread array will complete execution of the first region of the partitioned application program before any one of the threads in the cooperative thread array begins execution of the second region of the partitioned application program. 12. The non-transitory computer-readable medium of claim 9, wherein the application program is represented as a control flow graph including basic block nodes connected by edges. 13. The non-transitory computer-readable medium of claim 12, wherein the step of partitioning includes replacing the synchronization barrier instruction with an edge to separate one of the block nodes into a first basic block node corresponding to a first region and a second basic block node corresponding to a second region. 14. The non-transitory computer-readable medium of claim 8, further comprising the step of classifying the partitioned application program to identify each statement as either convergent or divergent with respect to a cooperative thread array dimension in the cooperative thread array dimensions. 15. A computing system configured to translate an application program for execution by a general purpose processor, comprising: a processor configured to execute a translator; anda system memory coupled to the processor and configured to store the translator, a first application program, and a second application program,the first application program written using a parallel programming model for execution on a multi-core graphics processing unit,the second application program configured for execution by the general purpose processor, andthe translator configured to:receive the first application program;partition the first application program into regions of synchronization independent instructions to produce a partitioned application program;determine a plurality of variance vectors associated with a first region in the regions of synchronization independent instructions, wherein each variance vector indicates dependence of a different statement in the region on zero or more cooperative thread array dimensions; andinsert a loop around the first region to produce a translated application program for execution by the general purpose processor, wherein the loop iterates over only the cooperative thread array dimensions that correspond to the thread dimensions in the plurality of variance vectors. 16. The computing system of claim 15, wherein the translator is further configured to identify a synchronization barrier instruction within the application program. 17. The computing system of claim 16, wherein the first region includes instructions that are before the synchronization barrier instruction and a second region of the partitioned application program includes instructions that are after the synchronization barrier instruction. 18. The computing system of claim 17, wherein the step of inserting the loop includes inserting a first loop around the first region of the partitioned application program to ensure that all threads in a cooperative thread array will complete execution of the first region of the partitioned application program before any one of the threads in the cooperative thread array begins execution of the second region of the partitioned application program. 19. The computing system of claim 16, wherein the application program is represented as a control flow graph including basic block nodes connected by edges and the step of partitioning includes replacing the synchronization barrier instruction with an edge to separate a one of the block nodes into a first basic block node corresponding to a first region and a second basic block node corresponding to a second region. 20. The computing system of claim 15, further comprising the step of classifying the partitioned application program to identify each statement as either convergent or divergent with respect to a cooperative thread array dimension in the cooperative array dimensions.

이 특허에 인용된 특허 (18)

Iliff, Edwin C., Authoring language translator.
상세보기
Norton Richard L. (Colorado Springs CO) Norton Karen J. (Colorado Springs CO), Computer simulation technique for predicting program performance.
상세보기
Larson Brian Ralph, Dance/multitude concurrent computation.
상세보기
Bernstein David (Bronx NY) So Kimming (Armonk NY), Debugging parallel programs by serialization.
상세보기
Hardwick Jonathan C.,GBX, Dynamic load balancing among processors in a parallel computer.
상세보기
Kasahara, Hironori; Kimura, Keiji; Shikano, Hiroaki, Global compiler for controlling heterogeneous multiprocessor.
상세보기
Ichinose,Katsumi; Moriya,Katsuyoshi, Information processing method and recording medium therefor capable of enhancing the executing speed of a parallel processing computing device.
상세보기
Reps Thomas (Madison WI) Horwitz Susan (Madison WI) Binkley David (Madison WI), Interprocedural slicing of computer programs using dependence graphs.
상세보기
Rishi Alok ; Masamitsu Jon A., Method and apparatus for run-time memory access checking and memory leak detection of a multi-threaded program.
상세보기
Morin,Luc, Method for compiling and executing a parallel program.
상세보기
Pan,Jielin; Yuan,Baosheng, Method, apparatus, and system for building a compact model for large vocabulary continuous speech recognition (LVCSR) system.
상세보기
Sistare, Steven J.; Plauger, David, Parallel and asynchronous debugger and debugging method for multi-threaded programs.
상세보기
Callahan, II, Charles David; Shields, Keith Arnett; Briggs, III, Preston Pengra, Parallelism performance analysis based on execution trace information.
상세보기
Uchihira Naoshi,JPX ; Honiden Shinichi,JPX ; Ohsuga Akihiko,JPX ; Seki Toshibumi,JPX ; Nagai Yasuo,JPX ; Handa Keiichi,JPX ; Ito Satoshi,JPX ; Sawashima Nobuyuki,JPX ; Tahara Yasuyuki,JPX ; Shiotani , Programming method for concurrent programs and program supporting apparatus thereof.
상세보기
Uchihira Naoshi,JPX ; Honiden Shinichi,JPX ; Ohsuga Akihiko,JPX ; Seki Toshibumi,JPX ; Nagai Yasuo,JPX ; Handa Keiichi,JPX ; Ito Satoshi,JPX ; Sawashima Nobuyuki,JPX ; Tahara Yasuyuki,JPX ; Shiotani , Programming method for concurrent programs and program supporting apparatus thereof.
상세보기
Uchihira, Naoshi; Honiden, Shinichi; Ohsuga, Akihiko; Seki, Toshibumi; Nagai, Yasuo; Handa, Keiichi; Ito, Satoshi; Sawashima, Nobuyuki; Tahara, Yasuyuki; Shiotani, Hideaki, Programming method for concurrent programs and program supporting apparatus thereof.
상세보기
Steele ; Jr. Guy L., System and method for assisting exact Garbage collection by segregating the contents of a stack into sub stacks.
상세보기
Tanaka, Yasuyuki, System for controlling assignment of a plurality of modules of a program to available execution units based on speculative executing and granularity adjusting.
상세보기

이 특허를 인용한 특허 (8)

Vassiliev, Andrei V., Device array topology configuration and source code partitioning for device arrays.
상세보기
Grover, Vinod; Aarts, Bastiaan Joannes Matheus; Murphy, Michael; Kolhe, Jayant B.; Pormann, John Bryan; Saylor, Douglas, Execution of retargetted graphics processor accelerated code by a general purpose processor.
상세보기
Grover, Vinod; Kerr, Andrew; Lee, Sean, Method for compiling a parallel thread execution program for general execution.
상세보기
Wu, Wai, Parallel signal processing system and method.
상세보기
Wu, Wai, Parallel signal processing system and method.
상세보기
Wu, Wai, Parallel signal processing system and method.
상세보기
Fetterman, Michael; Carlton, Stewart Glenn; Choquette, Jack Hilaire; Gadre, Shirish; Giroux, Olivier; Hahn, Douglas J.; Heinrich, Steven James; Hill, Eric Lyell; McCarver, Charles; Paranjape, Omkar; Rajendran, Anjana; Selvanesan, Rajeshwaran, Pre-scheduled replays of divergent operations.
상세보기
Vassiliev, Andrei V., Programmable forwarding plane.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Retargetting an application program for execution by a general purpose processor 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (18)

이 특허를 인용한 특허 (8)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Retargetting an application program for execution by a general purpose processor 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (18)

이 특허를 인용한 특허 (8)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트