[특허]Scheduler of program instructions for streaming vector processor having interconnected functional units

Scheduler of program instructions for streaming vector processor having interconnected functional units 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06F-009/50 G06F-009/46 G06F-009/44
출원번호	US-0184772 (2002-06-28)
발명자 / 주소	May,Philip E. Moat,Kent Donald Essick, IV,Raymond B. Chiricescu,Silviu Lucas,Brian Geoffrey Norris,James M. Schuette,Michael Allen Saidi,Ali
출원인 / 주소	Motorola, inc.
인용정보	피인용 횟수 : 9 인용 특허 : 61

초록 ▼

A method for scheduling a computation for execution on a computer with a number of interconnected functional units. The computation is representable by a data-flow graph with a number of nodes connected by edge. A loop-period of the computation is calculated and the nodes are scheduled for throughput by assigning an execution cycle and a functional unit to each node of the data-flow graph. The scheduling of flexible nodes is adjusted to minimize the number of interconnections required in each execution cycle. The edges of the data-flow graph are allocated to one or more of the interconnections between functional units. The scheduling method may be used, for example, to optimize the interconnection fabric design for an ASIC or as part of a compiler for a re-configurable streaming vector processor.

대표청구항 ▼

What is claimed is: 1. A method for scheduling a computation for execution on a computer comprising a plurality of functional units interconnected by a plurality of interconnections, the computation being representable by a data-flow graph having a plurality of nodes and a plurality of edges and the method comprising: (a) computing a loop-period of the computation; (b) attempting to schedule the plurality of nodes within the loop period for throughput by assigning an execution cycle and a functional unit to each node of the plurality of nodes; (c) adjusting the scheduling of flexible nodes of the plurality of nodes to reduce the number of interconnections required in any execution cycle if the number of interconnections required exceeds the number of interconnections in the plurality of interconnections; and (d) allocating the plurality of edges to one or more of the plurality of interconnections. 2. A method in accordance with claim 1, wherein one or more of the functional units is partitioned into two or more slices, the method further comprising: mapping nodes of the data-flow graph onto slices of the one of more of the partitioned functional units so as to reduce the number of interconnections required in an execution cycle. 3. A method in accordance with claim 2, and wherein the mapping nodes of the data-flow graph onto slices of the one of more of the partitioned functional units so as to reduce the number of interconnections required in an execution cycle comprises: computing a set of execution cycles number for which the number of interconnections required is greater than the number of interconnections in the plurality of interconnections; computing tail-times for each node that is the source of an edge that intersects the set of execution cycles; and mapping nodes onto slices of the one of more of the partitioned functional units so as to reduce the number of interconnections required in a cycle. 4. A method in accordance with claim 2, further comprising: computing the set of execution cycles for which the number of interconnections required is greater than the number of interconnections in the plurality of interconnections; computing lead-times for each node that is the destination of an edge that intersects the set of execution cycles allocated to a cycle of the first set of execution cycles; and mapping nodes onto the slices of the one of more of the partitioned functional units so as to reduce the number of interconnections required in a cycle. 5. A method in accordance with claim 2, wherein slugs are used to discard results from unused slices of the one of more of the partitioned functional units. 6. A method in accordance with claim 1, wherein the edges of the plurality of edges are allocated so that values are stored at one or more of: an input of a functional unit; an output of a functional unit; a storage entry in the interconnection fabric; and a trampoline node. 7. A method in accordance with claim 1, wherein the plurality of interconnections comprises a re-configurable interconnect fabric having a plurality of links and wherein the edges of the plurality of edges are allocated so that values live at one or more of: an output of a functional unit; a storage entry in the interconnection fabric; a trampoline node; and an input of a functional unit. 8. A method in accordance with claim 1, wherein the scheduling the plurality of nodes for throughput by assigning an execution cycle and a functional unit to each node of the plurality of nodes comprises: (b1) attempting to schedule the plurality of node within the loop-period; and (b2) while the attempt to schedule the plurality of node within the loop-period is unsuccessful, increasing the loop-period and repeating from (b1). 9. A method in accordance with claim 1, wherein the allocating the plurality of edges to one or more of the plurality of interconnections comprises (d1) attempting to allocate the plurality of edges to one or more of the plurality of interconnections; and (d2) if the attempt to allocate the plurality of edges is unsuccessful, increasing the loop-period and repeating from (d1). 10. A method in accordance with claim 1, wherein the plurality of interconnections comprises a re-configurable interconnect fabric having a plurality of links and wherein the allocation of an edge of the plurality of edges is ordered as: the input or output of the functional unit to which the node is assigned; a storage entry in the interconnection fabric; and the input or output of a free functional unit. 11. A method in accordance with claim 1, further comprising splitting the data-flow graph into a number of partitions, corresponding to the number of iterations that are executed in parallel when a steady state operation of the computer has been achieved. 12. A method in accordance with claim 1, further comprising overlapping the schedules for two or more adjacent iterations to obtain a higher throughput. 13. A method in accordance with claim 1, wherein consecutive iterations are scheduled to use different functional unit instances. 14. A method in accordance with claim 1, wherein two schedules are computed, one for maximum throughput and one for minimum latency, and wherein a schedule of the two schedules is selected in accordance with the number of iterations to be performed. 15. A method in accordance with claim 1, wherein the resulting schedule is represented as one of a set of very long instruction words and microcode instructions. 16. A method for minimizing the number of interconnections required by a computer to execute a computation, the computer comprising a plurality of functional units interconnected by a plurality of interconnections, the computation being representable by a data-flow graph having a plurality of nodes and a plurality of edges and the method comprising: (a) computing a loop-period of the computation; (b) attempting to schedule the plurality of nodes within the loop period by assigning an execution cycle and a functional unit to each node of the plurality of nodes; (c) adjusting the scheduling the plurality of nodes to minimize the number of interconnections required in any execution cycle; (d) adjusting the scheduling of the plurality of nodes to increase throughput if the throughput is below a predetermined minimum throughput; (e) adjusting the scheduling of the plurality of nodes to decrease latency if the latency exceeds a predetermined maximum latency; and (f) allocating the plurality of edges to one or more of the plurality of interconnections. 17. A computer readable medium containing instructions which, when executed on a first computer, carry out a process of scheduling a computation for execution on a second computer, the second computer having a plurality of functional units interconnected by a plurality of interconnections, and the computation being representable by a data-flow graph having a plurality of nodes and a plurality of edges, the process of scheduling comprising: (a) computing a loop-period of the computation; (b) attempting to schedule the plurality of nodes within the loop period for throughput by assigning an execution cycle and a functional unit to each node of the plurality of nodes; (c) adjusting the scheduling of flexible nodes of the plurality of nodes to reduce the number of interconnections required in each execution cycle if the number of interconnections required is greater than the number of interconnection in the plurality of interconnections; and (d) allocating the plurality of edges to one or more of the plurality of interconnections. 18. A computer readable medium in accordance with claim 17, wherein one or more of the functional units is partitioned into two or more slices, the process further comprising: assigning slices of the one of more of the partitioned functional units so as to reduce the number of interconnections required in an execution cycle. 19. A computer readable medium in accordance with claim 18, wherein the assigning slices of the one of more of the partitioned functional units so as to reduce the number of interconnections required in an execution cycle comprises: computing a set of execution cycles number for which the number of interconnections required is greater than the number of interconnections in the plurality of interconnections; computing tail-times for each node allocated to a cycle of the set of execution cycles; and mapping nodes to slices of the one of more of the partitioned functional units so as to reduce the number of interconnections required in a cycle. 20. A computer readable medium in accordance with claim 18, further comprising: computing the set of execution cycles for which the number of interconnections required is greater than the number of interconnections in the plurality of interconnections; computing lead-times for each node that is the destination of an edge that intersects the set of execution cycles allocated to a cycle of the first set of execution cycles; and mapping nodes to the slices of the one of more of the partitioned functional units so as to reduce the number of interconnections required in a cycle. 21. A computer readable medium in accordance with claim 17, wherein the allocating the plurality of edges to one or more of the plurality of interconnections comprises (d1) attempting to allocate the plurality of edges to one or more of the plurality of interconnections; and (d2) if the attempt to allocate the plurality of edges is unsuccessful, increasing the loop-period and repeating from (d1). 22. A computer readable medium in accordance with claim 17 where the first and second computers are the same computer. 23. An application specific integrated circuit for performing a computation representable by a data-flow graph having a plurality of nodes and a plurality of edges, the application specific integrated circuit having a plurality of functional units interconnected by a plurality of interconnections, wherein the number of interconnections in the plurality of interconnections is determined by: (a) computing a loop-period of the computation; (b) attempting to schedule the plurality of nodes within the loop period for throughput by assigning an execution cycle and a functional unit to each node of the plurality of nodes; (c) adjusting the scheduling of flexible nodes of the plurality of nodes to minimizing the number of interconnections required in each execution cycle; and (d) allocating the plurality of edges to one or more of the plurality of interconnections. 24. An application specific integrated circuit in accordance with claim 23, wherein the allocating the plurality of edges to one or more of the plurality of interconnections comprises (d1) attempting to allocate the plurality of edges to one or more of the plurality of interconnections; and (d2) if the attempt to allocate the plurality of edges is unsuccessful, increasing the loop-period and repeating from (d1).

이 특허에 인용된 특허 (61)

Lane Allen Smith, Address generation utilizing an adder, a non-sequential counter and a latch.
상세보기
Ohta Koichi (Kyoto JPX), Algorithm training system.
상세보기
Pegatoquet, Alain; Auguin, Michel; Sohier, Olivier, Assembly code performance evaluation apparatus and method.
상세보기
Hinker, Paul J.; Boucher, Michael, Avoiding gather and scatter when calling Fortran 77 code from Fortran 90 code.
상세보기
Ansari, Ahmad R., Bus protocol for efficiently transferring vector data.
상세보기
Cray ; Jr. ; Seymour R., Computer vector register processing.
상세보기
Franssen Frank,BEX ; van Swaaij Michael,BEX ; Nachtergaele Lode,BEX ; Samsom Hans,BEX ; Catthoor Francky,BEX ; De Man Hugo,BEX, Control flow and memory management optimization.
상세보기
Glass Simon James,GBX ; Jaggar David Vivian,GBX, Data processing apparatus registers.
상세보기
Gallup Michael G. ; Goke L. Rodney ; Seaton ; Jr. Robert W. ; Lawell Terry G. ; Osborn Stephen G. ; Tomazin Thomas J., Data processing system and method thereof.
상세보기
Scales ; III Hunter Ledbetter ; Diefendorff Keith Everett ; Olsson Brett ; Dubey Pradeep Kumar ; Hochsprung Ronald Ray ; Beavers Bradford Byron ; Burgess Bradley G. ; Snyder Michael Dean ; May Cathy , Data processing system for processing vector data and method therefor.
상세보기
Glass Simon James,GBX ; Jaggar David Vivian,GBX, Data processing system register control.
상세보기
Kloker Kevin L. (Arlington Heights IL), Data processor execution unit which receives data with reduced instruction overhead.
상세보기
Komatsu Hideaki,JPX ; Ogasawara Takeshi,JPX, Determining a communication schedule between processors.
상세보기
Harper ; III David T. (Dallas TX) Linebarger Darel A. (Plano TX), Dynamic address mapping for conflict-free vector access.
상세보기
Norris Joseph P., Fault tolerant switch fabric with control and data correction by hamming codes.
상세보기
Stefan Sandstrom SE; Stefan Lundberg SE, Flexible memory channel.
상세보기
Hunt, Galen C., Interception of unit creation requests by an automatic distributed partitioning system.
상세보기
Mehrotra, Sanjeev; Wang, Albert S., Intra compression of pixel blocks using predicted mean.
상세보기
Thomas G. Robertazzi ; Serge Luryi ; Saravut Charcranoon, Load sharing controller for optimizing resource utilization cost.
상세보기
Guffens, Jan; Pont, Kurt Du, Method and apparatus for compiling source code using symbolic execution.
상세보기
Taylor Valerie E. (Evanston IL), Method and apparatus for optimized processing of sparse matrices.
상세보기
Kodosky Jeffrey L. (Austin TX), Method and apparatus for providing autoprobe features in a graphical data flow diagram.
상세보기
Kodosky Jeffrey L. ; McKaskle Greg ; Kay Meg Fletcher, Method and apparatus for providing improved type compatibility and data structure organization in a graphical data flow.
상세보기
Kodosky Jeffrey L. ; Shah Darshan K., Method and apparatus for providing stricter data type capabilities in a graphical data flow diagram.
상세보기
Agarwal Ramesh Chandra ; Groves Randall Dean ; Gustavson Fred G. ; Johnson Mark A. ; Lyon Terry L. ; Olsson Brett ; Shearer James B., Method and system in a data processing system for loading and storing vectors in a plurality of modes.
상세보기
Hunt Peter D. (Pleasanton CA) Elliott Jon K. (Pleasanton CA) Tobias Richard J. (San Jose CA) Herring Alan J. (San Jose CA) Morgan Craig R. (San Jose CA) Hiller John A. (Palo Alto CA), Method for automated deployment of a software program onto a multi-processor architecture.
상세보기
Oliver Ibe ; Vick Vaishnavi ; Roger Dev, Method for automatic partitioning of node-weighted, edge-constrained graphs.
상세보기
Nishiyama Hiroyasu,JPX ; Kikuchi Sumio,JPX ; Mori Noriyasu,JPX ; Nishimoto Akira,JPX ; Takeuchi Yooichi,JPX, Method for controlling a processor for power-saving in a computer for executing a program, compiler medium and processo.
상세보기
Dworzecki Jozef,FRX, Method of scheduling successive tasks.
상세보기
Gupta Rajiv (Ossining NY), Method of synchronizing parallel processors employing channels and compiling method minimizing cross-processor data depe.
상세보기
Ebeling, W. H. Carl; Hogenauer, Eugene B., Method, system and software for programming reconfigurable hardware.
상세보기
Chow Frederick ; Kennedy Robert ; Liu Shin-Ming ; Lo Raymond ; Tu Peng ; Chan Sun C., Method, system, and computer program product for performing register promotion via load and store placement optimization within an optimizing compiler.
상세보기
Pechanek Gerald G. ; Revilla Juan Guillermo ; Barry Edwin F., Methods and apparatus for dynamic very long instruction word sub-instruction selection for execution time parallelism in an indirect very long instruction word processor.
상세보기
Nickerson Brian R., Multi-byte processing of byte-based image data.
상세보기
Simmons Albert L., Multi-mode, multi-channel communication bus.
상세보기
Prasanna G.N. Srinivasa, Multiprocessor scheduling and execution.
상세보기
York Richard,GBX ; Frances Hedley James,GBX ; Symes Dominic,GBX ; Biles Stuart,GBX, Non-instruction base register addressing in a data processing apparatus.
상세보기
Yamada Kouichi (Tokyo JPX), On demand powering of necesssary portions of execution unit by decoding instruction word field indications which unit is.
상세보기
Gupta Rajiv ; Berson David A. ; Fang Jesse Z., Optimizing code by exploiting speculation and predication with a cost-benefit data flow analysis based on path profiling information.
상세보기
Tanaka Kazuhiko,JPX ; Kojima Keiji,JPX ; Nishioka Kiyokazu,JPX ; Nojiri Tohru,JPX ; Fujikawa Yoshifumi,JPX ; Ishiguro Masao,JPX, Parallel processing unit with cache memories storing NO-OP mask bits for instructions.
상세보기
Ku Charlene S. ; Stearns Charles C. ; Tao Olive T., Partitioned decompression of audio data using audio decoder engine for computationally intensive processing.
상세보기
Betker Michael R. ; Fernando John S. ; Lemmon Frank ; Whalen Shaun P., Pointer register indirectly addressing a second register in the processor core of a digital processor.
상세보기
Linda L. Hurd, Power saving by disabling memory block access for aligned NOP slots during fetch of multiple instruction words.
상세보기
Kung Hsiang-Tsung (Pittsburgh PA) Hsu Feng-Hsiung (Pittsburgh PA) Sussman Alan L. (Pittsburgh PA) Nishizawa Teiji (Nara JPX), Programmable interconnection chip for computer system functional modules.
상세보기
Jones, Michael B.; Draves, Jr., Richard P.; Rosu, Daniela; Rosu, Marcel-Catalin, Providing predictable scheduling of programs using a repeating precomputed schedule.
상세보기
Michael B. Jones ; Richard P. Draves, Jr. ; Daniela Rosu ; Marcel-Catalin Rosu, Providing predictable scheduling of programs using a repeating precomputed schedule.
상세보기
Jones,Michael B.; Regehr,John, Providing predictable scheduling of programs using repeating precomputed schedules on discretely scheduled and/or multiprocessor operating systems.
상세보기
Shebanow Michael C. (Austin TX) Alsup Mitchell K. (Dripping Springs TX) Scales Hunter L. (Austin TX) Hoekstra George P. (Austin TX), Randomly accessible memory having time overlapping memory accesses.
상세보기
Blomgren, James S.; Olson, Timothy A.; Harle, Christophe, Rearranging data between vector and matrix forms in a SIMD matrix processor.
상세보기
Wilkinson Paul Amba ; Dieffenderfer James Warren ; Kogge Peter Michael ; Schoonover Nicholas Jerome, SIMD/MIMD array processor with vector processing.
상세보기
Dave Bharat P. ; Jha Niraj K. ; Lakshminarayana Ganesh, Scheduling-based hardware-software co-synthesis of heterogeneous distributed embedded systems.
상세보기
Epstein David A. (Ossining NY) Gilley Glenn G. (Chapel Hill NC) McAuliff Kevin P. (Peekskill NY), System and method for efficiently executing directed acyclic graphs.
상세보기
Dally William J. ; Rixner Scott Whitney ; Grossman Jeffrey P. ; Buehler Christopher James, System and method for performing compound vector operations.
상세보기
Sastry Shivakumar, System for and method of allocating processing tasks of a control program configured to control a distributed control system.
상세보기
Rehg,James M.; Knobe,Kathleen, System for computing the optimal static schedule using the stored task execution costs with recent schedule execution costs.
상세보기
Kowalczyk Andre (San Jose CA) Yeung Norman K. P. (Fremont CA), Unified floating point and integer datapath for a RISC processor.
상세보기
Inagami Yasuhiro (Kokubunji JPX) Nagashima Shigeo (Hachiouji JPX), Vector processing apparatus including vector registers having selectively accessible storage locations.
상세보기
Kinoshita Koji (Tokyo JPX), Vector processing device comprising a single supplying circuit for use in both stride and indirect vector processing mod.
상세보기
Omoda Koichiro (Sagamihara JPX) Torii Shunichi (Musashino JPX) Nagashima Shigeo (Hachioji JPX) Inagami Yasuhiro (Hadano JPX) Nakagawa Takayuki (Hadano JPX), Vector processor for reordering vector data during transfer from main memory to vector registers.
상세보기
Ansari, Ahmad R., Vector transfer system generating address error exception when vector to be transferred does not start and end on same memory page.
상세보기
Ashar, Pranav; Raghunathan, Anand; Bhattacharya, Subhrajit; Gupta, Aarti, Verification of scheduling in the presence of loops using uninterpreted symbolic simulation.
상세보기

이 특허를 인용한 특허 (9)

Gschwind, Michael K.; Salapura, Valentina, Identifying instructions for decode-time instruction optimization grouping in view of cache boundaries.
상세보기
Gschwind, Michael K.; Salapura, Valentina, Instruction group formation techniques for decode-time instruction optimization based on feedback.
상세보기
Rakvic, Ryan; Hankins, Richard A.; Grochowski, Ed; Wang, Hong; Annavaram, Murali; Poulsen, David K.; Shah, Sanjiv; Shen, John; Chinya, Gautham, Load balancing for multi-threaded applications via asymmetric power throttling.
상세보기
Rakvic, Ryan; Hankins, Richard A.; Grochowski, Ed; Wang, Hong; Annavaram, Murali; Poulsen, David K.; Shah, Sanjiv; Shen, John; Chinya, Gautham, Load balancing for multi-threaded applications via asymmetric power throttling.
상세보기
Weinman, Jr., Joseph B., Optimized job scheduling and execution in a distributed computing grid.
상세보기
Weinman, Jr., Joseph B., Optimized job scheduling and execution in a distributed computing grid.
상세보기
Weinman, Jr., Joseph B., Optimized job scheduling and execution in a distributed computing grid.
상세보기
Gschwind, Michael K.; Salapura, Valentina, Techniques for identifying instructions for decode-time instruction optimization grouping in view of cache boundaries.
상세보기
Gschwind, Michael K.; Salapura, Valentina, Techniques for instruction group formation for decode-time instruction optimization based on feedback.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Scheduler of program instructions for streaming vector processor having interconnected functional units 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (61)

이 특허를 인용한 특허 (9)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Scheduler of program instructions for streaming vector processor having interconnected functional units 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (61)

이 특허를 인용한 특허 (9)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트