Matrix computation framework
원문보기
IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0105915
(2011-05-12)
|
등록번호 |
US-8788556
(2014-07-22)
|
발명자
/ 주소 |
- Zhang, Zheng
- Qian, Zhengping
- Chen, Xiuwei
- Yu, Yuan
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
2 인용 특허 :
4 |
초록
▼
Described herein are technologies pertaining to matrix computation. A computer-executable algorithm that is configured to execute perform a sequence of computations over a matrix tile is received and translated into a global directed acyclic graph that includes vertices that perform a sequence of ma
Described herein are technologies pertaining to matrix computation. A computer-executable algorithm that is configured to execute perform a sequence of computations over a matrix tile is received and translated into a global directed acyclic graph that includes vertices that perform a sequence of matrix computations and edges that represent data dependencies amongst vertices. A vertex in the global directed acyclic graph is represented by a local directed acyclic graph that includes vertices that perform a sequence of matrix computations at the block level, thereby facilitating pipelined, data-driven matrix computation.
대표청구항
▼
1. A method, comprising: receiving a computer-executable algorithm that is configured to execute a matrix computation on a tile of a matrix;translating the computer-executable algorithm into a computer-implemented global directed acyclic graph that comprises a plurality of vertices and a correspondi
1. A method, comprising: receiving a computer-executable algorithm that is configured to execute a matrix computation on a tile of a matrix;translating the computer-executable algorithm into a computer-implemented global directed acyclic graph that comprises a plurality of vertices and a corresponding plurality of edges, wherein a vertex in the global directed acyclic graph is configured to perform a sequence of operations on the tile of the matrix and the plurality of edges represent data dependencies between coupled vertices, the translating performed by a computer processor; andrepresenting the vertex in the global directed acyclic graph as a local directed acyclic graph that comprises a plurality of vertices, wherein a vertex in the local directed acyclic graph is configured to execute a mathematical operation on a matrix block that is smaller than the tile, wherein the vertex in the local directed acyclic graph executes the mathematical operation in a data-driven manner, the representing performed by the computer processor. 2. The method of claim 1, wherein executing the mathematical operation in the data driven manner comprises executing the mathematical operation on the block responsive to the block being received at the vertex in the local directed acyclic graph. 3. The method of claim 2, wherein executing the mathematical operation in the data driven manner comprises outputting another block for provision to another vertex in the local directed acyclic graph immediately subsequent to executing the mathematical operation on the block. 4. The method of claim 1, wherein size of the block corresponds to size of a cache of a computing device that comprises the local directed acyclic graph. 5. The method of claim 1, wherein the local directed acyclic graph is represented as skeleton code that causes an operator to fire responsive to receipt of the block. 6. The method of claim 1, further comprising scheduling a plurality of different computing devices to perform a plurality of different matrix computations to generate an output. 7. The method of claim 1, further comprising: detecting a failure with respect to the local directed acyclic graph; andcausing the mathematical operation to be re-executed responsive to detecting the failure. 8. The method of claim 7, further comprising: recording identities of blocks output by the vertex in the local directed acyclic graph; andcausing the mathematical operation to be re-executed based at least in part upon the identities of the blocks output by the vertex. 9. The method of claim 8, further comprising: prior to causing the mathematical operation to be re-executed, querying children vertices in the local directed acyclic graph for identities of blocks needed by the children vertices; andcausing the mathematical operation to be re-executed based at least in part upon identities of the block returned responsive to the querying. 10. The method of claim 1 configured for execution in a high performance computing environment. 11. The method of claim 1, wherein output of the global directed acyclic graph is configured for employment in connection with one of facial recognition and three-dimensional modeling. 12. A system that facilitates large-scale distributed matrix computation, the system comprising: a processor; anda memory that comprises a plurality of components that are executed by the processor, the plurality of components comprising:a scheduler component that receives a computation pertaining to a matrix and performs the following actions: causes the computation to be represented as a plurality of vertices that are representative of computations to be undertaken by tiles of the matrix, the plurality of vertices being related by a plurality of edges; andschedules the plurality of vertices to be executed on a corresponding plurality of computing devices; andan executor component that executes the computations in parallel in a data-driven manner. 13. The system of claim 12, wherein the scheduler component causes the computation to be represented as a global directed acyclic graph. 14. The system of claim 13, wherein the scheduler component is further configured, for a vertex in the global directed acyclic graph, cause a matrix computation corresponding to the vertex to be represented as a plurality of child vertices that are configured to execute matrix computations on blocks, wherein the blocks are of a size that is less than the tiles. 15. The system of claim 14, wherein the blocks are the respective sizes of caches of the plurality of computing devices. 16. The system of claim 12, wherein the executor component retrieves at least one matrix computation from a library responsive to computer-executable code represented by a vertex calling the at least one matrix computation from the library. 17. The system of claim 12 being distributed across the plurality of computing devices. 18. The system of claim 12, further comprising a fault detector component that detects that a fault has occurred on at least one computing device and causes the executor component to re-execute at least one computation responsive to detecting that the fault has occurred. 19. The system of claim 18, wherein the fault detector component causes the at least one computation to be re-executed based at least in part upon data retained with respect to child vertices that are dependent upon data from a vertex that represents the at least one computation. 20. A computer-readable medium comprising instructions that, when executed by a processor, cause the processor to perform acts, comprising: receiving at least one computation that is to be executed over a matrix;subsequent to receiving the at least one computation, representing the at least one computation as a sequence of operations that are to be undertaken on tiles of the matrix;subsequent to representing the at least one computation as the sequence of operations, translating at least one operation into a global directed acyclic graph that comprises a plurality of vertices that are configured to perform a corresponding plurality of sequential operations on at least one tile of the matrix and a plurality of edges that represent data dependencies between the plurality of vertices;subsequent to translating the at least one operation in the global directed acyclic graph, representing at least one vertex in the global directed acyclic graph as a local directed acyclic graph that comprises a plurality of vertices that are configured to perform a corresponding plurality of sequential operations on at least one block that corresponds to the matrix, wherein a size of the block is smaller than a size of the at least one tile; andcausing the sequential operations that are represented by the plurality of vertices in the local directed acyclic graph to be executed in a data-driven manner.
이 특허에 인용된 특허 (4)
-
Prasanna G.N. Srinivasa, Multiprocessor scheduling and execution.
-
Little, John N.; Hicklin, Joseph F.; Martin, Jocelyn Luke; Moulana, Nausheen B.; Stefansson, Halldor N.; Dean, Loren; Lurie, Roy E.; Johnson, Stephen C.; Anderson, Penelope L.; Karr, Michael E.; Kinchen, Jason A., Parallel programming interface to dynamicaly allocate program portions.
-
Khan Emdadur R. (San Jose CA), Systolic array for multidimensional matrix computations.
-
Gilbert John R. ; Lamping John O. ; Mendhekar Anurag ; Shpeisman Tatiana, Tools for efficient sparse matrix computation.
이 특허를 인용한 특허 (2)
-
Andrianov, Alexander, Sparse threaded deterministic lock-free cholesky and LDLT factorizations.
-
Andrianov, Alexander, Sparse threaded deterministic lock-free cholesky and LDLT factorizations.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.