A first transaction manager of a partitioned storage group stores a first conditional commit record for a first write of a multi-partition transaction based on a first conflict detection operation. A second transaction manager stores a second conditional commit record for a second write of the trans
A first transaction manager of a partitioned storage group stores a first conditional commit record for a first write of a multi-partition transaction based on a first conflict detection operation. A second transaction manager stores a second conditional commit record for a second write of the transaction based on a second conflict detection operation. A client-side component of the storage group determines that both writes have been conditionally committed, and stores an unconditional commit record in a commit decision repository. A write applier examines the first conditional commit record and the unconditional commit record before propagating the first write to the first partition.
대표청구항▼
1. A system, comprising: one or more computing devices configured to: receive, at a first log-based transaction manager (LTM) from a client-side component of a storage group, a first commit request for a first write of a first multi-partition transaction, wherein the first write is dependent on a re
1. A system, comprising: one or more computing devices configured to: receive, at a first log-based transaction manager (LTM) from a client-side component of a storage group, a first commit request for a first write of a first multi-partition transaction, wherein the first write is dependent on a result of a first read indicated in the first commit request, wherein the first read was directed to a first partition of the storage group;receive, at a second LTM from the client-side component, a second commit request for a second write of the first multi-partition transaction, wherein the second write is dependent on a result of a second read indicated in the second commit request, wherein the second read was directed to a second partition of the storage group;store, within a first persistent log of the first LTM, a first conditional commit record corresponding to the first write, indicating that the first write is committable with respect to read-write conflicts between the first read and a first subset of writes recorded in the first persistent log, wherein the first conditional commit record includes metadata pertaining to the first multi-partition transaction;store, within a second persistent log of the second LTM, a second conditional commit record corresponding to the second write, indicating that the second write is committable with respect to read-write conflicts between the second read and a second subset of writes recorded in the second persistent log, wherein the second conditional commit record includes the metadata pertaining to the first multi-partition transaction;store, by the client-side component in response to determining that (a) the first conditional commit record has been stored and (b) the second conditional commit record has been stored, a first unconditional commit record corresponding to the first multi-partition transaction in a multi-partition commit decision repository (MCDR); andpropagate, by a first write applier associated with a particular partition of the storage group, the first write to the particular partition in response to (a) an examination of the metadata pertaining to the first multi-partition transaction included in the first conditional commit record and (b) an examination of the first unconditional commit record. 2. The system as recited in claim 1, wherein the one or more computing devices are further configured to: receive, at the first LTM, a third commit request for a third write of a different multi-partition transaction, wherein the third write is directed to the first partition, and wherein the third commit request comprises a representation of a timeout for the different multi-partition transaction;store, within the first persistent log, a third conditional commit record corresponding to the third write, indicating that the third write is committable with respect to read-write conflicts;detect, subsequent to an expiration of the timeout, that a second unconditional commit record corresponding to the different multi-partition transaction has not been stored in the MCDR; anddetermine that a propagation of the third write to the first partition is not to be implemented. 3. The system as recited in claim 1, wherein the first commit request includes an indication of the MCDR, and wherein the metadata pertaining to the first multi-partition transaction comprises the indication of the MCDR. 4. The system as recited in claim 1, wherein the one or more computing devices are further configured to: receive, prior to the first commit request, an indication of a target performance level for one or more types of operations directed to the storage group; anddetermine, based at least in part on the indication of the target performance level, one or more of: (a) a number of LTMs to be established for the storage group, (b) a number of MCDRs to be established for the storage group or (c) a number of write appliers to be established for the storage group. 5. The system as recited in claim 1, wherein the first partition is designated as a master partition of the storage group, wherein the first persistent log comprises at least a first portion of one or more storage devices of a first server, and wherein the MCDR comprises at least a second portion of the one or more storage devices of the first server. 6. A method, comprising: performing, by one or more computing devices: storing, within a first persistent log of a first log-based transaction manager (LTM) of a storage group, a first conditional commit record corresponding to a first write of a first multi-partition transaction, indicating that the first write has been designated committable based at least in part on a first conflict detection analysis performed by the first LTM;storing, within a second persistent log of a second LTM of the storage group, a second conditional commit record corresponding to a second write of the first multi-partition transaction, indicating that the second write has been designated committable based at least in part on a second conflict detection analysis performed by the second LTM;in response to detecting that the first and second writes have been designated committable, storing a first unconditional commit record corresponding to the first multi-partition transaction; andpropagating, by a first write applier, the first write to a particular partition of the storage group in response to (a) an examination of the first conditional commit record and (b) an examination of the first unconditional commit record. 7. The method as recited in claim 6, further comprising performing, by the one or more computing devices: receiving, at the first LTM, a commit request for a different write of a different multi-partition transaction, wherein the different write is directed to the first partition, and wherein the commit request comprises a representation of a timeout for the different multi-partition transaction;storing, within the first persistent log, a third conditional commit record corresponding to the different write, indicating that the different write is committable based at least in part on a third conflict detection analysis performed by the first LTM;detecting, subsequent to an expiration of the timeout, that a second unconditional commit record corresponding to the different multi-partition transaction has not been stored in a multi-partition commit decision repository (MCDR); anddetermining that a propagation of the different write to the first partition is not to be implemented. 8. The method as recited in claim 7, wherein the MCDR has an associated logical clock providing monotonically increasing logical timestamp values, and wherein the timeout comprises a particular future logical timestamp value expected to be obtained from the logical clock. 9. The method as recited in claim 6, wherein said first conditional commit record is stored in response to a first commit request for the first write, and wherein said second conditional commit request is stored in response to a second commit request for the second write, further comprising performing, by the one or more computing devices: receiving, at the first LTM, a third commit request for a third write of a second multi-partition transaction, wherein the third write is dependent upon a result of a third read directed to the first partition;receiving, at the second LTM, a fourth commit request for a fourth write of the second multi-partition transaction, wherein the fourth write is dependent upon a result of a fourth read directed to the second partition;storing, within the first persistent log, a third conditional commit record corresponding to the third write, indicating that the third write is committable based at least in part on a third conflict detection analysis performed by the first LTM;determining that the fourth commit request has been rejected by the second LTM based at least in part on a particular conflict detected by the second LTM; andabandoning propagation of the third write in response to the indication that the fourth commit request has been rejected. 10. The method as recited in claim 6, wherein the first conditional commit record is stored in response to a first commit request received at the first LTM from a client-side component of the storage group, wherein said storing the first unconditional commit record is performed by the client-side component. 11. The method as recited in claim 10, wherein the first commit request includes an indication of an MCDR into which the first unconditional commit record is stored. 12. The method as recited in claim 10, wherein the first commit request comprises an indication of a first read, wherein the first write is dependent upon a result of the first read, and wherein the indication of the first read is used by the first LTM to perform the first conflict detection analysis. 13. The method as recited in claim 6, further comprising performing, by the one or more computing devices: receiving an indication of a target performance level for one or more types of operations directed to the storage group; anddetermining, based at least in part on the indication of the target performance level, one or more of: (a) a number of LTMs to be established for the storage group, (b) a number of multi-partition commit decision repositories to be established for the storage group or (c) a number of write appliers to be established for the storage group. 14. The method as recited in claim 6, wherein the first partition is designated as a master partition of the storage group, wherein the first persistent log comprises at least a first portion of one or more storage devices of a first server, and wherein the first unconditional commit record is stored at a particular storage device of the one or more storage devices. 15. The method as recited in claim 6, wherein the first unconditional commit record is stored in a multi-partition commit decision repository (MCDR) comprising a plurality of nodes of a replication directed acyclic graph including a first node and a second node, and wherein said storing the first unconditional commit record comprises storing respective replicas of the first unconditional commit record at the first node and the second node. 16. A non-transitory computer-accessible storage medium storing program instructions that when executed on one or more processors implement a client-side component of a storage group, wherein the client-side component is configured to: transmit, to a first log-based transaction manager (LTM) of a storage group, a first commit request for a first write of a first multi-partition transaction, wherein the first commit request comprises an indication of a first read on which the first write depends, wherein the first read was directed to a first partition of the storage group;transmit, to a second log-based transaction manager (LTM) of the storage group, a second commit request for a second write of a first multi-partition transaction, wherein the second commit request comprises an indication of a second read on which the second write depends, wherein the second read was directed to a second partition of the storage group;in response to a determination that (a) the first write has been designated as committable by the first LTM based at least in part on a first commit analysis and (b) the second write has been designated as committable by the second LTM based at least in part on a second commit analysis, store a first unconditional commit record corresponding to the first multi-partition transaction at a selected location. 17. The non-transitory computer-accessible storage medium as recited in claim 16, wherein the first commit request comprises one or more of: (a) an indication of the selected location, or (b) a commit timeout for the first multi-partition transaction. 18. The non-transitory computer-accessible storage medium as recited in claim 16, wherein the client-side component is further configured to: in response to a determination that at least one write of a different multi-partition transaction has been designated as un-committable by the first LTM, store an abort record corresponding to the different multi-partition transaction at a different selected location. 19. A non-transitory computer-accessible storage medium storing program instructions that when executed on one or more processors implement a write applier of a storage group, wherein the write applier is configured to: determine that a particular record stored within a persistent log by a log-based transaction manager of the storage group represents a conditional commit of a first write directed to a first partition of the storage group;identify a commit decision repository designated to store unconditional commit records for multi-partition transactions which include one or more writes directed to first partition; andin response to a determination that the commit decision repository includes an unconditional commit record for a first multi-partition transaction which includes the first write, propagate the first write to the first partition of the storage group. 20. The non-transitory computer-accessible storage medium as recited in claim 19, wherein the write applier is configured to: determine that a second record stored within the persistent log represents a conditional commit of a second write directed to the first partition, wherein the second write is part of a second multi-partition transaction;identify a commit timeout associated with the second multi-partition transaction; andabandon propagation of the second write to the first partition based at least in part on a detection that an unconditional commit record corresponding to the second multi-partition transaction has not been written to the commit decision repository prior to an expiration of the timeout. 21. The non-transitory computer-accessible storage medium as recited in claim 19, wherein the write applier is configured to: determine that a second record stored within the persistent log represents an unconditional commit of a second write directed to the first partition; andpropagate the second write to the first partition without examining the commit decision repository.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (15)
Xia, Xiongwu; Seul, Michael; Chau, Chiu; Determan, Scott, Analysis, secure access to, and transmission of array images.
Corbin, Scott Roger; Shepherd, Joel; Pareek, Alok; McAllister, Chris, Apparatus and method for log based replication of distributed transactions using globally acknowledged commits.
Wang, Rui; Byrne, Peter; Stewart, Leigh M.; Dhamankar, Robin D.; Guo, Qun; Habben, Michael E.; Jiang, Xiaowei, Data change ordering in multi-log based replication.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.