System and method for building decision tree classifiers using bitmap techniques
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-007/00
G06F-017/30
G06F-017/00
출원번호
UP-0344193
(2006-02-01)
등록번호
US-7571159
(2009-08-24)
발명자
/ 주소
Thomas, Shiby
Li, Wei
Yarmus, Joseph
Jagannath, Mahesh
Mozes, Ari W.
출원인 / 주소
Oracle International Corporation
대리인 / 주소
Hanify & King, P.C.
인용정보
피인용 횟수 :
1인용 특허 :
1
초록▼
A method, system, and computer program product for counting predictor-target pairs for a decision tree model provides the capability to generate count tables that is quicker and more efficient than previous techniques. A method of counting predictor-target pairs for a decision tree model, the decisi
A method, system, and computer program product for counting predictor-target pairs for a decision tree model provides the capability to generate count tables that is quicker and more efficient than previous techniques. A method of counting predictor-target pairs for a decision tree model, the decision tree model based on data stored in a database, the data comprising a plurality of rows of data, at least one predictor and at least one target, comprises generating a bitmap for each split node of data stored in a database system by intersecting a parent node bitmap and a bitmap of a predictor that satisfies a condition of the node, intersecting each split node bitmap with each predictor bitmap and with each target bitmap to form intersected bitmaps, and counting bits of each intersected bitmap to generate a count of predictor-target pairs.
대표청구항▼
What is claimed is: 1. A computer-implemented method of generating a decision tree model comprising: generating a plurality of bitmaps in the database system, the bitmaps generated from data stored in a database table in the database system, the database table comprising a plurality of rows of data
What is claimed is: 1. A computer-implemented method of generating a decision tree model comprising: generating a plurality of bitmaps in the database system, the bitmaps generated from data stored in a database table in the database system, the database table comprising a plurality of rows of data, the plurality of bitmaps comprising a bitmap for each unique value of each predictor and target and indicating whether or not that unique value of each predictor and target is present in each row of the database table; intersecting each split node bitmap with each predictor bitmap and with each target bitmap to form intersected bitmaps; counting bits of each intersected bitmap to generate a count of predictor-target pairs; determining a splitter value for the data in the database table using the counts of the predictor-target pairs so as to split the data in the database table into a plurality of child nodes, each child node comprising a portion of the data in the database table; generating child bitmaps for the data in each child node; recursively generating a split node bitmap for each child node by intersecting a parent node bitmap and a bitmap of a predictor that satisfies a condition of the child node; intersecting each child node bitmap with each predictor bitmap and with each target bitmap to form intersected bitmaps; and counting bits of each intersected bitmap to generate a count of predictor-target pairs; whereby a decision tree model is formed; wherein the bitmap of the predictor that satisfies the condition of the child node is generated by: if the node is a root node, generating a bitmap by ORing each bitmap for each value of the predictor that satisfies a condition of the node split, to form a single bitmap for all values of the predictor that satisfy the condition of the node split; and if the node is below the root node, generating a bitmap by ORing each bitmap for each value of the predictor that satisfies a condition of the node split, to form a single bitmap for all values of the predictor that satisfy the condition of the node split and ANDing the single bitmap with a bitmap for a node above the node. 2. The method of claim 1, wherein each split node bitmap is intersected with each predictor bitmap and with each target bitmap to form intersected bitmaps by: intersecting each target bitmap with each split node bitmap to form a plurality of intermediate bitmaps; and intersecting each intermediate bitmap with each predictor bitmap to form an intersected bitmap. 3. The method of claim 2, wherein the target bitmaps and the split node bitmaps fit in a memory of a computer. 4. The method of claim 1, wherein each split node bitmap is intersected with each predictor bitmap and with each target bitmap to form intersected bitmaps by: for each of a plurality of portions of the split node bitmaps: intersecting each target bitmap with each split node bitmap in the portion of the split node bitmaps to form a plurality of intermediate bitmaps; and intersecting each intermediate bitmap with each predictor bitmap to form the intersected bitmaps. 5. The method of claim 4, wherein the target bitmaps and a portion of the split node bitmaps fit in a memory of a computer. 6. The method of claim 1, wherein the bitmaps are sorted by predictor and predictor value and target and target value. 7. A database system for generating a decision tree model comprising: a processor operable to execute computer program instructions; a memory operable to store computer program instructions executable by the processor; and computer program instructions stored in the memory and executable to perform the steps of: generating a plurality of bitmaps in the database system, the bitmaps generated from data stored in a database table in the database system, the database table comprising a plurality of rows of data, the plurality of bitmaps comprising a bitmap for each unique value of each predictor and target and indicating whether or not that unique value of each predictor and target is present in each row of the database table; intersecting each split node bitmap with each predictor bitmap and with each target bitmap to form intersected bitmaps; counting bits of each intersected bitmap to generate a count of predictor-target pairs; determining a splitter value for the data in the database table using the counts of the predictor-target pairs so as to split the data in the database table into a plurality of child nodes, each child node comprising a portion of the data in the database table; generating child bitmaps for the data in each child node; recursively generating a bitmap for each child node by intersecting a parent node bitmap and a bitmap of a predictor that satisfies a condition of the child node; intersecting each child node bitmap with each predictor bitmap and with each target bitmap to form intersected bitmaps; and counting bits of each intersected bitmap to generate a count of predictor-target pairs; whereby a decision tree model is formed, wherein the bitmap predictor that satisfies the condition of the child node is generated by: if the node is a root node, generating a bitmap by ORing each bitmap for each value of the predictor that satisfies a condition of the node split, to form a single bitmap for all values of the predictor that satisfy the condition of the node split; and if the node is below the root node, generating a bitmap by ORing each bitmap for each value of the predictor that satisfies a condition of the node split, to form a single bitmap for all values of the predictor that satisfy the condition of the node split and ANDing the single bitmap with a bitmap for a node above the node. 8. The system of claim 7, wherein each split node bitmap is intersected with each predictor bitmap and with each target bitmap to form intersected bitmaps by: intersecting each target bitmap with each split node bitmap to form a plurality of intermediate bitmaps; and intersecting each intermediate bitmap with each predictor bitmap to form an intersected bitmap. 9. The system of claim 8, wherein the target bitmaps and the split node bitmaps fit in a memory of a computer. 10. The system of claim 7, wherein each split node bitmap is intersected with each predictor bitmap and with each target bitmap to form intersected bitmaps by: for each of a plurality of portions of the split node bitmaps: intersecting each target bitmap with each split node bitmap in the portion of the split node bitmaps to form a plurality of intermediate bitmaps; and intersecting each intermediate bitmap with each predictor bitmap to form the intersected bitmaps. 11. The system of claim 10, wherein the target bitmaps and a portion of the split node bitmaps fit in a memory of a computer. 12. The system of claim 7, wherein the bitmaps are sorted by predictor and predictor value and target and target value. 13. A computer program product for generating a decision tree model in a database system, comprising: a computer readable storage medium; computer program instructions, recorded on the computer readable storage medium, executable by a processor, for performing the steps of generating a plurality of bitmaps in the database system, the bitmaps generated from data stored in a database table in the database system, the database table comprising a plurality of rows of data, the plurality of bitmaps comprising a bitmap for each unique value of each predictor and target and indicating whether or not that unique value of each predictor and target is present in each row of the database table; intersecting each split node bitmap with each predictor bitmap and with each target bitmap to form intersected bitmaps; counting bits of each intersected bitmap to generate a count of predictor-target pairs; determining a splitter value for the data in the database table using the counts of the predictor-target pairs so as to split the data in the database table into a plurality of child nodes, each child node comprising a portion of the data in the database table; generating child bitmaps for the data in each child node; recursively generating a bitmap for each child node by intersecting a parent node bitmap and a bitmap of a predictor that satisfies a condition of the child node; intersecting each child node bitmap with each predictor bitmap and with each target bitmap to form intersected bitmaps; and counting bits of each intersected bitmap to generate a count of predictor-target pairs; whereby a decision tree model is formed; wherein the bitmap of the predictor that satisfies the condition of the node is generated by: if the node is a root node, generating a bitmap by ORing each bitmap for each value of the predictor that satisfies a condition of the node Split, to form a single bitmap for all values of the predictor that satisfy the condition of the node split; and if the node is below the root node, generating a bitmap by ORing each bitmap for each value of the predictor that satisfies a condition of the node split, to form a single bitmap for all values of the predictor that satisfy the condition of the node split and ANDing the single bitmap with a bitmap for a node above the node. 14. The computer program product of claim 13, wherein the bitmaps are sorted by predictor and predictor value and target and target value. 15. The computer program product of claim 13, wherein each split node bitmap is intersected with each predictor bitmap and with each target bitmap to form intersected bitmaps by: intersecting each target bitmap with each split node bitmap to form a plurality of intermediate bitmaps; and intersecting each intermediate bitmap with each predictor bitmap to form an intersected bitmap. 16. The computer program product of claim 15, wherein the target bitmaps and the split node bitmaps fit in a memory of a computer. 17. The computer program product of claim 15, wherein the target bitmaps and a portion of the split node bitmaps fit in a memory of a computer.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (1)
Eatherton,William N.; Dittia,Zubin D., Tree bitmap data structures and their use in performing lookup operations.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.