IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0374380
(1999-08-13)
|
발명자
/ 주소 |
- Mao, Jianchang
- Niblack, Carlton Wayne
|
출원인 / 주소 |
- International Business Machines Corporation
|
대리인 / 주소 |
Gutman, JoseFleit, Kain, Gibbons, Gutman & Bongini P. L.
|
인용정보 |
피인용 횟수 :
144 인용 특허 :
17 |
초록
▼
A method and apparatus for indexing and searching content in a hardcopy document utilizes a searching assistant computing device (402) with an index table (420) stored in memory (412). The index table (420) is created in memory by scanning a 2-D barcode from a hardcopy document or alternatively by d
A method and apparatus for indexing and searching content in a hardcopy document utilizes a searching assistant computing device (402) with an index table (420) stored in memory (412). The index table (420) is created in memory by scanning a 2-D barcode from a hardcopy document or alternatively by downloading indexing information from a web page via the Internet (430). A search engine (410) in the searching assistant (402) searches the index table (420) to locate a data element found in the content of the hardcopy document. The indexing information corresponding to the data element is displayed to a user as part of the search results to indicate the location of the data element in the hardcopy document.
대표청구항
▼
A method and apparatus for indexing and searching content in a hardcopy document utilizes a searching assistant computing device (402) with an index table (420) stored in memory (412). The index table (420) is created in memory by scanning a 2-D barcode from a hardcopy document or alternatively by d
A method and apparatus for indexing and searching content in a hardcopy document utilizes a searching assistant computing device (402) with an index table (420) stored in memory (412). The index table (420) is created in memory by scanning a 2-D barcode from a hardcopy document or alternatively by downloading indexing information from a web page via the Internet (430). A search engine (410) in the searching assistant (402) searches the index table (420) to locate a data element found in the content of the hardcopy document. The indexing information corresponding to the data element is displayed to a user as part of the search results to indicate the location of the data element in the hardcopy document. o be obtained by using the prediction of the subordinate model corresponding to the retained mutually exclusive segment; and (k) terminating the open-ended decision list with one of the models built in step (a) or step (h) and resulting in a terminated decision list, thereby solving the problem of model interpretability while mitigating the fragmentation problem. 2. A computer implemented method as recited in claim 1, wherein the step (b) of observing performance further comprises the step of observing performance of the current model when applied to an entire current population of observing points; and wherein the step (d) of sorting further comprises the step of comparing the observed performance of the current model when applied to the entire current population of observing points, as determined in step (b) with sorted estimates as derived in step (d) and continuing with step (j), if an estimate as derived in step (d) is not substantially better than the observed performance. 3. A computer implemented method as recited in claim 2, wherein a counter K is initialized to a value of 1 in step (a) and incremented to K+1 in step (h) when a new predictive model is built so that the current model is called a K-th current model for each value of counter K, the arranging step (j) further comprising the steps: selecting a value J in a range of 1 to the value of counter K; discarding segments and subordinate models that were selected and retained in step (e) for the K-th current model, for all repetitions where K is greater than J; discarding the K-th current model that was built in step (a) or (h), for all repetitions where K is not equal to J, wherein the terminated decision list as generated in step (k) is terminated with the K-th current model, where K is equal to J, as selected in the selecting step, thereby choosing a number of repetitions from an adaptively determined range of possibilities. 4. A computer implemented method as recited in claim 3, wherein the number of repetitions J is chosen from an ad adaptively determined range of possibilities by optimizing observed performance when applied to an initial population of observing points and the step of selecting a value J in the arranging step (j) further comprises the steps: combining, for each possible choice of values for J in a range of 1 to K, the observed performance of the K-th current model when applied to every mutually exclusive segment of a K-th population of observing points selected and retained in step (e), where the value of counter K is less than J, with the observed performance of the K-th current model when applied to an entire K-th population of observing points, where the value of counter K is equal to J, thereby determining the observed performance when applied to the initial population of observing points that would result from choosing a value for J; and choosing a value for J to optimize observed performance as determined in step (a), and choosing the number of repetitions J from an adaptively determined range of possibilities by optimizing observed performance when applied to the initial population of observing points. 5. A computer implemented method as recited in claim 3, wherein the number of repetitions J is chosen from an adaptively determined range of possibilities by optimizing a reliable statistical estimate bounding future performance, and the step of selecting a value J in the arranging step (j) further comprises the steps: combining, for each possible choice for J in a range from 1 to the value of counter K as selected in step (j), the observed performance of the K-th current model when applied to every mutually exclusive segment of an entire K-th population of observing points that was selected and retained in step (e), where the value of counter K is less than J, with the observed performance of the K-th current model when applied to the entire K-th population of observing points, where the value of counter K is equ al to J, thereby determining the observed performance when applied to the initial population of observing points that would result from choosing a value for J; forming an increasing sequence of sets of possible choices for the value J in a range from 1 to the value of counter K and ending the sequence of sets with a set comprising all possible choices; selecting, for each set of possible choices formed in the forming step, the value of J that optimizes observed performance for the set, as determined in the combining step; deriving a reliable estimate bounding future performance with the value of J selected in the selecting step by applying statistical learning theory, for each set of possibilities formed in the forming step, the estimate being derived from parameters, wherein the parameters are selected from a group consisting of the observed performance with J, the number of possibilities in the set, and the number of observing points; and selecting the value of J to optimize estimated performance as determined in the deriving step. 6. A computer implemented method of boosting of predictive models that apply subordinate models to data points in possibly intersecting segments and arbitrate among the predictions of applicable subordinate models whenever a point falls within two or more segments, called cascade boosting, for resolving an interpretability problem of previous boosting methods, said method comprising the steps: (a) building an initial predictive model, which initially is a current model, that applies at least one subordinate model to a plurality of data points in possibly intersecting segments and arbitrates among the predictions of the applicable subordinate models whenever a point falls within two or more segments, the initial predictive model being built from an initial population of training data points; (b) observing performance of the current model, which is initially the initial predictive model, when applied to each segment of a current population of observing data points, which is initially either the initial population of training data points or a separate initial population of data points reserved for observing performance; (c) applying statistical learning theory to derive a reliable estimate bounding future performance of the current model on each segment, the estimate being derived for each segment from the observed performance together with a number of the observing data points falling within the segment; (d) sorting the segments by the estimates; (e) selecting and retaining a fraction of the segments, and also retaining each subordinate model associated with the segment, the selection resulting in retention of segments with better estimates, while separately selecting and retaining each additional segment intersecting the segments with better estimates, and also retaining each subordinate model associated with each additional segment; (f) forming a subpopulation of training points by sampling from the current population of training points so as to exclude, either with certainty or with high probability, each point falling within selected and retained segments having better estimates and the additional segments; (g) forming a subpopulation of observing data points by sampling from the current population of observing data points so as to exclude, either with certainty or with high probability, each point falling within the selected and retained segments having better estimates and the additional segments; (h) building another predictive model which becomes the current model, the current model applying at least one subordinate model to a plurality of data points in possibly intersecting segments and arbitrating among predictions of applicable subordinate models whenever a point falls within two or more segments, and being built from the subpopulation of training points; (i) repeating steps (b) to (h) a desired number of times, with the subpopulation of training points as the current po pulation of training points and with the subpopulation of observing data points as the current population of observing data points; (j) arranging the selected and retained segments having better estimates in an open-ended decision list, where each item in the open-ended decision list specifies a test for membership in one of the selected and retained segments having better estimates and specifies that, if a point passes a membership test, then a prediction for the point is to be obtained by arbitrating among predictions of all pertinent models, the pertinent models being subordinate models corresponding to selected and retained segments within which the point falls, the selected and retained segments having been selected and retained in execution of step (e) that selected and retained the segment in the membership test as one of the segments having better estimates; (k) terminating the open-ended decision list with one of the models built in step (a) or step (h) and resulting in a terminated decision list, thereby solving the problem of model interpretability. 7. A computer implemented method as recited in claim 6, wherein the step (b) of observing performance further comprises the step of observing performance of the current model when applied to an entire current population of observing points; and wherein the step (d) of sorting further comprises the step of comparing the observed performance of the current model when applied to the entire current population of observing points, as determined in step (b) with the sorted estimates as derived in step (d) and continuing with step (j), if an estimate as derived in step (d) is not substantially better than an observed performance. 8. A computer implemented method as recited in claim 7, wherein a counter K is initialized to a value of 1 in step (a) and incremented to K+1 in step (h) when a new predictive model is built so that the current model is called a K-th current model for each value of counter K, the arranging step (j) further comprising the steps: selecting a value J in a range of 1 to the value of counter K; discarding segments and subordinate models that were selected and retained in step (e) for the K-th current model, for all repetitions where K is greater than J; discarding the K-th current model that was built in step (a) or (h), for all repetitions where K is not equal to J, wherein the terminated decision list as generated in step (k) is terminated with the K-th current model, where K is equal to J, as selected in the selecting step, thereby choosing a number of repetitions from an adaptively determined range of possibilities. 9. A computer implemented method as recited in claim 8, wherein the number of repetitions J is chosen from an adaptively determined range of possibilities by optimizing observed performance when applied to an initial population of observing points and the step of selecting a value J in the arranging step (j) further comprises the steps: combining, for each possible choice of values for J in a range of 1 to K, the observed performance of the K-th current model when applied to every segment of a K-th population of observing points selected and retained in step (e) as a segment with a better estimate, where the value of counter K is less than J, with the observed performance of the K-th current model when applied to an entire K-th population of observing points, where the value of counter K is equal to J, thereby determining the observed performance when applied to the initial population of observing points that would result from choosing a value for J; and choosing a value for J to optimize observed performance as determined in step (a), and choosing the number of repetitions J from an adaptively determined range of possibilities by optimizing observed performance when applied to the initial population of observing points. 10. A computer implemented method as recited in claim 8, wherein the number of repetitions J is cho
※ AI-Helper는 부적절한 답변을 할 수 있습니다.