Detects a condition for classification of data. Apparatus detects a set of some constituents as a factor of the classification. Apparatus has means for selecting a pattern which is a set of constituents; means of selecting a second pattern formed of the first pattern and at least one constituent ad
Detects a condition for classification of data. Apparatus detects a set of some constituents as a factor of the classification. Apparatus has means for selecting a pattern which is a set of constituents; means of selecting a second pattern formed of the first pattern and at least one constituent added to the first pattern; means of generating an evaluation value for a measure of classification of the plurality of objects under a condition including the first pattern but not the second pattern on the basis of the number of objects satisfying the classification condition in the plurality of objects classified into the first group and the number of objects satisfying the classification condition in the objects classified into the second group; and means of outputting the first and second patterns as a factor of classification when the measure indicated by the evaluation value exceeds a reference measure.
대표청구항▼
What is claimed, is: 1. A classification factor detection apparatus which detects, with respect to the results of classification into two groups of a plurality of objects each constituted by a plurality of constituents through analysis as to whether or not each object has a predetermined characteri
What is claimed, is: 1. A classification factor detection apparatus which detects, with respect to the results of classification into two groups of a plurality of objects each constituted by a plurality of constituents through analysis as to whether or not each object has a predetermined characteristic, a set of some of the constituents as a factor of the classification, said apparatus comprising: first selection means of selecting a first pattern which is a set of at least one of the plurality of constituents of one of the plurality of objects; second selection means of selecting, from the plurality of constituents in one of the plurality of objects, a second pattern formed of the first pattern and at least one of the constituents added to the first pattern; evaluation value generation means of generating an evaluation value for a measure of classification of the plurality of objects under a classification condition including the first pattern but not including the second pattern on the basis of the number of objects satisfying the classification condition in the plurality of objects classified into the first group and the number of objects satisfying the classification condition in the objects classified into the second group, wherein the evaluation value is a chi-square test value representing the deviation of a probability distribution of the objects satisfying the classification condition based on the first pattern and the second pattern from a probability distribution of the objects satisfying a classification condition of a correlation equal to or lower than a predetermined value with the classification results; and classification factor output means of outputting the constituents in each of the first pattern and the second pattern as a factor of classification when the measure indicated by the evaluation value exceeds a reference measure determined in advance. 2. The classification factor detection apparatus according to claim 1, wherein said evaluation value generation means generates as the evaluation value a value determined by a downwardly convex function with respect to each of a first argument number which is the number of objects satisfying the classification condition in the first group and a second argument number which is the number of objects satisfying the classification condition in the second group, said apparatus further comprising: upper limit estimation means of generating, as an upper limit value of the evaluation value in a possible region for the first argument number and the second argument number when one of the constituents is added to the first pattern and/or the second pattern, the maximum of values of the evaluation function at a plurality of end points of the region; and constituent addition means of performing processing for adding the same constituent to each of the first pattern and the second pattern or processing for adding the constituent to the second pattern when the measure indicated by the upper limit value is higher than the reference measure, and wherein said evaluation value generation means further generates the evaluation value with respect to the first pattern and/or the second pattern to which one of the constituents has been added by said constituent addition means. 3. The classification factor detection apparatus according to claim 2, wherein if the number of objects including the first pattern in the plurality of objects classified into the first group is a; the number of objects including the second pattern in the objects classified into the first group is b; the number of objects including the first pattern in the plurality of objects classified into the second group is c; and the number of objects including the second pattern in the objects classified into the second group is d, said evaluation value generation means generates, as the evaluation value, a value determined by f (a-c, b-d) which is the evaluation function generating the chi-square test value on the basis of (a-c) which is the first argument number and (b-d) which is the second arguments number; and said upper limit value estimation means generates, as an upper limit value of the chi-square test value when one of the constituents is added to each of the first pattern and the second pattern or to the second pattern, the maximum of f (a-c, b) which is the chi-square test value in the case where the number of objects including the second pattern in the second group is 0 and f(a, b-d) which is the chi-square test value in the case where the number of objects including the first pattern in the second group is 0. 4. The classification factor detection apparatus according to claim 2, wherein said upper limit value estimation means generates an upper limit value of the evaluation value in the case of adding the same constituent to each of the first pattern and the second pattern and in the case of adding the constituent to the second pattern while maintaining the same contents of the first pattern each time the evaluation value is generated by said evaluation value generation means; when the measure indicated by the upper limit value is higher than the reference measure, said constituent addition means performs first addition processing for generating each of constituent-added second patterns formed by adding to the second pattern unevaluated constituents which are constituents not included in the second pattern in the plurality of constituents in one of the plurality of objects, and, if the first pattern and the second pattern are identical to each other, performs second addition processing for generating each of constituent-added first patterns and constituent-added second patterns formed by adding the unevaluated constituents to the first pattern and the second pattern; and said evaluation value generation means generates the evaluation value with respect to the constituent-added first pattern and constituent-added second pattern after the first or second addition processing. 5. The classification factor detection apparatus according to claim 2, further comprising: reference measure storage means of storing the reference measure; and reference measure updating means of storing, as the reference measure, in the reference measure storage means, the measure indicated by the evaluation value generated by said evaluation value generation means by relating the measure to the first pattern and the second pattern at the time of generation of the evaluation value if the measure indicated by the evaluation value exceeds the reference value, wherein said classification factor output means outputs, as a factor of classification, the first pattern and the second pattern stored in said reference measure storage means. 6. The classification factor detection apparatus according to claim 2, wherein said classification factor output means outputs, as a factor of classification, a classification condition corresponding to each of a predetermined number of evaluation values determined in advance in descending order of measure in a plurality of the evaluation values generated by said evaluation value generation means and indicating measures exceeding the reference measure. 7. The classification factor detection apparatus according to claim 1, wherein said evaluation value generation means generates as the evaluation value a value determined by an evaluation function which determines a value with respect to each of a first argument number which is the number of objects satisfying the classification condition in the first group and a second argument number which is the number of objects satisfying the classification condition in the second group, and the maximum of which corresponds to one of end points in a possible region for the first argument number and the second argument number, said apparatus further comprising: upper limit estimation means of generating, as an upper limit value of the evaluation value in a possible region for the first argument number and the second argument number when one of the constituents is added to the first pattern and/or the second pattern, the maximum of values of the evaluation function at a plurality of end points of the region; and constituent addition means of performing processing for adding the same constituent to each of the first pattern and the second pattern or processing for adding the constituent to the second pattern when the measure indicated by the upper limit value is higher than the reference measure, and wherein said evaluation value generation means further generates the evaluation value with respect to the first pattern and/or the second pattern to which one of the constituents has been added by said constituent addition means. 8. The classification factor detection apparatus according to claim 1, wherein the objects are sentences each formed of a plurality of words and/or phrases; a plurality of the sentences are classified into two groups according to genres indicating the contents of the sentences; said first selection means selects as the first pattern a set of at least one word or phrase in the words and phrases in one of the plurality of sentences; said second selection means selects as the second pattern a set of words and/or phrases formed by adding at least one word or phase to the first pattern; said evaluation value generation means generates the evaluation value according to the number of sentences satisfying the classification condition in the plurality of words and phrases classified into the first group and the number of sentences satisfying the classification condition in the sentences classified into the second group; and when the measure indicated by the evaluation value exceeds the reference measure, said classification factor output means outputs the set of words and/or phases in each of the first pattern and the second pattern and second pattern as a factor of classification of the plurality of sentences into the predetermined genres. 9. A classification factor detection method in which, with respect to the results of classification into two groups of a plurality of objects each constituted by a plurality of constituents through analysis as to whether or not each object has a predetermined characteristic, a set of some of the constituents is detected as a factor of the classification by a computer, said method comprising as steps performed by the computer: a first selection step of selecting a first pattern which is a set of at least one of the plurality of constituents of one of the plurality of objects; a second selection step of selecting, from the plurality of constituents in one of the plurality of objects, a second pattern formed of the first pattern and at least one of the constituents added to the first pattern; an evaluation value generation step of generating an evaluation value for a measure of classification of the plurality of objects under a classification condition including the first pattern but not including the second pattern on the basis of the number of objects satisfying the classification condition in the plurality of objects classified into the first group and the number of objects satisfying the classification condition in the objects classified into the second group, wherein the evaluation value is a chi-square test value representing the deviation of a probability distribution of the objects satisfying the classification condition based on the first pattern and the second pattern from a probability distribution of the objects satisfying a classification condition of a correlation equal to or lower than a predetermined value with the classification results; and a classification factor output step of outputting the constituents in each of the first pattern and the second pattern as a factor of classification when the measure indicated by the evaluation value exceeds a reference measure determined in advance. 10. The classification factor detection method according to claim 9, wherein, in said evaluation value generation step, the computer generates as the evaluation value a value determined by a downwardly convex function with respect to each of a first argument number which is the number of objects satisfying the classification condition in the first group and a second argument number which is the number of objects satisfying the classification condition in the second group, said method further comprising as steps performed by the computer: an upper limit estimation step of generating, as an upper limit value of the evaluation value in a possible region for the first argument number and the second argument number when one of the constituents is added to the first pattern and/or the second pattern, the maximum of values of the evaluation function at a plurality of end points of the region; and a constituent addition step of performing processing for adding the same constituent to each of the first pattern and the second pattern or processing for adding the constituent to the second pattern when the measure indicated by the upper limit value is higher than the reference measure, and wherein, in said evaluation value generation step, the computer further generates the evaluation value with respect to the first pattern and/or the second pattern to which one of the constituents has been added in said constituent addition step. 11. A computer program product comprising a tangible storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a classification factor detection method for detecting through analysis, with respect to results of classification into two groups of a plurality of objects, where each of said groups is constituted by a plurality of constituents, as to whether or not each object of the groups of objects has a predetermined characteristic, a set of some of the constituents as a factor of the classification, said method comprising the steps of: of selecting a first pattern which first pattern is a set of at least one of the plurality of constituents of one of the plurality of objects; selecting, from the plurality of constituents in one of the plurality of objects, a second pattern formed of the first pattern and at least one of the constituents added to the first pattern; means of generating an evaluation value for a measure of classification of the plurality of objects under a classification condition including the first pattern but not including the second pattern on the basis of the number of objects satisfying the classification condition in the plurality of objects classified into the first group and the number of objects satisfying the classification condition in the objects classified into the second group, wherein the evaluation value is a chi-square test value representing the deviation of a probability distribution of the objects satisfying the classification condition based on the first pattern and the second pattern from a probability distribution of the objects satisfying a classification condition of a correlation equal to or lower than a predetermined value with the classification results; and outputting the constituents in each of the first pattern and the second pattern as a factor of classification when the measure indicated by the evaluation value exceeds a reference measure determined in advance. 12. The computer program product according to claim 11, wherein said step of generating an evaluation value generates the evaluation value by determining a downwardly convex function with respect to each of a first argument number which is the number of objects satisfying the classification condition in the first group, and a second argument number which is the number of objects satisfying the classification condition in the second group, said method further including the steps of: generating, as an upper limit value of the evaluation value in a possible region for the first argument number and the second argument number when one of the constituents is added to the first pattern and/or the second pattern, wherein maximum of values of the evaluation function at a plurality of end points of the region; and processing for adding the same constituent to each of the first pattern and the second pattern, or, processing for adding the constituent to the second pattern when the measure indicated by the upper limit value is higher than the reference measure. 13. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for classification factor detection, said method steps comprising the steps of: selecting a first pattern which is a set of at least one of the plurality of constituents of one of the plurality of objects; selecting, from the plurality of constituents in one of the plurality of objects, a second pattern formed of the first pattern and at least one of the constituents added to the first pattern; generating an evaluation value for a measure of classification of the plurality of objects under a classification condition including the first pattern but not including the second pattern on the basis of the number of objects satisfying the classification condition in the plurality of objects classified into the first group and the number of objects satisfying the classification condition in the objects classified into the second group, wherein the evaluation value is a chi-square test value representing the deviation of a probability distribution of the objects satisfying the classification condition based on the first pattern and the second pattern from a probability distribution of the objects satisfying a classification condition of a correlation equal to or lower than a predetermined value with the classification results; and outputting the constituents in each of the first pattern and the second pattern as a factor of classification when the measure indicated by the evaluation value exceeds a reference measure determined in advance. 14. A computer program product comprising: a computer-usable medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a classification factor detection method for analyzing, detecting and classifying objects into two groups of a plurality of objects, each of the two groups constituted by a plurality of constituents, based on whether or not each object has a predetermined characteristic, including using a set of some of the constituents as a factor of the classification, the classification factor detection method comprising the steps of: selecting a first pattern which is a set of at least one of the plurality of constituents of one of the plurality of objects; selecting, from the plurality of constituents in one of the plurality of objects, a second pattern formed of the first pattern and at least one of the constituents added to the first pattern; generating an evaluation value for a measure of classification of the plurality of objects under a classification condition including the first pattern but not including the second pattern on the basis of the number of objects satisfying the classification condition in the plurality of objects classified into the first group and the number of objects satisfying the classification condition in the objects classified into the second group, wherein the evaluation value is a chi-square test value representing the deviation of a probability distribution of the objects satisfying the classification condition based on the first pattern and the second pattern from a probability distribution of the objects satisfying a classification condition of a correlation equal to or lower than a predetermined value with the classification results; and outputting the constituents in each of the first pattern and the second pattern as a factor of classification when the measure indicated by the evaluation value exceeds a reference measure determined in advance.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.