IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0290584
(1999-04-13)
|
우선권정보 |
JP0165030 (1998-06-12) |
발명자
/ 주소 |
- Deligne Sabine
- Sagisaka Yoshinori,JPX
- Nakajima Hideharu,JPX
|
출원인 / 주소 |
- ATR Interpreting Telecommunications Research, JPX
|
인용정보 |
피인용 횟수 :
94 인용 특허 :
2 |
초록
▼
An apparatus generates a statistical class sequence model called A class bi-multigram model from input training strings of discrete-valued units, where bigram dependencies are assumed between adjacent variable length sequences of maximum length N units, and where class labels are assigned to the seq
An apparatus generates a statistical class sequence model called A class bi-multigram model from input training strings of discrete-valued units, where bigram dependencies are assumed between adjacent variable length sequences of maximum length N units, and where class labels are assigned to the sequences. The number of times all sequences of units occur are counted, as well as the number of times all pairs of sequences of units co-occur in the input training strings. An initial bigram probability distribution of all the pairs of sequences is computed as the number of times the two sequences co-occur, divided by the number of times the first sequence occurs in the input training string. Then, the input sequences are classified into a pre-specified desired number of classes. Further, an estimate of the bigram probability distribution of the sequences is calculated by using an EM algorithm to maximize the likelihood of the input training string computed with the input probability distributions. The above processes are then iteratively performed to generate statistical class sequence model.
대표청구항
▼
[ What is claimed is:] [1.]1. An apparatus for generating a statistical class sequence model called a class bi-multigram model from input strings of discrete-valued units, where bigram dependencies are assumed between adjacent multigrams, where each said multigram is a variable length sequence of ma
[ What is claimed is:] [1.]1. An apparatus for generating a statistical class sequence model called a class bi-multigram model from input strings of discrete-valued units, where bigram dependencies are assumed between adjacent multigrams, where each said multigram is a variable length sequence of maximum length N units, and where class labels are assigned to the sequences, said apparatus comprising:initialization means for taking as an input a training string of units, registering in an inventory of sequences all the combinations of 1 to N units occurring in the input training string, counting the number of times all sequences of units occur and the number of times all pairs of sequences of units co-occur in the input training strings of units, computing an initial bigram probability distribution of all the pairs of sequences as the counted number of times the two sequences co-occur divided by the counted number of times the first sequence occurs in the input training string, and outputting the inventory of sequences and the initial bigram probability distribution of the sequences in the inventory;classification means for taking as an input the inventory of sequences and the bigram probability distribution of the sequences in the inventory, classifying the input sequences into a pre-specified desired number of classes, by first assigning each sequence to its own class, and then repeatedly updating a class conditional probability distribution of the sequences and the bigram probability distribution of the classes and merging the pairs of classes for which a loss in mutual information computed with the current class probability distributions is minimal, until a desired number of classes is obtained, and outputting the inventory of sequences with the class label assigned to each sequence, the class conditional probability distribution of the sequences, and the bigram probability distribution of the classes;reestimation means for taking as an input the training string of units, the inventory of sequences with the class label assigned to each sequence, the current class conditional probability distribution of the sequences, and the current bigram probability distribution of the classes which are outputted from said classification means, calculating an estimate of the bigram probability distribution of the sequences by using an EM algorithm to maximize the likelihood of the input training string computed with associated input probability distributions, and outputting the inventory of sequences with the bigram probability distribution of the sequences, the process of said reestimation means being performed with a forward-backward algorithm, by using an equation where a bigram probability between a sequence to be processed and a preceding sequence is calculated from a forward likelihood of the input training string which can be taken forward in a time series, the class conditional probability of the sequences to be processed, the probability of the class of the sequence to be processed knowing the class of the preceding sequence, and a backward likelihood of the input training string which can be taken backward in the time series; andcontrol means for controlling said classification means and said reestimation means to iteratively execute the process of said classification means and said reestimation means, the input of said classification means being, at a first iteration, the output of the said initialization means, and, during subsequent iterations, the output of said reestimation means, and the input of said reestimation means being the output of said classification means, until a predetermined ending condition is satisfied, thereby generating a statistical class sequence model.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.