Method and apparatus for factoring finite state transducers with unknown symbols
원문보기
IPC분류정보
국가/구분 |
United States(US) Patent
공개
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0737758
(2000-12-18)
|
공개번호 |
US-0198702
(2002-12-26)
|
발명자
/ 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
0 인용 특허 :
0 |
초록
▼
A method factors an input finite state transducer (FST) with unknown symbols into a left-sequential FST and a right-sequential FST while avoiding direct factorization of the unknown symbols. The left-sequential FST is formed by replacing each occurrence of the unknown symbol in the input FST with a
A method factors an input finite state transducer (FST) with unknown symbols into a left-sequential FST and a right-sequential FST while avoiding direct factorization of the unknown symbols. The left-sequential FST is formed by replacing each occurrence of the unknown symbol in the input FST with a sequence of the unknown symbol and a diacritic. The right-sequential FST is formed by replacing each occurrence of the diacritic with a symbol representative of an empty string and an output symbol.
대표청구항
▼
1. A method for factoring an input finite-state transducer (FST) including an unknown symbol, comprising the steps of:replacing each occurrence of the unknown symbol in the input FST with the unknown symbol and a diacritic to define a left-sequential finite-state transducer (FST); and replacing each
1. A method for factoring an input finite-state transducer (FST) including an unknown symbol, comprising the steps of:replacing each occurrence of the unknown symbol in the input FST with the unknown symbol and a diacritic to define a left-sequential finite-state transducer (FST); and replacing each occurrence of the diacritic with a symbol representative of an empty string and an output symbol to define a right-sequential finite-state transducer (FST); wherein said replacing steps avoid direct factorization of the unknown symbol. 2. The method of claim 1, further comprising the step of factoring the unknown symbol in the input FST into arc label sequences ┌?, δ:λ┐and ┌λ:&egr;, ?:&sgr;out┐, where:λis a diacritic, &sgr;outis an output symbol, and δ is a deterministic empty string. 3. The method of claim 2, further comprising the step of copying the arc label sequence ┌?, δ:λi┐to the left-sequential FST. 4. The method of claim 2, further comprising the step of copying the arc label sequence ┌λ:&egr;, ?:&sgr;out┐to the right-sequential FST. 5. The method of claim 1, wherein the left-sequential FST and the right-sequential FST are adapted for performing language processing. 6. The method of claim 5, wherein the language processing comprises one of tokenization, phonological analysis, morphological analysis, disambiguation, spelling correction, and shallow parsing. 7. The method of claim 1, wherein the left-sequential FST and the right-sequential FST are lexical transducers. 8. An apparatus for factoring an input finite-state transducer (FST) including an unknown symbol, comprising:means for replacing each occurrence of the unknown symbol in the input FST with the unknown symbol and a diacritic to define a left-sequential finite-state transducer (FST); and means for replacing each occurrence of the diacritic with a symbol representative of an empty string and an output symbol to define a right-sequential finite-state transducer (FST); wherein said replacing means avoid direct factorization of the unknown symbol. 9. The apparatus of claim 8, further comprising means for factoring the unknown symbol in the input FST into arc label sequences ┌?, δ:λ┐and ┌λ:&egr;, ?:&sgr;out┐, where:λis a diacritic, &sgr;outan output symbol, and δ is a deterministic empty string. 10. The apparatus of claim 9, further comprising means for copying the arc label sequence ┌?, δ:λ┐to the left-sequential FST. 11. The apparatus of claim 9, further comprising means for copying the arc label sequence ┌λ:&egr;, ?:&sgr;out┐to the right-sequential FST. 12. The apparatus of claim 8, wherein the left-sequential FST and the right-sequential FST are adapted for performing language processing. 13. The apparatus of claim 12, wherein the language processing comprises one of tokenization, phonological analysis, morphological analysis, disambiguation, spelling correction, and shallow parsing. 14. The apparatus of claim 8, wherein the left-sequential FST and the right-sequential FST are lexical transducers.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.