최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
DataON 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Edison 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Kafe 바로가기국가/구분 | United States(US) Patent 등록 |
---|---|
국제특허분류(IPC7판) |
|
출원번호 | US-0135815 (1993-10-12) |
발명자 / 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 | 피인용 횟수 : 421 인용 특허 : 32 |
A natural language processing system uses unformatted naturally occurring text and generates a subject vector representation of the text, which may be an entire document or a part thereof such as its title, a paragraph, clause, or a sentence therein. The subject codes which are used are obtained fro
A natural language processing system uses unformatted naturally occurring text and generates a subject vector representation of the text, which may be an entire document or a part thereof such as its title, a paragraph, clause, or a sentence therein. The subject codes which are used are obtained from a lexical database and the subject code(s) for each word in the text is looked up and assigned from the database. The database may be a dictionary or other word resource which has a semantic classification scheme as designators of subject domains. Various meanings or senses of a word may have assigned thereto multiple, different subject codes and psycholinguistically justified sense meaning disambiguation is used to select the most appropriate subject field code. Preferably, an ordered set of sentence level heuristics is used which is based on the statistical probability or likelihood of one of the plurality of codes being the most appropriate one of the plurality. The subject codes produce a weighted, fixed-length vector (regardless of the length of the document) which represents the semantic content thereof and may be used for various purposes such as information retrieval, categorization of texts, machine translation, document detection, question answering, and generally for extracting knowledge from the document. The system has particular utility in classifying documents by their general subject matter and retrieving documents relevant to a query.
[ We claim:] [1.] A method of generating a subject field code vector representation of a document which comprises the steps of assigning subject codes to each of the words of the document which codes express the semantic content of the document, said codes corresponding to the meanings of each of sa
[ We claim:] [1.] A method of generating a subject field code vector representation of a document which comprises the steps of assigning subject codes to each of the words of the document which codes express the semantic content of the document, said codes corresponding to the meanings of each of said words in accordance with the various senses thereof; disambiguating said document to select a specific subject code for each of said words heuristically in order first from the occurrence of like codes within each sentence of said documents which occur uniquely and at or with greater than a certain frequency within each sentence, then second correlating the codes for each word with the codes occurring uniquely (unique code) and with greater than or equal to the given frequency in the sentence to select for each word the code having the highest correlation, and then third in accordance with the frequency of usage of the meaning of the word represented by the code; and arranging said codes into a weighted vector representing the content of said document.
Copyright KISTI. All Rights Reserved.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.