IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0510186
(2009-07-27)
|
등록번호 |
US-8782805
(2014-07-15)
|
발명자
/ 주소 |
- Zhang, Benyu
- Zeng, Hua-Jun
- Ma, Wei-Ying
- Chen, Zheng
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
3 인용 특허 :
8 |
초록
▼
A method and system for detecting whether an outgoing communication contains confidential information or other target information is provided. The detection system is provided with a collection of documents that contain confidential information, referred to as “confidential documents.” When the dete
A method and system for detecting whether an outgoing communication contains confidential information or other target information is provided. The detection system is provided with a collection of documents that contain confidential information, referred to as “confidential documents.” When the detection system is provided with an outgoing communication, it compares the content of the outgoing communication to the content of the confidential documents. If the outgoing communication contains confidential information, then the detection system may prevent the outgoing communication from being sent outside the organization. The detection system detects confidential information based on the similarity between the content of an outgoing communication and the content of confidential documents that are known to contain confidential information.
대표청구항
▼
1. A method in a computer system having a memory and a processor for identifying whether an outgoing communication contains target information, the method comprising: providing documents that contain target information;for each of at least some words in the provided documents, with the processor, de
1. A method in a computer system having a memory and a processor for identifying whether an outgoing communication contains target information, the method comprising: providing documents that contain target information;for each of at least some words in the provided documents, with the processor, determining an importance value for the word,in response to determining that the importance value determined for theword exceeds a threshold, designating the word as a keyword, and in response to determining that the importance value determined for theword does not exceed the threshold, not designating the word as a keyword;generating a keyword index that maps keywords to documents that contain the keywords;receiving an outgoing communication;using the keyword index to locate candidate documents from among the provided documents based on keywords of the received outgoing communication; andwith the processor, comparing the received outgoing communication to at least a portion of the located candidate documents to determine whether the received outgoing communication contains target information. 2. The method of claim 1, further comprising: upon determining that the received outgoing communication contains target information, suppressing the delivery of the received outgoing communication to its intended recipient. 3. The method of claim 1 wherein words are identified as being keywords based on a term frequency by inverted document frequency metric. 4. The method of claim 1 wherein the outgoing communication is an electronic mail message. 5. The method of claim 1 wherein the outgoing communication is an attachment to an electronic mail message. 6. The method of claim 1 wherein the outgoing communication is a voice communication. 7. The method of claim 1 wherein the outgoing communication is an Internet posting. 8. The method of claim 1 wherein the threshold is determined prior to determining the importance values. 9. A method in a computer system having a memory and a processor for identifying whether an outgoing communication contains target information, the method comprising: providing documents that contain target information;for each of at least some words in the provided documents, with the processor, determining an importance value for the word,in response to determining that the importance value determined for the word exceeds a threshold, designating the word as a keyword, andin response to determining that the importance value determined for the word does not exceed the threshold, not designating the word as a keyword;generating a sentence hash table that, for each of at least some sentences, maps a hash code derived from the sentence to documents that contain the sentence;receiving an outgoing communication;using the sentence hash table to locate documents of the provided documents that contain one or more sentences that match one or more sentences of the received outgoing communication; andwith the processor, comparing the received outgoing communication to at least a portion of the located documents to determine whether the received outgoing communication contains target information. 10. The method of claim 9 wherein the sentence hash table maps each of one or more of the provided documents to a key sentence of the document. 11. A method in a computer system having a memory and a processor for identifying whether an outgoing communication contains target information, the method comprising: providing documents that contain target information;for each of at least some words in the provided documents, with the processor, determining an importance value for the word,in response to determining that the importance value determined for the word exceeds a threshold, designating the word as a keyword, andin response to determining that the importance value determined for the word does not exceed the threshold, not designating the word as a keyword;generating a keyword index that, for each of at least some keywords, maps the keyword to sentences of the provided documents that contain the keyword;receiving an outgoing communication;using the keyword index to locate documents of the provided documents that contain one or more sentences containing at least one keyword of the received outgoing communication; andwith the processor, comparing the received outgoing communication to at least a portion of the located documents to determine whether the received outgoing communication contains target information. 12. The method of claim 11 wherein the received outgoing communication is determined to contain target information when at least one located sentence of the provided documents is similar to at least one sentence of the received outgoing communication. 13. A computer-readable storage device containing instructions for controlling a computer system to identify whether a first document contains content similar to content of target documents, by operations comprising: for each of at least some words of the target documents, with a processor, determining an importance value for the word,comparing the determined importance value to a threshold, andwhen the determined importance value exceeds the threshold, designating the word as a keyword;creating a keyword index that, for each of at least some keywords of the target documents, maps the keyword to one or more target documents that contain the keyword;identifying keywords of the first document;using the created keyword index to locate candidate documents, from among the target documents, based on a similarity between keywords of the target documents and the identified keywords of the first document; andcomparing the located candidate documents to the first document to determine whether the first document contains content similar to at least one candidate document. 14. The computer-readable storage device of claim 13 wherein when the first document is an outgoing communication that contains target information, suppressing the sending of the outgoing communication. 15. A computer-readable storage device containing instructions for controlling a computer system to identify whether a first document contains content similar to content of target documents, by operations comprising: for each of at least some words of the target documents, with a processor, determining an importance value for the word,comparing the determined importance value to a threshold, andwhen the determined importance value exceeds the threshold,designating the word as a keyword;generating a sentence hash table that, for each of at least some sentences of the target documents, maps a hash code derived from the sentence to target documents that contain the sentence;using the sentence hash table to locate candidate documents that contain sentences that match sentences of the first document; andcomparing the located candidate documents to the first document to determine whether the first document contains content similar to at least one candidate document,wherein the sentence hash table is generated prior to identifying sentences of the first document. 16. A computer system having a memory and a processor for determining whether a first electronic mail message contains target information, comprising: a document store containing target electronic mail messages that contain target information;a component configured to, for each of at least some words in the target electronic mail messages, determine an importance value for the word, andwhen the importance value determined for the word exceeds a first threshold, designate the word as a keyword,a component configured to, for each of at least some of the target electronic mail messages, calculate a similarity value for the target electronic mail message based at least in part on common keywords between the target electronic mail message and the first electronic mail message,compare the calculated similarity value to a second threshold, anddesignate the target electronic mail message as a candidate electronic mail message if the comparison indicates that the calculated similarity value is greater than the second threshold; anda component configured to compare the first electronic mail message to the designated candidate electronic messages to determine whether the first electronic mail message contains target information, wherein the determination of whether the first electronic mail message contains target information is based at least in part on the designated keywords, andwherein at least one of the components comprises computer-executable instructions stored in memory for execution by the processor. 17. The computer system of claim 16, further comprising: a component configured to, when it is determined that the first electronic mail message contains target information, suppress delivery of the first electronic mail message to an intended recipient. 18. The computer system of claim 16 wherein at least one of the target electronic mail messages is not designated as a candidate electronic mail message. 19. The computer system of claim 16 wherein the similarity value for at least one target electronic mail message is calculated based at least in part on the number of common keywords between the at least one target electronic mail message and the first electronic mail message. 20. A computer-readable storage device containing instructions for controlling a computer system to identify whether a first document contains content similar to content of target documents, by operations comprising: for each of at least some words of the target documents, with a processor, determining an importance value for the word,comparing the determined importance value to a threshold, andwhen the determined importance value exceeds the threshold,designating the word as a keyword;selecting from the target documents candidate documents based on a similarity between keywords of the target documents and keywords of the first document; andcomparing the candidate documents to the first document to determine whether the first document contains content similar to a candidate document at least in part by, for each candidate document, determining a number of exact matches between sentences of the first document and sentences of the candidate document, andin response to determining that the number of exact matches between sentences of the first document and sentences of the candidate document does not exceed an exact match threshold, determining a number of fuzzy matches between sentences of the first document and sentences of the candidate document.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.