IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0410367
(1999-09-30)
|
발명자
/ 주소 |
- Saffer,Jeffrey D.
- Calapristi,Augustin J.
- Miller,Nancy E.
- Scarberry,Randall E.
- Thurston,Sarah J.
- Havre,Susan L.
- Decker,Scott D.
- Payne,Deborah A.
- Sofia,Heidi J.
- Thomas,Gregory S.
- Stillwell,Li
|
출원인 / 주소 |
- Battelle Memorial Institute
|
대리인 / 주소 |
Finnegan, Henderson, Farabow, Garrett &
|
인용정보 |
피인용 횟수 :
123 인용 특허 :
105 |
초록
▼
A system or method consistent with an embodiment of the present invention is useful in analyzing large volumes of different types of data, such as textual data, numeric data, categorical data, or sequential string data, for use in identifying relationships among the data types or different operation
A system or method consistent with an embodiment of the present invention is useful in analyzing large volumes of different types of data, such as textual data, numeric data, categorical data, or sequential string data, for use in identifying relationships among the data types or different operations that have been performed on the data. A system or method consistent with the present invention determines and displays the relative content and context of related information and is operative to aid in identifying relationships among disparate data types. Various data types, such as numerical data, protein and DNA sequence data, categorical information, and textual information, such as annotations associated with the numerical data or research papers may be correlated for visual analysis. A variety of user-selectable views may be correlated for user interaction to identify relationships that exist among the different types of data or various operations performed on the data. Furthermore, the user may explore the information contained in sets of records and their associated attributes through the use of interactive 2-D line charts and interactive summary miniplots.
대표청구항
▼
What is claimed is: 1. A computer-implemented method for simultaneous visualization of disparate data types, the method comprising: (1) selecting a set of attributes associated with an object, the attributes selected comprising a text data type and one other data type chosen from a biopolymer seque
What is claimed is: 1. A computer-implemented method for simultaneous visualization of disparate data types, the method comprising: (1) selecting a set of attributes associated with an object, the attributes selected comprising a text data type and one other data type chosen from a biopolymer sequence data type, a numerical data type, and a categorical data type; (2) creating a high dimensional vector representing the object by applying transformation operations to the selected attributes; and (3) projecting the high dimensional vector thereby visualizing the object based on the attributes selected; wherein the transformation operations for the attributes of the text data type comprise: (a) semantically filtering a set of documents in a database to extract a set of semantic concepts, to improve an efficiency of a predictive relationship to its content, based on at least one of word frequency, overlap and topicality; (b) defining a topic set, said topic set being characterized as the set of semantic concepts which best discriminate the content of the documents containing them, said topic set being defined based on at least one of word frequency, overlap and topicality; (c) forming a matrix with the semantic concepts contained within the topic set defining one dimension of said matrix and the semantic concepts contained within the filtered set of documents comprising another dimension of said matrix; (d) calculating matrix entries as the conditional probability that a document in the database will contain each semantic concept in the topic set given that it contains each semantic concept in the filtered set of documents; and (e) providing said matrix entries from step (d) for creating the high dimensional vector. 2. A computer-implemented method for simultaneous visualization of disparate data types, the method comprising: (1) selecting a set of attributes associated with an object, the attributes selected comprising a biopolymer sequence data type and one other data type chosen from a text data type, a numerical data type, and a categorical data type; (2) creating a high dimensional vector representing the object by applying transformation operations to the selected attributes; and (3) projecting the high dimensional vector thereby visualizing the object based on the attributes selected; wherein the transformation operations for the attributes of the biopolymer sequence data type comprise: (i) comparing a sequence of each biopolymer material to a sequence of each other biopolymer material to provide respective comparison results; (ii) arranging the comparison results in a square matrix indexed by the plurality of biopolymer materials; and (iii) providing the square matrix entries for creating the high dimensional vector. 3. The computer-implemented method of claims 2, wherein the attributes selected in step (1) comprise a text data type and a biopolymer sequence data type. 4. The computer-implemented method of claims 2, wherein the transformation operations for the attributes of the text data type comprise: (a) semantically filtering a set of documents in a database to extract a set of semantic concepts, to improve an efficiency of a predictive relationship to its content, based on at least one of word frequency, overlap and topicality; (b) defining a topic set, said topic set being characterized as the set of semantic concepts which best discriminate the content of the documents containing them, said topic set being defined based on at least one of word frequency, overlap and topicality; (c) forming a matrix with the semantic concepts contained within the topic set defining one dimension of said matrix and the semantic concepts contained within the filtered set of documents comprising another dimension of said matrix; (d) calculating matrix entries as the conditional probability that a document in the database will contain each semantic concept in the topic set given that it contains each semantic concept in the filtered set of documents; and (e) providing said matrix entries from step (d) for creating the high dimensional vector. 5. The computer-implemented method of claim 3, wherein the attributes selected in step (1) comprise a text data type, a biopolymer sequence data type, and one other data type chosen from a numerical date type and a categorical data type. 6. The computer-implemented method of claim 3, wherein the attributes selected in step (1) comprise a text data type, a biopolymer sequence data type, a numerical data type, and a categorical data type. 7. A computer-implemented method for simultaneous visualization of disparate data types, the method comprising: (1) selecting a set of attributes associated with an object, the attributes selected comprising any three data types chosen from a text data type, a biopolymer sequence data type, a numerical data type, and a categorical data type; (2) creating a high dimensional vector representing the object by applying transformation operations to the selected attributes; and (3) projecting the high dimensional vector thereby visualizing the object based on the attributes selected; wherein the transformation operations for the attributes of the text data type, if selected, comprise: (a) semantically filtering a set of documents in a database to extract a set of semantic concepts, to improve an efficiency of a predictive relationship to its content, based on at least one of word frequency, overlap and topicality; (b) defining a topic set, said topic set being characterized as the set of semantic concepts which best discriminate the content of the documents containing them, said topic set being defined based on at least one of word frequency, overlap and topicality; (c) forming a matrix with the semantic concepts contained within the topic set defining one dimension of said matrix and the semantic concepts contained within the filtered set of documents comprising another dimension of said matrix; (d) calculating matrix entries as the conditional probability that a document in the database will contain each semantic concept in the topic set given that it contains each semantic concept in the filtered set of documents; and (e) providing said matrix entries from step (d) for creating the high dimensional vector; and wherein the transformation operations for the attributes of the biopolymer sequence data type, if selected, comprise: (i) comparing a sequence of each biopolymer material to a sequence of each other biopolymer material to provide respective comparison results; (ii) arranging the comparison results in a square matrix indexed by the plurality of biopolymer materials; and (iii) providing the square matrix entries for creating the high dimensional vector. 8. A computer-readable medium storing a software that when executed by a computer performs the method of any one of claims 1-7 . 9. A device adapted to perform the method of claim 1-7. 10. The method of any of claims 1-7, wherein said application of transformation application on said selected attributes produces a vector representation of said object in correspondence with a uniform data structure. 11. A computer-readable medium storing a software that when executed by a computer performs the method of claim 10. 12. A device adapted to perform the method of claim 10. 13. The method of claim 10, further comprising using said representation to identify cluster groups of related objects. 14. A computer-readable medium storing a software that when executed by a computer performs the method of claim 13. 15. A device adapted to perform the method of claim 13.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.