Methods and systems for customizable clustering of sub-networks for bioinformatics and health care applications
원문보기
IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0604433
(2015-01-23)
|
등록번호 |
US-9690844
(2017-06-27)
|
우선권정보 |
IN-310/CHE/2014 (2014-01-24); KR-10-2015-0006118 (2015-01-13) |
발명자
/ 주소 |
- Mukherjee, Subhankar
- Ahn, Taejin
- Bopardikar, Ajit S.
- Bhaduri, Anirban
- Mallavarapu, Srikanth Rama
|
출원인 / 주소 |
- SAMSUNG ELECTRONICS CO., LTD.
|
대리인 / 주소 |
Leydig, Voit & Mayer, Ltd.
|
인용정보 |
피인용 횟수 :
0 인용 특허 :
4 |
초록
▼
Methods and devices for clustering a plurality of sub-networks of a larger interaction network using an enhanced hierarchical clustering algorithm are disclosed. The methods provide expression based sub-network generation using differentially expressed markers. The enhanced hierarchical clustering a
Methods and devices for clustering a plurality of sub-networks of a larger interaction network using an enhanced hierarchical clustering algorithm are disclosed. The methods provide expression based sub-network generation using differentially expressed markers. The enhanced hierarchical clustering algorithm clusters the generated sub-networks based on a user defined customizable similarity coefficient. The methods use non-Boolean links to cluster similar sub-networks. This provides consideration of indirect relationships among sub-networks. The customizable similarity coefficient enables the methods to be used for diverse applications such as biomarker detection, patient stratification, personalized therapy, drug efficacy prediction, genetic similarity analysis in genetic diseases. The methods enable patient grouping based on the enhanced hierarchical clustering algorithm.
대표청구항
▼
1. A method for clustering a plurality of sub-networks comprising: receiving as input in a computing device, one or more expression data sets of one or more samples;preprocessing the input expression data for obtaining a plurality of seed markers; wherein the seed markers are biomarkers or input mar
1. A method for clustering a plurality of sub-networks comprising: receiving as input in a computing device, one or more expression data sets of one or more samples;preprocessing the input expression data for obtaining a plurality of seed markers; wherein the seed markers are biomarkers or input marker genes, and obtained by methods such but not limited to thresholding normalized expression data based on a predefined threshold value;extracting a set of sub-networks for the input samples using expression values, set of seed markers obtained as above and the interaction network;selecting sub networks among the plurality of the extracted or input sub-networks;building a plurality of local heaps for each cluster among a plurality of clusters by computing a first link between each cluster and remaining clusters of the plurality of clusters, wherein each of the plurality of clusters correspond to the selected sub-networks;building a global heap by computing a second link between each cluster among the plurality of clusters and a highest ranked cluster of each of the local heap among the plurality of local heaps;merging the highest ranked cluster of each local heap and a highest ranked cluster of the global heap to form a plurality of intermediate clusters;calculating a similarity coefficient between each intermediate cluster among the plurality of intermediate clusters and each cluster in the global heap and each cluster corresponding to one of the local heap; andreturning each intermediate cluster as a final cluster, if each the calculated similarity coefficients are below a predefined link cutoff value. 2. The method as in claim 1, wherein a value of the link is based on a user defined customizable similarity coefficient used for computing a functional relationship quantifier. 3. The method as in claim 1, wherein the method further comprises pushing the each intermediate cluster into the global heap if the calculated link is above a predefined link cutoff value. 4. The method as in claim 1, wherein the method comprises building the local heap for the each cluster by adding each cluster from the remaining clusters to each local heap if the computed link for the cluster is above a predefined link cutoff value. 5. The method as in claim 1, wherein the method comprises ranking at least one cluster in each local heap and at least one cluster in the global heap to determine the highest ranked cluster in each local heap and the highest ranked cluster in the global heap based on a value of the computed link for the at least one cluster in the each local heap and the at least one cluster in the global heap. 6. The method as in claim 1, wherein the method further comprises performing grouping based on the enhanced hierarchical clustering algorithm by: generating the plurality of sub-networks for each sample among a plurality of samples;clustering sub-networks within the plurality of sub-networks of each sample using the enhanced hierarchical clustering algorithm based on the customizable similarity coefficient; andgenerating a data set of clusters by pooling clusters across the plurality of samples. 7. The method as in claim 5, wherein the method further comprises: initializing a plurality of clusters-of-interest from the data set of clusters;growing the clusters-of-interest using the data set of clusters;determining membership of each sample in each the plurality of cluster-of-interest based on the clustered sub-networks for the each sample; andgrouping the plurality of samples into a group among a plurality of groups based on the determined membership of the sample, wherein samples in the group exhibit identical cluster memberships. 8. The method as in claim 5, wherein the method further comprises generating the plurality of sub-networks by: generating a set of first level sub-networks around a plurality of seed markers based on differential marker expression;growing the set of generated first level sub-networks based on a predefined scoring function; wherein the predefined scoring function is defined as but not limited to the first derivative of a log likelihood function, andmerging the set of grown first level sub-networks based on one of: the enhanced clustering algorithm and a predefined similarity coefficient to generate the plurality of sub-networks; andmerging of the sub-networks using a highest differentially expressed marker in a neighborhood. 9. The method as in claim 6, wherein the similarity coefficient can be based on similarity measures such as but not limited to Jaccard coefficient, Edge interaction coefficient (EIC) and Common neighborhood interaction coefficient (CNIC). 10. A device for clustering a plurality of sub-networks derived from a larger network using an enhanced hierarchical clustering algorithm, wherein the device comprises: an integrated circuit further comprising at least one processor;at least one memory having a computer program code within the circuit;the at least one memory and the computer program code with the at least one processor cause the device, when the computer program code is executed by the processor, to:receive a data set representing a plurality of sub-networks derived from a network;select sub networks among the plurality of sub-networks;build a plurality of local heaps for each cluster among a plurality of clusters by computing a first link between each cluster and remaining clusters of the plurality of clusters, wherein the plurality of clusters correspond to a plurality of selected sub-networks among the plurality of sub-networks;build a global heap by computing a second link between each cluster among the plurality of clusters and a highest ranked cluster of each the local heap among the plurality of local heaps;merge the highest ranked cluster of each local heap and a highest ranked cluster of the global heap to form a plurality of intermediate clusters;calculate a similarity coefficient between each intermediate cluster among the plurality of intermediate clusters and each cluster in the global heap, each cluster corresponding to one of the local heap; andreturn each intermediate cluster as a final cluster, if each the calculated link is below a predefined link cutoff value. 11. The device as in claim 10, wherein a value of the link is based on a user defined customizable similarity coefficient used for computing a functional relationship quantifier. 12. The device as in claim 10, wherein the device is further configured to push each intermediate cluster into the global heap if each the calculated link is above the predefined link cutoff value. 13. The device as in claim 10, wherein the device is configured to build the local heap for the each cluster by adding each cluster from the remaining clusters to each local heap if the computed link for the cluster is above the predefined link cutoff value. 14. The device as in claim 10, wherein the device is configured to rank at least one cluster in each local heap and at least one cluster in the global heap to determine the highest ranked cluster in each local heap and the highest ranked cluster in the global heap based on a value of the computed link for the at least one cluster in each local heap and the at least one cluster in the global heap. 15. The device as in claim 10, wherein the device is further configured to perform patient grouping based on the enhanced hierarchical clustering algorithm by: generating the plurality of sub-networks for each patient among a plurality of patients;clustering sub-networks within the plurality of sub-networks of each patient using the enhanced hierarchical clustering algorithm based on the customizable similarity coefficient; andgenerating a data set of clusters by pooling clusters across the plurality of patients. 16. The device as in claim 15, wherein the device is further configured to: initialize a plurality of clusters-of-interest from the data set of clusters;grow the clusters-of-interest using the data set of clusters;determine membership of the each patient in each the plurality of cluster-of-interest based on the clustered sub-networks for the each patient; andgroup the plurality of patients into a group among a plurality of groups based on the determined membership of the each patient, wherein patients in the group exhibit identical cluster membership. 17. The device as in claim 15, wherein the device is further configured to generate the plurality of sub-networks by: generating a set of first level sub-network around a plurality of seed markers based on differential marker expression;growing the set of generated first level sub-networks based on a predefined scoring function; andmerging the set of grown first level sub-networks based on one of: the enhanced hierarchical clustering algorithm and the predefined scoring function to generate the plurality of sub-networks, wherein the predefined scoring function is defined as the first derivative of a log likelihood function. 18. The device as in claim 10, wherein the device is further configured to refine at least one biomarker by clustering sub-networks generated from an incomplete set of disease specific input marker genes, wherein the clustering is based on a customizable similarity coefficient. 19. A computer program that is implemented by hardware and is stored in a medium to execute the method of claim 1.
이 특허에 인용된 특허 (4)
-
Singh, Ambuj Kumar; He, Huahai, Graph querying, graph motif mining and the discovery of clusters.
-
Jin,Yaochu; Sendhoff,Bernhard, Reduction of fitness evaluations using clustering techniques and neural network ensembles.
-
Asgekar, Amogh; Tawari, Sandesh, Social network node clustering system and method.
-
Nucci, Antonio; Keralapura, Ram, System and method for content-aware co-clustering algorithm based on hourglass model.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.