IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
UP-0295225
(2005-12-05)
|
등록번호 |
US-7822555
(2010-11-15)
|
발명자
/ 주소 |
- Huang, Jing
- Jones, Keith W.
- Shapero, Michael H.
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
43 인용 특허 :
39 |
초록
▼
Methods of identifying allele-specific changes in genomic DNA copy number are disclosed. Methods for identifying homozygous deletions and genetic amplifications are disclosed. An array of probes designed to detect presence or absence of a plurality of different sequences is also disclosed. The probe
Methods of identifying allele-specific changes in genomic DNA copy number are disclosed. Methods for identifying homozygous deletions and genetic amplifications are disclosed. An array of probes designed to detect presence or absence of a plurality of different sequences is also disclosed. The probes are designed to hybridize to sequences that are predicted to be present in a reduced complexity sample. The methods may be used to detect copy number changes in cancerous tissue compared to normal tissue. The methods may be used to diagnose cancer and other diseases associated with chromosomal anomalies.
대표청구항
▼
What is claimed is: 1. A method of estimating in a sample the copy number of a plurality of genomic regions in a genome, wherein each genomic region contains at least one single nucleotide polymorphisms (SNP) from a plurality of SNPs, wherein each SNP in the plurality has an A and a B allele in a p
What is claimed is: 1. A method of estimating in a sample the copy number of a plurality of genomic regions in a genome, wherein each genomic region contains at least one single nucleotide polymorphisms (SNP) from a plurality of SNPs, wherein each SNP in the plurality has an A and a B allele in a population, said method comprising: (a) genotyping the sample using a high density genotyping array comprising a plurality of perfect match and mismatch-probes for the A allele of each SNP in the plurality of SNPs (PMA and MMA) and a plurality of perfect match and mismatch probes for the B allele (PMB and MMB) to obtain a raw intensity measurement for each PMA, MMA, PMB and MMB probe for each SNP in the plurality of SNPs, wherein said and to obtain a genotyping call for each SNP in the plurality of SNPs; (b) transforming each raw intensity measurement to its natural log to obtain a transformed intensity value for each probe; (c) normalizing the transformed intensity values using the MMB transformed intensity values for all SNPs from the plurality of SNPs that are called BB in the sample to obtain normalized PMA intensities; (d) normalizing each PMB transformed intensity values using the MMA transformed intensities for all SNPs from the plurality that are called AA in the sample to obtain normalized PMB intensities; (e) using a plurality of reference samples, identify a set of PMA probes and a set of PMB probes for each SNP in the plurality of SNPs that show linear correlation between copy number and intensity; (f) calculating for each SNP in the plurality of SNPs an average of the PMA probes in the set of PMA probes and an average of the PMB probes in the set of PMB probes to obtain a PMA average intensity and a PMB average intensity for each SNP in the plurality of SNPs; (g) performing linear regression against a model equation derived from a plurality of reference samples to obtain an estimated A allele copy number and an estimated B allele copy number for each SNP in the plurality of SNPs; (h) adding the estimated A allele copy number to the estimated B allele copy number to obtain an estimated total copy number of the genomic region of each SNP in the plurality of SNPs, thereby calculating an estimated total copy number for each of a plurality of genomic regions in a genome; and (i) applying regression tree analysis to the estimated total copy numbers obtained in (h) to partition the genome into genomic regions having the same estimated total copy number, wherein steps (b)-(i) are performed by a computer and wherein the computer outputs the estimated total copy number of a plurality of genomic regions in a computer readable format. 2. The method of claim 1 wherein the high density genotyping array comprises a plurality of probe sets comprising at least 100,000 different probe sets, wherein a probe set comprises at least three perfect match probes for allele A, at least 3 perfect match probes for allele B, at least 3 mismatch probes for allele A and at least 3 mismatch probes for allele B. 3. The method of claim 2 wherein a probe set comprises at least 7 perfect match probes for allele A, at least 7 perfect match probes for allele B, at least 7 mismatch probes for allele A and at least 7 mismatch probes for allele B. 4. The method of claim 1 wherein probes are selected for the set of PMA probes and for the set of PMB probes by identifying SNPs that show a correlation greater than 0.6 between allelic dosages based on genotype calls and probe intensity using the equation S _ a ; l m = ∑ n ∈ A m S a ; lmn # { A m } for the A alleles and S _ b ; lm = ∑ n ∈ B m S b ; lmn # { B m } for the B alleles, wherein Am={n|Cor>0.6 between Sa;rmn and genotype Grm r=1, . . . , R is the reference set} Bm={n|Cor>0.6 between Sb;rmn and genotype Grm r=1, . . . , R is the reference set}. 5. The method of claim 1 wherein the step of estimating the A allele copy number and the B allele copy number in step (g) further comprises calculating a value for C in the following equation Im=αm,0+αm,1 ln(δm+C)+Am+ε for each SNP in the plurality of SNPs where m=1, . . . , M is the SNP index, I is the probe intensity, αm,0 is the SNP-specific optical background, αm,1 is the scaling factor, δm is the non-specific hybridization, T is the DNA target concentration, Am is an affinity term determined by probe and target fragment sequences, and ε is a random noise term. 6. The method of claim 1 wherein the step of estimating the copy number of the A allele in the unknown sample in step (g) further comprises using the equation: Ĉa,lm=max(exp({circumflex over (γ)}a0,m+{circumflex over (γ)}a1,mIa,lm)−{circumflex over (δ)}a,lm,0), and the step of estimating the copy number of the B allele in the unknown sample in step (g) further comprises using the equation: Ĉb,lm=max(exp({circumflex over (γ)}b0,m+{circumflex over (γ)}b1,mIb,lm)−{circumflex over (δ)}b,lm,0), wherein δ is fixed and γao,m, γa1,m, γbo,m, γb1,m, are estimated using the least square regression with the normal reference as the training set. 7. The method of claim 6 further comprising performing regression tree analysis to partition the genome further based on allele-specific copy number for a plurality of genomic regions into regions that share the same allele-specific copy number and to assign allele-specific copy number to regions that show alteration from the diploid state. 8. A method for estimating the copy number of a genomic region in an experimental sample comprising: (a) isolating nucleic acid from the experimental sample; (b) fragmenting the nucleic acid sample with a restriction enzyme; (c) ligating an adaptor to the fragments (d) amplifying at least some of the adaptor ligated fragments (e) labeling the amplified products; (f) hybridizing the labeled amplified products to an array to obtain a hybridization pattern, wherein the array comprises a plurality of genotyping probe sets for a plurality of SNPs, wherein a probe set comprises: (i) a plurality of perfect match probes to a first allele of a SNP, (ii) a plurality of perfect match probes to a second allele of the SNP, (iii) a plurality of mismatch probes to the first allele of the SNP, and (iv) a plurality of mismatch probes to the second allele of the SNP, (g) obtaining a raw intensity measurement for each perfect match and each mismatch probe in each probe set for each SNP; (h) calculating the natural log(ln) of the raw intensity measurement for each probe; (i) standardizing the natural log of the raw intensity measurement for each probe using as background the mismatch probe intensities from the opposite allele in SNPs with a genotype call homozygous for the opposite allele; (j) obtaining a standardized measurement for each perfect match probe by a method comprising obtaining a first background intensity by calculating an average of a plurality of B allele mismatch probes for a plurality of SNPs called homozygous A in the sample, obtaining a second background intensity by calculating an average of a plurality of A allele mismatch probes for a plurality of SNPs called homozygous B in the sample, (k) standardize the PMa probes so that the MMa probes for SNPs with BB genotype calls have a variance of one and a mean of zero; (l) standardize the PMb probes so that the mismatch B probes for homozygous A SNPs have a variance of one and a mean of zero; (m) select probes to be included in calculation by identifying probes that show a linear response between copy number and intensity above a threshold and calculate an average intensity across selected probes in a probe set; (n) perform regression analysis on the reference set mean intensities for a given probe set and genotype for each SNP; (o) compare the intensity of the target sample against the mean intensity values of samples from the reference set with the same genotype call; (p) apply a linear regression to adjust the target intensity so that it falls on the line Y+X, perform separately on PMA and PMB probe intensities; (q) model copy number from the reference samples using the following equations ln(Ca,rm+δa,m)=γa1,mIa,rm+εa,rm ln(Cb,rm+δb,m)=γb0,m+γb1,mIb,rm+εb,rm (r) use the values of γao,m, γa1,m, γbo,m, γb1,m, obtained in (q) to estimate copy number of the unknown sample for each allele of each SNP using the following equation Ĉa,lm=max(exp({circumflex over (γ)}a0,m+{circumflex over (γ)}a1,mIa,lm)−{circumflex over (δ)}a,lm,0) and Ĉb,lm=max(exp({circumflex over (γ)}b0,m+{circumflex over (γ)}b1,mIb,lm)−{circumflex over (δ)}b,lm,0) (s) perform kernel smoothing on the estimated copy number applying significance with a 1 Mb window and Gaussian kernel.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.