[특허]Nucleic acid sequence assembly

[미국특허] Nucleic acid sequence assembly 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06F-019/22 C12Q-001/68
출원번호	US-0045818 (2016-02-17)
등록번호	US-9715573 (2017-07-25)
발명자 / 주소	Putnam, Nicholas H. Stites, Jonathan C. Rice, Brandon J.
출원인 / 주소	DOVETAIL GENOMICS, LLC
대리인 / 주소	Wilson Sonsini Goodrich & Rosati
인용정보	피인용 횟수 : 0 인용 특허 : 52

초록 ▼

Disclosed herein are compositions, systems and methods related to sequence assembly, such as nucleic acid sequence assembly of single reads and contigs into larger contigs and scaffolds through the use of read pair sequence information, such as read pair information indicative of nucleic acid sequen

대표청구항 ▼

1. A method for nucleic acid sequence data assembly, comprising: (a) obtaining purified DNA;(b) binding the purified DNA with a DNA binding agent to form DNA/chromatin complexes;(c) incubating the DNA-chromatin complexes with restriction enzymes to leave sticky ends;(d) performing ligation to join ends of DNA;(e) sequencing ligated DNA junctions to generate paired end reads;(f) obtaining standard paired-end read distance frequency data;(g) obtaining grouped contig sequences; and(h) scaffolding the grouped contig sequences such that read pair distance frequency data for read pairs that map to separate contigs approximates the standard paired-end read distance frequency data,thereby assembling the sequence data of the nucleic acid. 2. The method of claim 1, wherein read pair distance frequency data for read pairs that map to separate contigs more closely approximates the standard paired-end read distance frequency data when read pair distance likelihood is improved by at least 5% compared to read pair distance likelihood calculated for unscaffolded contig sequences. 3. A method for scaffolding contigs of nucleic acid sequence information obtained from a biological sample, said method comprising: (a) obtaining a set of contig sequences having an initial configuration, wherein the contig sequences are obtained by extracting DNA from a biological material and sequencing the DNA;(b) obtaining a set of paired end reads, wherein the set of paired-end reads is obtained by digesting sample DNA to generate internal double strand breaks within the nucleic acid, allowing the double strand breaks to re-ligate randomly to form a plurality of re-ligation junctions, and sequencing across the plurality of re-ligation junctions;(c) obtaining standard paired-end read distance frequency data;(d) grouping contig pairs sharing sequence that coexists in at least one paired end read, thereby generating grouped contigs; and(e) scaffolding the grouped contigs such that read pair distance frequency data for read pairs that map to separate contigs more closely approximates the standard paired-end read distance frequency data by at least 5% relative to the read pair frequency data of the grouped contigs in the initial configuration. 4. The method of claim 3, wherein the sample DNA is crosslinked to at least one DNA binding agent. 5. The method of claim 3, wherein the sample DNA comprises isolated naked DNA. 6. The method of claim 5, wherein the isolated naked DNA is reassembled into reconstituted chromatin. 7. The method of claim 6, wherein the reconstituted chromatin is crosslinked. 8. The method of claim 6, wherein the reconstituted chromatin comprises a DNA binding protein. 9. The method of claim 6, wherein the reconstituted chromatin comprises a nanoparticle. 10. The method of claim 3, wherein the set of paired-end reads are obtained by digesting sample DNA to generate internal double strand breaks within the nucleic acid, allowing the double strand breaks to re-ligate randomly to form a plurality of re-ligation junctions, and sequencing on each side of the plurality of re-ligation junctions. 11. The method of claim 3, wherein standard paired-end read distance frequency data is obtained from paired-end reads where both reads map to a common contig. 12. The method of claim 3, wherein standard paired-end read distance frequency data is obtained from previously generated curves. 13. The method of claim 3, wherein said scaffolding comprises selecting a first set of putative adjacent contigs of said grouped contigs, determining a minimal distance order of said first set of putative adjacent contigs that reduces an aggregate measure of the read-pair distances for said read pairs, and scaffolding said first set of putative adjacent contigs so as to reduce said aggregate measure of the read-pair distance. 14. The method of claim 13, wherein said determining a minimal distance order comprises determining an expected read-pair distance for all possible contig configurations for at least one read pair that comprises reads mapping to two contigs of said first set of putative adjacent contigs. 15. The method of claim 14, comprising selecting a contig orientation from said all possible contig configurations that most improves the read pair distance likelihood compared to other possible configurations from said all possible contig configurations. 16. The method of claim 3, wherein the biological sample comprises a genome. 17. The method of claim 3, wherein the biological sample is a heterogeneous sample comprising a plurality of genomes. 18. A method for scaffolding contigs of nucleic acid sequence information comprising: (a) obtaining a set of contig sequences having an initial configuration;(b) obtaining a set of paired end reads;(c) obtaining standard paired-end read distance frequency data;(d) grouping contig pairs sharing sequence that coexists in at least one paired end read, thereby generating grouped contigs; and(e) scaffolding the grouped contigs such that read pair distance frequency data for read pairs that map to separate contigs more closely approximates the standard paired-end read distance frequency data by at least 5% relative to the read pair distance frequency data of the grouped contigs in the initial configuration. 19. The method of claim 18, wherein the scaffolding comprises at least one of ordering the grouped contigs, orienting the grouped contigs, merging at least two contigs end to end, inserting one contig into a second contig, and cleaving a contig into at least two constituent contigs. 20. The method of claim 18, wherein read pair distance frequency data for read pairs that map to separate contigs more closely approximates the paired-end read distance frequency data when read pair distance likelihood increases by at least 5% relative to the read pair distance frequency data of the group contigs in the initial configuration. 21. The method of claim 20, wherein read-pair distance likelihood is maximized. 22. The method of claim 18, wherein read pair distance frequency data for read pairs that map to separate contigs more closely approximates the standard paired-end read distance frequency data when a statistical measure of difference between the read pair distance frequency data for read pairs that map to separate contigs and the standard paired-end read distance frequency data decreases by at least 5% relative to the read pair distance frequency data of the grouped contigs in the initial configuration. 23. The method of claim 22, wherein the statistical measure of distance between the read pair distance frequency data for read pairs that map to separate contigs and the standard paired-end read distance frequency data comprises at least one of ANOVA, a t-test, and a X-squared test. 24. The method of claim 23, wherein read pair distance for read pairs that map to separate contigs more closely matches the paired-end read distance frequency data when deviation of read pair distance distribution among ordered contigs obtained as compared to standard paired-end read distance frequency decreases. 25. The method of claim 24, wherein deviation of read pair distance distribution among ordered contigs obtained as compared to standard paired-end read distance frequency is minimized. 26. A method of assembling contig sequence information into at least one scaffold, comprising (a) obtaining sequence information corresponding to a plurality of contigs, obtaining paired-end read information from a nucleic acid sample represented by the plurality of contigs, and(b) configuring the plurality of contigs such that deviation of a read pair distance parameter from a predicted read pair distance data set is decreased by at least 5% compared to the read pair distance parameter of plurality of contigs in an initial configuration, wherein the configuring occurs in less than 8 hours. 27. The method of claim 26, wherein the predicted read pair distance data set comprises a read pair distance likelihood curve. 28. The method of claim 26, wherein the read pair distance parameter is maximum distance likelihood relative to a read pair distance likelihood curve. 29. The method of claim 26, wherein the read pair distance parameter is variation relative to a read pair distance likelihood curve. 30. The method of claim 26, wherein 70% of the plurality of contigs are ordered and oriented so as to match the relative order and orientation of their sequences in the nucleic acid sample in no more than 8 hours. 31. The method of claim 1, wherein an N50 value is increased by at least 1% relative to unscaffolded sequence data. 32. The method of claim 3, wherein an N50 value is increased by at least 1% relative to the grouped contigs in the initial configuration. 33. The method of claim 18, wherein an N50 value is increased by at least 1% relative to the grouped contigs in the initial configuration. 34. The method of claim 26, wherein an N50 value is increased by at least 1% relative to the plurality of contigs in the initial configuration.

이 특허에 인용된 특허 (52) 인용/피인용 타임라인 분석

Barany, Francis; Kiu, Jianzhao; Kirk, Brian W.; Zirvi, Monib; Gerry, Norman P.; Paty, Philip B., Accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing.
상세보기
Su, Xing; Dong, Helin; Ryder, Thomas B., Amplification of nucleic acids.
상세보기
Balasubramanian, Shankar; Klenerman, David; Bentley, David, Arrayed biomolecules and their use in sequencing.
상세보기
Chee Mark ; Cronin Maureen T. ; Fodor Stephen P. A. ; Huang Xiaohua X. ; Hubbell Earl A. ; Lipshutz Robert J. ; Lobban Peter E. ; Morris MacDonald S. ; Sheldon Edward L., Arrays of nucleic acid probes on biological chips.
상세보기
De Laat, Wouter; Grosveld, Frank, Capture and characterized co-localized chromatin (4C) technology.
상세보기
De Laat, Wouter; Grosveld, Frank, Circular chromosome conformation capture (4C).
상세보기
Ausubel Frederick M. ; Mindrinos Michael, Cleaved amplified modified polymorphic sequence detection methods.
상세보기
Greenfield,I. Lawrence, Compositions, methods, and kits for isolating nucleic acids using surfactants and proteases.
상세보기
Hubbell Earl A. (Mt. View CA) Lipshutz Robert J. (Palo Alto CA) Morris Macdonald S. (San Jose CA) Winkler James L. (Palo Alto CA), Computer-aided engineering system for design of sequence arrays and lithographic masks.
상세보기
Hubbell Earl A. (Mt. View CA) Morris MacDonald S. (San Jose CA) Winkler James L. (Palo Alto CA), Computer-aided engineering system for design of sequence arrays and lithographic masks.
상세보기
Tom Henry K. (La Honda CA) Rowley Gerald L. (Cupertino CA), Concentrating zone method in heterogeneous immunoassays.
상세보기
Letsinger Robert L. ; Herrlein Mathias K.,DEX, Covalent lock for self-assembled oligonucleotide constructs.
상세보기
Nolan John P. ; White P. Scott ; Cai Hong, DNA polymorphism identity determination using flow cytometry.
상세보기
Hawkins Trevor, DNA purification and isolation using magnetic particles.
상세보기
Peter Lohse ; Markus Kurz, DNA-protein fusions and uses thereof.
상세보기
Whiteley Norman M. (San Carlos CA) Hunkapiller Michael W. (San Carlos CA) Glazer Alexander N. (Orinda CA), Detection of specific sequences in nucleic acids.
상세보기
Ullman Edwin F. (Atherton CA) Schwarzberg Moshe (Palo Alto CA), Fluorescence quenching with immunological pairs in immunoassays.
상세보기
Kronick Melvyn N. (Palo Alto CA) Little William A. (Palo Alto CA), Fluorescent immunoassay employing total reflection for activation.
상세보기
Juncosa Robert D. ; Bongard Rene ; Dapprich Johannes ; Scribner Richard, Genetic analysis device.
상세보기
Chen, Lin; Kalhor, Reza, Genome-wide chromosome conformation capture.
상세보기
Church, George M.; Zhang, Kun, Genomic library construction.
상세보기
Jayasena Sumedha ; Gold Larry, Homogeneous detection of a target through nucleic acid ligand-ligand beacon interaction.
상세보기
Higuchi Russell G., Homogeneous methods for nucleic acid amplification and detection.
상세보기
Maggio Edward T. (Redwood City CA), Kit for carrying out chemically induced fluorescence immunoassay.
상세보기
Pirrung Michael C. (Durham NC) Read J. Leighton (Palo Alto CA) Fodor Stephen P. A. (Palo Alto CA) Stryer Lubert (Stanford CA), Large scale photolithographic solid phase synthesis of polypeptides and receptor binding screening thereof.
상세보기
Litman David J. (Palo Alto CA) Harel Zvi (Stanford CA) Ullman Edwin F. (Atherton CA), Macromolecular environment control in specific receptor assays.
상세보기
Pfeifer,Gerd P.; Rauch,Tibor, Method for detecting methylated CpG islands.
상세보기
Fu, Rongdian; Brenner, Sydney; Albrecht, Glenn, Method for determining relative abundance of nucleic acid sequences.
상세보기
Wang Chang-Ning J. (Chelmsford MA) Wu Kai-Wuan (Lowell MA), Method for reducing non-specific priming in DNA amplification.
상세보기
Landegren Ulf (Pasadena CA) Hood Leroy (Pasadena CA), Method of detecting a nucleotide change in nucleic acids.
상세보기
Herman James G. ; Baylin Stephen B., Method of detection of methylated nucleic acid using agents which modify unmethylated cytosine and distinguishing modifi.
상세보기
Higuchi Russell G., Methods and devices for hemogeneous nucleic acid amplification and detector.
상세보기
Whitcombe David Mark,GBX ; Theaker Jane,GBX ; Gibson Neil James,GBX ; Little Stephen,GBX, Methods for detecting target nucleic acid sequences.
상세보기
Liu,Guoying; Cawley,Simon; Matsuzaki,Hajime; Hubbell,Earl A.; Yang,Geoffrey; Webster,Teresa A.; Mei,Rui; Di,Xiaojun; Chiles,Richard, Methods for genotyping polymorphisms in humans.
상세보기
Wang Chang-Ning J. (Chelmsford MA) Wu Kai-Yuan (Lowell MA), Methods for reducing non-specific priming in DNA detection.
상세보기
Oleinikov, Andrew V., Microarray synthesis and assembly of gene-length polynucleotides.
상세보기
Wittwer Carl T. ; Ririe Kirk M. ; Rasmussen Randy P., Monitoring amplification of DNA during PCR.
상세보기
Van S. Chandler ; Jerrold R. Fulton ; Mark B. Chandler, Multiplexed analysis of clinical specimens apparatus and method.
상세보기
Iwashita, Jun, Negative resist composition and method for forming resist pattern.
상세보기
Letsinger Robert L. (Wilmette IL) Gryaznov Sergei M. (San Mateo CA), Non-enzymatic ligation of oligonucleotides.
상세보기
Nazarenko Irina A. ; Bhatnagar Satish K. ; Winn-Deen Emily S. ; Hohman Robert J., Nucleic acid amplification oligonucleotides with molecular energy transfer labels and methods based thereon.
상세보기
Drmanac, Radoje, Nucleic acid analysis by random mixtures of non-overlapping fragments.
상세보기
Drmanac, Radoje, Nucleic acid analysis by random mixtures of non-overlapping fragments.
상세보기
Saito,Isao; Okamoto,Akimitsu; Saito,Yoshio; Yoshida,Yasuko; Niwa,Kousuke, Nucleotide derivative and DNA microarray.
상세보기
Balasubramanian, Shankar, Polynucleotide sequencing.
상세보기
Boom Willem R. (Amsterdam NLX) Adriaanse Henritte M. A. (Arnhem NLX) Kievits Tim (The Hague NLX) Lens Peter F. (Amsterdam NLX), Process for isolating nucleic acid.
상세보기
Kurnit David M. ; Chiang Pei-Wen ; Wang Chang-Ning J., Quantitative PCR using blocking oligonucleotides.
상세보기
Shokat, Kevan; Simon, Matthew D., Site-specific installation of methyl-lysine analogues into recombinant histones.
상세보기
Granéli, Annette; Reimhult, Erik; Svedhem, Sofia; Pfeiffer, Indriati; Höök, Fredik, Surface immobilised multilayer structure of vesicles.
상세보기
Bridgham, John; Corcoran, Kevin; Golda, George; Brenner, Sydney; Pallas, Michael C., System and apparatus for sequential processing of analytes.
상세보기
Barany Francis (New York NY) Zebala John (New York NY) Nickerson Deborah (Seattle WA) Kaiser ; Jr. Robert J. (Seattle WA) Hood Leroy (Seattle WA), Thermostable ligase-mediated DNA amplifications system for the detection of genetic disease.
상세보기
Drmanac, Radoje, Using non-overlapping fragments for nucleic acid sequencing.
상세보기

활용도 분석정보

상세보기

다운로드

내보내기

활용도 Top5 특허

해당 특허가 속한 카테고리에서 활용도가 높은 상위 5개 콘텐츠를 보여줍니다.
더보기 버튼을 클릭하시면 더 많은 관련자료를 살펴볼 수 있습니다.

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

[미국특허] Nucleic acid sequence assembly 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (52) 인용/피인용 타임라인 분석

활용도 분석정보

활용도 Top5 특허

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

[미국특허] Nucleic acid sequence assembly 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (52) 인용/피인용 타임라인 분석

활용도 분석정보

활용도 Top5 특허 더보기

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

활용도 Top5 특허