[논문]Reinterpretation of the protein identification process for proteomics data

Kwon, Kyung-Hoon; Lee, Sang-Kwang; Cho, Kun; Park, Gun-Wook; Kang, Byeong-Soo; Park, Young-Mok

doi:10.4051/ibc.2009.3.0009

Abstract ▼ AI-Helper

Introduction: In the mass spectrometry-based proteomics, biological samples are analyzed to identify proteins by mass spectrometer and database search. Database search is the process to select the best matches to the experimental mass spectra among the amino acid sequence database and we identify th...

Introduction: In the mass spectrometry-based proteomics, biological samples are analyzed to identify proteins by mass spectrometer and database search. Database search is the process to select the best matches to the experimental mass spectra among the amino acid sequence database and we identify the protein as the matched sequence. The match score is defined to find the matches from the database and declare the highest scored hit as the most probable protein. According to the score definition, search result varies. In this study, the difference among search results of different search engines or different databases was investigated, in order to suggest a better way to identify more proteins with higher reliability. Materials and Methods: The protein extract of human mesenchymal stem cell was separated by several bands by one-dimensional electrophorysis. One-dimensional gel was excised one by one, digested by trypsin and analyzed by a mass spectrometer, FT LTQ. The tandem mass (MS/MS) spectra of peptide ions were applied to the database search of X!Tandem, Mascot and Sequest search engines with IPI human database and SwissProt database. The search result was filtered by several threshold probability values of the Trans-Proteomic Pipeline (TPP) of the Institute for Systems Biology. The analysis of the output which was generated from TPP was performed. Results and Discussion: For each MS/MS spectrum, the peptide sequences which were identified from different conditions such as search engines, threshold probability, and sequence database were compared. The main difference of peptide identification at high threshold probability was caused by not the difference of sequence database but the difference of the score. As the threshold probability decreases, the missed peptides appeared. Conversely, in the extremely high threshold level, we missed many true assignments. Conclusion and Prospects: The different identification result of the search engines was mainly caused by the different scoring algorithms. Usually in proteomics high-scored peptides are selected and low-scored peptides are discarded. Many of them are true negatives. By integrating the search results from different parameter and different search engines, the protein identification process can be improved.

주제어

AI 본문요약
AI-Helper

* AI 자동 식별 결과로 적합하지 않은 문장이 있을 수 있으니, 이용에 유의하시기 바랍니다.

제안 방법

The gel band was digested into peptides by trypsin and analyzed by tandem mass (MS/MS) spectrometry. All MS/MS experiments for peptide identification were performed a Nano-LC/MS system consisting of a Surveyor HPLC system and a 7-tesla LTQ-FT mass spectrometer (Finnigan, San Jose) equipped with a nano-ESI source. Ten microliter of each sample with digested peptides was separated on a homemade microcapillary column of length 100mm packed with C₁₈ in 75 µm silica tubing.
In this study, we compared the peptides and proteins which were identified from different search engines and filtered by different threshold probabilities. At first, we aimed to check whether two search engines identify different sequences for one MS/MS spectrum.
In this study, we tried to compare the peptide sequences identified for one MS/MS spectrum by different search engines or by different threshold probability. At first, it was checked whether one MS/MS spectrum could be identified by different peptide sequences with low error rate in different search engines.

대상 데이터

In this analysis, three major search engines of Mascot, Sequest and X!Tandem were used. As the sequence database, IPI human database v3.49 (EBI, UK) and Swiss-Prot database v51.6 (EBI, UK) were chosen. They are less redundant appropriately for the database search of proteomics experimental data than NCBI nr database.

참고문헌 (15)

Alves, G., Wu, W.W., Wang, G., Shen, R.F. and Yu, Y.K. (2008) Enhancing peptide identification confidence by combining search methods. J. Proteome Res. 7(8), 3102-13

상세보기
Craig, R., Beavis, R.C. (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20, 1466-1467

상세보기
Dancik, V., Addona, T.A., Clauser, K.R., Vath, J.E. and Pevzner, P.A. (1999) De Novo Peptide Sequencing via Tandem Mass Spectrometry. J. Comp. Biol. 6, 327-342

상세보기
Elias, J.E., Gibbons, F.D., King, O.D., Roth, F.P. and Gygi, S.P. (2004) Intensity-based protein identification by machine learning from a library of tandem mass spectra. Nat. Biotech. 22, 214-219

상세보기
Eng, J.K., McCormack, A.L., Yates, JR III (1994) An Approach to Correlate Tandem Mass Spectral Data of Peptides with Amino Acid Sequences in a Protein Database. J. Am. Soc. Mass Spectrom 5, 976-989

상세보기
Eng, J.K., Fischer, B., Grossmann, J. and MacCoss, M.J. (2008) A Fast SEQUEST Cross Correlation Algorithm. J. Proteome Res. 7, 4598-4602

상세보기
Geer, L.Y., Markey, S.P., Kowalak, J.A., Wagner, L., Xu, M., Maynard, D.M., Yang, X., Shi, W. and Bryant, S.H. (2004) Open mass spectrometry search algorithm, J. Proteome Res. 3(5), 958-64

상세보기
Keller, A., Nesvizhskii, A.I., Kolker, E. and Aebersold, R. (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383-5392

상세보기
Keller ,A., Eng, J., Zhang, N., Li, X. and Aebersold, R. (2005) A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol. Sys. Biol. 2, 1-8

상세보기
Nesvizhskii, A.I., Keller, A., Kolker, E. and Aebersold, R. (2003) A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75, 4646-4658

상세보기
Kapp, E.A., Schutz, F., Connolly, L.M., Chakel, J.A., Meza, J.E., Miller, C.A., Fenyo, D., Eng, J.K., Adkins, J.N., and Omenn, G,S. (2005) An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: sensitivity and specificity analysis. Proteomics 5, 3475-90

상세보기
Kersey, P.J., Duarte, J., Williams, A., Karavidopoulou, Y., Birney, E. and Apweiler, R. (2004) The International Protein Index : an integrated database for proteomics experiments, Proteomics, 4(7), 1985-8

상세보기
O'Donovan, C., Martin, M.J., Gattiker, A., Gastelger, E., Bairoch, A. and Apweiler, R. (2002) High-quality protein knowledge resource: SWISS-PROT and TrEMBL. Brief Bioinform. 3(3), 275-84

상세보기
Omenn, G.S., States, D.J., et al. (2005) Overview of the HUPO Plasma Proteome Project: results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database, Proteomics 5(13), 3226-45

상세보기
Perkins, D.N., Pappin, D.J.C., Creasy, D.M. and Cottrell, J.S. (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551-3567

상세보기

이 논문을 인용한 문헌

저자의 다른 논문 :

LOADING...

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Reinterpretation of the protein identification process for proteomics data 원문보기

Abstract ▼ AI-Helper

주제어

AI 본문요약
AI-Helper

제안 방법

대상 데이터

성능/효과

참고문헌 (15)

이 논문을 인용한 문헌

저자의 다른 논문 :

연구과제 타임라인

관련 콘텐츠

원문 보기

원문 URL 링크

연관된 기능

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Reinterpretation of the protein identification process for proteomics data 원문보기

Abstract ▼ AI-Helper

주제어

AI 본문요약 엑셀 다운로드 AI-Helper

제안 방법

대상 데이터

성능/효과

참고문헌 (15)

이 논문을 인용한 문헌

저자의 다른 논문 :

권경훈 (11) 이상광 (3) 조건 (7) 박건욱 (4) 박영목 (11)

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

관련 콘텐츠

원문 보기

원문 URL 링크

연관된 기능

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

AI 본문요약
AI-Helper