"Refining Transcriptome Gene Catalogs by MS‐Validation of Expressed Proteins"

Protein sequence identification by tandem mass spectroscopy (LC‐MS/MS) identifies thousands of protein sequences even in complex mixtures, and provides valuable insight into the biological functions of different cells. For non‐model organisms, transcriptomes are generally used to allow peptide identification, an important addition to their use as a gene catalog allowing the potential metabolic activities of cells to be determined. We used LC‐MS/MS data to identify which of the six possible reading frames in the transcriptome was actually used by the cell to make protein, and asked whether this would have an impact on downstream analyses using the dataset. We combined results from several LC‐MS/MS experiments designed to identify peptide sequences in extracts from the dinoflagellate Lingulodinium polyedra using a 74 655‐sequence transcriptome. We compiled a list of 6628 translated nucleic acid sequences that contained the ensemble of peptide matches (termed MS‐validated sequences) and assessed the similarity in downstream analyses between this data set and the 6628 nucleic acid sequences from which they were derived. When compared with BLASTx analyses of the DNA sequences, the MS‐validated protein‐sequences‐analyzed using BLASTp showed differences in gene ontology, had more identified BLAST hits, and contained more KEGG pathway enzymes. The MS‐validated protein sequences also differ from datasets containing longest open reading frame (ORF) protein sequences. We also note a poor correlation between the levels of protein and mRNA abundance, a comparison not previously performed for dinoflagellates. The differences observed between analyses of MS‐validated protein sequence and nucleic acid sequence datasets suggest use of the former may provide a more accurate representation of cellular capacity than the latter. Developing MS‐validated protein sequence datasets may also speed interpretation of MS‐MS spectra in bottom up proteomics experiments.

Keywords: protein; validated protein; protein sequence; protein sequences; gene; sequence

Journal Title: PROTEOMICS
Year Published: 2018

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
1

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended