Nowadays, monoisotopic mass is used as an important feature in top-down proteomics. Knowing the exact monoisotopic mass is helpful for precise and quick protein identification in large protein databases. However,… Click to show full abstract
Nowadays, monoisotopic mass is used as an important feature in top-down proteomics. Knowing the exact monoisotopic mass is helpful for precise and quick protein identification in large protein databases. However, only in spectra of small molecules the monoisotopic peak is visible. For bigger molecules like proteins, it is hidden in noise or undetected at all, and therefore its position has to be predicted. By improving the prediction of the peak, we contribute to a more accurate identification of molecules, which is crucial in fields such as chemistry and medicine. In this work, we present the envemind algorithm, which is a two-step procedure to predict monoisotopic masses of proteins. The prediction is based on an isotopic envelope. Therefore, envemind is dedicated to spectra where we are able to resolve the one dalton separated isotopic variants. Furthermore, only single-molecule spectra are allowed, that is, spectra that do not require prior deconvolution. The algorithm deals with the problem of off-by-one dalton errors, which are common in monoisotopic mass prediction. A novel aspect of this work is a mathematical exploration of the space of molecules, where we equate chemical formulas and their theoretical spectrum. Since the space of molecules consists of all possible chemical formulas, this approach is not limited to known substances only. This makes optimization processes faster and enables to approximate theoretical spectrum for a given experimental one. The algorithm is available as a Python package envemind on our GitHub page https://github.com/PiotrRadzinski/envemind.
               
Click one of the above tabs to view related content.