Algorithmic challenges of high-resolution mass spectrometry

Research project objectives

Mass spectrometry (MS) is the method of choice for high-throughput analysis of biological compounds, such as proteins, lipids, or metabolites. The rich insights it provides into the composition of biological samples come, however, at a cost of both complexity and size of the acquired data. Recent advances in instrumental design clearly show the tendency to construct much more affordable mass spectrometers with still higher resolution. We plan to study of the fine isotopic distribution and its potential in ultra-high-resolution mass spectrometry. Moreover we explore a new approach for the comparison of mass spectra using a metric known in the computer science under the name of Earth Mover’s Distance and in mathematics as the Wasserstein distance. Direct application thereof can be found in the study of mass spectrometry imaging (MSI) or in metabolomics. Four specific questions will be addressed by us:

  • WassersteinCompare: verify the usefulness of novel distance measure in MS processing
  • MsDeconvolute: propose efficient deconvolution method based on Wasserstein distance
  • IsoAnnotate: validate the automatic chemical formula inference based on fine isotopic distribution from high-resolution spectra
  • SparseFT-ICR: check whether the application of more efficient version of FFT method (sparse FFT) would speed-up massive MSI processing

Research project methodology

Mass spectrometers are used to separate the components of a given sample by their mass to charge ratio (m/z). WassersteinCompare subproject proposes a new measure which quantifies both the differences in intensities and m/z values of peaks in a continuous way. As such, the measure is more robust to changes in chemical formulas than the most common measures based on peak matching. The measure is based on the concept of transporting the ion current between the spectra. The distance between spectra is equal to the total distance in the m/z domain covered by the current. We show that, under certain assumptions, it can be computed in time linear in the number of distinct peaks in both spectra.

We argue that this metric allows for natural and robust solutions to various problems in the analysis of mass spectra. MsDeconvolute shows an application to the problem of deconvolution, in which we infer proportions of several overlapping isotopic envelopes of similar compounds. Combined with the previously proposed generator of isotopic envelopes, IsoSpec, our approach should work for a wide range of masses and charges in the presence of several types of measurement inaccuracies.

Identification of de novo chemical formulas in the IsoAnnotate will be based on analysis of graphs with vertices being the signals in the spectrum and the edges corresponding to mass translation (changing the isotope of one element). The distribution of different isotopes is modelled as multinomial distribution, and the estimation of its parameters will allow to approximate the number of individual atoms in the studied molecule. The automatic annotation tool will be validated as an input to challenging problem of metabolic pathways inference.

Nowadays, instruments that offer highest attainable resolution are the Fourier transform ion cyclotron resonance mass spectrometers, FT-ICR, in which the recorded signal is a time series composed of overlapping simple harmonic motions of different ions inside the sample. The problem of establishing which particular harmonics are present in the signal is solved by the applying the Fast Fourier Transform (FFT) to the observed signal. Our idea of SparseFT-ICR method is based on recently proposed efficient sparse FFT which should significantly speed up the calculations.

Expected impact of the research project on the development of science

Mass spectrometry has become a fundamental tool in proteomics, metabolomics, lipidomics, and other high-throughput studies of complex biochemical samples. The possibility of investigating complex samples like tissues, foods and body fluids is crucial in both fundamental research and applied biomedical sciences. Computational mass spectrometry belong to technology-driven branch of science. It is clear that the advances in spectrometer resolution should be followed be the development of adequate computational and statistical methods. Our project takes up this challenge aiming in new algorithms for ultra high resolution MS.