CSAMA 2024

Content

  • Introduction to metabolomics
  • Introduction to (LC-)MS
  • Handling and processing metabolomics data in R
  • Lab overview: Main preprocessing steps
  • Annotation of metabolomics data

Metabolite? Metabolism?

Metabolite? Metabolism?

  • Key metabolic pathway common to all cells.
  • Creates energy by converting glucose to pyruvate.

Metabolite? Metabolism?

  • Metabolites: intermediates and products of cellular processes.

Metabolomics?

  • Large-scale study of small molecules in a system.

What are we measuring? … depends on context:

  • (human) metabolomics: metabolites, lipids, …
  • plant and food metabolomics: metabolites, polyphenoles, …
  • exposomics: chemicals, pharmaceuticals, …
  • environmental sciences: chemicals, metals, …

How are we measuring these small compounds?

  • Nuclear Magnetic Resonance (NMR): highly specific, not very sensitive.
  • Mass Spectrometry (MS): less specific, highly sensitive, high throughput.

(Human) Metabolomics

Where can we measure small compounds/metabolites?

  • Blood (serum):
    • insights into general physiological state of organism.
    • venous blood/capillary (arterial) blood.
  • Urine, stool samples, (food, dust): external influence.
  • Cell culture experiments:
    • supernatant: what did cells consume/produce?
    • cell extracts: insights into mitochondrial metabolism.

Metabolomics

Putting metabolomics into context:

  • Genome: what can happen.
  • Transcriptome: what appears to be happening.
  • Proteome: what makes it happen.
  • Metabolome: what actually happened.

Properties of the metabolome:

  • Metabolome is highly dynamic.
  • Metabolome influenced by genetic and environmental factors.

Influence from genetic factors

  • mGWAS: associations between genetic variants and metabolite concentrations.

  • Significant association between variant and carnitine, acetylcarnitine and butyrylcarnitine.
  • SLC22A5: carnitine transporter.
  • Genetic variant in this gene has influence on its function.

How can we measure metabolites?

  • Metabolites small enough to be directly measured by Mass Spectrometry.
  • Most metabolites uncharged - need to create ions first.

Two main setups to measure metabolites:

  • targeted: quantitative measurement of selected metabolites.
  • untargeted: semi-quantitative measurement of all metabolites (detectable with the setup) in a sample.

Mass Spectrometry (MS)

Mass Spectrometry (MS)

Mass Spectrometry (MS)

Mass Spectrometry (MS)

Glucose (C6H12O6)

Fructose (C6H12O6)

Mannose (C6H12O6)

  • Problem: unable to distinguish between metabolites with the same/similar mass-to-charge ratio (m/z).
  • Solution: additional separation of metabolites prior to MS.

Mass Spectrometry (MS)

Glucose (C6H12O6)

Fructose (C6H12O6)

Mannose (C6H12O6)

  • Problem: unable to distinguish between metabolites with the same/similar mass-to-charge ratio (m/z).
  • Solution: additional separation of metabolites prior to MS.

Liquid chromatography

  • Sample is dissolved in a fluid (mobile phase).

Liquid chromatography

  • Sample is dissolved in a fluid (mobile phase).

  • Mobile phase carries analytes through column (stationary phase).

Liquid chromatography

  • Sample is dissolved in a fluid (mobile phase).

  • Mobile phase carries analytes through column (stationary phase).

  • Separation based on affinity for the column’s stationary phase.

Liquid chromatography

  • Sample is dissolved in a fluid (mobile phase).

  • Mobile phase carries analytes through column (stationary phase).

  • Separation based on affinity for the column’s stationary phase.

  • Commonly used: RPLC (Reversed Phase LC). HILIC (hyrophilic liquid interaction chromatography)

Mass Spectrometry (MS)

Liquid Chromatography Mass Spectrometry (LC-MS)

We gain an additional dimension:

  • retention time.

Liquid Chromatography Mass Spectrometry (LC-MS)

We gain an additional dimension:

  • retention time.

Liquid Chromatography Mass Spectrometry (LC-MS)

We gain an additional dimension:

  • retention time.

Liquid Chromatography Mass Spectrometry (LC-MS)

We gain an additional dimension:

  • retention time.

Liquid Chromatography Mass Spectrometry (LC-MS)

We gain an additional dimension:

  • retention time.

Liquid Chromatography Mass Spectrometry (LC-MS)

We gain an additional dimension:

  • retention time.

Liquid Chromatography Mass Spectrometry (LC-MS)

We gain an additional dimension:

  • retention time.
  • LC-MS: analyze data along retention time.

Liquid Chromatography Mass Spectrometry (LC-MS)

Liquid Chromatography Mass Spectrometry (LC-MS)

Signal measured along retention time (150-175 seconds).

Liquid Chromatography Mass Spectrometry (LC-MS)

Signal along rt (chromatogram) of an [M+Na]+ ion of C6H12O6.

Mass Spectrometry Data in R: An overview

Mass Spectrometry Data in R

  • Data stored in mzML files.
  • To load such data into R:

ms <- readMsExperiment(fl)

sps <- Spectra(fl, backend = MsBackendMzR())

LC-MS data analysis

  • Untargeted metabolomics (label-free proteomics).

  • LC-MS preprocessing:

LC-MS data analysis

  • Untargeted metabolomics (label-free proteomics).

  • LC-MS preprocessing:
    • chromatographic peak detection

findChromPeaks

LC-MS data analysis

  • Untargeted metabolomics (label-free proteomics).

  • LC-MS preprocessing:
    • chromatographic peak detection
    • alignment

adjustRtime

LC-MS data analysis

  • Untargeted metabolomics (label-free proteomics).

  • LC-MS preprocessing:
    • chromatographic peak detection
    • alignment
    • correspondence

groupChromPeaks

LC-MS data analysis

  • Untargeted metabolomics (label-free proteomics).

  • LC-MS preprocessing:
    • chromatographic peak detection
    • alignment
    • correspondence

featureValues

## DataFrame with 779 rows and 6 columns
##               mz        rt  sample_1  sample_2  sample_3  sample_4
##        <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
## FT001    326.378    25.409  4654.057  4814.874  4793.317  4734.862
## FT002    134.096    16.513   767.239   782.878   803.749   913.539
## FT003    134.096    19.089  1054.229  1159.738  1004.752  1044.921
## FT004    134.096    22.305   980.392  1007.868  1051.315   980.127
## FT005    134.097    25.409   525.633   579.745   645.511   594.787
## ...          ...       ...       ...       ...       ...       ...
## FTM933   441.298    24.896  28202.47  28263.68  28454.65  28219.58
## FTM934   447.345    14.577   1356.29   1432.32   1337.77   1391.90
## FTM935   495.265    12.009   3218.93   3303.66   3037.58   3279.49
## FTM936   501.383   137.340   7767.00   7854.74   7804.17   7927.17
## FTM937   612.404    28.094   1667.49   1658.53   1779.24   1671.64

LC-MS data analysis

  • Data exploration
  • Data normalization
  • Differential abundance analysis

R is a amazing tool for this kind of statistical analysis.

But wait - what are we actually measuring?

  • LC-MS features characterized by their m/z and retention time.
  • Goal: annotate these features to metabolites (compounds).
## DataFrame with 779 rows and 6 columns
##               mz        rt  sample_1  sample_2  sample_3  sample_4
##        <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
## FT001    326.378    25.409  4654.057  4814.874  4793.317  4734.862
## FT002    134.096    16.513   767.239   782.878   803.749   913.539
## FT003    134.096    19.089  1054.229  1159.738  1004.752  1044.921
## FT004    134.096    22.305   980.392  1007.868  1051.315   980.127
## FT005    134.097    25.409   525.633   579.745   645.511   594.787
## ...          ...       ...       ...       ...       ...       ...
## FTM933   441.298    24.896  28202.47  28263.68  28454.65  28219.58
## FTM934   447.345    14.577   1356.29   1432.32   1337.77   1391.90
## FTM935   495.265    12.009   3218.93   3303.66   3037.58   3279.49
## FTM936   501.383   137.340   7767.00   7854.74   7804.17   7927.17
## FTM937   612.404    28.094   1667.49   1658.53   1779.24   1671.64

One step back: ionization

name formula exactmass
Caffeine C8H10N4O2 194.1
  • Molecule not charged. Can not be detected with MS.

One step back: ionization

name formula exactmass [M+H]+
Caffeine C8H10N4O2 194.1 195.1
  • Molecule not charged. Can not be detected with MS.
  • Electro spray ionization: create ions

One step back: ionization

name formula exactmass [M+H]+ [M+Na]+
Caffeine C8H10N4O2 194.1 195.1 217.1
  • Molecule not charged. Can not be detected with MS.
  • Electro spray ionization: create ions, potentially multiple.

Annotation: using m/z values

name formula exactmass [M+H]+ [M+Na]+
Caffeine C8H10N4O2 194.1 195.1 217.1
  • Molecule not charged. Can not be detected with MS.
  • Electro spray ionization: create ions, potentially multiple.
  • Match measured m/z against these reference values.

mtch <- matchValues(query, target,
                    Mass2MzParam(c("[M+H]+", "[M+Na]+")))

Annotation: using m/z values

name formula exactmass [M+H]+ [M+Na]+
Caffeine C8H10N4O2 194.1 195.1 217.1
  • Molecule not charged. Can not be detected with MS.
  • Electro spray ionization: create ions, potentially multiple.
  • Match measured m/z against these reference values.

mtch <- matchValues(query, target,
                    Mass2MzParam(c("[M+H]+", "[M+Na]+")))
  • query: experimental m/z values.
  • target: reference masses (e.g. from HMDB, ChEBI, PubChem, …).

A little additional complication

name formula exactmass [M+H]+ [M+Na]+
Caffeine C8H10N4O2 194.1 195.1 217.1
Enprofylline C8H10N4O2 194.1 195.1 217.1
  • Enprofylline: asthma treatment agent.
  • Compounds can have same formula - how to distinguish?

A little additional complication

name formula exactmass [M+H]+ [M+Na]+
Caffeine C8H10N4O2 194.1 195.1 217.1
Enprofylline C8H10N4O2 194.1 195.1 217.1
  • Enprofylline: asthma treatment agent.
  • Compounds can have same formula - how to distinguish?
  • They differ by their structure. Thus will separate in the LC: -> different retention time.

Annotation: using m/z and retention time

  • Annotation using m/z and retention time:

mtch <- matchValues(query, target,
                    Mass2MzRtParam(c("[M+H]+", "[M+Na]+")))
  • Requires that we do have reference retention times available.
  • These are instrument set-up/lab-specific.

Annotation: using MS2 spectra

  • We can fragment ions to get some information on their structure.

Annotation: using MS2 spectra

  • We can fragment ions to get some information on their structure.

  • LC-MS/MS data: MS1 for quantification, MS2 for annotation.

Annotation: using MS2 spectra

  • If we have MS2 spectra associated to features, we can match them against reference spectra.
  • Calculate similarity scores and compare spectra.

Annotation: using MS2 spectra

  • If we have MS2 spectra associated to features, we can match them against reference spectra.
  • Calculate similarity scores and compare spectra.

mtch <- matchSpectra(query, target,
                     CompareSpectraParam())
  • Limitation: availability of reference spectra.
  • Public reference databases are growing, collecting data shared by researchers.

Workshops

Thank you for your attention