Methods in automated glycosaminoglycan tandem mass spectra analysis
OA Version
Citation
Abstract
Glycosylation is the process by which a glycan is enzymatically attached to a protein, and is one of the most common post-translational modifications in nature. One class of glycans is the glycosaminoglycans (GAGs), which are long, linear polysaccharides that are variably sulfated and make up the glycan portion of proteoglycans (PGs). PGs are located on the cellular surface and in the extracellular matrix (ECM), making them important molecules for cell signaling and ligand binding. The GAG sulfation sequence is a determining factor for the signaling capacity of binding complexes, so accurate determination of the sequence is critical. Historically, GAG sequencing using tandem mass spectrometry (MS2) has been a difficult, manual process; however, with the advent of faster computational techniques and higher-resolution MS2, high-throughput GAG sequencing is within reach.
Two steps in the pipeline of biomolecule sequencing using MS2 are discovery and interpretation of spectral peaks. The discovery step traditionally is performed using methods that rely on the concept of averagine, or the average molecular building block for the analyte in question. These methods were developed for protein sequencing, but perform considerably worse on GAG sequences, due to the non-uniform distribution of sulfur atoms along the chain and the relatively high isotope abundance of 34S. The interpretation step traditionally is performed manually, which takes time and introduces potential user error. To combat these problems, I developed GAGfinder, the first GAG-specific MS2 peak finding and annotation software. GAGfinder is described in detail in chapter two.
Another step in MS2 sequencing is the determination of the sequence using the found MS2 fragments. For a given GAG composition, there are many possible sequences, and peak finding algorithms such as GAGfinder return a list of the peaks in the MS2 mass spectrum. The many-to-many relationship between sequences and fragments can be represented using a bipartite network, and node-ranking techniques can be employed to generate likelihood scores for possible sequences. I developed a bipartite network-based sequencing tool, GAGrank, based on a bipartite network extension of Google’s PageRank algorithm for ranking websites. GAGrank is described in detail in chapter three.
Description
License
Attribution-NonCommercial 4.0 International