De novo sequencing of heparan sulfate saccharides using high-resolution tandem mass spectrometry
Heparan sulfate (HS) is a class of linear, sulfated polysaccharides located on cell surface, secretory granules, and in extracellular matrices found in all animal organ systems. It consists of alternately repeating disaccharide units, expressed in animal species ranging from hydra to higher vertebrates including humans. HS binds and mediates the biological activities of over 300 proteins, including growth factors, enzymes, chemokines, cytokines, adhesion and structural proteins, lipoproteins and amyloid proteins. The binding events largely depend on the fine structure - the arrangement of sulfate groups and other variations - on HS chains. With the activated electron dissociation (ExD) high-resolution tandem mass spectrometry technique, researchers acquire rich structural information about the HS molecule. Using this technique, covalent bonds of the HS oligosaccharide ions are dissociated in the mass spectrometer. However, this information is complex, owing to the large number of product ions, and contains a degree of ambiguity due to the overlapping of product ion masses and lability of sulfate groups; as a result, there is a serious barrier to manual interpretation of the spectra. The interpretation of such data creates a serious bottleneck to the understanding of the biological roles of HS. In order to solve this problem, I designed HS-SEQ - the first HS sequencing algorithm using high-resolution tandem mass spectrometry. HS-SEQ allows rapid and confident sequencing of HS chains from millions of candidate structures and I validated its performance using multiple known pure standards. In many cases, HS oligosaccharides exist as mixtures of sulfation positional isomers. I therefore designed MULTI-HS-SEQ, an extended version of HS-SEQ targeting spectra coming from more than one HS sequence. I also developed several pre-processing and post-processing modules to support the automatic identification of HS structure. These methods and tools demonstrated the capacity for large-scale HS sequencing, which should contribute to clarifying the rich information encoded by HS chains as well as developing tailored HS drugs to target a wide spectrum of diseases.