Boston University Libraries OpenBU
    JavaScript is disabled for your browser. Some features of this site may not work without it.
    View Item 
    •   OpenBU
    • Theses & Dissertations
    • Boston University Theses & Dissertations
    • View Item
    •   OpenBU
    • Theses & Dissertations
    • Boston University Theses & Dissertations
    • View Item

    Statistical methods & algorithms for autonomous immunoglobulin repertoire analysis

    Thumbnail
    License
    Attribution-NonCommercial-ShareAlike 4.0 International
    Date Issued
    2020
    Author(s)
    Norwood, Katherine Frances
    Share to FacebookShare to TwitterShare by Email
    Export Citation
    Download to BibTex
    Download to EndNote/RefMan (RIS)
    Metadata
    Show full item record
    Permanent Link
    https://hdl.handle.net/2144/41875
    Abstract
    Investigating the immunoglobulin repertoire is a means of understanding the adaptive immune response to infectious disease or vaccine challenge. The data examined are typically generated using high-throughput sequencing on samples of immunoglobulin variable-region genes present in blood or tissue collected from human or animal subjects. The analysis of these large, diverse collections provides a means of gaining insight into the specific molecular mechanisms involved in generating and maintaining a protective immune response. It involves the characterization of distinct clonal populations, specifically through the inference of founding alleles for germline gene segment recombination, as well as the lineage of accumulated mutations acquired during the development of each clone. Germline gene segment inference is currently performed by aligning immunoglobulin sequencing reads against an external reference database and assigning each read to the entry that provides the best score according to the metric used. The problem with this approach is that allelic diversity is greater than can be usefully accommodated in a static database. The absence of the alleles used from the database often leads to the misclassification of single-nucleotide polymorphisms as somatic mutations acquired during affinity maturation. This trend is especially evident with the rhesus macaque, but also affects the comparatively well-catalogued human databases, whose collections are biased towards samples from individuals of European descent. Our project presents novel statistical methods for immunoglobulin repertoire analysis which allow for the de novo inference of germline gene segment libraries directly from next-generation sequencing data, without the need for external reference databases. These methods follow a Bayesian paradigm, which uses an information-theoretic modelling approach to iteratively improve upon internal candidate gene segment libraries. Both candidate libraries and trial analyses given those libraries are incorporated as components of the machine learning evaluation procedure, allowing for the simultaneous optimization of model accuracy and simplicity. Finally, the proposed methods are evaluated using synthetic data designed to mimic known mechanisms for repertoire generation, with pre-designated parameters. We also apply these methods to known biological sources with unknown repertoire generation parameters, and conclude with a discussion on how this method can be used to identify potential novel alleles.
    Rights
    Attribution-NonCommercial-ShareAlike 4.0 International
    Collections
    • Boston University Theses & Dissertations [6891]


    Boston University
    Contact Us | Send Feedback | Help
     

     

    Browse

    All of OpenBUCommunities & CollectionsIssue DateAuthorsTitlesSubjectsThis CollectionIssue DateAuthorsTitlesSubjects

    Deposit Materials

    LoginNon-BU Registration

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Boston University
    Contact Us | Send Feedback | Help