Show simple item record

dc.contributor.advisorKepler, Thomas B.en_US
dc.contributor.authorNorwood, Katherine Francesen_US
dc.date.accessioned2021-01-13T14:23:49Z
dc.date.available2021-01-13T14:23:49Z
dc.date.issued2020
dc.identifier.urihttps://hdl.handle.net/2144/41875
dc.description.abstractInvestigating the immunoglobulin repertoire is a means of understanding the adaptive immune response to infectious disease or vaccine challenge. The data examined are typically generated using high-throughput sequencing on samples of immunoglobulin variable-region genes present in blood or tissue collected from human or animal subjects. The analysis of these large, diverse collections provides a means of gaining insight into the specific molecular mechanisms involved in generating and maintaining a protective immune response. It involves the characterization of distinct clonal populations, specifically through the inference of founding alleles for germline gene segment recombination, as well as the lineage of accumulated mutations acquired during the development of each clone. Germline gene segment inference is currently performed by aligning immunoglobulin sequencing reads against an external reference database and assigning each read to the entry that provides the best score according to the metric used. The problem with this approach is that allelic diversity is greater than can be usefully accommodated in a static database. The absence of the alleles used from the database often leads to the misclassification of single-nucleotide polymorphisms as somatic mutations acquired during affinity maturation. This trend is especially evident with the rhesus macaque, but also affects the comparatively well-catalogued human databases, whose collections are biased towards samples from individuals of European descent. Our project presents novel statistical methods for immunoglobulin repertoire analysis which allow for the de novo inference of germline gene segment libraries directly from next-generation sequencing data, without the need for external reference databases. These methods follow a Bayesian paradigm, which uses an information-theoretic modelling approach to iteratively improve upon internal candidate gene segment libraries. Both candidate libraries and trial analyses given those libraries are incorporated as components of the machine learning evaluation procedure, allowing for the simultaneous optimization of model accuracy and simplicity. Finally, the proposed methods are evaluated using synthetic data designed to mimic known mechanisms for repertoire generation, with pre-designated parameters. We also apply these methods to known biological sources with unknown repertoire generation parameters, and conclude with a discussion on how this method can be used to identify potential novel alleles.en_US
dc.language.isoen_US
dc.rightsAttribution-NonCommercial-ShareAlike 4.0 Internationalen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/
dc.subjectBioinformaticsen_US
dc.subjectBayesianen_US
dc.subjectDirichleten_US
dc.subjectImmunoglobulinen_US
dc.subjectRepertoireen_US
dc.titleStatistical methods & algorithms for autonomous immunoglobulin repertoire analysisen_US
dc.typeThesis/Dissertationen_US
dc.date.updated2021-01-13T05:03:06Z
etd.degree.nameDoctor of Philosophyen_US
etd.degree.leveldoctoralen_US
etd.degree.disciplineBioinformatics GRSen_US
etd.degree.grantorBoston Universityen_US
dc.identifier.orcid0000-0001-7505-6293


This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-ShareAlike 4.0 International
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike 4.0 International