Show simple item record

dc.contributor.advisorBenson, Garyen_US
dc.contributor.authorEslami Rasekh, Marziehen_US
dc.date.accessioned2021-10-14T15:17:14Z
dc.date.available2021-10-14T15:17:14Z
dc.date.issued2021
dc.identifier.urihttps://hdl.handle.net/2144/43144
dc.description.abstractOver half the human genome consists of repetitive sequences. One major class is the tandem repeats (TRs), which are defined by their location in the genome, repeat unit, and copy number. TRs loci that exhibit variant copy numbers are called Variable Number Tandem Repeats (VNTRs). High VNTR mutation rates of approximately 0.0001 per generation make them suitable for forensic studies, and of interest for potential roles in gene regulation and disease. TRs are generally divided into three classes: 1) microsatellites or short tandem repeats (STRs) with patterns <7 bp; 2) minisatellites with patterns of seven to hundreds of base pairs; and 3) macrosatellites with patterns of >100 bp. To date, mini- and macrosatellites have been poorly characterized, mainly due to a lack of computational tools. In this thesis, I utilize a tool, VNTRseek, to identify human minisatellite VNTRs using short-read sequencing data from nearly 2,800 individuals and developed a new computational tool, MaSUD, to identify human macrosatellite VNTRs using data from 2,504 individuals. MaSUD is the first high-throughput tool to genotype macrosatellites using short reads. I identified over 35,000 minisatellite VNTRs and over 4,000 macrosatellite VNTRs, most previously unknown. A small subset in each VNTR class was validated experimentally and in silico. The detected VNTRs were further studied for their effects on gene expression, ability to distinguish human populations, and functional enrichment. Unlike STRs, mini- and macrosatellite VNTRs are enriched in regions with functional importance, e.g., introns, promoters, and transcription factor binding sites. A study of VNTRs across 26 populations shows that minisatellite VNTR genotypes can be used to predict super-populations with >90% accuracy. In addition, genotypes for 195 minisatellite VNTRs and 22 macrosatellite VNTRs were shown to be associated with differential expression in nearby genes (eQTLs). Finally, I developed a computational tool, mlZ, to infer undetected VNTR alleles and to detect false positive predictions. mlZ is applicable to other tools that use read support for predicting short variants. Overall, these studies provide the most comprehensive analysis of mini- and macrosatellites in human populations and will facilitate the application of VNTRs for clinical purposes.en_US
dc.language.isoen_US
dc.rightsAttribution-NonCommercial 4.0 Internationalen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/
dc.subjectBioinformaticsen_US
dc.subjectGenomic variationen_US
dc.subjectGenotypingen_US
dc.subjectMacrosatelliteen_US
dc.subjectMinisatelliteen_US
dc.subjectTandem repeatsen_US
dc.subjectVNTRen_US
dc.titleCharacterizing VNTRs in human populationsen_US
dc.typeThesis/Dissertationen_US
dc.date.updated2021-10-04T19:30:02Z
etd.degree.nameDoctor of Philosophyen_US
etd.degree.leveldoctoralen_US
etd.degree.disciplineBioinformatics GRSen_US
etd.degree.grantorBoston Universityen_US
dc.identifier.orcid0000-0003-0046-158X


This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial 4.0 International
Except where otherwise noted, this item's license is described as Attribution-NonCommercial 4.0 International