Show simple item record

dc.contributor.authorManimaran, Solaiappanen_US
dc.date.accessioned2017-03-21T17:54:07Z
dc.date.available2017-03-21T17:54:07Z
dc.date.issued2017
dc.identifier.urihttps://hdl.handle.net/2144/20879
dc.description.abstractThere has been tremendous advancement in sequencing technologies; the rate at which sequencing data can be generated has increased multifold while the cost of sequencing continues on a downward descent. Sequencing data provide novel insights into the ecological environment of microbes as well as human health and disease status but challenge investigators with a variety of computational issues. This thesis focuses on three common problems in the analysis of high-throughput data. The goals of the first project are to (1) develop a statistical framework and a complete software pipeline for metagenomics that identifies microbes to the strain level and thus facilitating a personalized drug treatment targeting the strain; and (2) estimate the relative content of microbes in a sample as accurately and as quickly as possible. The second project focuses on the analysis of the microbiome variation across multiple samples. Studying the variation of microbiomes under different conditions within an organism or environment is the key to diagnosing diseases and providing personalized treatments. The goals are to (1) identify various statistical diversity measures; (2) develop confidence regions for the relative abundance estimates; (3) perform multi-dimensional and differential expression analysis; and (4) develop a complete pipeline for multi-sample microbiome analysis. The third project is focused on batch effect analysis. When analyzing high dimensional data, non-biological experimental variation or “batch effects” confound the true associations between the conditions of interest and the outcome variable. Batch effects exist even after normalization. Hence, unless the batch effects are identified and corrected, any attempts for downstream analyses, will likely be error prone and may lead to false positive results. The goals are to (1) analyze the effect of correlation of the batch adjusted data and develop new techniques to account for correlation in two step hypothesis testing approach; (2) develop a software pipeline to identify whether batch effects are present in the data and adjust for batch effects in a suitable way. In summary, we developed software pipelines called PathoScope, PathoStat and BatchQC as part of these projects and validated our techniques using simulation and real data sets.en_US
dc.language.isoen_US
dc.subjectBiostatisticsen_US
dc.subjectBatch Effects BatchQC R packageen_US
dc.subjectBayesian modelingen_US
dc.subjectDiagnostics and personalized medicineen_US
dc.subjectMetagenomics PathoScope softwareen_US
dc.subjectMicrobiome analysis PathoStaten_US
dc.subjectNext-gen sequencing cancer biomarkersen_US
dc.titleStatistical methods for analyzing sequencing data with applications in modern biomedical analysis and personalized medicineen_US
dc.typeThesis/Dissertationen_US
dc.date.updated2017-03-13T22:08:40Z
etd.degree.nameDoctor of Philosophyen_US
etd.degree.leveldoctoralen_US
etd.degree.disciplineBiostatisticsen_US
etd.degree.grantorBoston Universityen_US


This item appears in the following Collection(s)

Show simple item record