Structured clustering representations and methods
Heilbut, Adrian Mark
MetadataShow full item record
Rather than designing focused experiments to test individual hypotheses, scientists now commonly acquire measurements using massively parallel techniques, for post hoc interrogation. The resulting data is both high-dimensional and structured, in that observed variables are grouped and ordered into related subspaces, reflecting both natural physical organization and factorial experimental designs. Such structure encodes critical constraints and clues to interpretation, but typical unsupervised learning methods assume exchangeability and fail to account adequately for the structure of data in a flexible and interpretable way. In this thesis, I develop computational methods for exploratory analysis of structured high-dimensional data, and apply them to study gene expression regulation in Parkinson’s (PD) and Huntington’s diseases (HD). BOMBASTIC (Block-Organized, Model-Based, Tree-Indexed Clustering) is a methodology to cluster and visualize data organized in pre-specified subspaces, by combining independent clusterings of blocks into hierarchies. BOMBASTIC provides a formal specification of the block-clustering problem and a modular implementation that facilitates integration, visualization, and comparison of diverse datasets and rapid exploration of alternative analyses. These tools, along with standard methods, were applied to study gene expression in mouse models of neurodegenerative diseases, in collaboration with Dr. Myriam Heiman and Dr. Robert Fenster. In PD, I analyzed cell-type-specific expression following levodopa treatment to study mechanisms underlying levodopa-induced dyskinesia (LID). I identified likely regulators of the transcriptional changes leading to LID and implicated signaling pathways amenable to pharmacological modulation (Heiman, Heilbut et al, 2014). In HD, I analyzed multiple mouse models (Kuhn, 2007), cell-type specific profiles of medium spiny neurons (Fenster, 2011), and an RNA-Seq dataset profiling multiple tissue types over time and across an mHTT allelic series (CHDI, 2015). I found evidence suggesting that altered activity of the PRC2 complex significantly contributes to the transcriptional dysregulation observed in striatal neurons in HD.