Bayesian regression for network data
Upton, Elizabeth LeightonMary
MetadataShow full item record
The research contained in this dissertation extends modeling methods for network data. Networks are widely used, across a number of disciplines, to represent objects and their interconnectedness. The prevalence of this data structure outlines just one of our motivations for developing novel modeling methods and computational tools that improve our understanding of network-indexed data. We first consider the problem of statistical inference and prediction for processes defined on networks. We assume that the network of interest is known, and we would like to learn more about an attribute associated with its vertices. Drawing on ideas from functional data analysis, our proposed model consists of node indexed predictors and a basis expansion of their coefficients, allowing the coefficients to vary over the network. We employ a regularization procedure, cast as a prior distribution on the regression coefficients in a Bayesian setup, so that predicted responses vary smoothly according to the topology of the network. We present a novel variable selection technique, introduce efficient expectation-maximization fitting algorithms and Markov Chain Monte Carlo sampling schemes, and provide computationally-friendly methods for eliciting hyper-prior parameters. Turning to an application, we study occurrences of residential burglary in Boston, Massachusetts. Noting that crime rates are not spatially homogeneous, and that rates appear to vary sharply across regions or hot zones in the city, we construct a hierarchical model that addresses these issues and gives insight into the spatial patterns and dynamics of residential burglary in Boston. Finally, we address the computational challenges of performing inference on network structure. With the goal of understanding the processes behind edge formulation within a network of given size, we present algorithms and data representations that allow for more efficient inference on large-scale networks. Through a regression framework, the tools allow for investigating a variety of effects that may shape a network's structure, such as degree heterogeneity and clustering. We illustrate and evaluate the benefits of our work on both simulated and real-world networks. Finally, with the goal of exploring the relationship between a set of predictor variables and a vertex-pair indexed response, we introduce a flexible approach to modeling network ties. Through a generalized linear model framework, we are able to model weighted and binary edges while investigating a variety of effects or features commonly found in networks. We present algorithms and data representations that allow for efficient inference, and we illustrate and evaluate the benefits of our work on both simulated and real-world networks.