Welcome To OpenBU

OpenBU is Boston University’s digital institutional repository for scholarly articles, theses and dissertations, preprints, and grey literature. This repository enables BU researchers to share, disseminate, and preserve their scholarship, and makes their research more accessible
If you are looking for information on BU's opt-out open access policy, please visit the BU Open Access Policy page.
 

Recent Submissions

Item
Essays on econometrics and development economics
(2024) Tian, Xunkang; Kaido, Hiroaki
This dissertation discusses econometrics and development economics, blending advanced methodologies to address distinct yet interconnected issues across these fields. In the first chapter, I utilize the Bayesian inference framework to explore the determinants of pairwise stable network formation in rural Indian villages, highlighting the influence of financial accessibility on social relationships. In the second chapter, I introduce an econometric approach for estimating partially identified parameters in models with moment inequalities, showcasing its application through examples from the US vehicle market and hospital referral models. In the third chapter, a novel method is presented for inferring social network structures using Aggregate Relational Data, offering a cost-effective strategy for network analysis with limited detailed data. Each chapter contributes to understanding complex economic and social networks, with implications for policy and further research.
Item
Understanding accretion variability in young, low-mass stars through multi-epoch, multi-wavelength observations
(2024) Wendeborn, John Carlos; Espaillat, Catherine C.
Young, low-mass stars, many of which are precursors to our own solar system, may hold the key to understanding how stellar systems throughout the galaxy form. These systems possess a large, gas- and dust-rich disk from within which planets form out of disk material. An important part of the disk evolution is accretion, where diskmaterial is funneled onto the star. Some systems, known as FUors, have undergone enormous, sudden bursts of accretion, brightening by a factor of 1000 within months. I present evidence of variability at 2.7 mm in only the third FUor system, which I attribute to free-free emission produced by recent accretion. I show that this millimeter emission may lead to overestimated disk masses which complicates our understanding of how mass is accreted early during star formation. I then present the largest multi-wavelength, multi-epoch monitoring campaign to date of four Classical T Tauri Stars (CTTSs). Using accretion shock modeling and Hubble Space Telescope (HST ) UV spectra, I show that accretion in these systems is highly time variable, varying by up to a factor of 5 within several days. I also demonstrate that UV emission lines are a poor direct tracer of accretion in CTTSs. Next, I compare the above accretion with contemporaneous uBgVriz and Transiting Exoplanet Survey Satellite (TESS ) light curves. I show that while the connection between accretion and photometry is strong, it varies from target to target and photometry should be used cautiously as a direct measurement of accretion. I also use these light curves to measure color variability, periodicity. Finally, I use contemporaneous optical spectra and an accretion flow model to further estimate accretion properties, including magnetospheric geometries. Overall, this flow model is able to recover accretion rates typical of each system, but fails to reproduce variability as seen by our shock modeling and photometry. In most cases, it estimates Rin, the innermost truncation radius of the disk, between 2.5–4 R⋆, less than the typically assumed 5 R⋆. This work shows that young stars, specifically FUors and CTTSs are highly variable on all timescales and at many wavelengths. To best understand accretion, a critical process in star formation, future studies of young stars should be contemporaneous multi-wavelength, multi-epoch campaigns.
Item
Methods for reproducible evaluation of transcriptomic biomarkers in tuberculosis
(2024) Wang, Xutao; Johnson, W. Evan; Patil, Prasad
In the wake of the transition of COVID-19 from a pandemic to endemic problem, tuberculosis (TB) has reemerged as the most common cause of mortality world- wide due to an infectious disease. To address diagnostic challenges in TB, re- searchers have developed blood-based gene expression signatures/biomarkers over the past decade. While these signatures show promise for point-of-care testing, further research is necessary to establish their efficacy and reproducibility in clin- ical settings. In this dissertation, I developed curatedTBData, an R package comprising over 49 curated transcriptomic datasets related to TB host biomark- ers. This resource facilitates meta-analysis alleviating the challenges associated with harmonizing heterogeneous studies. Additionally, I conducted a compre- hensive study of multiple gene set scoring methods, establishing their statistical equivalence as reliable computational tools for comparing TB biomarkers. I pro- posed a signature-splitting strategy to mitigate limitations introduced by certain gene set scoring methods. Lastly, I developed a novel multi-study adaptive learn- ing framework, an ensemble method for improving predictive ability by averaging predictions from multiple biomarkers. I showed the benefits of this framework by comparing with other methods and derived its theoretical behavior in the context of using the Naïve Bayes classifier. The findings presented in this dissertation of- fer an extensive resource for various TB biomarker research questions and address issues of reproducibility in biomarker utilization. Furthermore, this dissertation provides insights for working with heterogeneous datasets, with applications to enhance the generalizability of biomarkers across different cohorts.
Item
Forecasting the soil microbiome
(2024) Werbin, Zoey; Bhatnagar, Jennifer M
The soil microbiome provides vital ecosystem services ranging from food provisioning to carbon sequestration, but we have limited ability to predict the shifts in composition and function of microbial communities that result from global change. My dissertation tests the overarching hypothesis that we can predict changes in microbial composition and function before they occur, using environmental and genomic information collected from soil microorganisms over the past two centuries. To accomplish this, I leveraged paired genomic and environmental data from the National Ecological Observatory Network (NEON), encompassing thousands of soil samples across the continental U.S., as well as genome-based analyses of potential microbial activity.I began by constructing the first ecological forecasts of soil microorganisms, evaluating our ability to use current knowledge about soil systems to predict the composition of the soil microbiome at new places and into the future. I created Bayesian statistical models for 173 soil microbial taxonomic and functional groups, which showed that grouping taxa by ecological function improved predictability at the landscape scale, with higher predictability overall for soil bacteria. Forecasts also revealed ubiquitous seasonal cycles in microbial abundances. Next, I developed an analysis pipeline for NEON soil metagenomes to quantify nitrogen cycle pathway gene abundances across samples and efficiently assemble short sequencing reads into bacterial genomes. Finally, I showed that modeling the metabolism of individual soil organisms could be used to predict nitrogen transformation rates across biomes. I evaluated spatiotemporal and non- spatiotemporal approaches for modeling flux rates from microbial communities including 67 key taxa and found that spatiotemporal community modeling improved predictions of nitrification and ammonification rates, compared to models including commonly measured soil physical, chemical, and microbial variables. I also showed that predicted ratios of nitrification and mineralization rates were roughly consistent with Earth System Model outputs. Taken together, this dissertation advances our understanding of the processes shaping the soil microbiome from the scale of individual molecules to entire landscapes, and how we may use this understanding to anticipate changes at a broad scale.
Item
Two variants of Kleene Algebra and their applications
(2024) Zhang, Cheng; Gaboardi, Marco
Kleene Algebra (KA) is an equational system celebrated for its decidability andcompleteness with respect to regular language equalities. Because of the desirable properties of Kleene Algebra, numerous extensions were developed to reason about network system [45, 48, 60], concurrent programs [38, 61, 57], probabilistic sys- tems [32, 44], relational verification [76], and program schematology [24]. In this thesis, we focus on two variants of Kleene Algebra with real-world applications. The first system, Kleene Algebra with tests and top (TopKAT), was developed to perform domain and reachablity reasoning. We showed the conventional extension of Kleene Algebra with tests, despite able to encode Hoare logic, is inadequate for domain reasoning. This leads to our development of TopKAT, which is complete for domain reasoning. TopKAT was able to soundly encode both propositional incor- rectness and Hoare logic [65, 7], offering better complexity bound than alternative frameworks [69, 79]. Our completeness proof for TopKAT relies heavily on a tech- nique called reduction, we showed that the reduction from TopKAT to KAT satisfy nice properties that enable us to generate complete interpretations for TopKAT for free, and also gives us a complete decision procedure with minimal effort. The second system, control-flow Guarded Kleene Algebra with Tests (CF-GKAT) verifies control-flow transformations. Guarded Kleene Algebra with Tests [67] pro- vides a robust equational system that is not only sound and complete with respect to trace equivalence, but also enjoys an efficient decision procedure. Yet, GKAT remains insufficient as a system to verify several well-known control-flow algorithms [13, 52, 37], because it lacks important control-flow structures like indicator variable and non- local control-flow structures like break, return, and goto. To obtain CF-GKAT, we extended the syntax and semantics of GKAT to incorporate these essential features. We have developed an efficient decision procedure for CF-GKAT program utilizing CF-GKAT automata, an automata model that closely emulates CF-GKAT programs, and can be efficiently lowered into GKAT automata. Furthermore, this decision pro- cedure is sound and complete: the algorithm will output true if and only if the two input programs are trace equivalent.
Item
Understanding visual and linguistic content by structured representations
(2024) Zhang, Zhongping; Plummer, Bryan A.
Understanding visual and linguistic content is essential in different vision and language tasks (e.g., text-to-image editing, article comprehension). While large-scale models like the GPT-Series or Stable Diffusion exhibit impressive performance in these areas, their effectiveness in certain domains can be limited due to an inaccurate or incomplete understanding of input content. For instance, in text-to-image editing, state-of-the-art models (e.g., Imagic, Dreambooth) typically interpret input images in a global manner. In this case, they may fail to fully capture the complex details of scenes involving multiple objects and their relationships. To spur research in this field, this dissertation explores the use of structured representations in visual and linguistic content, aiming for a comprehensive and precise understanding of semantic information. We first introduce scene-graph representations and attribute-excluded features to text-to-image editing, where scene graphs can indicate multiple object interactions and attribute-excluded features can disentangle desired content from text-irrelevant content. We then discuss the applications of structured representations in comprehending linguistic content. We show that splitting long articles into simpler segments (e.g., metadata, named entities, paragraphs) and processing articles based on these segments helps our models analyze articles more accurately. In addition, we present structured representations in broader applications including video-based movie genre analysis and fashion compatibility prediction. To summarize, this work discusses the use of structured representations to understand visual and linguistic content. When directly understanding the entire input is challenging, segmenting the input into simpler parts and processing these parts hierarchically can yield improved performance. The enhancements achieved through structured representations demonstrate their value across multiple applications, making this field worth further exploration.
Item
Evaluation of multivariate longitudinal data accounting for missingness: methods and applications
(2024) Wang, Xuzhi; Liu, Chunyu
Missing data are frequently encountered in biomedical research, especially in longitudinal studies. Multiple imputation (MI) is widely used to handle missing data due to missing at random (MAR). Two-stage MI is a flexible method that accounts for two types of missing data in a two-step process, allowing for diverse assumptions regarding missing mechanisms, such as MAR and missing not at random (MNAR). This method has immense potential, but its current application and extension are limited. Joint models provide another framework to address MNAR by simultaneously modeling both longitudinal and missingness processes. Joint models have been implemented in longitudinal studies for dementia progression to handle missing data due to MNAR, such as informative dropout due to dementia or death. Nonlinear mixed-effects models with latent time shifts are proposed to investigate long-term dementia progression. However, few studies incorporate these models into joint models to handle informative dropout. Furthermore, joint models with changepoints are proposed to identify the acceleration of cognitive decline before dementia onset, while accounting for informative dropout. Nevertheless, few joint models with changepoints consider semi-competing risks by distinguishing transitions between various health states. To address these knowledge gaps, this dissertation focuses on the methods and applications for handling missingness data in multivariate longitudinal data. This focus is reflected in three distinct projects. In project 1, we evaluate the performance of two-stage MI in a novel context. Specifically, we impute a longitudinal composite variable for cardiovascular health constructed from several continuous and binary components, while handling missing data due to MAR and MNAR. In project 2, we propose a joint model for cognitive decline that incorporates a multivariate nonlinear mixed-effects model with latent time shifts. We investigate different association structures between the longitudinal and missingness processes across various simulation settings. We also compare the proposed joint model with separate models that ignore the association between the longitudinal and missingness processes. In project 3, we propose a joint model that accounts for both changepoints and semi-competing risks by combining a multivariate random changepoint model for cognitive decline with an illness-death model for estimating health state transitions. We examine the proposed model with various types of random changepoint formulations and association structures. Overall, these projects provide insights into assessing cardiovascular and cognitive health in the presence of missingness.
Item
Sound affects: remaking Taiwan through traditional religious practices
(2024) Tischer, Jacob F.; Weller, Robert P.
This dissertation problematizes the notion of “tradition” at the intersection of religion, gender, and nation-building in Taiwan through an ethnographic investigation of popular religion. Drawing on data collected during nineteen months in 2017 and 2019-20, I trace how younger generations of Taiwanese take ownership of religious practices, which are typically framed as traditional, and adapt them to a changing, modern environment. Having access to the means of producing cultural value enables these religiously active individuals to develop a sense of affective belonging to Taiwan as their geographical, political, and culturally grounded home.Theoretically, I engage popular-religious practices as forms of building affective relations both with the divine and with social others. This relation-building dynamic relies on the mediational capacities of material things to substantiate the co-presence of divine beings. My interlocutors, members of two voluntary religious associations, seek to actively foster such co-presence by building ongoing relationships with the divine, using the human body as an interface. I interpret embodiment as animating divine co-presence, that is, rendering it experientially real through a combination of spirit possession, playing music, and carrying items that contain divinity, such as sedan chairs. The dissertation unfolds the concept of mediation to highlight the affective and aesthetic components of such embodied practices. The ability of material things to repeatedly mediate between divine intervention and its social affects renders them indexical or iconic signs laden with emotional significance. Having worked with religious musicians in Taipei, I pay particular attention to sound as a powerful but variable material index. While in the context of traditional religious festivals, music and the sight of animated divine beings produce a desired atmosphere and mediate affective place-making among participants, the same activities come to be perceived as unwanted noise in other settings. Beyond their sonic imprint, religious practices often sustain gendered prejudices even while mediating reinterpretations of gender on the basis of the increasing diversification of gender expressions and sexual orientations in Taiwan. Religious opera offers a variety of gendered role archetypes through which my young interlocutors express—or animate—unconventional gender identities in negotiation with their social environment. Traditional religion thus shows itself adaptable to social change. In a case study, I investigate how the popular goddess Mazu becomes an intermediary adopted by a new generation of Taiwanese worshipers, a queer icon who embodies their unique contributions to the ongoing project of Taiwanese nation-building. While traditional practices like Mazu pilgrimages symbolically invoke a discourse of historic depth, in actual practice they instead take on a decidedly contemporary and Taiwanese face. 
Item
Grounding language in image, video and audio modalities
(2024) Tan, Reuben; Saenko, Kate; Plummer, Bryan A.
The sub-field of multimodal learning in artificial intelligence involves the problem of jointly learning representations of the audio, language and visual modalities for the purpose of automated applications including virtual assistants, chatbots and robotics. In particular, learning a multimodal representation that facilitates grounding of language in multiple modalities is of paramount importance in AI applications such as video captioning and embodied navigation. Similar to how humans communicate using natural language, endowing machines with the ability to perform grounded language reasoning can help them to interact more effectively with diverse and complex real-world scenarios. In light of this, this thesis addresses open challenges in language grounding across three different axes, such as learning effective multimodal representations from noisy data as well as pretrained unimodal representations, spatiotemporal grounding in videos and grounding language in a new modality that is unseen during training. An abstract formulation of grounding language in other modalities involves encoding data samples from different modalities into a common latent space. Along the first theme of grounding language in images, multimodal approaches often assume that the language modality has a literal relation to other modalities but this is not realistic in domains such as news and Wikipedia. To address this, we introduce an approach that uses a pretrained vision-language model (VLM) to associate the illustrative relationships of multiple complementary images with different parts of long text sequences. In domains where the language modality has a literal relation to data from other modalities, conventional approaches often learn multimodal representations by learning transformation functions to project pretrained unimodal representations but they fail to encode fine-grained context between different modalities. As such, we introduce our work along the second theme of grounding language in videos and audios. We propose a novel language-conditioned visual graph to hierarchically fusecontextual information between video and language representations to address the task of weakly-supervised video moment retrieval. While the abovementioned approaches are effective, their learnt multimodal representations are often global in nature, where each data sample is encoded as a single vector. Representing each video as a single vector makes it challenging to ground language spatiotemporally within a video clip. In light of this, we propose two approaches to learn multimodal representations for grounding language spatiotemporally not only in videos but their corresponding audios as well. Additionally, the problem of encoding a video as a single vector is compounded as the length of the video increases. This results in a loss of fine-grained visual information that is important for grounding language effectively. Thus, we adapt pretrained multimodal-Large Language models (LLMs) to ground language in long temporal and visual information holistically for long-form video question answering. Finally, we briefly transition from learning multimodal representations that ground language in static visual scenes to the third theme of active perception to identify interesting directions for future work. The open-ended generative capability of multimodal-LLMs provides a promising avenue for grounding language in dynamic and evolving scenes. Existing multimodal approaches that ground language in action policies for interacting with their environment often learn these policies by relying on fine-grained step-by-step annotations for training supervision, which limits their generalizability to unseen tasks and environments. Given the demonstrated effectiveness of in-context learning, we begin by studying the effectiveness of using pretrained LLMs for high-level task planning under the few-shot setting with a few examples from the train set. Based on the preliminary results as well as the observations in our previous work, we wrap up this thesis by drawing insights to lay the grounds for future directions in research on multimodal learning and video understanding.
Item
Computationally secure quantum cryptography without one-way functions
(2024) Qian, Luowen; Canetti, Ran
Most interesting cryptography happens in the computational realm where all parties can only perform efficient computations. On one hand, without this constraint many practically relevant cryptographic tasks are infeasible; on the other, this constraint still captures the power of any physical attacker with limited resources. Classically, if the task of attacking almost any interesting computational cryptography is provably hard, then we would also derive from it the existence of one-way functions, and thus P ≠ NP as well. Quantum mechanics offers a more complete model of physical information in the real world. Recent works have proved oracle separations (Kretschmer’21; Kretschmer, Qian, Sinha, Tal’23; Lombardi, Ma, Wright’23) suggesting that computationally secure quantum cryptography could exist even if P = NP, or in other words, that classical cryptography is infeasible. This hints at the possibility of constructing a quantum cryptosystem and establishing its security without unproven assumptions, liberating cryptographers from the perpetual cat-and-mousegame. In this dissertation, we make initial progress towards this goal by developing a theory of quantum computational cryptography without assuming the existence ofone-way functions. More specifically, we establish the following results: • We introduce a simple and natural quantum cryptographic primitive called EFI pairs, and prove its existential equivalence to various other quantum cryptographic primitives, including commitment schemes, oblivious transfer, and zero knowledge.• We establish the “minimality” of EFI pairs by showing that their existence is implied by many natural cryptographic primitives. EFI pairs thus appear easier to construct than other primitives. • We show that the existence of EFI pairs (and the existentially equivalent primitives) is robust by showing how to upgrade various notions of “weak” security to standard security without any additional assumption. • We finally construct commitment schemes, oblivious transfer, and zero knowledge unconditionally if a piece of quantum advice is allowed. In contrast, classical analogues of these are impossible without first settling P ≠ NP even if we allow randomized advice.