A piRNA regulation landscape in C. elegans and a computational model to predict gene functions
MetadataShow full item record
Investigating mechanisms that regulate genes and the genes' functions are essential to understand a biological system. This dissertation is consists of two specific research projects under these aims, which are for understanding piRNA's regulation mechanism and predicting genes' function computationally. The first project shows a piRNA regulation landscape in C. elegans. piRNAs (Piwi-interacting small RNAs) form a complex with Piwi Argonautes to maintain fertility and silence transposons in animal germlines. In C. elegans, previous studies have suggested that piRNAs tolerate mismatched pairing and in principle could target all transcripts. In this project, by computationally analyzing the chimeric reads directly captured by cross-linking piRNA and their targets in vivo, piRNAs are found to target all germline mRNAs with microRNA-like pairing rules. The number of targeting chimeric reads correlates better with binding energy than with piRNA abundance, suggesting that piRNA concentration does not limit targeting. Further more, in mRNAs silenced by piRNAs, secondary small RNAs are found to be accumulating at the center and ends of piRNA binding sites. Whereas in germline-expressed mRNAs, reduced piRNA binding density and suppression of piRNA-associated secondary small RNAs targeting correlate with the CSR-1 Argonaute presence. These findings reveal physiologically important and nuanced regulation of piRNA targets and provide evidence for a comprehensive post-transcriptional regulatory step in germline gene expression. The second project elaborates a computational model to predict gene function. Predicting genes involved in a biological function facilitates many kinds of research, such as prioritizing candidates in a screening project. Following the “Guilt By Association” principle, multiple datasets are considered as biological networks and integrated together under a multi-label learning framework for predicting gene functions. Specifically, the functional labels are propagated and smoothed using a label propagation method on the networks and then integrated using an “Error correction of code” multi-label learning framework, where a “codeword” defines all the labels annotated to a specific gene. The model is then trained by finding the optimal projections between the code matrix and the biological datasets using canonical correlation analysis. Its performance is benchmarked by comparing to a state-of-art algorithm and a large scale screen results for piRNA pathway genes in D.melanogaster. Finally, piRNA targeting's roles in epigenetics and physiology and its cross-talk with CSR-1 pathway are discussed, together with a survey of additional biological datasets and a discussion of benchmarking methods for the gene function prediction.
RightsAttribution 4.0 International