A module based approach for identifying driver genes and expanding pathways from integrated biological networks
MetadataShow full item record
Each gene or protein has its own function which, when combined with others, allows the group to perform more complex behaviors, e.g. carry out a particular cellular task (functional module) or affect a particular disease phenotype (disease module). One of the major challenges in systems biology is to reveal the roles of genes or proteins in functional modules or disease modules. In the first part of the dissertation, I present a data-driven method, Correlation Set Analysis (CSA), for comprehensively detecting active regulators in disease populations by integrating co-expression analysis and specific types of literature-derived causal relationships. Instead of investigating the co-expression level between regulators and their targets, I focus on coherence of regulatees of a regulator, e.g. downstream targets of a transcription factor. Using simulated datasets I show that my method can reach high true positive rate and true negative rate (>80%) even the regulatory relationships is weak (only 20% of regulatees are co-expressed). Using three separate real biological datasets I was able to recover well-known and as- yet undescribed, active regulators for each disease population. In the second part of the dissertation, I develop and apply a new computational algorithm for detecting modules of functionally related genes that are likely to drive malignant transformation. The algorithm takes as input the identity and locations of a small number of known oncogenes (a seed set) on a human genome functional linkage network (FLN). It then searches for a boundary surrounding a gene set encompassing the seed, such that the magnitude of the difference in linkage weights between interior-interior gene pairs, and interior-exterior gene pairs is maximized. Starting with small seed sets for breast and ovarian cancer, I successfully identify known and novel drivers in both cancer types. In the third part of the dissertation, I propose a module based approach for expanding manually curated functional modules. I use the KEGG pathway database as an example and the results show that my approach can successfully suggest both validated pathway members (genes that are assigned to a particular pathway by other manually curated pathway databases) and novel candidate pathway genes.