Systematic approaches to mine, predict and visualize biological functions
MetadataShow full item record
With advances in high-throughput technologies and next-generation sequencing, the amount of genomic and proteomic data is dramatically increasing in the post-genomic era. One of the biggest challenges that has arisen is the connection of sequences to their activities and the understanding of their cellular functions and interactions. In this dissertation, I present three different strategies for mining, predicting and visualizing biological functions. In the first part, I present the COMputational Bridges to Experiments (COMBREX) project, which facilitates the functional annotation of microbial proteins by leveraging the power of scientific community. The goal is to bring computational biologists and biochemists together to expand our knowledge. A database-driven web portal has been built to serve as a hub for the community. Predicted annotations will be deposited into the database and the recommendation system will guide biologists to the predictions whose experimental validation will be more beneficial to our knowledge of microbial proteins. In addition, by taking advantage of the rich content, we develop a web service to help community members enrich their genome annotations. In the second part, I focus on identifying the genes for enzyme activities that lack genetic details in the major biological databases. Protein sequences are unknown for about one-third of the characterized enzyme activities listed in the EC system, the so-called orphan enzymes. Our approach considers the similarities between enzyme activities, enabling us to deal with broad types of orphan enzymes in eukaryotes. I apply our framework to human orphan enzymes and show that we can successfully fill the knowledge gaps in the human metabolic network. In the last part, I construct a platform for visually analyzing the eco-system level metabolic network. Most microbes live in a multiple-species environment. The underlying nutrient exchange can be seen as a dynamic eco-system level metabolic network. The complexity of the network poses new visualization challenges. Using the data predicted by Computation Of Microbial Ecosystems in Time and Space (COMETS), I demonstrate that our platform is a powerful tool for investigating the interactions of the microbial community. We apply it to the exploration of a simulated microbial eco-system in the human gut. The result reflects both known knowledge and novel mutualistic interactions, such as the nutrients exchanges between E. coli, C. difficile and L. acidophilus.