Molecular Biology, Cell Biology and Biochemistry Program: Scholarly Papers
Permanent URI for this collection
Browse
Recent Submissions
Item A Unique Family of Mrr-Like Modification-Dependent Restriction Endonucleases(Oxford University Press, 2010-5-5) Zheng, Yu; Cohen-Karni, Devora; Xu, Derrick; Chin, Hang Gyeong; Wilson, Geoffrey; Pradhan, Sriharsa; Roberts, Richard J.Mrr superfamily of homologous genes in microbial genomes restricts modified DNA in vivo. However, their biochemical properties in vitro have remained obscure. Here, we report the experimental characterization of MspJI, a remote homolog of Escherichia coli's Mrr and show it is a DNA modification-dependent restriction endonuclease. Our results suggest MspJI recognizes mCNNR (R = G/A) sites and cleaves DNA at fixed distances (N12/N16) away from the modified cytosine at the 3′ side (or N9/N13 from R). Besides 5-methylcytosine, MspJI also recognizes 5-hydroxymethylcytosine but is blocked by 5-glucosylhydroxymethylcytosine. Several other close homologs of MspJI show similar modification-dependent endonuclease activity and display substrate preferences different from MspJI. A unique feature of these modification-dependent enzymes is that they are able to extract small DNA fragments containing modified sites on genomic DNA, for example ^∼32 bp around symmetrically methylated CG sites and ^∼31 bp around methylated CNG sites. The digested fragments can be directly selected for high-throughput sequencing to map the location of the modification on the genomic DNA. The MspJI enzyme family, with their different recognition specificities and cleavage properties, provides a basis on which many future methods can build to decode the epigenomes of different organisms.Item PC3 prostate tumor-initiating cells with molecular profile FAM65Bhigh/MFI2low/LEF1low increase tumor angiogenesis(BioMed Central, 2010-12-29) Zhang, Kexiong; Waxman, David J.BACKGROUND Cancer stem-like cells are proposed to sustain solid tumors by virtue of their capacity for self-renewal and differentiation to cells that comprise the bulk of the tumor, and have been identified for a variety of cancers based on characteristic clonal morphologies and patterns of marker gene expression. METHODS Single cell cloning and spheroid culture studies were used to identify a population of cancer stem-like cells in the androgen-independent human prostate cancer cell line PC3. RESULTS We demonstrate that, under standard culture conditions, ~10% of PC3 cells form holoclones with cancer stem cell characteristics. These holoclones display high self-renewal capability in spheroid formation assays under low attachment and serum-free culture conditions, retain their holoclone morphology when passaged at high cell density, exhibit moderate drug resistance, and show high tumorigenicity in scid immunodeficient mice. PC3 holoclones readily form spheres, and PC3-derived spheres yield a high percentage of holoclones, further supporting their cancer stem cell-like nature. We identified one gene, FAM65B, whose expression is consistently up regulated in PC3 holoclones compared to paraclones, the major cell morphology in the parental PC3 cell population, and two genes, MFI2 and LEF1, that are consistently down regulated. This molecular profile, FAM65Bhigh/MFI2low/LEF1low, also characterizes spheres generated from parental PC3 cells. The PC3 holoclones did not show significant enriched expression of the putative prostate cancer stem cell markers CD44 and integrin α2β1. PC3 tumors seeded with holoclones showed dramatic down regulation of FAM65B and dramatic up regulation of MFI2 and LEF1, and unexpectedly, a marked increase in tumor vascularity compared to parental PC3 tumors, suggesting a role of cancer stem cells in tumor angiogenesis. CONCLUSIONS These findings support the proposal that PC3 tumors are sustained by a small number of tumor-initiating cells with stem-like characteristics, including strong self-renewal and pro-angiogenic capability and marked by the expression pattern FAM65Bhigh/MFI2low/LEF1low. These markers may serve as targets for therapies designed to eliminate cancer stem cell populations associated with aggressive, androgen-independent prostate tumors such as PC3.Item Machine Learning for Regulatory Analysis and Transcription Factor Target Prediction in Yeast(Kluwer Academic Publishers, 2006-10-31) Holloway, Dustin T.; Kon, Mark; DeLisi, CharlesHigh throughput technologies, including array-based chromatin immunoprecipitation, have rapidly increased our knowledge of transcriptional maps-the identity and location of regulatory binding sites within genomes. Still, the full identification of sites, even in lower eukaryotes, remains largely incomplete. In this paper we develop a supervised learning approach to site identification using support vector machines (SVMs) to combine 26 different data types. A comparison with the standard approach to site identification using position specific scoring matrices (PSSMs) for a set of 104 Saccharomyces cerevisiae regulators indicates that our SVM-based target classification is more sensitive (73 vs. 20%) when specificity and positive predictive value are the same. We have applied our SVM classifier for each transcriptional regulator to all promoters in the yeast genome to obtain thousands of new targets, which are currently being analyzed and refined to limit the risk of classifier over-fitting. For the purpose of illustration we discuss several results, including biochemical pathway predictions for Gcn4 and Rap1. For both transcription factors SVM predictions match well with the known biology of control mechanisms, and possible new roles for these factors are suggested, such as a function for Rap1 in regulating fermentative growth. We also examine the promoter melting temperature curves for the targets of YJR060W, and show that targets of this TF have potentially unique physical properties which distinguish them from other genes. The SVM output automatically provides the means to rank dataset features to identify important biological elements. We use this property to rank classifying k-mers, thereby reconstructing known binding sites for several TFs, and to rank expression experiments, determining the conditions under which Fhl1, the factor responsible for expression of ribosomal protein genes, is active. We can see that targets of Fhl1 are differentially expressed in the chosen conditions as compared to the expression of average and negative set genes. SVM-based classifiers provide a robust framework for analysis of regulatory networks. Processing of classifier outputs can provide high quality predictions and biological insight into functions of particular transcription factors. Future work on this method will focus on increasing the accuracy and quality of predictions using feature reduction and clustering strategies. Since predictions have been made on only 104 TFs in yeast, new classifiers will be built for the remaining 100 factors which have available binding data. ELECTRONIC SUPPLEMENTARY MATERIAL. Supplementary material is available in the online version of this article at http://dx.doi.org/10.1007/s11693-006-9003-3 and is accessible for authorized users.Item In silico regulatory analysis for exploring human disease progression(BioMed Central, 2008-6-18) Holloway, Dustin T.; Kon, Mark; DeLisi, CharlesBACKGROUND. An important goal in bioinformatics is to unravel the network of transcription factors (TFs) and their targets. This is important in the human genome, where many TFs are involved in disease progression. Here, classification methods are applied to identify new targets for 152 transcriptional regulators using publicly-available targets as training examples. Three types of sequence information are used: composition, conservation, and overrepresentation. RESULTS. Starting with 8817 TF-target interactions we predict an additional 9333 targets for 152 TFs. Randomized classifiers make few predictions (~2/18660) indicating that our predictions for many TFs are significantly enriched for true targets. An enrichment score is calculated and used to filter new predictions. Two case-studies for the TFs OCT4 and WT1 illustrate the usefulness of our predictions: • Many predicted OCT4 targets fall into the Wnt-pathway. This is consistent with known biology as OCT4 is developmentally related and Wnt pathway plays a role in early development. • Beginning with 15 known targets, 354 predictions are made for WT1. WT1 has a role in formation of Wilms' tumor. Chromosomal regions previously implicated in Wilms' tumor by cytological evidence are statistically enriched in predicted WT1 targets. These findings may shed light on Wilms' tumor progression, suggesting that the tumor progresses either by loss of WT1 or by loss of regions harbouring its targets. • Targets of WT1 are statistically enriched for cancer related functions including metastasis and apoptosis. Among new targets are BAX and PDE4B, which may help mediate the established anti-apoptotic effects of WT1. • Of the thirteen TFs found which co-regulate genes with WT1 (p ≤ 0.02), 8 have been previously implicated in cancer. The regulatory-network for WT1 targets in genomic regions relevant to Wilms' tumor is provided. CONCLUSION. We have assembled a set of features for the targets of human TFs and used them to develop classifiers for the determination of new regulatory targets. Many predicted targets are consistent with the known biology of their regulators, and new targets for the Wilms' tumor regulator, WT1, are proposed. We speculate that Wilms' tumor development is mediated by chromosomal rearrangements in the location of WT1 targets. REVIEWERS. This article was reviewed by Trey Ideker, Vladimir A. Kuznetsov(nominated by Frank Eisenhaber), and Tzachi Pilpel.Item Classifying Transcription Factor Targets and Discovering Relevant Biological Features(BioMed Central, 2008-5-30) Holloway, Dustin T.; Kon, Mark; DeLisi, CharlesBACKGROUND. An important goal in post-genomic research is discovering the network of interactions between transcription factors (TFs) and the genes they regulate. We have previously reported the development of a supervised-learning approach to TF target identification, and used it to predict targets of 104 transcription factors in yeast. We now include a new sequence conservation measure, expand our predictions to include 59 new TFs, introduce a web-server, and implement an improved ranking method to reveal the biological features contributing to regulation. The classifiers combine 8 genomic datasets covering a broad range of measurements including sequence conservation, sequence overrepresentation, gene expression, and DNA structural properties. PRINCIPAL FINDINGS. (1) Application of the method yields an amplification of information about yeast regulators. The ratio of total targets to previously known targets is greater than 2 for 11 TFs, with several having larger gains: Ash1(4), Ino2(2.6), Yaf1(2.4), and Yap6(2.4). (2) Many predicted targets for TFs match well with the known biology of their regulators. As a case study we discuss the regulator Swi6, presenting evidence that it may be important in the DNA damage response, and that the previously uncharacterized gene YMR279C plays a role in DNA damage response and perhaps in cell-cycle progression. (3) A procedure based on recursive-feature-elimination is able to uncover from the large initial data sets those features that best distinguish targets for any TF, providing clues relevant to its biology. An analysis of Swi6 suggests a possible role in lipid metabolism, and more specifically in metabolism of ceramide, a bioactive lipid currently being investigated for anti-cancer properties. (4) An analysis of global network properties highlights the transcriptional network hubs; the factors which control the most genes and the genes which are bound by the largest set of regulators. Cell-cycle and growth related regulators dominate the former; genes involved in carbon metabolism and energy generation dominate the latter. CONCLUSION. Postprocessing of regulatory-classifier results can provide high quality predictions, and feature ranking strategies can deliver insight into the regulatory functions of TFs. Predictions are available at an online web-server, including the full transcriptional network, which can be analyzed using VisAnt network analysis suite. REVIEWERS. This article was reviewed by Igor Jouline, Todd Mockler(nominated by Valerian Dolja), and Sandor Pongor.Item VisANT: Data-Integrating Visual Framework for Biological Networks and Modules(Oxford University Press, 2005-06-27) Hu, Zhenjun; Mellor, Joe; Wu, Jie; Yamada, Takuji; Holloway, Dustin; DeLisi, CharlesVisANT is a web-based software framework for visualizing and analyzing many types of networks of biological interactions and associations. Networks are a useful computational tool for representing many types of biological data, such as biomolecular interactions, cellular pathways and functional modules. Given user-defined sets of interactions or groupings between genes or proteins, VisANT provides: (i) a visual interface for combining and annotating network data, (ii) supporting function and annotation data for different genomes from the Gene Ontology and KEGG databases and (iii) the statistical and analytical tools needed for extracting topological properties of the user-defined networks. Users can customize, modify, save and share network views with other users, and import basic network data representations from their own data sources, and from standard exchange formats such as PSI-MI and BioPAX. The software framework we employ also supports the development of more sophisticated visualization and analysis functions through its open API for Java-based plug-ins. VisANT is distributed freely via the web at and can also be downloaded for individual use.Item Gyrase Inhibitors Induce an Oxidative Damage Cellular Death Pathway in Escherichia Coli(2007-03-13) Dwyer, Daniel J.; Kohanski, Michael A.; Hayete, Boris; Collins, James J.Modulation of bacterial chromosomal supercoiling is a function of DNA gyrase-catalyzed strand breakage and rejoining. This reaction is exploited by both antibiotic and proteic gyrase inhibitors, which trap the gyrase molecule at the DNA cleavage stage. Owing to this interaction, double-stranded DNA breaks are introduced and replication machinery is arrested at blocked replication forks. This immediately results in bacteriostasis and ultimately induces cell death. Here we demonstrate, through a series of phenotypic and gene expression analyses, that superoxide and hydroxyl radical oxidative species are generated following gyrase poisoning and play an important role in cell killing by gyrase inhibitors. We show that superoxide-mediated oxidation of iron–sulfur clusters promotes a breakdown of iron regulatory dynamics; in turn, iron misregulation drives the generation of highly destructive hydroxyl radicals via the Fenton reaction. Importantly, our data reveal that blockage of hydroxyl radical formation increases the survival of gyrase-poisoned cells. Together, this series of biochemical reactions appears to compose a maladaptive response, that serves to amplify the primary effect of gyrase inhibition by oxidatively damaging DNA, proteins and lipids.Item Portraits of Breast Cancer Progression(BioMed Central, 2007-8-6) Dalgin, Gul S.; Alexe, Gabriela; Scanfeld, Daniel; Tamayo, Pablo; Mesirov, Jill P.; Ganesan, Shridar; DeLisi, Charles; Bhanot, GyanBACKGROUND. Clustering analysis of microarray data is often criticized for giving ambiguous results because of sensitivity to data perturbation or clustering techniques used. In this paper, we describe a new method based on principal component analysis and ensemble consensus clustering that avoids these problems. RESULTS. We illustrate the method on a public microarray dataset from 36 breast cancer patients of whom 31 were diagnosed with at least two of three pathological stages of disease (atypical ductal hyperplasia (ADH), ductal carcinoma in situ (DCIS) and invasive ductal carcinoma (IDC). Our method identifies an optimum set of genes and divides the samples into stable clusters which correlate with clinical classification into Luminal, Basal-like and Her2+ subtypes. Our analysis reveals a hierarchical portrait of breast cancer progression and identifies genes and pathways for each stage, grade and subtype. An intriguing observation is that the disease phenotype is distinguishable in ADH and progresses along distinct pathways for each subtype. The genetic signature for disease heterogeneity across subtypes is greater than the heterogeneity of progression from DCIS to IDC within a subtype, suggesting that the disease subtypes have distinct progression pathways. Our method identifies six disease subtype and one normal clusters. The first split separates the normal samples from the cancer samples. Next, the cancer cluster splits into low grade (pathological grades 1 and 2) and high grade (pathological grades 2 and 3) while the normal cluster is unchanged. Further, the low grade cluster splits into two subclusters and the high grade cluster into four. The final six disease clusters are mapped into one Luminal A, three Luminal B, one Basal-like and one Her2+. CONCLUSION. We confirm that the cancer phenotype can be identified in early stage because the genes altered in this stage progressively alter further as the disease progresses through DCIS into IDC. We identify six subtypes of disease which have distinct genetic signatures and remain separated in the clustering hierarchy. Our findings suggest that the heterogeneity of disease across subtypes is higher than the heterogeneity of the disease progression within a subtype, indicating that the subtypes are in fact distinct diseases.Item Identification and Characterization of Renal Cell Carcinoma Gene Markers(Libertas Academica, 2007-2-9) Dalgin, Gul S.; Holloway, Dustin T.; Liou, Louis S.; DeLisi, CharlesMicroarray gene expression profiling has been used to distinguish histological subtypes of renal cell carcinoma (RCC), and consequently to identify specific tumor markers. The analytical procedures currently in use find sets of genes whose average differential expression across the two categories differ significantly. In general each of the markers thus identified does not distinguish tumor from normal with 100% accuracy, although the group as a whole might be able to do so. For the purpose of developing a widely used economically viable diagnostic signature, however, large groups of genes are not likely to be useful. Here we use two different methods, one a support vector machine variant, and the other an exhaustive search, to reanalyze data previously generated in our Lab (Lenburg et al. 2003). We identify 158 genes, each having an expression level that is higher (lower) in every tumor sample than in any normal sample, and each having a minimum differential expression across the two categories at a significance of 0.01. The set is highly enriched in cancer related genes (p = 1.6 × 10−12), containing 43 genes previously associated with either RCC or other types of cancer. Many of the biomarkers appear to be associated with the central alterations known to be required for cancer transformation. These include the oncogenes JAZF1, AXL, ABL2; tumor suppressors RASD1, PTPRO, TFAP2A, CDKN1C; and genes involved in proteolysis or cell-adhesion such as WASF2, and PAPPA.Item Data perturbation independent diagnosis and validation of breast cancer subtypes using clustering and patterns(Libertas Academica, 2007-2-19) Alexe, Gabriela; Dalgin, Gul S.; Ramaswamy, R.; DeLisi, Charles; Bhanot, GyanMolecular stratification of disease based on expression levels of sets of genes can help guide therapeutic decisions if such classifications can be shown to be stable against variations in sample source and data perturbation. Classifications inferred from one set of samples in one lab should be able to consistently stratify a different set of samples in another lab. We present a method for assessing such stability and apply it to the breast cancer (BCA) datasets of Sorlie et al. 2003 and Ma et al. 2003. We find that within the now commonly accepted BCA categories identified by Sorlie et al. Luminal A and Basal are robust, but Luminal B and ERBB2+ are not. In particular, 36% of the samples identified as Luminal B and 55% identified as ERBB2+ cannot be assigned an accurate category because the classification is sensitive to data perturbation. We identify a "core cluster" of samples for each category, and from these we determine "patterns" of gene expression that distinguish the core clusters from each other. We find that the best markers for Luminal A and Basal are (ESR1, LIV1, GATA-3) and (CCNE1, LAD1, KRT5), respectively. Pathways enriched in the patterns regulate apoptosis, tissue remodeling and the immune response. We use a different dataset (Ma et al. 2003) to test the accuracy with which samples can be allocated to the four disease subtypes. We find, as expected, that the classification of samples identified as Luminal A and Basal is robust but classification into the other two subtypes is not.