Characterizing and reassembling the COPD and ILD transcriptome using RNA-seq
Brothers, John Frederick
MetadataShow full item record
Chronic Obstructive Pulmonary Disease (COPD) is the 3rd leading cause of death in the US, and idiopathic pulmonary fibrosis (IPF), a type of Interstitial Lung Disease (ILD), is a fast acting, irreversible disease that leads to mortality within 3-5 years. RNA-sequencing provides the opportunity to quantitatively examine the sequences of millions mRNAs, and offers the potential to gain unprecedented insights into the structure of chronic non-malignant lung disease transcriptome. By identifying changes in splicing and novel loci expression associated with disease, we may be able to gain a better understanding of their pathogenesis, identify novel disease-specific biomarkers, and find better targets for therapy. Using RNA-seq data that our group generated on 281 human lung tissue samples (47=Control, 131=COPD, 103=ILD), I initially defined the transcriptomic landscape of lung tissue by identifying which genes were expressed in each tissue sample. I used a mixture model to separate genes into reliable and not reliable expression. Next, I employed reads that overlapped splice junctions in a linear model interaction term to identify disease-specific differential splicing. I identified alternatively spliced genes between control and disease tissues and validated three (PDGFA, NUMB, SCEL) of these genes with qPCR and nanostring (a hybridization-based barcoding technique used to quantify transcripts). Finally, I implemented and improved a pipeline to perform transcriptome assembly using Cufflinks that led to the identification of 1,855 novel loci that did not overlap with UCSC, Vega, and Ensembl annotations. The loci were classified into potential coding and non-coding loci (191 and 1,664, respectively). Expression analysis revealed that there were 120 IPF-associated and 10 emphysema-associated differentially expressed (q < 0.01) novel loci. RNA-seq provides a high-resolution transcript-level view of the pulmonary transcriptome and its modification in lung disease. It has enabled a new understanding of the lung transcriptome structure because it measures not only the transcripts we know but also the ones we do not know. The approaches and improvements I have employed have identified these novel targets and make possible further downstream functional analysis that could identify better targets for therapy and lead to an even better understanding of chronic lung disease pathogenesis.