The prediction of HLA genotypes from next generation sequencing and genome scan data
Farrell, John J.
MetadataShow full item record
Genome-wide association studies have very successfully found highly significant disease associations with single nucleotide polymorphisms (SNP) in the Major Histocompatibility Complex for adverse drug reactions, autoimmune diseases and infectious diseases. However, the extensive linkage disequilibrium in the region has made it difficult to unravel the HLA alleles underlying these diseases. Here I present two methods to comprehensively predict 4-digit HLA types from the two types of experimental genome data widely available. The Virtual SNP Imputation approach was developed for genome scan data and demonstrated a high precision and recall (96% and 97% respectively) for the prediction of HLA genotypes. A reanalysis of 6 genome-wide association studies using the HLA imputation method identified 18 significant HLA allele associations for 6 autoimmune diseases: 2 in ankylosing spondylitis, 2 in autoimmune thyroid disease, 2 in Crohn's disease, 3 in multiple sclerosis, 2 in psoriasis and 7 in rheumatoid arthritis. The EPIGEN consortium also used the Virtual SNP Imputation approach to detect a novel association of HLA-A*31:01 with adverse reactions to carbamazepine. For the prediction of HLA genotypes from next generation sequencing data, I developed a novel approach using a naïve Bayes algorithm called HLA-Genotyper. The validation results covered whole genome, whole exome and RNA-Seq experimental designs in the European and Yoruba population samples available from the 1000 Genomes Project. The RNA-Seq data gave the best results with an overall precision and recall near 0.99 for Europeans and 0.98 for the Yoruba population. I then successfully used the method on targeted sequencing data to detect significant associations of idiopathic membranous nephropathy with HLA-DRB1*03:01 and HLA-DQA1*05:01 using the 1000 Genomes European subjects as controls. Using the results reported here, researchers may now readily unravel the association of HLA alleles with many diseases from genome scans and next generation sequencing experiments without the expensive and laborious HLA typing of thousands of subjects. Both algorithms enable the analysis of diverse populations to help researchers pinpoint HLA loci with biological roles in infection, inflammation, autoimmunity, aging, mental illness and adverse drug reactions.