Two-Stage Approach for Identifying Single-Nucleotide Polymorphisms Associated with Rheumatoid Arthritis Using Random Forests and Bayesian Networks

Date
2007-12-18
DOI
Authors
Meng, Yan
Yang, Qiong
Cuenco, Karen T.
Cupples, L. Adrienne
DeStefano, Anita L.
Lunetta, Kathryn L.
Version
OA Version
Citation
Meng, Yan, Qiong Yang, Karen T Cuenco, L Adrienne Cupples, Anita L DeStefano, Kathryn L Lunetta. "Two-stage approach for identifying single-nucleotide polymorphisms associated with rheumatoid arthritis using random forests and Bayesian networks" BMC Proceedings 1(Suppl 1):S56. (2007)
Abstract
We used the simulated data set from Genetic Analysis Workshop 15 Problem 3 to assess a two-stage approach for identifying single-nucleotide polymorphisms (SNPs) associated with rheumatoid arthritis (RA). In the first stage, we used random forests (RF) to screen large amounts of genetic data using the variable importance measure, which takes into account SNP interaction effects as well as main effects without requiring model specification. We used the simulated 9187 SNPs mimicking a 10 K SNP chip, along with covariates DR (the simulated DRB1 gentoype), smoking, and sex as input to the RF analyses with a training set consisting of 750 unrelated RA cases and 750 controls. We used an iterative RF screening procedure to identify a smaller set of variables for further analysis. In the second stage, we used the software program CaMML for producing Bayesian networks, and developed complex etiologic models for RA risk using the variables identified by our RF screening procedure. We evaluated the performance of this method using independent test data sets for up to 100 replicates.
Description
License
Copyright 2007 Meng et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.