Boston University Libraries OpenBU
    JavaScript is disabled for your browser. Some features of this site may not work without it.
    View Item 
    •   OpenBU
    • Theses & Dissertations
    • Boston University Theses & Dissertations
    • View Item
    •   OpenBU
    • Theses & Dissertations
    • Boston University Theses & Dissertations
    • View Item

    Impact of new variables on discrimination of risk prediction models

    Thumbnail
    Date Issued
    2012
    Author(s)
    Demler, Olga V.
    Share to FacebookShare to TwitterShare by Email
    Export Citation
    Download to BibTex
    Download to EndNote/RefMan (RIS)
    Metadata
    Show full item record
    Embargoed until:
    Indefinite
    Permanent Link
    https://hdl.handle.net/2144/31532
    Abstract
    Risk prediction models for binary outcomes (such as the Framingham Risk Score for cardiovascular disease or the Gail Model for 5 year risk of breast cancer) have become the standard tools for health practitioners and policy makers. Rapid scientific progress in genetics and biochemistry has led to numerous new variables being proposed as candidates to improve existing models. Quality of risk prediction models is usually measured by the area under the receiver operating characteristic curve (AUC). Increase of AUC is used to evaluate how much added new variable contributes to model performance. However, the following paradox has been often reported in the literature: the new predictor is statistically significant in the multivariable model, but does not lead to a statistically significant change in the AUC. In the first part of this thesis we prove that the paradox outlined above is not true when data is normally distributed. We demonstrate that in this setting statistical significance of the new predictor(s) is always equivalent to the statistical significance of the increase in the AUC. In the second part, we show rigorously that the DeLong test, which is typically used to compare two AUCs, is invalid for nested models for any distribution of the data and for general type of risk prediction models, including logistic regression. Invalidity is the likely explanation for the paradox outlined above and results in DeLong test being overly conservative. In the third part of the thesis we focus on understanding what kind of statistical properties of the new predictor are beneficial for model performance. Using multivariate normal data we prove that contrary to common wisdom new variables uncorrelated with the old risk score are not always the strongest contributors to discrimination while negatively correlated ones are always beneficial. We also show that new predictor that has very high multiple R-square when linearly regressed on the old predictors can also be beneficial for risk prediction model. All results are illustrated using real-life Framingham data and conclusions and future direction are presented at the end.
    Description
    Thesis (Ph.D.)--Boston University
     
    PLEASE NOTE: Boston University Libraries did not receive an Authorization To Manage form for this thesis or dissertation. It is therefore not openly accessible, though it may be available by request. If you are the author or principal advisor of this work and would like to request open access for it, please contact us at open-help@bu.edu. Thank you.
     
    Collections
    • Boston University Theses & Dissertations [6905]


    Boston University
    Contact Us | Send Feedback | Help
     

     

    Browse

    All of OpenBUCommunities & CollectionsIssue DateAuthorsTitlesSubjectsThis CollectionIssue DateAuthorsTitlesSubjects

    Deposit Materials

    LoginNon-BU Registration

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Boston University
    Contact Us | Send Feedback | Help