A study assessing the characteristics of big data environments that predict high research impact: application of qualitative and quantitative methods
MetadataShow full item record
BACKGROUND: Big data offers new opportunities to enhance healthcare practice. While researchers have shown increasing interest to use them, little is known about what drives research impact. We explored predictors of research impact, across three major sources of healthcare big data derived from the government and the private sector. METHODS: This study was based on a mixed methods approach. Using quantitative analysis, we first clustered peer-reviewed original research that used data from government sources derived through the Veterans Health Administration (VHA), and private sources of data from IBM MarketScan and Optum, using social network analysis. We analyzed a battery of research impact measures as a function of the data sources. Other main predictors were topic clusters and authors’ social influence. Additionally, we conducted key informant interviews (KII) with a purposive sample of high impact researchers who have knowledge of the data. We then compiled findings of KIIs into two case studies to provide a rich understanding of drivers of research impact. RESULTS: Analysis of 1,907 peer-reviewed publications using VHA, IBM MarketScan and Optum found that the overall research enterprise was highly dynamic and growing over time. With less than 4 years of observation, research productivity, use of machine learning (ML), natural language processing (NLP), and the Journal Impact Factor showed substantial growth. Studies that used ML and NLP, however, showed limited visibility. After adjustments, VHA studies had generally higher impact (10% and 27% higher annualized Google citation rates) compared to MarketScan and Optum (p<0.001 for both). Analysis of co-authorship networks showed that no single social actor, either a community of scientists or institutions, was dominating. Other key opportunities to achieve high impact based on KIIs include methodological innovations, under-studied populations and predictive modeling based on rich clinical data. CONCLUSIONS: Big data for purposes of research analytics has grown within the three data sources studied between 2013 and 2016. Despite important challenges, the research community is reacting favorably to the opportunities offered both by big data and advanced analytic methods. Big data may be a logical and cost-efficient choice to emulate research initiatives where RCTs are not possible.