On identifying polycystic ovary syndrome in the Clinical Data Warehouse at Boston Medical Center
Cheng, Jay Jojo
MetadataShow full item record
INTRODUCTION: Polycystic ovary syndrome (PCOS) is characterized by hyperandrogenemia, oligoanovulation, and numerous ovarian cysts. Although the most common cause of female factor infertility, its characteristics and metabolic risks are difficult to study due to its heterogeneity. Additionally, ethnic-specific data is scarce. Hospital electronic medical records and the diverse patient population at Boston Medical Center (BMC) may provide an avenue for investigating the longitudinal nature of PCOS and its race-specific characteristics. OBJECTIVES: 1. Describe the Clinical Data Warehouse (CDW) dataset available for studying PCOS. 2. Develop an automated method for extracting ovarian features from written ultrasound reports. 3. Identify PCOS patients from their record of the three cardinal PCOS features. METHODS: Patients evaluated on at least one of the three cardinal PCOS features, between October 1, 2003 and September 30, 2015 were queried from the BMC CDW. This thesis describes methods for cleaning the data, as well as the development of an ultrasound classifier based on natural language processing techniques. RESULTS: On a validation set of 1000 random ultrasounds, the automatic ultrasound classifier had a recall and precision for the presence of PCOM, 99.0% and 94.2%, respectively. Overall, 2421 cases of PCOS were identified, with 1010 not receiving a diagnosis. Black patients had twice the odds of being underdiagnosed compared to White patients (OR: 2.09; 95% CI: 1.69–2.59). CONCLUSIONS: Ascertaining PCOS through the medical record offers advantages over self-reported PCOS, including documentation of disease and recorded measurements. In the future, this PCOS dataset can be used in conjunction with cardiovascular and metabolic outcomes for developing a predictive model.