Comparison of self-reported patient survey data and curated electronic health record data in the osteosarcoma and leiomyosarcoma Count Me In study

OA Version
Citation
Abstract
BACKGROUND AND PURPOSE: Supplementing data from electronic health records (EHRs) with additional self-reported patient survey data enables the collection of a comprehensive dataset. Access to data from multiple sources provides further insights, especially in the case of rare conditions, expanding on patient population data and disease characteristics. The Count Me In (CMI) Project engages patients with Osteosarcoma (OS) and Leiomyosarcoma (LMS), rare sarcomas for which there has been limited progress in identifying new treatments and improving outcomes. The study obtains clinical data from participant EHRs and surveys, which provides an opportunity to assess whether in this context combining data from these sources improves data accuracy and completeness. METHODS: Electronic medical records (EMRs) collected from treating institutions were abstracted by trained staff into a REDCap system using guidelines outlined in a curation manual. Participants filled out surveys online upon study registration. Our analytic cohort consisted of participants who had a completed clinical data abstraction with a diagnosis date and at least one clinical characteristic from the survey. We collect consented participant demographics, race, ethnicity, health literacy, native language, education level, baseline disease characteristics, and cancer-directed therapy information. Data collected from the EMR are coded based on a limited list generated for the patient cohort. Survey data is mapped to the EMR variables obtained from the cohort and compared for agreement. All mapped terms that match are considered in agreement, and unmatched terms are assessed for agreement individually. RESULTS: By the project data freeze date on 10/31/24, the CMI study had received 141 OS EMRs and 358 LMS EMRs for clinical data abstraction. 103/124 (83%) OS and 151/153 (98%) LMS participants met the criteria for inclusion in the analytic cohort for this project. The demographics in this dataset are consistent with findings from previous studies. Information on date of diagnosis and primary site of cancer were mostly concordant across data sources. 90% of OS and 87% of LMS stage at diagnosis in the survey match stage data from the EMR. We identified a higher rate of metastatic disease reported in the EMR for both diagnoses, and a higher number of chemotherapy drugs listed in the survey. For chemotherapy treatment data, only 221/486 (47.2%) OS drugs and 160/363 (44.1%) LMS drugs listed matched across the EMR and survey information. CONCLUSIONS: Patient-reported survey data is a valuable source of data that yields insights that complement data abstracted from electronic medical records. In this study, we have used both data sources to report on patient clinical and demographic characteristics, showing that both contain gaps and can be used to supplement each other.
Description
2025
License