Interrater variability between local and central pathologists in an industry sponsored adjudication program
Occhiuti, Alison Michele
MetadataShow full item record
BACKGROUND: Adjudication is a standardized, objective, and often blinded mechanism designed to assess clinical events with increased accuracy. It is performed by a centralized committee of independent reviewers, who are specialized, expert physicians who have no involvement with either the treatment of study subjects or the trial sponsor. Adjudication can decrease variability and bias in study results and increase the likelihood of correct identification, assessment, and categorization of clinical events such as potential malignancies diagnosed through histopathology. Histopathology is highly variable due to the subjective nature of the assessments. THESIS: If it is the case that there are clinically significant discrepancies between local and central diagnoses and that central adjudication yields more accurate diagnoses than a local pathologist, then it should be accepted that adjudication ought to be more widely used in clinical trials to assess histopathology-related safety outcomes and endpoints. METHODS AND STATISTICS: This is a retrospective cross-sectional study assessing interrater variability between local and central diagnoses of biopsy samples in a clinical trial setting using kappa scores and percent agreement. Certified Professional Coders (CPC) and central pathologists used the International Classification of Diseases for Oncology revision 3 (ICD-O 3) to codify the local and central assessments to permit comparison. Three statistical groups (group A: the full dataset, group B: pathology sub-specialty reading groups, and group C: non-melanoma skin cancers versus all other malignancies) were assessed for interrater variability in seven separate analyses: neoplasm versus non-neoplasm (analysis 1), benign versus malignant including non-neoplasms (analysis 2.1), benign versus malignant excluding non-neoplasms (analysis 2.2), discrepancies in morphology and/or behavior including non-neoplasms (analysis 3.1), discrepancies in morphology and/or behavior excluding non-neoplasms (analysis 3.2), all discrepancies leading to differences in treatment (analysis 4.1), and all discrepancies leading to difference in treatment with round 1 matches removed (analysis 4.2). RESULTS: 602 cases comprised the dataset. Based on kappa scores, there is near perfect agreement between the central and local lab diagnoses in analyses 1, 2.1, and 2.2 in group A (all cases in the dataset). The percent agreement for these analyses is above 90%. The group A (full dataset) kappa score and percent agreement decreased to 0.59 and 68.3%, respectively, in analysis 3.1 (discrepancies in morphology and/or behavior codes, including non-neoplasms). When non-neoplasms were removed (analysis 3.2), the kappa score and percent agreement were 0.52 and 57.0%, respectively. In group C, NMSC had substantial kappa agreement in analyses 1, 2.1, and 2.2, whereas all other malignancies had near perfect kappa agreement. All percent agreements were above 88% and surpassed the minimally acceptable threshold for interrater percent agreement in healthcare (80%). Group B divided the data set into 10 sub-specialty reading groups. Kappa scores ranged from 0.66 (GYN) to 1.00 (lung) in analysis 1; the analysis 1 kappa score for lymphoma was 0.55, but this was not statistically significant. In analysis 2.1, lung and sarcoma had the highest kappa scores (1.00) and dermatology and GYN had the lowest (0.71). As in analysis 1, the kappa score for lymphoma was 0.55 but was not statistically significant. When non-neoplasms were removed from analysis 2.2, 6 of the 10 sub-groups had kappa scores of 1.00, but all 6 had sample sizes less than 10. Percent agreement ranged from 80 to 100 percent. When all cases were considered regardless of number of rounds of review (analysis 4.1), about 90% of diagnoses would have similar courses of treatment. All sub-groups except sarcoma reached the minimally acceptable agreement rate in healthcare (80%). In the remaining 33% of cases that did not have matching diagnoses in round 1 (analysis 4.2), 34% may have different courses of treatment depending on whether the local or central diagnoses was used. Mid-study updates to the charter and CPC/reviewer manuals and processing of specimens did not have a significant impact on results. CONCLUSION: Although there is little discrepancy between local and central pathologists on whether malignancies exist among samples, there is discord regarding specific diagnoses and their associated treatments. Adjudication can assist in decreasing this discordance in order to develop the most specific and accurate safety profile for a compound.