Crowdsourcing as a novel technique for retinal fundus photography classification: analysis of images in the EPIC Norfolk cohort on behalf of the UK Biobank Eye and Vision Consortium.
PLoS ONE 2013 ; 8: e71154.
Mitry D, Peto T, Hayat S, Morgan JE, Khaw KT, Foster PJ
DOI : 10.1371/journal.pone.0071154
PubMed ID : 23990935
PMCID : PMC3749186
URL : https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0071154
Abstract
Crowdsourcing is the process of outsourcing numerous tasks to many untrained individuals. Our aim was to assess the performance and repeatability of crowdsourcing for the classification of retinal fundus photography.
One hundred retinal fundus photograph images with pre-determined disease criteria were selected by experts from a large cohort study. After reading brief instructions and an example classification, we requested that knowledge workers (KWs) from a crowdsourcing platform classified each image as normal or abnormal with grades of severity. Each image was classified 20 times by different KWs. Four study designs were examined to assess the effect of varying incentive and KW experience in classification accuracy. All study designs were conducted twice to examine repeatability. Performance was assessed by comparing the sensitivity, specificity and area under the receiver operating characteristic curve (AUC).
Without restriction on eligible participants, two thousand classifications of 100 images were received in under 24 hours at minimal cost. In trial 1 all study designs had an AUC (95%CI) of 0.701(0.680-0.721) or greater for classification of normal/abnormal. In trial 1, the highest AUC (95%CI) for normal/abnormal classification was 0.757 (0.738-0.776) for KWs with moderate experience. Comparable results were observed in trial 2. In trial 1, between 64-86% of any abnormal image was correctly classified by over half of all KWs. In trial 2, this ranged between 74-97%. Sensitivity was ≥ 96% for normal versus severely abnormal detections across all trials. Sensitivity for normal versus mildly abnormal varied between 61-79% across trials.
With minimal training, crowdsourcing represents an accurate, rapid and cost-effective method of retinal image analysis which demonstrates good repeatability. Larger studies with more comprehensive participant training are needed to explore the utility of this compelling technique in large scale medical image analysis.
Study : EPIC-Norfolk: The European Prospective Investigation into Cancer Norfolk Cohort