How to review results

There are three recommended strategies for reviewing results. The first batch of data should be comprehensively reviewed by your team and/or experts contracted by Centaur. For subsequent batches, it is recommended to review at least 10 - 15 cases per category to get a good sense of labeler performance.

How to findWhat they showWhat to look for
Low agreement casesFilter by Labeled cases and sort by ascending agreement.Go to States and select Labeled. Then go toMore Columns and select Agreement.Cases where labelers did not converge on similar answersAny patterns in the types of cases with low agreement - e.g., edge cases that may not be covered in the instructions
High difficulty casesFilter by Gold Standard cases and sort by descending difficulty.Go to States and select Gold Standard. Then go to More Columns and select Difficulty.Cases where labelers did not agree with the gold standard answerAny patterns of the kinds of cases labelers are getting wrong and/or any incorrect gold standard answers
Random sampleMultiple ways - e.g., filtering by Labeled and pick a random page to reviewRandom sample of cases to get a general sense of labeler performanceHow labelers are doing on the task; any other interesting trends

If any significant errors are found in the results, consider these strategies to improve your results moving forward. Share your findings with your project manager, and they can assist you as well.

If you are interested in leveraging Centaur accuracy metrics to review your results, check out Case level metrics and Task level metrics.