How to review results
There are three recommended strategies for reviewing results. The first batch of data should be comprehensively reviewed by your team and/or experts contracted by Centaur. For subsequent batches, it is recommended to review at least 10 - 15 cases per category to get a good sense of labeler performance.
How to find | What they show | What to look for | |
|---|---|---|---|
Low agreement cases | Filter by Labeled cases and sort by ascending agreement. Go to States and select Labeled. Then go toMore Columns and select Agreement. | Cases where labelers did not converge on similar answers | Any patterns in the types of cases with low agreement - e.g., edge cases that may not be covered in the instructions |
High difficulty cases | Filter by Gold Standard cases and sort by descending difficulty. Go to States and select Gold Standard. Then go to More Columns and select Difficulty. | Cases where labelers did not agree with the gold standard answer | Any patterns of the kinds of cases labelers are getting wrong and/or any incorrect gold standard answers |
Random sample | Multiple ways - e.g., filtering by Labeled and pick a random page to review | Random sample of cases to get a general sense of labeler performance | How labelers are doing on the task; any other interesting trends |
If any significant errors are found in the results, consider these strategies to improve your results moving forward. Share your findings with your project manager, and they can assist you as well.
If you are interested in leveraging Centaur accuracy metrics to review your results, check out Case level metrics and Task level metrics.
Updated 3 months ago
