Case level metrics

Agreement and difficulty provide insight into labeler confidence for each case. This information can be used to identify particularly challenging cases, Gold Standard cases with incorrect answers, and/or edge cases that were not adequately addressed in labeler instructions.

What are they?

We calculate two measures of case-level labeler performance: agreement and difficulty.

Agreement indicates how often labelers agree with one another on a certain case, otherwise known as their similarity. Agreement scores are calculated for any case with any reads: Labeled, Gold Standard, and In Progress cases. Agreement scores can be interpreted in the following ways:

  • High agreement score: Labelers understand and interpret the concept similarly.
  • Low agreement score: Labelers may not understand the task well and/or the case is difficult to assess.

Difficulty indicates how often labelers disagree with the Correct Label on a Gold Standard case. Difficulty scores can be interpreted in the following ways:

  • Low difficulty score: Labelers generally agree with the Correct Label.
  • High difficulty score: Labelers disagree with the Correct Label, which may indicate that the case is difficult to assess and/or the Correct Label is wrong.

Where can I see them?

You can find agreement and difficulty on the task view (Cases tab of your task), viewing individual cases, and in your results download:

Task View

Once in the Cases tab of your task, select More Columns to add Difficulty and Agreement columns. You will see Agreement values for In Progress, Gold Standard and Labeled Cases with reads. You will see Difficulty values for Gold Standard cases with reads.

Case view

Once you have selected a specific case, you can see the agreement and difficulty on the left hand side of the screen under Analysis, where applicable.


Case view also displays information surrounding the Q-score (also known as trailing average accuracy) of all Qualified Reads used to create the Majority and Correct labels.

Note: For Gold Standard cases this only corresponds to the Majority label, as the Correct label is provided by your team or consultants.

Results download

There are agreement and difficulty columns in the results download, listing the values where applicable.