Cohen's Kappa (κ) is a statistic that measures inter-rater agreement for categorical items. It accounts for the possibility of agreement occurring by chance, providing a more robust measure than simple percent agreement.
Where Po = observed agreement and Pe = expected agreement by chance
| Kappa | Agreement Level |
|---|---|
| < 0.00 | Less than chance |
| 0.01 - 0.20 | Slight |
| 0.21 - 0.40 | Fair |
| 0.41 - 0.60 | Moderate |
| 0.61 - 0.80 | Substantial |
| 0.81 - 1.00 | Almost perfect |
Guidelines from Landis & Koch (1977)
For ordinal data where some disagreements are worse than others, use weighted kappa:
Two doctors independently classify 100 X-rays as Normal, Suspicious, or Abnormal. Cohen's Kappa measures how much better their agreement is compared to what would be expected by chance.
CrossTabs.com calculates Cohen's Kappa with:
Two doctors classify 100 X-rays as "Normal" or "Abnormal":
| Doctor B: Normal | Doctor B: Abnormal | Total | |
|---|---|---|---|
| Doctor A: Normal | 70 | 5 | 75 |
| Doctor A: Abnormal | 10 | 15 | 25 |
| Total | 80 | 20 | 100 |
Step 1: Observed agreement (Pₒ) = (70 + 15) / 100 = 0.85
Step 2: Expected agreement (Pₑ) = (75×80 + 25×20) / 100² = (6000 + 500) / 10000 = 0.65
Step 3: Cohen's kappa = (Pₒ − Pₑ) / (1 − Pₑ) = (0.85 − 0.65) / (1 − 0.65) = 0.20 / 0.35 = 0.571
Interpretation: κ = 0.57 indicates moderate agreement beyond chance.
| Kappa (κ) | Agreement Level |
|---|---|
| < 0.00 | Less than chance (worse than random) |
| 0.00 – 0.20 | Slight agreement |
| 0.21 – 0.40 | Fair agreement |
| 0.41 – 0.60 | Moderate agreement |
| 0.61 – 0.80 | Substantial agreement |
| 0.81 – 1.00 | Almost perfect agreement |