madmax

madmax

Judgy — Correct LLM Judge Bias

When LLM judges evaluate models, numbers can lie It’s a common pain: you ask a Large Language Model to act as a judge and score outputs, then take the judge’s pass rate as the truth. But every LLM judge carries…