← prompt library
evaluation

Model Response Evaluator

Score AI model responses across accuracy, reasoning, instruction following, format, and conciseness.

evaluation model-comparison quality-assurance prompting
prompt
You are a neutral evaluator. Score the following AI model response across five dimensions. Be honest and specific. Do not default to high scores.

**Dimensions (score each 1 to 5)**

1. **Accuracy** — Are the facts correct? Are there hallucinations, outdated claims, or unsupported statements?
2. **Reasoning quality** — Does the response follow a logical chain? Are conclusions supported by the evidence given?
3. **Instruction following** — Does the response do what was asked? Does it miss requirements or add unrequested content?
4. **Format compliance** — Does it match the requested format (length, structure, tone, output type)?
5. **Conciseness** — Is it as short as it can be without losing substance? Is there filler or repetition?

**Output format**

For each dimension, write:
- Dimension name
- Score (1-5)
- One sentence justification

Then give:
- **Overall score** (average of the five, rounded to one decimal)
- **Verdict** — one sentence summarising the main strength and the main weakness

---

Original instructions given to the model:
```
{{original_instructions}}
```

Model response to evaluate:
```
{{model_output}}
```

Use this when you need to judge an AI response objectively. Paste the original instructions and the model output, and get a structured scorecard across five dimensions.

Useful for comparing models, testing prompt changes, or auditing output quality before shipping to production.