strict-intent-bench

A static inspection demo for wrong intent inference cases. Inspect benchmark cases, expected actions, success criteria, and side-by-side illustrative failures without using an API or backend.

Failure mode

Wrong intent inference is when an assistant answers a plausible implied request, but not the request the user actually made.

Intervention

Strict / Precision behavior aims to reduce wrong-intent behavior by avoiding unsupported assumptions about short, quoted, corrective, or context-dependent replies.

Current weakness

The trade-off is over-clarification: strict behavior can ask too many questions when the user's selection is already clear.

Existing measured summary

Bars use existing v0.2 80-case run summaries in this repository. Lower is better for wrong intent inference and unnecessary clarification.

Side-by-side examples

These are illustrative manually written examples, not measured API outputs. They show the intended failure pattern before running a full evaluation.