A simple way to answer the hard question: is this agent good?
Creating and using an agent only pays off if you can tell whether it's working. I designed an evaluation experience that turns scattered signals into a clear, honest read.
Role · Product & UX design — research, information design, prototyping
The problem
Quality was a gut feeling. There was no shared, trustworthy way to judge whether an agent was performing well or where it failed.
And no way to tell whether a change made things better or worse.
Understanding why
I worked to define what 'good' even means for an agent, and which signals teams would actually trust.
The challenge was honesty: showing real weaknesses clearly without drowning people in metrics.
The solution
An evaluation view that pairs a clear headline read with the specific examples behind it.
A team can see the score, understand why, and jump straight to what to fix.
Outcome
Placeholder for shareable results — e.g. broader adoption of evaluation and faster diagnosis.
Quality moved from a gut feeling to something teams could point at.
Continuous evaluation — quality tracked over time so regressions surface on their own, turning evaluation from an event into a safety net.
An honest 'here's where it's weak' earns more trust than a dashboard full of green.
Privacy note — screens are recreated with dummy data and details simplified; the real product evolves with business decisions and user needs.