Xiaoyu Liu

Design Art About

Overview

Led The Farmer’s Dog (TFD) 's internal AI-powered Quality Assurance (QA) platform: eliminated ~$400K annual licensing costs, acheived largest satisfaction improvement (+0.57), scaling conversation coverage from 3% to 75%+ (projected).

My role

Designed complete user experience and co-defined product strategy and roadmap across two phases, leading end-to-end UI/UX, user research, and design language for AI transparency.

Drove cross-functional alignment across eng, legal, data, and ops while navigating compliance requirements and technical ambiguity.

Context

TFD Customer Care (CC) at a glance

400+ CC advisors | ~49–50K contacts/week | 2.5M+ customer contacts/year

QA is how we evaluate conversations and coach advisors to improve

Relied on EchoAI that lacked TFD’s domain knowledge that costs ~$400K/year
~2% of conversations were manually reviewed

Audience

CC Managers

Manage a group of Advisors
Lead coaching conversations
Grade in low volume

QA Specialists

Own QA measurement and standards
Support coaching opportunities and plans
Grade in high volume

CC Advisors

Interact with customers
Receive feedback and coaching

Challenges

Several constraints hit at once: legal compliance, user trust, technical ambiguity, EchoAI contract ending soon.
Thus, how do we build an AI-powered QA platform that:

Provides automation at scale without sacrificing human judgment
Stays legally compliant under state-specific laws
Gives users transparency and genuine control
Empower Advisors with better coaching opportunities
Earns adoption rather than triggering resistance

And ship it with 0.5 designer and limited frontend resources without creating long-term maintenance debt?

Phased approach

Phase 1 (May–Oct 2025): 0→1 Foundation

Manual grading only, no AI.
Build trust, collect baseline data.
Lock in technical foundations.

Phase 2 (Oct 2025–Mar 2026): AI Integration

Introduce AI once users trust the platform.
Baseline data in place for training and comparison.
Legal compliance is settled.

Phase 3: Personalized insights & coaching intelligence

Phase 1

Approach

Combined my onboarding introductions with early information-gathering, getting to know the team while learning how QA actually worked in our team.

Designed core manual grading flows. Shipped Phase 1 MVP (Sep 2025) covering 6 core workflows across 3 roles, completing all fast-follows by Oct 2025.

Every fidelity stage had a purpose: to get buy-ins, facilitate decision-making, align with cross-funtional partners, validate core flows, test usability issues, or enable smooth rollout.

Technical strategy

Figma MCP integration enabled AI-assisted development

I pushed the team to experiment immediately on Figma MCP (June 2025) to accelerate Phase 1 delivery.

Material UI over TFD design system - deliberate trade-off

Accessibility (advisors with color blindness flagged contrast issues on our internal design system).
Sustainability & speed (MUI is robust, well-documented, solid for 2B, and AI tools generate its code more accurately than custom systems).
Team efficiency (another internal team was already using it).

Intentionally scaled-back UI following MUI standards

Clean, functional layouts, focus on information hierarchy and usability, typography and spacing for clarity.
High-volume QA work needs clarity and speed.
Eng can ship features without deep designer involvement.
Scales better with limited resources.

Phase 1 / Key Flow 1 / Select Agent to Review

Phase 1 / Key Flow 2 / Grade Manually

Phase 1 / Key Flow 3 / Edit Scores

Outcomes

Scaled from dozens to hundreds of graded conversations/week within weeks of launch

87 respondents described EvalPal as a coaching reinforcement tool that matches the design intent (330 respondents, post-launch survey, Sep 2025)

4.05/5 satisfaction vs. EchoAI's 3.48/5, +0.57 improvement (337 respondents, Q4 2025)

Strong baselines for 3-month-old product: 4.20/5 usability, 4.18/5 effectiveness for 3-month-old product.

Phase 2

Approach

Before phase 2, I ran a 330-person survey around AI sentiment and current product to make sure we weren't walking in blind.

Designed Phase 2 AI integration: LLM grading workflows, Guru article suggestions, note generation, feedback mechanism, patterns for AI transparency and human-centered AI collaboration.

Collaborated with Legal on AI grading policies and Eng on technical ambiguity.

Started to build custom components and establish AI interaction patterns and design language.

Challenges

Technical ambiguity

LLM evaluation quality and consistency
Real-time vs. delayed vs. batch processing (cost, accuracy, user expectations)
Handling cases where AI output is low-quality or off-base

Business & user tensions

AI automation needs to run at scale to deliver value
Employees need agency and transparency, not surveillance
Human review still matters, AI can't replace context

Legal reality

California has the strictest employee monitoring laws in the US
AI-powered evaluation of employee conversations requires specific disclosure
"Coaching" vs. "surveillance" is a legally significant distinction
Results need to be unbiased

Phase 2 / Key Flow 1 / View & Filter Conversations

Phase 2 / Key Flow 2 / View Graded Rubrics

Phase 2 / Key Flow 1 / Live Grade Manually

Measure success

Our target metrics compared to Phase 1 as the baseline

Conversations graded: -3% → 75%+
Avg time to agent feedback: 5 days → <1 day
Complete & Customized Resolution Rate: 85% → 88%
First contact resolution: 82% → 85%
QA scores: 93% → 95%

Business impact

This not only reduces expenses but also gives us direct control over how we measure and surface conversation quality and performance.

Workflow transformation

Later hosted an internal Lunch & Learn with one of our Engineers to share our learnings and experiments across prod-eng group.

Several teams have reached out and adopted Figma MCP in their design-eng handoff process.

Go back Top Info

@Xiaoyu Liu