Xiaoyu Liu







Overview


Led The Farmer’s Dog (TFD) 's internal AI-powered Quality Assurance (QA) platform: eliminated ~$400K annual licensing costs, acheived largest satisfaction improvement (+0.57), scaling conversation coverage from 3% to 75%+ (projected).


My role


Designed complete user experience and co-defined product strategy and roadmap across two phases, leading end-to-end UI/UX, user research, and design language for AI transparency. 

Drove cross-functional alignment across eng, legal, data, and ops while navigating compliance requirements and technical ambiguity.












Context

TFD Customer Care (CC) at a glance

  • 400+ CC advisors | ~49–50K contacts/week | 2.5M+ customer contacts/year
  • QA is how we evaluate conversations and coach advisors to improve
  • Relied on EchoAI that lacked TFD’s domain knowledge that costs ~$400K/year
  • ~2% of conversations were manually reviewed



Audience

CC Managers

  • Manage a group of Advisors
  • Lead coaching conversations
  • Grade in low volume

QA Specialists

  • Own QA measurement and standards
  • Support coaching opportunities and plans
  • Grade in high volume

CC Advisors

  • Interact with customers
  • Receive feedback and coaching



Challenges

Several constraints hit at once: legal compliance, user trust, technical ambiguity, EchoAI contract ending soon.
Thus, how do we build an AI-powered QA platform that:

  • Provides automation at scale without sacrificing human judgment
  • Stays legally compliant under state-specific laws
  • Gives users transparency and genuine control
  • Empower Advisors with better coaching opportunities
  • Earns adoption rather than triggering resistance

And ship it with 0.5 designer and limited frontend resources without creating long-term maintenance debt?




Phased approach

Phase 1 (May–Oct 2025): 0→1 Foundation

  • Manual grading only, no AI. 
  • Build trust, collect baseline data.
  • Lock in technical foundations.

Phase 2 (Oct 2025–Mar 2026): AI Integration

  • Introduce AI once users trust the platform.
  • Baseline data in place for training and comparison.
  • Legal compliance is settled.

Phase 3: Personalized insights & coaching intelligence










Phase 1







Approach

Combined my onboarding introductions with early information-gathering, getting to know the team while learning how QA actually worked in our team.

Designed core manual grading flows. Shipped Phase 1 MVP (Sep 2025) covering 6 core workflows across 3 roles, completing all fast-follows by Oct 2025.

Every fidelity stage had a purpose: to get buy-ins, facilitate decision-making, align with cross-funtional partners, validate core flows, test usability issues, or enable smooth rollout.









Technical strategy

Figma MCP integration enabled AI-assisted development

  • I pushed the team to experiment immediately on Figma MCP (June 2025) to accelerate Phase 1 delivery.

Material UI over TFD design system - deliberate trade-off

  • Accessibility (advisors with color blindness flagged contrast issues on our internal design system).
  • Sustainability & speed (MUI is robust, well-documented, solid for 2B, and AI tools generate its code more accurately than custom systems).
  • Team efficiency (another internal team was already using it).

Intentionally scaled-back UI following MUI standards

  • Clean, functional layouts, focus on information hierarchy and usability, typography and spacing for clarity.
  • High-volume QA work needs clarity and speed. 
  • Eng can ship features without deep designer involvement.
  • Scales better with limited resources.








Phase 1 / Key Flow 1 / Select Agent to Review












Phase 1 / Key Flow 2 / Grade Manually










Phase 1 / Key Flow 3 / Edit Scores








Outcomes

Scaled from dozens to hundreds of graded conversations/week within weeks of launch

87 respondents described EvalPal as a coaching reinforcement tool that matches the design intent (330 respondents, post-launch survey, Sep 2025)

4.05/5 satisfaction vs. EchoAI's 3.48/5, +0.57 improvement (337 respondents, Q4 2025)

Strong baselines for 3-month-old product: 4.20/5 usability, 4.18/5 effectiveness for 3-month-old product.









Phase 2








Approach

Before phase 2, I ran a 330-person survey around AI sentiment and current product to make sure we weren't walking in blind.

Designed Phase 2 AI integration: LLM grading workflows, Guru article suggestions, note generation, feedback mechanism, patterns for AI transparency and human-centered AI collaboration.

Collaborated with Legal on AI grading policies and Eng on technical ambiguity.

Started to build custom components and establish AI interaction patterns and design language.



Challenges

Technical ambiguity

  • LLM evaluation quality and consistency
  • Real-time vs. delayed vs. batch processing (cost, accuracy, user expectations)
  • Handling cases where AI output is low-quality or off-base

Business & user tensions

  • AI automation needs to run at scale to deliver value
  • Employees need agency and transparency, not surveillance
  • Human review still matters, AI can't replace context

Legal reality

  • California has the strictest employee monitoring laws in the US
  • AI-powered evaluation of employee conversations requires specific disclosure 
  • "Coaching" vs. "surveillance" is a legally significant distinction
  • Results need to be unbiased







Phase 2 / Key Flow 1 / View & Filter Conversations










Phase 2 / Key Flow 2 / View Graded Rubrics












Phase 2 / Key Flow 1 / Live Grade Manually









Measure success

Our target metrics compared to Phase 1 as the baseline

  • Conversations graded: -3% → 75%+ 
  • Avg time to agent feedback: 5 days → <1 day 
  • Complete & Customized Resolution Rate: 85% → 88% 
  • First contact resolution: 82% → 85% 
  • QA scores: 93% → 95%
















Business impact

This not only reduces expenses but also gives us direct control over how we measure and surface conversation quality and performance.



Workflow transformation

Later hosted an internal Lunch & Learn with one of our Engineers to share our learnings and experiments across prod-eng group. 

Several teams have reached out and adopted Figma MCP in their design-eng handoff process.




@Xiaoyu Liu