Comparing User Testing:
Uxia's AI vs. Human
This report compares AI vs. Human user testing using the Amsterdam Public Transit App (GVB) as the case study. Provide your email to unlock the full report.



Trusted by +700 leading Product Teams
About the Report
The report focused on a high-stakes, real-world scenario: a first-time tourist attempting to purchase a 1-hour travel ticket using a Mastercard and requesting a receipt. To ensure a fair fight, both Uxia’s AI and the human panel used identical prototypes, missions, and audience demographics (UK-based, ages 25–45) .
Main Findings
Massive Gains in Speed and Cost
Uxia significantly outpaced traditional methods in terms of time and budget:
30x Faster Delivery: The full testing cycle (setup, execution, and analysis) took just 25 minutes with Uxia, compared to 748 minutes (over 12 hours) for the human panel.
Automated Analysis: While researchers spent over 4.5 hours manually reviewing human recordings, Uxia’s analysis time was effectively 0 minutes because it generates a ready-to-read report immediately.
Significant Savings: At a volume of 15 tests per month, Uxia costs $299/month, saving over $550 compared to the $849/month required for a human-panel platform.
Superior Insight Detection
The AI testers proved to be far more observant than their human counterparts:
4.25x More Issues: Uxia surfaced 17 real usability issues, while humans only detected 4.
Zero Unique Human Insights: Every single issue flagged by the human panel had already been independently identified by the AI; the humans brought no unique findings to the table.
Critical "Blind Spots": All 10 AI testers flagged a serious trust issue regarding an external Dutch-language payment redirect. In contrast, not a single human tester commented on it, likely because they were rushing to complete the task for compensation.
Reliability and Engagement Depth
The quality of feedback revealed a stark "attentiveness gap" between the two groups:
7x More Commentary: AI transcripts averaged 2,200 words per session, compared to just 300 words from humans.
100% Success Rate: All AI tests were valid and usable, whereas the human panel suffered a 10% failure rate due to a technical audio issue that made one transcript unusable.
Active vs. Passive Testing: Human testers often clicked through screens in "automatic mode," spending only 3–5 seconds on onboarding slides. AI testers "thought out loud," questioning ambiguous labels and identifying multi-layered friction points.


