Usability Myths Need Reality Checks
Not so very long ago, it was agreed that five to eight users was enough for a good usability test. Somehow, this idea achieved mythic status. We believed it. We preached it to everyone who would listen. It survived in areas where it had been disproved, and was introduced into new situations where it didn’t even apply.
What gives some ideas such staying power?
We challenge new ideas as we should, especially if they might translate into a change in the way that we do our work. An idea must be strong and clearly stated to survive. When this idea was first propounded, it was tested and evaluated. This single idea, which grew from testing certain kinds of software, became an “industry standard.”
Unfortunately, accepted ideas don’t always get the same careful scrutiny that new ideas receive. When the world-wide web came along, we carried it over to web site testing. Evidence that this was a mistake slowly accumulated. Today the idea of five users being enough to test a web site seems like the most naive sort of wishful thinking.
How did we ever come to believe it?
In 1993, usability was between a rock and a hard place. On one side marketing people complained that data based on just a few users was too insubstantial, so why pay attention to it?
On the other side time and money pressure made it necessary to fight for every single test. No one could say what constituted a reasonable and sufficient number of test subjects. As a result, it was hard to estimate, let alone justify, the cost of a meaningful test, and consequently, hard to get testing done at all.
Two papers, published by Robert Virzi in 1992 and Jakob Nielsen and Thomas Landauer in 1993, made a pretty good case that five users would uncover 70% of major usability problems and the next three would get most of the rest—for certain types of software. The usability community took the idea and ran with it.
Arguments advanced in the articles justified five to eight users as a significant number. It was also an affordable number. All sides were satisfied. Let the testing begin!
The two papers gave five to eight users a scientific basis for the testing of small software applications. However, extending the “five user theory” to larger software applications and the world-wide web turned a useful rule into a myth with no scientific basis.
Good myths are plausible explanations that serve a purpose. Feel free to chuckle, but at one time more people believed that a big guy in the sky with a hammer caused thunder and lightning than ever believed five users was enough to test anything, even though praying to Thor never did stop the rain. The myth served its purpose, which was not to control the weather, but to keep the Nordic priesthood in power.
What did the five-user myth accomplish? It reconciled test plans with testing budgets! If five to eight users are enough, then it’s safe to act on the results of a test series with only five to eight users.
Back off, marketers! Back off, CFOs! Look at these papers full of nice chewy math! We know what we’re doing!
The five-user theory was so plausible, and went over so well, that we applied it in places the original authors never intended. It appeared to keep working, and no one challenged its broader and broader application.
Like a cartoon character running blindly off a cliff, the five-user myth defied gravity for quite a distance before looking down and discovering just how far it had overshot its scientific basis. Myths do that. This makes them dangerous company for real people.
The facts are these: Neither article ever actually said that 5 users was sufficient for all software testing—let alone web site testing—and they are not. Tests of web sites and complex software will continue to discover new and serious problems long after the fifth, tenth, or fortieth test. For most usability testing, the five-user myth has no visible means of support. It was long overdue for a reality check.
If we give up on the five-user myth, what do we lose? We lose what we always lose when giving up on a myth: false confidence. That’s actually the worst of it.
We need a better justification for the number of test users, but we never really had one—we only thought we did. We have to accept that our past results were partial, not complete—but they are still accurate. Since we cannot “cover” our agenda with five users—we must find other ways to make testing more effective, such as better-focused tasks or improved user screening. This is all to the good, really. We give up arrogance, but we gain humility.
We embraced the myth of five because it gave us an answer we needed to get down to testing, without considering its impact on test results. What other forces tempt us to take such long walks on short piers?
You have undoubtedly heard that users give up because pages take too long to download. This is also a myth. Testing shows no correlation between page download time and users giving up. How does this myth continue to defy gravity?
A large part of its strength has roots in an appeal to our feelings. The idea seems automatically plausible because we have all been impatient and in a hurry. Of course, we forget about the delicious impatience of waiting for something good, like Christmas Morning, the Super Bowl, or the next Lord of the Rings movie.
Beware of appeals to your feelings in defense of an idea! Plausibility is not science and feelings are not observations.
Here’s another myth—users will leave a site if they don’t find what they want after three clicks. In fact, on every site we have tested in the last three years, it takes more than three clicks (except for featured content) to reach any content at all. Not a single user has left any of these sites within three clicks, and only a handful chose featured content links. “Three clicks” turns out to be a false constraint, focusing designers on objectives that will not necessarily benefit users or improve the site.
The three clicks and page download myths both give designers target criteria for site development that don’t require usability testing to measure. This is very tempting.
Developers don’t have the time and energy to reinvent the wheel every time they lay out a page. It’s a luxury to have criteria to simplify choices between design alternatives. (A dangerous luxury.) In this way real-world pressure accounts for the persistence of a lot of myths which turn out to be pretty flimsy when you look at them closely.
Following myths doesn’t mean that we are doing the right thing—we are just doing what the majority—at the time—thinks is the right thing.
Myths are most destructive when they displace expertise and testing. This happens over and over because they are so simple to understand and easy to accept and apply. Practically anyone armed with a couple of myths (your CEO, your mother-in-law, the guru in the next cube) can criticize a project and direct its development without the support of a serious user testing effort. Myths seem to make science superfluous. Until you look down and gravity takes over.