Beta — We’re reformatting our articles to work with the new site, but this is one hasn’t been done yet; it may contain formatting issues. No need to report this. We’re working on it.

Usability Testing Best Practices: An Interview with Rolf Molich

by Christine Perfetti
on July 24, 2003

You may have never heard of Rolf Molich. Yet, if you’ve done any usability testing, design evaluations, or heuristic inspections,
then you’ve been affected by his pioneering work.

Since entering the field in 1983, Rolf has produced some of the most impressive
and forward-thinking research on effective discount usability techniques. Two
of Rolf’s more renowned contributions include the co-invention of the
Heuristic Inspection process with Jakob Nielsen and the more recent CUE (Comparative
Usability Evaluation) studies.

The Heuristic Inspection approach turned the usability world on its head when
Rolf and Jakob suggested that you could get value by having experts review
interface designs. However, in recent years, Rolf has revisited his thinking
on this method and now is questioning its effectiveness for all projects.

The CUE studies are the first of their kind. Usability practitioners from
all over the world are asked to evaluate the same interface, using their standard
practices. Rolf, along with Robin Jeffries and other collaborators (including
the interface’s design teams), compared the different results, looking
to see which practices were most effective at discovering and reporting usability

The most famous study, CUE-2 had nine teams conduct usability tests of Microsoft’s
Hotmail interface. More recently, CUE-4 had 18 evaluators (using both expert
inspections and usability testing) looking at iHotelier’s Flash-based
hotel reservation system.

While we’re still waiting for many of the results from the CUE-4 analysis,
the CUE-2 study has changed the way we think about usability testing practice.
(For example, many questions arose about how “scientific” usability
practices really are when there wasn’t one problem that all nine teams

While preparing for Rolf’s full-day seminar at the User Interface 8
Conference, we had the opportunity to ask Rolf about some of his thoughts on
the best practices surrounding usability testing. Here’s what we talked

UIE: Many critics of usability testing argue that usability testing
can’t make up for a bad design. Do you agree that if a design team starts
with a deeply flawed design, usability testing will diagnose many of the
problems, but won’t necessarily point to a cure?

Rolf Molich: Alan Cooper has wisely said “If you want to create a beautifully
cut diamond, you cannot begin with a lump of coal. No amount of chipping, chiseling,
and sanding will turn that coal into a diamond.”

That said, I have helped a lot of my clients produce rather usable pieces
of coal based on simple rules for how to write good error messages, re-phrase
key messages, tune a local search engine, or make other kinds of quick-and-dirty
last-minute changes.

Many usability practitioners believe that "eight users is enough" to
find the majority of usability problems on web sites. In your experience,
how many users is enough for testing?

It depends. The number of users needed for web-testing depends on the goal
of the test. If you have no goal, then anything (including nothing) will do.

If your goal is to “sell” usability in your organization, then
I believe 3-4 users will be sufficient. Much more important than the number
of users is the sensible involvement of your project team in the test process
and proper consensus-building after the test.

If you goal is to find catastrophic problems to drive an iterative
development process
, then 5-6 users are enough with the current
state of the art.

However, if you want to find all usability problems in an interface,
then a large number of users and facilitators will be necessary as shown by
the CUE studies and UIE’s research. In the CUE-2 and CUE-4 studies tests
with more than 50 users brought us large numbers of valid problems, but by
far no exhaustive problem list.

Since you and Jakob first started promoting Heuristic Inspections,
you’ve indicated you’re not as optimistic about the technique.
Where are your thoughts on that particular method today?

Heuristic inspections are cheap, simple to explain, and deceptively simple
to execute. However, I don’t use this method very often and I don’t
recommend it to my clients. In my opinion, the idea that anyone can conduct
a useful heuristic inspection after a crash course is rubbish. The results
from my studies showed that inexperienced inspectors working on their own often
produce disastrous amounts of “false alarms”.

Another problem is that heuristic inspection is based solely on opinions.
No one has given me a good answer to the question that I’ve heard several
times from disbelieving designers: “Why are your opinions better than
mine?” I think that’s an excellent question, particularly knowing
that users often prove me wrong whenever my heuristic predictions are put to
a real usability test.

What prompted the CUE studies?

Curiosity and the need for solid data. With the CUE studies, I wanted to offer
designers and usability practitioners a summary of current, state-of-the art
usability testing practices. At the same time, I wanted to give the participating
usability labs an opportunity to assess their strengths and weaknesses in the
core practices of the usability profession.

What were the biggest surprises when you compared the processes and
reports from each of the nine teams in CUE-2?

What surprised me most was that many of the tests did not fully live up to
what I consider to be sound usability practices

In the CUE-2 study, nine teams tested the Hotmail website. Each team had three
weeks to run the study, which included recruiting their own test participants
and creating their own test tasks. We imposed as few restraints on the teams
as possible to ensure that the teams did the tests exactly as they would have
done if they had been ordinary client projects.

Many of the teams failed at creating professional test tasks that were realistic,
frequently occurring tasks and free of hidden clues and jargon. Some teams
also failed to distinguish between user data and personal opinions. Even more
surprising was how unusable some of the usability teams’ reports were.

What elements were lacking in the test reports? 

Above all, a good usability report must be usable. The main recommendations
I give clients for creating a usable usability report are:

  • Keep it short.
    No more than approximately 50 comments and 30 pages. It’s the job of
    the good usability professional to limit the comments to the ones that are
    really important.
  • Provide a one-page executive summary on page 2.
    Include the top three positive comments and the top three problems. Four
    of the nine CUE-2 teams did not include an executive summary in their reports.
  • Include positive findings.
    The ideal ratio between positive findings and problems is 1:1, but I have
    to admit that I rarely do better than 1:3. The CUE-2 teams ranged from
    no positive comments at all to an excellent ratio of 7:10.
  • Classify the comments.
    Distinguish between disasters, serious problems, minor problems, positive
    findings, bugs, and suggestions for improving the interface. Three of the
    nine CUE-2 teams did not classify their comments at all. The remaining
    six each invented their own classification scheme.

Reports are of course useful, but even a perfect report is useless if it doesn’t
cause beneficial changes to the user interface. For example, good communication
with the development team through effective consensus building is far more
important than a good test report.

(Rolf Molich offers a sample usability test report that attempts to follow
these recommendations at

In the CUE-2 study, there wasn’t a single usability problem that every
team reported. The findings indicate a strong need for improvement in the
usability testing process. Don’t you think your findings undermine
the effectiveness of usability testing?

In my experience, usability testing is very effective for showing your colleagues
what a usability problem looks like in their interface. But I think the study
results indicate that usability testing is ineffective for finding all usability
problems in an interface. Our results also indicate that it’s ineffective
even for finding all the serious usability problems in an

The CUE-2 teams reported 310 different usability problems. The most frequently
reported problem was reported by seven of the nine teams. Only six problems
were reported by more than half of the teams, while 232 problems (75%) were
reported only once. Many of the problems that were classified as “serious” were
only reported by a single team.

Even the tasks used by most or all teams produced very different results –
around 70% of the findings for each of these common tasks were unique.

My main conclusion is that the assumption that all usability professionals
use the same methods and get the same results in a usability test is plain

Given your findings, how can development teams confidently conclude
they are changing the *right* problems on their web sites?

It’s very simple: They can’t be sure!

But if they are humble, listen to their critics, learn from their mistakes,
avoid voodoo-methods, and use regular external coaching to catch bad habits,
they may eventually detect so many real problems that it will drive the iteration
forward in a useful way.

Given your results from the CUE studies, do you think usability testing
will play a major role in creating usable web sites in the future?

Usability tests are spectacular. They are excellent for convincing skeptical
colleagues that serious usability problems exist in an interface. But they
are also inefficient and costly. We should use them mainly in an intermediate
phase to establish trust with our colleagues, and then use much more cost-efficient
preventive methods such as usable interface building blocks, reviews based
on standards and proven guidelines, and contextual inquiry.

I hope that we will one day have huge libraries of generic interface building
blocks that are thoroughly tested with real users and proven usable. I also
hope that we will show how assembling such building blocks into full-blown
websites by usability-conscious specialists will yield websites with a high
degree of usability.

Thanks Rolf!

(All reports from the CUE-1 and CUE-2 studies are publicly available (