A Counter-Intuitive Approach to Evaluating Design Alternatives
Last week, a client called looking for advice on their first usability study. The client is a large consumer information site with millions of visitors each month. (A similar site might be a large financial information site, with details of individual stocks, investment strategies, and “celebrity” investor/analysts that people like to follow.)
They are about to redesign their home page and navigation. They have three home page design alternatives and five navigation alternatives, created by an outside firm who didn’t do any evaluations of the designs.
To help figure out which design to pick, the team has (finally!) received approval for their first usability testing study. While their site has been around for years, they’ve never watched visitors use it before now.
Up until now, management has perceived usability testing as a nice-to-have luxury they couldn’t afford, primarily because of the time it takes. The team called us because they are very concerned everyone views their very first test as an overwhelming success.
They fought long and hard to get this project approved. If it’s a success, it will be easier to approve future studies. If anyone thinks that it didn’t help pick the right design, it will be a huge political challenge to convince management to conduct a second project.
The Challenges of Comparing Design Alternatives
When we started our conversation, the first thing the team members asked was how to compare the design alternatives. Ideally, they thought, we’d have each participant try each of the home page designs and each of the navigation designs, then, somehow render a decision on which one is “best.” After two days of testing, we’d tally up the scores, declaring a winner.
Comparing designs is tricky under the best situations. First, you have to assume the alternatives are truly different from each other. If they aren’t, all the alternatives may share a core assumption that could render each as a poor choice.
Assuming the team has done a good job creating the alternatives, the next problem is evaluating them with users. To do this, you’d need to run each alternative through a series of realistic tasks.
Choosing tasks is difficult in any study, but it’s more complicated when the team has never really studied their users in the past. They’ve collected some data from market research and site analytics, but, as we talked to them, it was clear they weren’t confident they understand why people came to the site.
If we think the team could come up with realistic tasks, there’s still one more big challenge: evaluating all the alternatives. Since they wanted to test new designs, the best thing is to test against a benchmark.
A minimum study design would have each alternative (along with the current design) going first, to correct for “learning effects.” (Learning effects happen in studies where the tasks and design alternatives are similar. How do you know if the second design succeeds because it’s better or because the user learned something from the first design?)
For ratings, we wouldn’t recommend less than four people evaluating each alternative in the first slot. That means, for six alternatives, we’re talking a minimum of 24 users.
This presented the problem—there’s no effective way to test all these alternatives with 24 users in their allotted two days, within their budget. We needed to think creatively.
Looking at the Problem Differently
What would happen if we didn’t ask users to pick the best alternatives? The team would need to decide on the alternative themselves. Instead of testing all the design alternatives, we suggested focusing on the current design and then using the insights gleamed to inform the decision process. Instead of a study that compared designs, we recommended the following steps:
Step 1: Build a Weighted Differences Matrix
First, we suggested the team use some of the planning time to build a matrix of the differences between the alternatives. Each line of the matrix would reflect something different between the original design and the alternatives.
The group would then assign a weight—between one and five—that would represent how important each difference is to the user’s success. They could put in a similar number under each alternative, showing how well that variant meets that need. Finally, the team could add up the scores for each alternative to see “best” one.
Step 2: Recruit from 2 User Groups
We recommended the team recruit both loyal and new users as study participants. The first day of testing should be loyal users of the site and the second day should be new users to the site. The loyal users would help figure out what the important tasks are. The new users will help determine what’s important for people new to the site, such as how they figure out the basics.
Step 3: Use an Inherent Value Test Protocol
The Inherent Value Test finds out what is valuable about a current design from loyal users. Then it helps identify if the design communicates those values effectively to the new users.
For each participant from the loyal user group, the moderator would inquire about their current usage. The moderator will learn why the user comes to the site, what they last tried to use it for, and how that worked out. The moderator would then ask the participant to repeat a recent activity, demonstrating to the team the values that keep them loyal to the site. The team will learn what makes the site great for the loyal users who repeatedly visit.
For each participant from the new user group, the moderator would interview the participant to learn which loyal user’s tasks they’d most likely use themselves. Then they ask the participant to execute the chosen task while they discover the value of the site. This would help the team learn how well the current design is communicating the site’s value.
Step 4: Add in the “Best” Alternative
After each participant uses the current design to perform their tasks, have them spend time with the best of the new alternatives, according the Weighted Differences Matrix. This would be more a critique than usage, since the alternative design isn’t functional yet.
However, because the user had just used the existing site, they’d be ready to share how they’d do the same tasks with the alternative. The team would learn the user’s perspective on the differences between the current design and the best alternative.
Step 5: If Possible, Add a Competitor Alternative
If there’s time in each session, spend a few minutes performing the same tasks on a competitor’s site. This will help the team see where they stand competitively and provide some insights in design directions they hadn’t considered. If the participant regularly uses a competitor, we recommended using that site, yielding the extra benefit of discussing the competitor’s advantages.
Step 6: Evaluate the Design Alternatives
When analyzing the study results, we recommended the team revisit their Weighted Differences Matrix. We suggested they add the participant’s tasks and values to the lines of the matrix and incorporate new ratings (and adjust their original weightings accordingly). When done, the matrix will help with deciding the design alternatives they wish to pursue.
Making Informed Decisions
Teams have to make decisions. The most successful teams make informed decisions.
While it may be counter-intuitive, focusing the study on the current design may be the best approach for this client. Asking each participant to somehow rank each design alternative will take more time and produce confusing results. We felt a study that looks primarily at the current design can give the team the most insight into what alternatives, if any, to choose.