Front End Concerns When Implementing Faceted Search – Part 2

by Daniel Tunkelang

This article is an excerpt from Daniel Tunkelang’s book Faceted Search.

This is the second part of a two part article. It is recommended you read part 1 first.

The Search Box

Although our discussion of faceted search has focused on the use of facets for refinements, it is important to remember that faceted search is still a type of search—and that often the entry point into a faceted search system is the search box. Without the search box, we would only have a faceted navigation system—a useful interface for many applications but too limited to enjoy the broad success of faceted search.

Combining free-text search and faceted refinement is powerful: it allows users to create semi-structured queries and thus access structured and unstructured content. However, the search box also raises significant design challenges for application developers. The designer of a faceted search system must make a number of choices about how the search box behaves:

Should a search query adhere to the current query filters?
Should search look at all of the text in each document, or should search be restricted to specific fields?
How should the search handle multiword queries by default? Should the words be combined as an OR (i.e., match any word), as an AND (i.e., match all words), or as a phrase (i.e., the words must all occur in a document and in that exact sequence), and should the user be given a choice in this matter?
Should search queries be subject to query expansion, such as matching words that are variants of query terms?
Should systems present multiple search boxes, a parameterizable search box, or an advanced search interface?

These are open-ended questions and only represent a subset of the questions about search behavior that face designers of faceted search applications. We will try to supply some answers—or, at least, guidance.

First, let us consider the question of whether the search query should respect the current query filters. In the most common use case for faceted search, a user initially enters a free-text search query and then follows it up by one or more refinements using the facets. For example, a user types in “digital cameras” and refines on a specific megapixel range. But how do we handle deviations from this common case, for example, when the user first narrows the document collection by selecting a facet value and then performs a free-text search? Does the text search adhere to the faceted refinement or start a new query from scratch?

The conventional and probably safest approach is for free-text search to default to clearing all other filters or to offer users the options to search within the current results, for example, by clicking on a check box indicating that the user is explicitly choosing to search only within the set of results that is currently being viewed. The interface for the Triangle Research Libraries Network in Section 5.1 illustrates this check-box approach.

Figure 7.4 suggests an alternate approach, preserving query context by default and signaling this behavior to users by labeling the search box with “search within these results” before a user starts typing.

results in Forrester — Figure 7.4 : Search within results at www.forrester.com.

Now let us consider the question of whether we should match a search query against the full document text or only a restricted set of text fields. A common approach for search engines is to perform search against full text and to rely on relevance ranking to push more relevant results to the top of the results.

Although this approach may work well in a conventional search engine, it can undermine the effectiveness of faceted search. As we discussed in Chapter 3, faceted search builds on a set retrieval model: the faceted refinements reflect all of the results not just the results that the system judges to be most relevant. If most of the results are not relevant, then the faceted refinements may not be especially useful, especially if they are presented with counts indicating their distribution over the result set.

An alternative approach is for the default behavior to err on the side of precision, for example, searching only against the title field unless the user explicitly asks to include other fields. The key design question is whether the benefit of increased precision cost usually outweighs the cost of decreased recall. It is also possible to hedge, to obtain search results from a broader search query that favors recall, and to derive the faceted refinements from a narrower search query that favors precision.

The search box can also serve as a way for the user to search the set of facets, not just the documents themselves, as shown in Figure 7.5 (the “categories” are facet values). An advantage of this approach is that the set of documents assigned a particular facet value is often a more accurate result set than the set of documents containing those words in their text. It is even possible to search against the set of combinations of facet values [103].

results in Home Depot — Figure 7.5 : Results for “steamer” at www.homedepot.com.

Multiword search queries and query expansion also raise issues of precision and recall. However, because of their familiarity with web search engines, most users have become accustomed to search engines that interpret multiword search queries as an unordered conjunction (an AND, not an OR), for example, a search for faceted navigation returns documents containing both words but not necessarily in that order or even next to each other. While we should not be fatalistic about conventions, we must recognize that flouting them will incur some amount of user confusion.

The conventions for query expansion are less established, but most users expect, at a minimum, that they do not need to worry whether they use the singular or plural form of a noun (i.e., that both will return the same results). More aggressive query expansion (e.g., employing a thesaurus to obtain additional matches for words related to the query terms) is again a precision/recall trade-off. More importantly, it calls for transparency to avoid confusing users with unexpected and unexplained results. Any expansion that is unintuitive to a user is not worth the risk of confusing users and thus undermining their faith in the system.

Finally, there is the question of whether to offer users multiple search boxes, a parameterizable search box, or an advanced search interface. There are no hard and fast rules here, but multiple search boxes with different behaviors have the potential to confuse users. A few users will configure a parameterizable search box, but most will never change the default search behavior. Hence, it is critical that the default search behavior be reasonable. Similarly, whereas a minority of users will appreciate the opportunity to use an advanced (typically parametric) search interface, many will never even discover it exists. To avoid confusing the majority of users, many successful retrieval applications place their advanced search interface on a separate page.

Multiple Selection from a Facet

The most common use case for faceted search or navigation is to select at most one value per facet, but there are at least two ways from which a user might select multiple values from the same facet:

Disjunctive (OR) selection. Selecting a range (e.g., a price or date range) may be a kind of disjunctive selection, depending how the values are represented.
Conjunctive (AND) selection.

The design challenge is to communicate to users whether selecting multiple values from a particular facet is disjunctive or conjunctive—particularly if the site offers both behaviors. Users are notoriously bad at inferring Boolean logic from subtle cues.

It is important to use an interface that not only is self-consistent but also adheres to familiar conventions. For example, the check boxes in Figure 7.6 adheres to the convention for disjunctive selection.

results in Help — Figure 7.6: Results for “ribs” in New York at www.yelp.com.

There are fewer interfaces that allow conjunctive selection from the same facet, but a convention for those that do is to present the selections as ordinary links, as with the topic facet in Figure 7.7.

results in fcla — Figure 7.7: Results for “tax law” at www.fcla.edu.

The approach may make the user think that he or she is drilling down a hierarchy, but fortunately that misinterpretation is consistent with the narrowing effect of conjunctive selection.

Perhaps most importantly, we urge caution in combing disjunctive and conjunctive selection in the same interface. Users who can understand such a complex process will be better served by the ability to construct Boolean queries at a command line.

A rule of thumb is that facets that are typically singly assigned to documents (e.g., brand, document type) work well with disjunctive selection, whereas facets that are often multiply assigned to documents (e.g., consumer electronics features, topic) work well with conjunctive selection.

Design Patterns

The previous sections have focused on specific aspects of faceted search interfaces that raise front- end usability concerns. A more holistic approach to these concerns is to follow design patterns.

A concept that originated in software engineering, a design pattern is a general, reusable solution to a commonly occurring design problem. The definitive textbook on design patterns is a book of that title, written by the illustrious “Gang of Four” [104]. Today, the term architectural pattern refers more to a generally established solution for a problem in software architecture.

Although faceted search is relatively new, particularly in terms of mainstream adoption, there are already collections of design patterns for it. In particular, Peter Morville maintains a collection of search patterns that address faceted search [105].

Other designers who have built pattern libraries include Martijn Van Welie [106] and Janne Lammi [107]. The Yahoo! Design Pattern Library [108] more generally addresses the design of Web sites, but some of its patterns are useful for faceted search applications in particular.

Front-end design is a complicated, open-ended challenge, and the best we can hope to offer are rough guidelines. Pattern libraries assemble best practices from deployed applications and thus complement theoretically motivated advice from the wisdom of practical experience. Moreover, because they use concrete examples, they may supply details that, although minor, make crucial differences to the look and feel of a faceted search site. We encourage readers to learn from and contribute to pattern libraries.

Take-aways

The power of faceted search can overwhelm and confuse users if it is implemented with a poor design.
Choosing the correct layout of facets and results in an interface can be a trade-off between the work required for users to see results and the likelihood they notice the facets.
For ambiguous queries, users may benefit from a facet-driven clarification dialog.
Strategies to avoid information overload by filtering facets and facet values also offer ways to rank and organize them.
Users generally expect that initiating a new free-text search will clear current query filters, but some applications provide the option to search within current results.
A number of decisions about search behavior involve a precision/recall trade-off: consider computing results to favor recall but computing the utility of facets and facet values based on a narrower search query that favors precision.
An interface allowing multiple selections within a single facet should not only be self- consistent but adhere to familiar conventions.
Consider the use of design patterns to take a holistic approach and learn from the collective wisdom of practitioners.

Published here on August 10, 2009.