For an Edge Condition, Seeing the Problem is a Problem

by Jared M. Spool

Originally published on medium.com on November 24, 2015.

Suddenly, there was a collective gasp from the crowd behind me. I knew immediately there had been a disturbance in The Force.

I turned around and noticed the other 50 or so JetBlue passengers staring intently out of the airport window, their gaze fixed on a plane—our plane—which was now pulling away from the gate with none of us on it. I turned back to the gate agent in front of me and said, “Our plane is pushing back.”

“Yes,” he said without looking up from his terminal, “it’s on the way to the maintenance hanger. We have to get the gate area ready for the 8 o’clock, which just landed.”

None of this surprised me. At 7:15, when the captain and crew from the 7 o’clock flight—the flight I and all these other passengers were booked on—unceremoniously walked out the jetway door and away from the gate, I figured we weren’t going anywhere. While I noticed their quiet egress, most of the other passengers hadn’t. Without any additional announcements from the gate agents, everybody was still in the dark about whether the 7 o’clock would ever leave. Now the disappearing plane seemed to confirm their worst fears.

I kept wondering: Could the design team at JetBlue create a solution that would have dealt with this? How would you design a great experience around a plane that suddenly needs maintenance? (These days, I spend a lot of time waiting for planes to be fixed. I think about this a lot.)

Small Percentages and the Rule of Large Numbers

JetBlue, like all airlines, will tell you this is rare. They’ll say 99% of their flights take off on time without any maintenance delays.

And they are correct. In a typical day, JetBlue puts their planes in the air 1,000 times. In that day, they might only have 10 maintenance issues that cause a serious delay of more than a a few minutes. (When talking about delayed flights, the takeoff isn’t as important as the landing. A serious delay is one that puts passengers at risk for missing a connection or other important event.)

But, across a year, that’s 3,650 seriously delayed planes. With an average of 170 passengers per flight, that’s 620,500 JetBlue passengers each year that will likely have a less-than-desirable flying experience. JetBlue is only one airline, one which has amongst the smallest number of flights and amongst the newest fleets (that should be more reliable).

When dealing with large numbers, small percentages are also large numbers. (That’s the rule of large numbers.) One percent maintenance failure, when applied to every passenger on every plane that has that failure, is still a large number of passengers.

It makes sense that an airline design team would want to help their 620,500 customers have a better experience when problems come from a maintenance delay. It’s not hard to see where you could improve the experience. You could deliver better communication of what’s going on, automatic rebooking to the final destination (even when it’s through a different airline), automatic baggage routing and notification, and automatic “distressed traveler” accommodations for lodging and meals (which the airline may be obligated to provide). Computers could do these things, as long as someone designed and built a system to do so.

What would those solutions look like? How would the airline’s already heavily-burdened IT systems know to make these arrangements and communicate them to the passengers?

Complexity from Systems

I made it onto the 8 o’clock flight. Other passengers weren’t as lucky. Checking the flight status, the original 7 o’clock finally left the airport at 9:42.

There were several groups of JetBlue people involved in this flight: the gate agents, the pilots, and the flight attendants were who I saw. There were also maintenance people and airport operations teams involved. JetBlue’s flight operations team was dealing with the inevitable downstream effects of the delay. And there were the passengers. (We can’t forget the passengers.)

Part of what made the experience frustrating for the passengers was how these groups weren’t talking to each other. The pilot and crew left the flight, not saying a word to the gate agents. The digital flight board, 20 minutes after it’s planned departure time, still said the flight was leaving at 7 o’clock—nobody from flight operations had updated it with a new time.

Each group is an independent system, doing its own thing to make this flight happen. The mechanics are working with one set of procedures, rules, and constraints. The pilots have their own, as do the gate agents and flight operations. Each of these independent systems have been finely coordinated to work well when everything happens as planned.

These independent systems are not finely coordinated when they encounter the maintenance delay. They are an edge condition, an instance of system coordination that happens outside the norm.

While it’s relatively simple to design processes, procedures, and support tools for independent systems working well together, designing for edge conditions is a level of complexity most teams aren’t ready for. More importantly, they don’t have the tools to make it happen.

Falling in Love with a Complex Problem

There’s an old saying: Great designers don’t fall in love with their solution. Great designers fall in love with the problem.

It would be easy to imagine a collaborative tool, on smart phones and other portable devices, that each team could use. Mechanics could describe what they think the problem is. They could put in an estimate on what it will take to fix it. Pilots and gate agents could see that information and relay it to the passengers. Or the information could find its way directly to the passenger’s own devices.

We’d also need to help with the rebooking issue. How do we get passengers on their way with a minimum of disruptions? Rebooking passengers en masse creates a further communication problem (how does a passenger know to run to a gate elsewhere?) and puts strain on the system. Plus, the mechanic’s estimates are known to be fallible. A problem that looks bad can often be fixed by simply rebooting the plane. (A comforting thought, eh?)

If the design team focuses on solutions for when passengers experience a delay before the team figures out how the independent systems work, their solutions will fail. They need to spend time researching the independent systems to see where the failure points are and form a rich understanding of the constraints and interactions. They need to fall in love with each of the independent systems and the problems they create.

Observations and Interviews Can’t Get Us There

Our go-to tools for user research are the old standards of observation and interviews. We march into the field and see the problem as it happens. We interview the people involved, to understand their context and how they think of it.

For edge conditions, seeing the problem occur is, well, a problem. The JetBlue team could wait in an airport concourse until one of their flights suffers a maintenance delay, but that’s cost inefficient. Being there to observe the problem in progress, even when so many happen, near impossible.

If they understood the situation, they could interview the mechanics, pilots, gate agents, and operations personnel after the fact. However, that presents its own problems. It’s hard to ask the right questions to get to a deep understanding when you don’t know what questions you need to ask.

The old tried-and-true methods of observation and interviews won’t cut it. We need something else to get us the deep insights that will tell us how to design for complex independent systems dealing with infrequent, edge condition issues.

Collective Story Harvest

Marc Rettig has seen problems like this JetBlue problem before. In his work at Fit Associates, he regularly works with organizations that need to solve gnarly independent system interaction issues. His clients have a good handle on how to make things work when all the systems are functioning normally. It’s the edge conditions that get them, and that’s where Marc comes in.

I asked Marc how he’d try to understand what’s happening in this problem. He told me he’d employ one of his favorite techniques: a Collective Story Harvest.

When Marc facilitates this technique, he invites all the project influencers together, including the design team, stakeholders, implementers, and others who will have a say in how the final solutions will look. He then invites storytellers—individuals who have experienced the problems the group is trying to solve. In the JetBlue problem, storytellers would be mechanics, pilots, gate agents, flight operations staff, and even passengers who have experienced a maintenance delay.

Marc then divides the project influencer group into smaller groups, with one storyteller in each. He’ll ask each of the project influencers to take on a lens to listen through. One lens might be the story narrative, while another might be the breakthrough moments, and another might be the principles that could be learned from the story.

As the storytellers relives their experience of a maintenance delay, each group member listens intently while they focus on their assigned lens. After the storytellers finish, the group reflects what they heard, repeating back to the storytellers the important elements they learned.

Marc next asks each project member assigned to each lens to compare their notes with everyone else assigned that lens. All the narrative folks share the different narratives they heard, while all the folks who were listening for breakthrough moments compare theirs. Each team member shares the story they heard to the rest, looking for similarities and contrasts, and strong patterns emerge.

What comes from a Collective Story Harvest session is a deep, rich understanding of the problem. The Storytellers, after hearing what the team learned, report they also learned new things about their experiences. And the project influencers walk away with a much better understanding of the dimensions of the problem and how the independent systems interact (or don’t) when the edge condition occurs.

Storytelling: Unleashing the Power of Retrospection

We’re taught in User Research 101 that you should not listen to our users. That what our users say isn’t as important as what they do. That’s why we focus on observing their behavior.

Yet, when dealing with edge conditions, our observations won’t work. This is why techniques like Collective Story Harvest are so important. (And there’s a bunch of techniques that are just like a Collective Story Harvest.)

These techniques don’t ask the users to solve the problem. Instead, the techniques use the power of story to take apart what makes these problem complicated. By asking a storyteller to relive a difficult experience, you hear it through their point of view.

It’s when we compare the points of view of different storytellers that we start to see the patterns. When we ask people to reflect on what has happened, and apply different lenses while we listen, we create an environment of retrospection that’s very rich. From that environment emerges the details, subtleties, and nuances that will help us come to a shared understanding of the problems we’re trying solve.

We’re not asking our participants to tell us what to build. We’re integrating the stories they tell of their experiences into our collective understanding of the problem. We’re on our way to loving the problem more than we can ever love the solution.

Published here on December 2, 2015.