Wednesday, September 26, 2012

RITE testing brings the team together


Rapid Iterative Testing and Evaluation - the RITE method - is a way to run lab-based studies that identify and fix as many issues as possible and then verify the effectiveness of those fixes in the shortest possible time. Testing and fixing happens in near real-time, so the whole team feels more involved in the outcome of the sessions.

Traditional usability testing techniques have been one of the biggest stumbling blocks in the move to agile UX. Running eight participants and then crunching the data to arrive at a set of findings takes too much time. In fact, it can take more elapsed time than a sprint, which means the UX people have to disengage from the team and work one or more sprints ahead/behind.

Being disengaged from the team creates its own issues. Development team members aren't going to want to attend a usability session that is looking at features they aren't working on. As a result, the UX team members have to create more documentation to communicate the results of the study. If the dev team think they are finished with the work, they aren't going to want to revisit it to fix issues that were found. If the test is on prototypes for future sprints, the whole product could have changed direction by the time the dev team come to work on the tested features.

Traditional user testing protocol focuses on the wrong things 

On commercial teams, usability testing is a way to find and fix the big issues that stop customers successfully achieving their goals. The aim is to ship an improved interface as rapidly and cheaply as possible. It's generally more important to discover the big issues than to find every potential issue.

Traditional usability testing methodology comes from academia and was designed to meet a different set of goals. Speed and cost were much less important considerations than accuracy and reliability. Traditional test protocols are designed to remove any experimenter bias. Unfortunately this means that they also remove the ability for expert observers to influence the study when it is underway by, for instance, fixing something that is very obviously broken.

Enter RITE testing - a tool for expert observers

The RITE method was developed at Microsoft partly as a formalization of something that had been going on for some time anyway. As soon as you get product decision makers (developers, program managers, UX folks) in the room watching a study, they start to brainstorm solutions. Often the issue is so glaring and a fix would be so easy to implement that it makes sense to make the change and see how it works with subsequent participants.
Formalizing that process involves a couple of procedural and philosophical changes.
  • Setting the expected success rate for tasks before the study starts
  • All decision makers attend the sessions and see the issues for themselves
  • Developers commit to coding fixes "on the fly" wherever possible
  • Usability feedback from UX team happens immediately after each session and is inclusive (everyone gets a say, UX team keep it real by explaining psychological or design rationales to estimate issue severity)
  • Enough participants are run to ensure any changes really fixed the problems found. 
With the team's buy-in and up-front goal setting, it is much easier to iterate the product during the evaluation time while still keeping the test environment unbiased for each participant.

How RITE works

RITE testing is very much like "traditional" usability testing with the following exceptions:
  • Development team and other stakeholders commit to attending and actively observing.
  • More time is allowed between participants for issue discussion and resolution - add at least an extra hour between participants. You'll probably only get through three a day rather than five. 
  • Consider adding a resolution coding day between testing days (test Monday, code Tuesday, test Wednesday, code Thursday, test Friday).
  • The absolute number of participants might not be known up front - you might need to schedule some fall-back participants.
  • The development team agrees the tasks that users must be able to do, and what constitutes "success."
After each participant session, the observers discuss the issues they saw, and categorize them.
  1. Issues with obvious cause and solution, quick fix - Fix and test with next participant
  2. Issues with obvious cause and solution, big fix - Start fix now, test with fixed code when stable
  3. Issues with no obvious cause (or solution) - Keep collecting data, upgrade issue to 1 or 2
  4. Issues caused by other factors (test script, participant) - Keep collecting data, learn from mistakes
Developers make the changes they can implement before the next session (Category 1), and start working on any bigger fixes (Category 2) with the aim of getting them into the code for testing later in the week.  It obviously helps to have a development platform and framework that allows for quick fixes!

Plotting the observed issues on a graph might give you something like this:

Graph redrawn from original data in Medlock et al paper (link below)
Here, errors are shown as red squares. These are mistakes that the participant could recover from but shouldn't have experienced in the first place. The blue diamonds are failures. These are bugs, design issues or conceptual problems that stopped the participant from completing their task at all.

You can see how a change to the code after the first participant actually increased the number of errors and failures. That is because the change removed a big issue that prevented people from even moving forward in the product. Now, participants were able to experience more of the product and thus find more issues.  

The code was changed six times in all, either immediately after a session to remove a blocking issue, or after a couple of sessions if it was a category 2 issue (took more time to fix) or category 3 (needed more observations before the fix was apparent). What is important is that six more participants were tested after the final code change. That allowed the team to be sure that they'd found and fixed all the issues they were going to observe with this code. 

RITE benefits and cautions

RITE is not an excuse to do sloppy coding or UX work. If anything, it takes more coding discipline to make a product that can be changed quickly, and more UX experience to run a study where variables are changing on you with each participant. It also takes commitment from a team who might initially be wary about spending time observing users instead of writing code.

However, RITE testing is a great way to get an engaged team to improve the product in a short space of time. The efficiency of the technique can make user testing viable on agile teams who would otherwise have refused to take the extra time to take the product into a lab test. It's also a wonderful way to get everyone on the same page about user issues. Because the decision makers on the team are seeing participants struggle, they are extra-invested in making sure the product improves. They have a shared vocabulary around issues and will often refer back to "that participant who..." in other conversations.

Learn more

You can read more about RITE in a paper by its creators, Michael Medlock, Dennis Wixon and others on the Playtest team for Microsoft Games. Disclaimer: I worked with these guys while they were formalizing the process, so I may be a bit positively biased about the technique.


Creative Commons License

No comments:

Post a Comment

Please keep your comments respectful, coherent, on-topic and non-commercial.