Why Analyzing Qualitative User Tests Is Complicated
Analyzing data from qualitative usability tests is often presented in textbooks as a straightforward process of cataloging usability issues, summarizing task performance, tallying errors, and summarizing time on task. However, it’s not always so easy. This is especially the case when you have:
- Unusual study designs (such as multiple prototype versions shown to each participant in a session)
- Complex research questions related to topics such as discoverability, comprehension, or how people solve problems
- Low-fidelity prototypes that don’t have all the features, content, or screens available
Let’s imagine we’ve designed a new product-details page that we’d like to test. Participants will be asked to find specific products on a prototype and choose one that’s best for them. Below are some of our research questions.
- Is the product-comparison feature discoverable?
- What information about the product is important to users and what isn’t?
- Do people understand the product-overview information?
- Do people understand how the product works? Do they form the right mental model?
It might be hard to answer such research questions by studying performance in one task. To answer these questions successfully, we’d need to:
- Gather multiple pieces of data (what people did and said at multiple points in the session)
- Weigh the data against the study design, recruitment particulars (who participated in our research), and even facilitation events (i.e., what the researcher did or said)
- Triangulate all this information to provide a trustworthy answer
Analysis and Synthesis
Moving from data to insights and recommendations requires a combination of two activities: analysis and synthesis.
Analysis refers to the breaking down and inspection of complex information, whereas synthesis refers to the recombination of information into new meaningful forms, namely insights. (Note that when we talk about “analysis” as a stage in the research process, we’re really referring to both of these activities.)
When we analyze qualitative data, analysis and synthesis don’t happen in a neat linear fashion. Sometimes, we move back and forth between the two.
4 Steps to Analyze Usability-Test Data
To illustrate how we reach insights from usability test data and what the process of analysis and synthesis look like in this context, we propose a 4-step framework.
- Collect relevant data: Select data points from each session (observations and quotes) that are relevant to a research question. This step involves analysis and reducing the data set to something more manageable.
- Assess for accuracy: Examine each data point to assess its relevancy and accuracy. We’re still in the analysis phase here.
- Explain the data: Combine the data points (step 1), our assessments (step 2), and our expertise to provide reasonable explanations or answers to our research question. This step involves synthesis.
- Check for good fit: Check our explanations or answers to our research questions against the data we collected to ensure good fit. Does all the data we collected support the explanation? If not, we iterate upon the explanation (by revisiting step 3). Step 4 is where analysis occurs again.
Even though this process seems linear, steps 3 and 4 are often iterative. In practice, we tend to form initial explanations based on a small subset of data — often data points that are easy to recall or stand out. However, when we test those explanations against the broader dataset, we may notice inconsistencies or overlooked patterns that challenge our initial thinking. This prompts us to refine our explanations. This back and forth between steps 3 and 4 can occur many times.
Step 1: Collect Relevant Data
Usability tests generate a lot of data. In the Collect stage, we start to gather all relevant data points or observations that may help us answer our research questions. (This is a little bit like picking apples in an orchard — we're picking apples that look promising.) To do this, we inspect session notes, transcripts, and recordings, if available. We note or code relevant observations and quotes.
For example, if we were looking to understand whether a comparison feature on the product-details page is discoverable, we might revisit notes and recordings to collect data points on the participants’ behavior, in-the-moment comments, and answers to the facilitator’s followup questions when performing relevant tasks. Questions we might ask include:
- Did participants use or notice the feature (as indicated, for instance, by hovering the mouse cursor over the feature)?
- Did participants mention the comparison feature or the need to compare while thinking out loud or in answer to some of the facilitator’s followup questions?
- If they didn’t use the comparison feature, how did participants perform the comparison task?
This step involves analysis since we’re breaking down the full data set and extracting a smaller set of useful items.
Step 2: Assess for Accuracy
In step 2, we’re still in the analysis stage (rather than synthesis). In the Assess step, we scrutinize data points to assess how relevant each is and how much weight to place on it. Not every data point is treated equally. To continue our apple-picking analogy, this is like the apple picker inspecting each apple for bruises or other imperfections.
For example, maybe a participant commented that they liked the comparison feature but they never used it. Or perhaps this comment was in answer to a leading question from the facilitator (e.g., Did you like the comparison feature?), Knowing these details makes us trust the data point a little more or less.
Step 3. Explain the Data
In step 3, we begin the synthesis process by pooling our observations, assessments, and domain knowledge to generate likely explanations (or hypotheses) for the data we’ve collected.
Sometimes several explanations are possible given the data we collected. For example, if all our participants missed the comparison feature we designed, the following explanations might be generated.
- Possible explanation 1: The feature was not in the place people expected.
- Possible explanation 2: It was hard to see or notice the feature.
- Possible explanation 3: The feature wasn’t useful to participants in the task, so they didn’t look for it.
Coming up with explanations requires UX knowledge and experience, as well as an element of imagination. For example, if a researcher has conducted research on similar designs in the past, they might have lots of memories of human behavior and knowledge of design best practices to draw from.
Step 4. Check for Good Fit
To narrow down our possible explanations and build confidence in our explanations for user behavior, we test our explanations against the existing data for a good fit. Does the existing data support the explanation? Can we find data points that conflict with it or cause us to doubt its accuracy? This step is like seeing if a puzzle piece fits neatly.
For example, if participants didn’t use the comparison feature on the product-overview page and our interpretation was that users don’t need it, then we would expect that participants utilized another successful strategy to compare products. However, if participants struggled to compare, opened many product-overview pages in multiple tabs to compare, or complained that it was hard to compare the available products, we might reject the explanation since it seems inaccurate given the data.
What’s happening here is a version of hypothesis testing. Our explanation drives predictions; if the data does not support our predictions, we reject or refine our explanation.
Qualitative research often uncovers valuable insights and answers — but it can also raise just as many new questions that need further research! It’s okay for your analysis to end with reporting that you don’t really know the reasons behind the data and that more research is needed. However, the mark of a good analyst is exploring their data from multiple angles, testing multiple explanations against the data, and putting forward some sensible hypotheses to explore in future research.
Conclusion
Analyzing data from qualitative usability testing is often more complex than it’s portrayed. This type of data is rich, nuanced, and messy. A good analyst collects all the right data, evaluates it from multiple angles, seeks explanations that fit the data, and tests those explanations for good fit.