

Instead, clearly specify whether participants were asked to think out loud. Even though it is not necessary to use think aloud in quantitative studies, do not assume that everybody will know that. Sticking to the same platform ensures a sound base for comparing results of different studies. In such tests, the study is usually run within an unmoderated platform (e.g., UserZoom, ). This is particularly important in unmoderated tests. The type of study (moderated or not, in-person vs.All the methodology details- ranging from the type of study to task phrasing - need to be well documented so that researchers who will run the subsequent iterations of your benchmark study will be able to replicate it. The dependent variables (or metrics) of interest and their exact definitionsĭifferences in study methodology impact the results of the study and ultimately bias our comparison in one direction or the other.The screener used for recruiting participants.

There are several aspects of the study that need to be well documented so that anybody will be able to replicate the study in the future. People who are involved in the UX evaluation of subsequent product versions should be able to refer to both the study information and the raw data from the study when they need them. However, it should be easily findable and accessible. This repository could be part of the research repository maintained by the company or could be separate. Usually, the study details and the raw data collected in the study are stored in a benchmark research repository. Second, a statistical analysis should be run to determine if the difference is statistically significant or due to noise.Īny company that considers a benchmarking practice should first define the benchmark study in minute detail and document it so that subsequent versions of the same design could be evaluated using the exact same methodology and metrics.

First, for the comparison to be fair, the two iterations of the benchmark study must have used the exact same protocol and definition of the metrics collected. The difference between these numbers, however, should not be taken at face value. In this example, the time on task (as obtained in two iterations of a benchmark study) was higher for the original version of the design than for the redesign. In other words, the studies need to have the same methodology and collect the same metrics. To soundly compare the results from the two studies, the company needs to compare apples to apples. Thus, if we notice a difference in task time or success rates between the two design versions, it could be due not to a better design, but to the way in which the study was conducted. Often, people tend to stay more on task in in-person studies than in remote unmoderated ones, and, as a result, times from remote unmoderated studies tend to be longer. Imagine that the first iteration of the benchmark study, which evaluated the original version of the design, was conducted in person, whereas the second iteration was remote unmoderated. They will also have to go beyond the observed values of the metrics obtained in the two studies and run a statistical analysis to judge whether the observed differences are statistically significant or due to chance. To gauge UX improvements from one version to the next, researchers would have to compare metrics such as success rate, task time, and user satisfaction for some important tasks. UX benchmarking involves running a set of iterations of the same summative study (called a benchmark study) on various product versions, with each iteration aiming to capture the UX of the current version of the product. The answer to that question is usually provided by UX benchmarking, a practice that aims to measure and compare the UX of different versions of the product. As an organization refines the user experience of a product, a common question is whether the newer version of its product is better than what it had before.
