In an effort to evolve from the days of website changes based purely on hunches, hypothesis testing within A/B testing scenarios emerged to further investigate theories of behavior in order to determine whether there is enough statistical evidence to support a specific theory.
Using a very basic framework for statistical inference, the procedure for hypothesis testing goes as follows:
- Start with the existing version of the web page or the tested element within it. That existing version is now termed the “baseline” (or variation A).
- Set up the alternative variation a.k.a the “treatment” (or variation B).
- Calculate the required sample size in advance using a calculator such as this one. This calculation is based on the baseline’s current conversion rate (which must be already known), the minimum difference in performance you wish to detect, and the desired Statistical Power (i.e. in rough terms, how reliable that detection should be, as higher reliability requires a greater sample size).
- Launch the test and let it run until the required sample size per variation is reached. Often people do not calculate a sample size in advance and cannot resist the urge to check out the data. But in truth, if you look and see significant results before reaching a proper sample size, it can cause you to jump to conclusions which can significantly degrade the reliability of the results.
- With samples, observations on the performance of each variation can be made and calculated as to whether the stronger performing variation is, in fact, better than its competitor in a statistically significant fashion.