A/B Testing & Split Testing: The Definitive Guide (2020)
A/B testing allows you to optimize your website with evidence, rather than guesswork. This guide provides a simple framework for successful A/B Tests and answers the most frequently-asked questions about statistics, website loading speed and SEO.
A/B Testing & Split Testing Guide
A/B Testing is always based around a goal. For most people, this goal is to increase their website’s conversion rate. As long as you have a specific and measurable goal, you can collect data from you users. Unfortunately, you also need to have some other ingredients to conduct a successful test.
- Traffic – You can’t test your website without a large enough test-sample.
- Hypothesis – with some changes to compare to your original webpage.
- A/B Testing Tool – A software tool to allocate your traffic and collect your results.
A/B Testing procedures vary between agencies. However, most combine a few key features. This is the five-step plan we use at Convertize:
At the end of each testing cycle, it is important to revisit and evaluate the whole process.
Step 1 – Analysis
Before you’ve even begun to think about what version B might look like, version A needs a thorough examination. The key here is data. Google Analytics is an indispensable tool for web marketers, as it tells you how visitors are using your site. By examining your visitors’ behaviour, and identifying weaknesses in your Conversion Funnel, you will be confident about What needs to be optimized. To understand the Why and the How, other CRO tools are necessary.
Free online software allows you to transform mouse-tracking data into visual heatmaps. Scrollmaps tell you how far down a page visitors get before leaving. These tools can be combined with customer segmentation and exit surveys to show you how visitors experience your site. But you’re not done yet!
In order to change how people experience your site, you need to think about consumer psychology. The final part of the research stage is a comprehensive heuristic analysis.
Step 2 – Hypothesis
This is the fun part! You have to put your neck on the line and find a way to change Version A in order to achieve a particular goal. So, what are you going to change? Your call-to-action buttons, your copy, the colours or structure of your sections? Does your page need a facelift or full cosmetic surgery? Anyone can create a hypothesis, but a good hypothesis takes careful thought.
Step 3 – Design
It is important to be precise about the parameters of your experiment. before launching a test, you will need to decide on:
- Your goal – In order for an A/B testing platform to compare the number of conversions resulting from version A or B, you need to specify an action that it can register. Most commonly this is the URL of a “thank you” page (following a purchase), as it guarantees that you will only receive data from completed purchases. Occasionally, you might want to set a different kind of goal. Clicking on a particular CTA or visiting another page on your site might be the best goal for you to measure.
- Which pages you want to target – Testing each product page one by one would take forever, but targeting the wrong pages will obscure your results. Your targeting is usually defined by the URLs on which your changes will apply. Like with online searches, you can define the limits of your test in terms of: “URL Contains”, “URL Ends With” or “URL Equals.”
How you want your traffic to be divided – Some software features a “Multi-Armed Bandit” algorithm that directs the majority of your traffic towards the best performing version of your webpage. This has two advantages. Firstly, it can help to reduce the time taken to achieve significant results. Secondly, it means conversions are not lost by sending valuable traffic to a less-optimal page. However, incase your A/B testing tool does not provide this sort of algorithm, you will need to think about how your traffic should be allocated.
Step 4 – Experiment
Running an experiment is like sitting in the passenger seat. No matter how much you want to tweak, adjust and alter the process, you have to let the driver take control. However, there is one important decision you have to make: when to end the test. Before starting your test you should decide on the level of “Confidence” you will have to reach before concluding the experiment. 95% confidence is the industry benchmark, but experienced testers might decide to end a test early in certain circumstances.
If there is no variation between A and B, arriving at statistical significance will take a long time. It might be better to try a more substantial “Treatment.” If A is significantly outperforming B, it might be worth cutting your losses and trying something else.
Step 5 – Interpretation:
Statistical significance is the foundation for drawing your conclusions. Even so, a significant uplift for version B may not lead you to make sweeping changes to your site. For example, version B might lead visitors to make a purchase more frequently, but it might also lead visitors to make a less profitable kind of purchase. It might reduce the number of returning customers, or cause other unanticipated problems. It may be that you only showed version B to a segment of your customers. In that case, the next step would be to try it on the other segments.
A/B Testing Strategy
Click here to download a PDF copy of this article Save to read later or share with a colleague.
One recurring piece of advice is known as the “no-peeking” rule. When analysing the value of a hypothesis, marketers sometimes view their data before the full sample size is reached. It’s easily done; we all want results as soon as possible! The problem with this is that statistical significance within a sample does not necessarily make your results representative. It is all too easy to jump to conclusions!
The demand for A/B testing has led to a wide selection of tools designed to make the testing process as smooth as possible. However, not all testing tools work the same way, and most are aimed at executive customers (with executive prices).
We recently conducted a survey of the 26 best A/B testing tools on the market in 2020. Very few of the tools available combined a user-friendly interface with the flexibility required for effective testing.
These are 10 of the most popular AB testing tools available in 2019:
- Convertize A tool designed with marketing professionals and mid-sized companies in mind. It is a robust system and comes with expert support. The software offers intuitive editing, unique speed and safety features, and a library of neuromarketing tactics.
- Optimizely An executive tool for major eCommerce platforms. It provides advanced testing options. Optimizely supports multivariate testing and was built to manage large traffic volume.
- VWO – Visual Website Optimizer – Another popular tool with marketing agencies and large multi-national companies. Along with A/B testing, customers have access to a full suite of additional analytics (such as heatmaps).
- AB Tasty Originally built for medium-sized enterprises, the tool has been repositioned as an executive solution. AB Tasty now specialises in re-marketing features.
- Google Optimize Google’s free AB testing system. Optimize 360 provides a paid-for service that can test up to 10 variations of a page. The tool can be integrated with Google Analytics.
- Kameleoon An expensive option aimed at medium-sized companies. It is based around the use of AI and machine learning and has a focus on personalisation.
- Convert Positioned as the alternative to Optimizely. The tool has been rebranded for an executive audience but is nonetheless cheaper than its famous rival. It is one of the few executive tools to offer expert support.
- Omniconvert Targeted at small and medium enterprises. It is built to integrate with customer segmentation and personalisation. The software offers affordable multivariate testing.
- Adobe Target One of the oldest A/B testing solutions, Omniture Test & Target, was absorbed into Adobe’s marketing suite. Adobe provides an executive service for major businesses (HSBC, for example). It is the most expensive, and most comprehensive, A/B testing package.
- FreshMarketer Zarget was purchased by Freshworks in 2017 and renamed. The tool runs within a Chrome extension and has an impressive list of features for such an affordable solution.
There is no standard time for an A/B test because a test is only considered reliable when the results are significant. Until then, it is dangerous to draw any conclusions (even from seemingly clear data). Statisticians are wary of a phenomena called Regression to the Mean.
- Regression to the mean – When seemingly clear results become less pronounced as the sample size increases. If the variation between A and B appears significant to begin with, but regresses to a more moderate difference, then the initial results were probably the result of outlying phenomena. In this case, the variation will become less pronounced the longer your test continues.
In order to reach significance, your test requires sufficient Statistical Power. This is determined by the Effect Size and your Sample Size.
- Sample Size – Before starting an A/B test, you must calculate the sample size needed. Your sample is composed of visitors to your website, so your test duration is directly related to the amount of traffic your site receives. In some cases it might be sensible not to run an A/B test because the volume of traffic available on the site (or the page tested) is not high enough.
- Effect Size – This is the change caused by your variable. In the case of A/B testing it is measured in terms of conversion rate. A dramatic Uplift in conversions on Version B of your page would constitute a large Effect. The bigger the difference between versions, the more likely you are to reach statistical significance.
- Statistical power – The chance that your experiment will detect an effect, if the effect exists. There are two significant factors that determine the statistical power of your test: the magnitude of the effect your test creates and the number of visitors your site receives.
These factors combine to give your test a degree of Representativeness (or, statistical significance). This is the likelihood that your results demonstrate a real effect.
Calculating statistical significance is an important step in any experiment. There are two main approaches to calculating significance: Bayesian and Frequentist. They are not simply alternative methods, but actually reflect different interpretations of probability.
The Frequentist approach examines the number of times an event occurs in a volume of tests. The result is a statement only about frequency in a given sample.
The Bayesian approach starts with an estimate of a real-world effect and updates this as data is accumulated. The result is a new estimate of the real-world effect and a number describing how much it can be trusted.
When calculating the statistical significance of an A/B test, both approaches contribute important information. A/B testing software often combines the two approaches in a single statistics package. Using your experimental data (the number of visitors to A, the number of visitors to B, the number of conversions from A, the number of conversions from B) the software will tell you the relative uplift observed between A and B and the likelihood that this is a result of the changes you have made.
You can use the Convertize interactive significance calculator here:
The method used by Convertize for analysing statistical significance is described in AB Testing: The Hybrid Statistical Approach.
Google clarified its position regarding A/B Testing in an article published on its blog. The important points to remember are:
- Use Canonical Content Tags: Search engines find it difficult to rank content when it appears in two places (“duplicate content”). As a result, web crawlers penalise duplicate content and reduce its SERP ranking. When two URLs displaying alternate versions of a page are live (during A/B tests, for example) it is important to specify which of them should be ranked. This is done by attaching a rel=canonical tag to the alternative (“B”) version of your page, directing web crawlers to your preferred version.
- Do Not Use Cloaking: In order to avoid penalties for duplicate content, some early A/B testers resorted to blocking Google’s site crawlers on one version of a page. However, this technique can lead to SEO penalties. Showing one version of content to humans and another to Google’s site indexers is against Google’s rules. It is important not to exclude Googlebot (by editing a page’s robots.txt file) whilst conducting A/B tests.
- Use 302 redirects: Redirecting traffic is central to A/B testing. However, a 301 redirect can trick Google into thinking that an “A” page is old content. In order to avoid this, traffic should be redirected using a 302 link (which indicates a temporary redirect).
A/B testing software can reduce loading speed due to the way in which it hosts competing versions of a page. A testing tool can create scenarios in two ways:
- From the client’s side (front-end)
- Using server-side scripts.
Server-side This form of A/B testing is faster and more secure. However, it is also expensive and more complicated to implement.
Most A/B testing software operates on a Client-side basis. This is to make editing a site as easy as possible. In order to reduce the impact of testing on a page, the best A/B testing solutions have found ways to speed up page loading.
Providing your site has enough traffic, A/B testing is an essential technique for any eCommerce business’s marketing team. Not only will it provide unexpected insights about your customers and your site, it will allow you to market your business with certainty. For websites with lower traffic volumes, there are alternative ways to optimize your conversion rates. For example, you might begin by exploring our 2020 guides to CRO and neuromarketing.