Reading Time: 10 minutes
A/B Testing Guide 2019
4.6 (92.73%) 11 votes

A/B Testing 

A/B Testing (similar to ‘split testing‘) is a real-world experiment that compares two versions of a webpage: version “A” and version “B”. Surveys suggest that 71% of international companies perform multiple tests each month. Around 12% of tests produce significant results.

When running an A/B test, two versions of a webpage operate simultaneously and traffic is divided between them. The way visitors respond tells web-marketers which version of the page is best. 

To help guide you through the technical details and industry jargon, we have created a series of comprehensive guides to digital marketing and web optimization. This is the Convertize A/B Testing Guide for 2019.

Convertize A/B Testing Guide 2019

We have split our 2019 guide into four sections to make it easy for both first-time and veteran testers.

The first section, ‘What is A/B testing‘, provides definitions for the technical jargon and background information on A/B testing. The second section, ‘Risks and Benefits‘, addresses common concerns and misconceptions surrounding website optimization. It also contains a list of webpage features commonly selected for testing. Section three, ‘How to do A/B testing‘, gives a five-step plan to performing your own A/B tests. It also describes the best tools available for different A/B testing needs. The final section, ‘Ideas and Advice‘ contains hints, tips and advice for getting the most out of your tests. 

What is A/B testing

Performing A/B tests has become a major part of digital marketing for most eCommerce businesses. With the rising cost of traditional advertising, traffic acquisition has become an expensive activity. Making the most out of existing traffic is an effective alternative. However, CEOs and Marketing Directors want to make informed decisions about how to increase their conversion rates and the only way to get reliable data on website performance is through testing.

A/B Testing involves two versions of a single webpage. Version A is the currently used version (the ‘Control‘), while Version B is the modified page (the ‘Treatment‘). By running both pages simultaneously, their performance data can be easily compared. 

The modification of Version A should be based on a Hypothesis about how visitors use your website. This might relate to the design, the structure, or the content. By comparing the two versions (the Treatment and the Control) you can either prove or disprove your Hypothesis

Performing numerous A/B tests is the best way to gain a real understanding of how a webpage’s design affects its performance. For large eCommerce companies, the process is continuous and involves many versions of each page.

Testing variables in a real-world situation is not new; some of the most familiar scientific ideas have been established by analysing the effect of variables through statistics. Statistical Hypothesis Testing, in which the significance of a test is measured, was formalised in the early twentieth century by statisticians such as Ronald Fisher and Jerzy Neyman. They established ideas such as Statistical Significance and Null Hypotheses

In the world of advertising, early copywriters such as Claude Hopkins experimented with ways of testing public engagement. Hopkins used promotional coupons, introduced with different advertising copy, to measure the impact of competing versions. He described his techniques for testing copy in his Scientific Advertising (1923). 

Since the turn of the century, A/B testing has been used as a key resource for software providers and internet services. In the year 2000, engineers working for Google ran a test to find what the optimum number of results to display on a search engine results page. The answer (10 results per page) has remained the default ever since.

In 2009 an employee for Microsoft suggested trialling a new way of opening links for Hotmail from the MSN homepage. 900 000 UK users were involved in the test, which opened links in a new tab, rather than redirecting the browser. The engagement of users (measured according to the number of clicks on the MSN homepage) increased by 8.9%

Similarly, in 2012, an engineer for Bing wrote a simple test for comparing two ways of displaying Ad headlines. The winning variant resulted in a 12% increase in revenue.

Today, companies such as Amazon and run thousands of A/B tests every day. Despite the difficulty of obtaining reliable data, and the relatively minor impact of most modifications, AB Testing is the most reliable way to improve website performance

In order to perform effective tests, website editors need three things: 

  • A hypothesis
  • A/B Testing Software
  • A way of analysing their results

A website editor’s hypothesis is simply their idea for changing one element of a webpage in order to improve its performance. This might be the location of a call-to-action, the layout of a page, or even the colour of an add-to-cart button. 

A/B Testing Software monitors and records the effect of the change on visitors’ behaviour. The software divides traffic between the ‘treatment‘ and the ‘control‘ and measures the different responses. The most sophisticated tools use algorithms to send more visitors to the best-performing version of a page. That way, businesses don’t lost out on customers whilst the test is running.

Once the website has received enough visits, the editor will end their experiment. However, there is another important step to make before the changes can be made permanent. Analysing the statistical significance of the experimental data is a crucial phase in the A/B Testing process.

A/B Testing involves a single variable, with two versions of a page. Testing multiple versions of a page simultaneously is known as A/B/n Testing. Supposing a second variable (X) is added, the page versions tested would be: ABX. Because this can involve any number of variables (depending on the available traffic). 

Multivariate testing works the same way as A/B testing, but tests more than one variation at a time, both separately and in combination. This gives information on how each individual Treatment works and how variations work together. Supposing a second variable (X) is added to a test, the versions tested would be: ABA-XB-X.

Split testing is the same as A/B testing, except the two pages, A and B, are assigned their own URLs. This makes the loading speed of the pages faster, and allows for more extensive changes. However, it is also a more complicated procedure.

Benefits and risks of A/B testing

A/B Testing has one major advantage over alternative ways of designing and redesigning a website: It is based on evidence! Whilst UX design, best-practice guidelines and customer journey analysis can provide hints and suggestions, A/B testing offers certainty.

CEOs and Marketing Directors prefer to base their decisions on data. Split Testing tells them how to achieve their goals:

  1. E-commerce websites use it to strengthen their conversion funnel
  2. Saas websites use it to improve their home page and enhance their sign-up process
  3. Lead generation websites use it to optimise their landing pages

The same process is also used to help redesign websites. In 2018, for example, British Airways launched a new website. However, before releasing the new design, they trialled new versions of each webpage with A/B Testing software. By the time the finished website was published, each page had been tested over several months and thousands of visitors.

Any element of a webpage can be tested by comparing the Control with a Treatment. Common features selected for testing include:

Advanced tools allow you to test your online forms, pricing structure and site navigation. 

Google clarified its position regarding A/B Testing in an article published on its blog. The important points to remember are:

  • Use Canonical Content Tags: Search engines find it difficult to rank content when it appears in two places (“duplicate content”). As a result, web crawlers penalise duplicate content and reduce its SERP ranking. When two URLs displaying alternate versions of a page are live (during A/B tests, for example) it is important to specify which of them should be ranked. This is done by attaching a rel=canonical tag to the alternative (“B”) version of your page, directing web crawlers to your preferred version.
  • Do Not Use Cloaking: In order to avoid penalties for duplicate content, some early A/B testers resorted to blocking Google’s site crawlers on one version of a page. However, this technique can lead to SEO penalties. Showing one version of content to humans and another to Google’s site indexers is against Google’s rules. It is important not to exclude Googlebot (by editing a page’s robots.txt file) whilst conducting A/B tests.
  • Use 302 redirects: Redirecting traffic is central to A/B testing. However, a 301 redirect can trick Google into thinking that an “A” page is old content. In order to avoid this, traffic should be redirected using a 302 link (which indicates a temporary redirect).

By following these guidelines, you can make sure that your tests have no negative SEO impact on your site. If you are using A/B testing software, these steps will be followed automatically. All major AB Testing tools follow these guidelines and use javascripts in such a way that testing will not influence your SEO.

A/B testing software can create test scenarios in two ways: 

  • From the client’s side (front-end)
  • Using server-side scripts.

Server-side A/B Testing solutions are faster and are more secure. However, they are also more expensive and complicated to implement.

Client-side software uses Javascript code that makes the changes directly in the visitor’s browser. This can create a loading delay of a few fractions of a second, which is occasionally visible to the naked eye.

The loading delay traditionally associated with client-side software is known as the “flickering effect”. The best A/B Testing solutions have found ways to remedy this.

How to do A/B testing

A/B Testing Strategy

A/B Testing procedures vary between agencies. However, most combine a few key features. This is the five-step plan we use at Convertize:

  1. Analysis
  2. Hypothesis
  3. Design
  4. Experiment
  5. Interpretation

Our blog articles provide a wide range of insights and practical tips on running your own A/B tests.

One recurring piece of advice is known as the “no-peeking” rule.

When analysing the value of a hypothesis, marketers sometimes view their data before the full sample size is reached. It’s easily done; we all want results as soon as possible! The problem with this is that statistical significance within a sample does not necessarily make your results representative. It is all too easy to Jump to conclusions!

The demand for A/B testing has led to a wide selection of tools designed to make the testing process as smooth as possible. However, not all testing tools work the same way, and most are aimed at executive customers (with executive prices). 

We conducted a survey of the 26 best A/B testing tools on the market in 2019. Very few of the tools available combined a user-friendly interface with the flexibility required for effective testing. 

The wide selection of A/B testing software makes it difficult to decide which solution is best for your business. These ten questions provide a guide to the most important factors to consider:

  1. What is the skill level of my team?
  2. What technical resources will I need to use this software?
  3. What skills are required to use this A/B testing solution?
  4. What level of support is provided with this solution?
  5. How much volume does this software require to perform tests? 
  6. How long will we have to set aside for testing? And how often?
  7. Will the software increase my site’s loading time?
  8. Would I be better-off recruiting a CRO agency? 
  9. How much will A/B testing cost me over a 12 month period?
  10. What other tools will be needed to complete the testing?

By working through this list, you will be able to choose the most appropriate solution for your organisation. A/B testing software must provide a user-friendly and efficient service, whilst in no way burdening your team.

Ideas and Advice

Choosing an A/B Testing tool is relatively simple. It is far more difficult to use such that tool effectively. A common complaint among marketing executives during their first experience of running tests is the ‘Blank Page Effect.’ 

My analytics tell me I should optimise this page, but what should I test?

Convertize has put together an extensive library of Neuromarketing principles and A/B Testing tactics. These ideas can be applied to any business website.

Some of the tactics are specific to particular types of page (such as the pricing-page tactic #29 “select a price with the smallest number of letters.”) Others relate to general marketing principles (such as #9appeal to Loss Aversion with a limited-time offer“.)

We have organised our A/B Testing ideas according to…

Website type

Page type

Psychological Principles

We have identified 13 classic A/B Testing mistakes and compiled a guide to avoiding them.

Flaming Laptop - A/B Testing Benefits and Risks

There is no standard time for an A/B test because a test is only considered reliable when the results are significant. Until then, it is dangerous to draw any conclusions (even from seemingly clear data). Statisticians are wary of a phenomena called Regression to the Mean.

  • Regression to the mean – When seemingly clear results become less pronounced as the sample size increases. If the variation between A and B appears significant to begin with, but regresses to a more moderate difference, then the initial results were probably the result of outlying phenomena. In this case, the variation will become less pronounced the longer your test continues.

In order to reach significance, your test requires sufficient Statistical Power. This is determined by the Effect Size and your Sample Size.

  • Sample Size – Before starting an A/B test, you must calculate the sample size needed. Your sample is composed of visitors to your website, so your test duration is directly related to the amount of traffic your site receives. In some cases it might be sensible not to run an A/B test because the volume of traffic available on the site (or the page tested) is not high enough.
  • Effect Size – This is the change caused by your variable. In the case of A/B testing it is measured in terms of conversion rate. A dramatic Uplift in conversions on Version B of your page would constitute a large Effect. The bigger the difference between versions, the more likely you are to reach statistical significance.
  • Statistical power – The chance that your experiment will detect an effect, if the effect exists. There are two significant factors that determine the statistical power of your test: the magnitude of the effect your test creates and the number of visitors your site receives.

These factors combine to give your test a degree of Representativeness (or, statistical significance). This is the likelihood that your results demonstrate a real effect

Calculating statistical significance is an important step in any experiment. There are two main approaches to calculating significance: Bayesian and Frequentist

Many online tools allow you to calculate the significance of your A/B Test:

Convertize has developed a new method for analysing the significance of your A/B Test: the Hybrid Method. In this way, you can arrive at reliable A/B test results more quickly. The principles underpinning the Hybrid Method are outlined in A/B Testing: The Hybrid Statistical Approach.

Philippe Aimé

by Philippe Aimé

Philippe is the CEO of Convertize. He is based in London. Philippe created his first website in 1998, and has worked many years as a Cost Optimisation Consultant for several important groups. He now heads a team of Conversions Optimisation consultants.