A/B Testing Guide 2019
When running an A/B test, two versions of a webpage operate simultaneously and traffic is divided between them. The way visitors respond tells web-marketers which version of the page is best.
To help guide you through the technical details and industry jargon, we have created a series of comprehensive guides to digital marketing and web optimization. This is the Convertize A/B Testing Guide for 2019.
A/B Testing Guide Contents
What is A/B Testing?
- A/B Testing in 2019
- A history of A/B Testing
- How does A/B Testing work?
- A/B/n vs Multivariate Testing – what’s the difference?
A/B Testing Risks and Benefits
- Why is A/B testing so popular?
- What should you test?
- Does A/B Testing affect SEO?
- Will A/B Testing slow down my website?
How to do A/B Testing
- A/B Testing: A five-step plan
- What are the best A/B Testing tools for 2019?
- How do I choose the best A/B testing solution for me?
- Common Mistakes in A/B Testing
- How long should you run an A/B test for?
- How do you calculate statistical significance?
Conclusion, Ideas and Examples
We have divided our 2019 guide into four sections. The first section, ‘What is A/B testing‘, provides definitions, explanations and background. The second section, ‘Risks and Benefits‘, addresses common concerns and misconceptions. Section three, ‘How to do A/B testing‘, gives a five-step plan to performing your own A/B tests. It also describes the best tools available for different A/B testing needs. The Conclusion contains hints, tips and advice for getting the most out of your tests.
With the rising cost of traditional advertising, traffic acquisition has become an expensive activity. Making the most out of existing traffic is an effective alternative. However, CEOs and Marketing Directors want to make informed decisions about how to increase their conversion rates and the only way to get reliable data on website performance is through testing.
A/B Testing involves two versions of a single webpage. Version A is the currently used version (the ‘Control‘), while Version B is the modified page (the ‘Treatment‘). By running both pages simultaneously, their performance data can be easily compared.
The modification of Version A should be based on a Hypothesis about how visitors use your website. This might relate to the design, the structure, or the content. By comparing the two versions (the Treatment and the Control) you can either prove or disprove your Hypothesis.
Performing numerous A/B tests is the best way to gain a real understanding of how a webpage’s design affects its performance. For large eCommerce companies, the process is continuous and involves many versions of each page.
Testing variables in a real-world situation is not new; some of the most familiar scientific ideas have been established by analysing the effect of variables through statistics. “Statistical hypothesis testing,” in which the significance of a test is measured, was formalised in the early twentieth century by statisticians. Applied scientists such as Ronald Fisher and Jerzy Neyman established theories about Statistical Significance based on the Null Hypothesis.
In the world of advertising, early copywriters such as Claude Hopkins experimented with ways of testing public engagement. Hopkins used promotional coupons, introduced with different advertising copy, to measure the impact of competing versions. He described his techniques for testing copy in his Scientific Advertising (1923).
Since the turn of the century, A/B testing has been used as a key resource for software providers, eCommerce platforms and internet services. Writing in the Harvard Business Review, Ron Kohavi (Microsoft engineer) and Stefan Thomke (Business Administration Professor) noted the following examples:
- In the year 2000, engineers working for Google ran a test to find the optimum number of results to display on a search engine results page. The answer (10 results per page) has remained consistent ever since.
- In the early 2000s, Amazon discovered that moving credit card offers from its homepage to its shopping cart pages boosted revenue by tens of millions of dollars annually.
- In 2009 an employee for Microsoft suggested a new way of opening links from the MSN homepage. 900 000 UK users were involved in the test, which directed browsers via a new tab, so that their session was not interrupted. The engagement of users (measured according to the number of clicks on the MSN homepage) increased by 8.9% and the practice has become a standard homepage tactic.
- Similarly, in 2012, an engineer for Bing wrote a simple test for comparing two ways of displaying Ad headlines. The winning variant resulted in a 12% increase in revenue.
Today, companies such as Amazon and Booking.com each run over 10 000 A/B tests annually. Despite the difficulty of obtaining reliable data, and the relatively minor impact of most modifications, AB Testing is the most reliable way to improve website performance.
In order to perform effective tests, website editors need three things:
- A hypothesis
- A way of editing their site
- A way of analysing their results
A website editor’s hypothesis is simply their idea for changing one element of a webpage in order to improve its performance. This might be the location of a call-to-action, the layout of a page, or even the colour of an add-to-cart button.
An A/B Testing Software monitors and records the effect of the change on visitors’ behaviour. The software divides traffic between the ‘treatment‘ and the ‘control‘ and measures the different responses. The most sophisticated tools use algorithms to send more visitors to the best-performing version of a page. That way, businesses don’t lost out on customers whilst the test is running.
Once the website has received enough visits, the editor will end their experiment. However, there is another important step to make before the changes can be made permanent. Analysing the statistical significance of the experimental data is a crucial phase in the A/B Testing process.
A/B Testing involves a single variable, with two versions of a page. Testing multiple versions of a page simultaneously is known as A/B/n Testing. Supposing a second variable (X) is added, the page versions tested would be: A, B, X. Because this can involve any number of variables (depending on the available traffic).
Multivariate testing works the same way as A/B testing, but tests more than one variation at a time, both separately and in combination. This gives information on how each individual Treatment works and how variations work together. Supposing a second variable (X) is added to a test, the versions tested would be: A, B, A-X, B-X.
Split testing is the same as A/B testing, except the two pages, A and B, are assigned their own URLs. This makes the loading speed of the pages faster, and allows for more extensive changes. However, it is also a more complicated procedure.
A/B Testing has one major advantage over alternative ways of designing and redesigning a website: It is based on evidence! Whilst UX design, best-practice guidelines and customer journey analysis can provide hints and suggestions, A/B testing offers certainty.
CEOs and Marketing Directors prefer to base their decisions on data. Split Testing tells them how to achieve their goals:
- E-commerce websites use it to strengthen their conversion funnel
- Saas websites use it to improve their home page and enhance their sign-up process
- Lead generation websites use it to optimise their landing pages.
The same process is also used to help redesign websites. In 2017, for example, British Airways launched a new website. However, before releasing the new design, they trialled new versions of each webpage with A/B Testing software. By the time the finished website was published, each page had been tested over several months and thousands of visitors.
Any element of a webpage can be tested by comparing the Control with a Treatment. Common features selected for testing include:
- Titles – These are the first and most direct form of communication you will have with your visitor
- Product Pages – The most beautiful shop in the world won’t make an ugly jumper sell
- Copy – Tell a story, make a pitch, tell your visitor how you feel … words matter
- Prices – You can’t offer different prices to different customers, but you can change how they appear
- Images – The way a viewer will relate to an image is impossible to predict, so data is essential
- Colours – Google is famous for testing the colours on its website
- Forms – Form completion rates vary far more than conversion rates, so optimization is key
Advanced tools allow you to test more complex elements, such as your site structure.
Google clarified its position regarding A/B Testing in an article published on its blog. The important points to remember are:
- Use Canonical Content Tags: Search engines find it difficult to rank content when it appears in two places (“duplicate content”). As a result, web crawlers penalise duplicate content and reduce its SERP ranking. When two URLs displaying alternate versions of a page are live (during A/B tests, for example) it is important to specify which of them should be ranked. This is done by attaching a rel=canonical tag to the alternative (“B”) version of your page, directing web crawlers to your preferred version.
- Do Not Use Cloaking: In order to avoid penalties for duplicate content, some early A/B testers resorted to blocking Google’s site crawlers on one version of a page. However, this technique can lead to SEO penalties. Showing one version of content to humans and another to Google’s site indexers is against Google’s rules. It is important not to exclude Googlebot (by editing a page’s robots.txt file) whilst conducting A/B tests.
- Use 302 redirects: Redirecting traffic is central to A/B testing. However, a 301 redirect can trick Google into thinking that an “A” page is old content. In order to avoid this, traffic should be redirected using a 302 link (which indicates a temporary redirect).
A/B testing software can reduce loading speed due to the way in which it hosts competing versions of a page. An A/B testing tool can create test scenarios in two ways:
- From the client’s side (front-end)
- Using server-side scripts.
Server-side This form of A/B Testing is faster and are more secure. However, it is also more expensive and complicated to implement.
Most A/B testing softwares operates on a Client-side basis. This is to make editing a site as easy as possible. In order to reduce the impact of testing on a page, the best A/B Testing solutions have found ways to speed up page loading.
A/B Testing procedures vary between agencies. However, most combine a few key features. This is the five-step plan we use at Convertize:
At the end of each testing cycle, it is important to revisit and evaluate the whole process.
Before you’ve even begun to think about what version B might look like, version A needs a thorough examination. The key here is data. Google Analytics is an indispensable tool for web marketers, as it tells you how visitors are using your site. By examining your visitors’ behaviour, and identifying weaknesses in your Conversion Funnel, you will be confident about What needs to be optimized. To understand the Why and the How, other tools are necessary.
Free online software allows you to transform mouse-tracking data into visual heatmaps. Scrollmaps tell you how far down a page visitors get before leaving. These tools can be combined with customer segmentation and exit surveys to show you how visitors experience your site. But you’re not done yet! In order to change how people experience your site, you need to think about consumer psychology. The final part of the research stage is a comprehensive heuristic analysis.
This is the fun part! You have to put your neck on the line and find a way to change Version A in order to achieve a particular goal. So, what are you going to change? Your call-to-action buttons, your copy, the colours or structure of your sections? Does your page need a facelift or full surgery? Anyone can create a hypothesis, but a good hypothesis takes careful thought.
It is important to be precise about the parameters of your experiment. before launching a test, you will need to decide on:
- Your goal – In order for an A/B testing platform to compare the number of conversions resulting from version A or B, you need to specify an action that it can register. Most commonly this is the URL of a “thank you” page (following a purchase), as it guarantees that you will only receive data from completed purchases. Occasionally, you might want to set a different kind of goal. Clicking on a particular CTA or visiting another page on your site might be the best goal for you to measure.
- Which pages you want to target – Testing each product page one by one would take forever, but targeting the wrong pages will obscure your results. Your targeting is usually defined by the URLs on which your changes will apply. Like with online searches, you can define the limits of your test in terms of: “URL Contains”, “URL Ends With” or “URL Equals.”
How you want your traffic to be divided – Some software features a “Multi-Armed Bandit” algorithm that directs the majority of your traffic towards the best performing version of your webpage. This has two advantages. Firstly, it can help to reduce the time taken to achieve significant results. Secondly, it means conversions are not lost by sending valuable traffic to a less-optimal page. However, incase your A/B testing tool does not provide this sort of algorithm, you will need to think about how your traffic should be allocated.
Running an experiment is like sitting in the passenger seat. No matter how much you want to tweak, adjust and alter the process, you have to let the driver take control. However, there is one important decision you have to make: when to end the test. Before starting your test you should decide on the level of “Confidence” you will have to reach before concluding the experiment. 95% confidence is the industry benchmark, but experienced testers might decide to end a test early in certain circumstances.
If there is no variation between A and B, arriving at statistical significance will take a long time. It might be better to try a more substantial “Treatment.” If A is significantly outperforming B, it might be worth cutting your losses and trying something else.
Statistical significance is the foundation for drawing your conclusions. Even so, a significant uplift for version B may not lead you to make sweeping changes to your site. For example, version B might lead visitors to make a purchase more frequently, but it might also lead visitors to make a less profitable kind of purchase. It might reduce the number of returning customers, or cause other unanticipated problems. It may be that you only showed version B to a segment of your customers. In that case, the next step would be to try it on the other segments.
Our expert articles provide a wide range of insights and practical tips on running your own A/B tests.
One recurring piece of advice is known as the “no-peeking” rule. When analysing the value of a hypothesis, marketers sometimes view their data before the full sample size is reached. It’s easily done; we all want results as soon as possible! The problem with this is that statistical significance within a sample does not necessarily make your results representative. It is all too easy to jump to conclusions!
Convertize A/B Testing Strategy 2019
Click here to download our A/B Testing Strategy Save to read later or share with a colleague.
The demand for A/B testing has led to a wide selection of tools designed to make the testing process as smooth as possible. However, not all testing tools work the same way, and most are aimed at executive customers (with executive prices).
We recently conducted a survey of the 26 best A/B testing tools on the market in 2019. Very few of the tools available combined a user-friendly interface with the flexibility required for effective testing.
These are 10 of the most popular A/B testing tools available in 2019:
Convertize A tool designed with marketing professionals and mid-sized companies in mind. It is a robust system and comes with expert support. The software offers intuitive editing, unique speed and safety features, and a library of neuromarketing tactics.
Optimizely An executive tool for major eCommerce platforms. It provides advanced testing options. Optimizely supports multivariate testing and was built to manage large traffic volume.
VWO – Visual Website Optimizer – Another popular tool with marketing agencies and large multi-national companies. Along with A/B testing, customers have access to a full suite of additional analytics (such as heatmaps).
AB Tasty Originally built for medium-sized enterprises, the tool has been repositioned as an executive solution. AB Tasty now specialises in re-marketing features.
Google Optimize Google’s free A/B testing system. Optimize 360 provides a paid-for service that can test up to 10 variations of a page. The tool can be integrated with Google Analytics.
Kameleoon An expensive option aimed at medium-sized companies. It is based around the use of AI and machine learning and has a focus on personalisation.
Convert Positioned as the alternative to Optimizely. The tool has been rebranded for an executive audience but is nonetheless cheaper than its famous rival. It is one of the few executive tools to offer expert support.
Omniconvert Targeted at small and medium enterprises. It is built to integrate with customer segmentation and personalisation. The software offers affordable multivariate testing.
Adobe Target One of the oldest A/B testing solutions, Omniture Test & Target, was absorbed into Adobe’s marketing suite. Adobe provides an executive service for major businesses (HSBC, for example). It is the most expensive, and most comprehensive, A/B testing package.
FreshMarketer Zarget was purchased by Freshworks in 2017 and renamed. The tool runs within a Chrome extension and has an impressive list of features for such an affordable solution.
The wide selection of A/B testing software makes it difficult to decide which solution is best for your business. These ten questions provide a guide to the most important factors to consider:
- What is the skill level of my team?
- What technical resources will I need to use this software?
- What skills are required to use this A/B testing solution?
- What level of support is provided with this solution?
- How much volume does this software require to perform tests?
- How long will we have to set aside for testing? And how often?
- Will the software increase my site’s loading time?
- Would I be better-off recruiting a CRO agency?
- How much will A/B testing cost me over a 12 month period?
- What other tools will be needed to complete the testing?
By working through this list, you will be able to choose the most appropriate solution for your organisation. A/B testing software must provide a user-friendly and efficient service, whilst in no way burdening your team.
We have identified 13 classic A/B Testing mistakes and compiled a guide to avoiding them. To run an A/B test effectively, three things must be present: 1) Traffic 2) Time 3) Uplift
Traffic and time are essential for any test, as they are the only way to produce statistically significant results. The size of your uplift (the increase in conversion rate between version A and B) is also important. A more dramatic effect, when witnessed in a large sample size, gives your experiments greater statistical power.
- Error 1: You are testing the wrong things
- Error 2: You do not have a methodology
- Error 3: You do not prioritize your tests
- Error 4: You are not using the right tool
- Error 5: Your AB Testing subscription is not appropriate
- Error 6: You are not optimizing for the right KPIs
- Error 7: You are ignoring mobile traffic
- Error 8: You are ignoring your current (and returning) customers
- Error 9: You are doing too much
- Error 10: You are managing A/B tests in parallel
- Error 11: The results of your A/B tests are not valid
- Error 12: You stop your test too early
- Error 13: You should not do an A/B test
Most A/B testing mistakes (for example, errors 3, 4, 5, 9, 11, 12 and 13) come from trying to complete an experiment without one of the three things that must always be present.
There is no standard time for an A/B test because a test is only considered reliable when the results are significant. Until then, it is dangerous to draw any conclusions (even from seemingly clear data). Statisticians are wary of a phenomena called Regression to the Mean.
- Regression to the mean – When seemingly clear results become less pronounced as the sample size increases. If the variation between A and B appears significant to begin with, but regresses to a more moderate difference, then the initial results were probably the result of outlying phenomena. In this case, the variation will become less pronounced the longer your test continues.
In order to reach significance, your test requires sufficient Statistical Power. This is determined by the Effect Size and your Sample Size.
- Sample Size – Before starting an A/B test, you must calculate the sample size needed. Your sample is composed of visitors to your website, so your test duration is directly related to the amount of traffic your site receives. In some cases it might be sensible not to run an A/B test because the volume of traffic available on the site (or the page tested) is not high enough.
- Effect Size – This is the change caused by your variable. In the case of A/B testing it is measured in terms of conversion rate. A dramatic Uplift in conversions on Version B of your page would constitute a large Effect. The bigger the difference between versions, the more likely you are to reach statistical significance.
- Statistical power – The chance that your experiment will detect an effect, if the effect exists. There are two significant factors that determine the statistical power of your test: the magnitude of the effect your test creates and the number of visitors your site receives.
These factors combine to give your test a degree of Representativeness (or, statistical significance). This is the likelihood that your results demonstrate a real effect.
Calculating statistical significance is an important step in any experiment. There are two main approaches to calculating significance: Bayesian and Frequentist. They are not simply alternative methods, but actually reflect different interpretations of probability.
The Frequentist approach examines the number of times an event occurs in a volume of tests. The result is a statement only about frequency in a given sample.
The Bayesian approach starts with an estimate of a real-world effect and updates this as data is accumulated. The result is a new estimate of the real-world effect and a number describing how much it can be trusted.
When calculating the statistical significance of an A/B test, both approaches contribute important information. A/B testing software often combines the two approaches in single statistics package. Using your experimental data (the number of visitors to A, the number of visitors to B, the number of conversions from A, the number of conversions from B) the software will tell you the relative uplift observed between A and B and the likelihood that this is a result of the changes you have made.
Many online tools allow you to calculate the significance of your A/B test manually:
Or, you can use the Convertize interactive significance calculator here:
A common complaint among marketing executives during their first experience of running tests is the ‘Blank Page Effect.’
My analytics tell me I should optimise this page, but what should I test?
Some of the tactics are specific to particular types of page (such as the pricing-page tactic #29 “select a price with the smallest number of letters.”) Others relate to general marketing principles (such as #9 “appeal to Loss Aversion with a limited-time offer“.)
We have organised our A/B Testing ideas according to website type and page type:
Providing your site has enough traffic, A/B testing is an essential technique for any eCommerce business’s marketing team. Not only will it provide unexpected insights about your customers and your site, it will allow you to market your business with certainty. For websites with lower traffic volumes, there are alternative ways to optimize your conversion rates. For example, you might begin by exploring our 2019 guides to CRO and neuromarketing: