# AB Testing Sample Size: The 4 Levels of Difficulty

We get lots of different questions from people who are interested in A/B testing, but there’s one question we hear more than any other: “Can I do A/B testing with under 100,000 visitors a month?” Many of the tests run by online businesses end up being useless and the root is very simple: **AB testing sample size**.

Finding a way to get **Uplift** is not easy, and achieving **statistical significance** is downright difficult. That’s why you need to think about your AB testing sample size before you launch an experiment. To give you a sense of the sample size needed to run AB tests on your website, we have created a simple chart showing how many visitors you website will need to produce significant test results.

## A/B Testing & Split Testing Guide

**What is A/B Testing?****How To Calculate Your A/B Testing Sample Size****A/B Testing Sample Size: The 4 Levels of Difficulty**

**1. Fear Factor zone
**

**2. Thrilling zone**

**3. Exciting zone**

**4. Safe zone**

AB testing has become a major part of web-marketing. Even so, many tests are completely useless. This post will show you when AB testing actually works and when it is a waste of time and money. If you do not have enough traffic for the AB testing sample size you need, experiments are not an option.

AB testing is a type of experiment that compares two versions of the same webpage: A and B. Your visitors are split into two groups, each being shown one version. The aim is to see which version of the webpage produces the highest conversion rate.

To make sure your conclusions are **reliable,** you need to ensure that your experiments are **statistically significant**. Unreliable data could lead you to waste your marketing budget on a false positive.

### It’s important to think about:

- The
**duration**of the test, - The
**confidence**level (or statistical significance), - The “
**power**” of the test.

Unless you’re a statistician, those terms might not mean much to you but, don’t worry, **carry on reading this article** and all will become clear!

There are plenty of online tools that can calculate the statistical significance of a test you **have already run**.

For most purposes, you can use this simple calculator.

However, working out how many visitors will be required for a test you plan to run **in the future** is more complicated. To work out your necessary AB testing sample size, you will need a sample size calculator.

You will also need to provide 3 things:

### 1. Your Analytics Data

You need to know the **conversion rate** of the page you want to test (how many sales/sign-ups you get for each visitor) and the **number of unique weekly visitors** it receives.

Providing this information will allow you to see how many days your test will need to run for.

### 2. Your Expected “Uplift”

Next, you need to predict the increase in your conversion rate (also known as “uplift”) that you expect to see. This is the tricky part. Remember, it’s not the uplift you want – it’s the uplift you expect your changes to produce.

Conversion optimisation experts agree that it is very difficult to see an improvement of more than **10%** on your overall conversion rate (Sales/Visitors). Achieving anything beyond 10% uplift would require “**innovative**” changes to your site. More “**iterative**” changes (such as adjusting a button colour, headline or image) generally produce an increase of **less than 5%**. Therefore, in most cases, we would recommend “5%” as a realistic projected increase.

**3. A/B Test Parameters **

These are the settings you should enter into a significance calculator, if it gives you the option.

**Hypothesis**: “2-sided” (this means you are testing if version B is *different* to A, either better *or* worse)

**Power:** 80% (this is the probability that your experiment will identify an effect caused by the changes you make to version B)

**Confidence level:** 95% (this is related to the probability that your conclusions match real effects)

By entering these values into the AB test sample size calculator, you can estimate the duration required for your test. It’s always important to verify that the test will last at least 1 sales cycle and it should ideally last for 2.

In general, it’s always best not to set a test to run for any longer than 30 days, with 60 days being the absolute maximum. Running a test for more than **3 months** is risky because many of the parameters will change over such a long period (on average, 10% of internet users erase their cookies once a month, which will contaminate your test results).

## AB Testing Sample Size: The 4 Levels of Difficulty

Click here to download a PDF copy of this article Save to read later or share with a colleague.

The graph below can help you to estimate the level of difficulty related to the AB Testing sample size required to carry out reliable **AB tests**.

In other words: what is the minimum increase in conversion rate needed to be confident that test results are reliable in relation to the number of unique monthly visitors?

We have fixed certain parameters for this graph:

- Site conversion rate: 2%
- Test duration: 30 days (you should NOT test longer: Cookie deletion will make your results unreliable)
- 1 control page (A) and 1 variation for testing (B)
- Confidence level: 95%
- Power: 80%

Visitors are organised in to 4 zones that represent the difficulty level, or in other words the increase in conversion that would be required for an accurate result, for reaching an **Uplift** (an increase in conversion rate).

**What Are The 4 difficulty levels?**

Make sure to always consider the **data of the page tested** rather than that from the site as a whole.

With less than **10,000 visitors** a month, AB testing will be very **unreliable** because it’s necessary to improve the conversion rate by more than 30% in order to have a “winning” variation.

With between **10,000 and 100,000** visitors a month, AB testing can be a real challenge as an improvement in conversion rate of at least 9% is needed to be reliable.

With between **100,000 and 1,000,000** visitors a month, we’re entering the “Exciting” zone: it’s necessary to improve the conversion rate by between 2% and 9%, depending on the number of visitors.

Beyond **one million** visitors a month, we’re in the “Safe” zone, which allows us to carry out a number of iterative tests.

It is really worthwhile doing AB tests in order to validate hypotheses so, even if your traffic is low, it’s important to still find a way of testing at least certain elements.

If your traffic is less than 10,000 visitors a month, however, it’s going to be very difficult to optimise using “iterative” methods.

**Here are two things you should be doing in parallel:**

- Trying to increase your traffic!
- Working on optimising your site:
- solidify your USPs: why visitors should buy from you rather than your competitors, and why they should do it now!
- test functionality: does your site work well on all browsers, devices, etc.
- gain a better understanding of your visitors and customers (mini-surveys, user testing,…).

In the 10,000 to 100,000 visitors a month zone, it’s still quite difficult: iterative tests aren’t very powerful and so it becomes necessary to create “innovative” tests in order to obtain reliable results. Using **Neuroscience** and **Consumer Behavioural** techniques can help to boost a test’s performance on top of those elements mentioned above.

Micro conversions are steps that occur during your conversion funnel, for example when someone adds a product to their basket, but the final conversion is still making the sale (which is a macro conversion).

It’s pretty easy to increase the rate of micro conversions and so it can be tempting to try and base tests on these. However, you must always also measure the impact the test is having on macro conversions to have an objective understanding of the true performance of your tests.

If you optimise “too much” on one side, then it could result in a decrease in conversion on the other side: for example, if you over-optimise one page through the use of motivating messages but then the promised incentive isn’t realised on the following pages, your overall conversion rate is likely to drop. This is called the **Roberval Balance**.

**AB testing is an important technique** for validating ideas. When used alongside a coherent statistical method it can provide robust information about how to optimize a website.

However, to run tests successfully, your sample size must be sufficient.

Subscribing to an AB testing solution doesn’t guarantee results because often the main objectives of these is to sell you a 12 month subscription which can lead to certain misinformation and confusion about results given. The best way to optimise your conversions is to have wider knowledge and support in order to make informed and effective changes.

xvideos says

Google’s 41 shades of blue is a good example of this. In 2009, when Google could not decide which shades of blue would generate the most clicks on their search results page, they decided to test 41 shades. At a 95% confidence level, the chance of getting a false positive was 88%. If they had tested 10 shades, the chance of getting a false positive would have been 40%, 9% with 3 shades, and down to 5% with 2 shades.

Stephen Courtney says

That’s a great example. It’s not easy to do AB tests well – but testing is one reason why Google is as big as… well, Google!