/files/23405719-10157246385934848-188103472457083914-o.jpg

Martin Betz

twitter.svg

AB testing 101

Martin Betz • September 18, 2020

AB testing is easy to understand and hard to master. The concept is easy: Form a hypothesis about something in your system and test your current implementation against a treatment. But you can easily get lost in a sea of definitions and poorly set up tests and hard to interpret results. They will not give you a clear winner but only numbers you cannot make sense of. Let's walk through an example together and I will tell you the bare minimum you need to know about AB testing when you are a product or tech person. You probably will not learn anything new as a data scientist.

Contacts

Contacts: Example with all definitions

Take this example of two contact cards. You have a contact card with one call-to-action "Contact". We use a generic avatar in the original version. We call the original "control". Our hypothesis is that we can increase the rate of people clicking on the "Contact" button - often called conversion - by replacing the generic picture with a real profile picture. The version with the real image is called "treatment".

We now split the traffic on our website in two groups, say 70% see the control and 30% see the treatment. If a higher rate of people click on the contact on the treatment, we choose it and replace the control implementation. We call this: Better conversion.

The exact probability that our result just happened by chance is called significance. It is defined by the sample size (Visitors on control and treatment are each a sample) and the variation of the groups (the more similar they are, the better). The more people are in one sample, say 70.000 in the control group and 30.000, the better. The second factor is sample variation: It's better to have the same distribution of people within both samples. For example it would be bad if everyone in the group of 70.000 was not familiar with technology and would prefer offline contacts and everyone in the 30.000 developers who like to do everything online. So, in summary, higher sample sizes and less variation in samples increases the validity of your result.

This is called significance. The significance is described as a percentage, such as 95%. That's the sweet number to be sure that the result did not happen by chance.

So what do you need to set up the test and what to take care of.

What you need for an AB test…

How to set up your test

You don't need to know lots about statistics. Tools like Google Optimize will do all the heavylifting with numbers for you.

You only need sample groups (defined e.g. as "All visitors on your page"), a split (70% on control, 30% on treatment), a clear goal (e.g. "Clicks on button - which triggers Google Analytics event button_click_contact or a goal that you defined in Google Analytics). Google Optimize will calculate the expected goal (in our case the conversion) and the probability that the treatment will beat the original.

This is an example result from a test where the treatment beat the original:

Contacts

QA