Kung Fu Testing: A/B Testing Experience Report

First I must start out with a huge apology to Lisa Crispin. I had promised her a brief experience report on A/B testing a few weeks ago and I failed to deliver.

I thought it would be a good topic for a blog post.

First I would like to explain some potential confusion. Recently I have heard people mention A/B testing as Test Driven Development. Although I think the phrase is applicable it confused me because I think of Test Driven Development, TDD, as write your tests then your code.

So in the spirit of A/B Testing you write your experimental design and then execute against that design. They certainly seem analogous, but I get confused in conversation and thought perhaps others might.

I first learned about A/B testing in 2002. I was with a small start up and we were trying to follow XP patterns at the time. I recall doing a couple of successful A/B tests on new features, but I really do not recall the actual mechanics. We did leverage a Big Lever software product called Gears to rapidly establish feature sets, but I was not privy to the server mechanics or I simply do not recall. The tool set did allow us to expose a customer base to feature A in a control fashion.

Today it is my opinion that A/B tests take on a way more sophisticated approach of scientific design. You will also hear this technique referred to Multi-variant Testing, MVT. I am amazed at how much design takes place today in order to have a successful A/B test. The key piece in my opinion is having robust mechanics for measurement. Sometimes a small statistical measure of variance can make a huge difference in the success of a business.

Here are the key components to a successful A/B test:

Hypothesis
Tools for Measurement
Mechanism for Traffic Control of the User experience

A/B tests can be extremely simple or extremely complex. Most of the time the experiment is designed to evaluate user behavior with the hope of directing that behavior to improve a business result. I think a guiding principle is to only adjust one variable at a time.

A hypothesis can come from anywhere in an organization, but in my opinion it takes a diverse team to evaluate the data and draw a meaningful conclusion that might improve the business.

Example hypotheses:

What if we increase the button size so more people will click it?
What if we change the checkout flow from Vertical to horizontal will the experience be better and sales increase?
What if we change the color from blue to green will we have more customer retention on the web page ?
What if we used a larger image size will more people buy the product?
What if we moved widget A above the fold would customers be more likely to use the widget?

Example Tools for measurement:

Google Analytics

New Relic

Splunk

Mechanics to control traffic flow:

Load Balancer

Varnish

There are many more tools and even companies who's business model is based on multi-variant testing. However, these are the tools I am familiar with today.

The ability to conduct A/B tests is dependent on many variables, but here is the basic approach.

You know that X number of clicks on Image A happen per day using Google Analytics. You would like to increase the number of clicks by 10%. Your hypothesis is that image size will make a difference. So you design a web page that has a 200 x 400 image and you also design a web page with a 400 x 800 image. You deploy each of these web pages to separate web servers. In order to get a statistically significant sample you divert 10% of the web traffic to web server B that has the design with the larger image. You believe one weeks worth of data will be statistically significant. You measure the clicks per day over the course of that week in order to determine if you increased the number of clicks with the larger image.

Unfortunately I am not able to share specific experimental designs or results, but I think Multi-variant is an important aspect of modern rapid software development. If you get the desired result from your experimental design it is just a matter of adjusting a switch and your customers are on the new design.

Another cool aspect that I forgot to mention that part of your traffic is the control seeing the normal experiment. There is a ton of literature on the topic and I will never claim to be an expert. But I do believe that well thought out tests coupled with accurate measurement can improve your business and customer experience.

Kung Fu Testing

Sunday, March 02, 2014

A/B Testing Experience Report

No comments:

Links