Multivariate Testing – Test Design

Part 2: Choosing a MVT Test Array
Also see Part 1: Introduction to Factorial Designs


The results of your tests are often pre-determined by your test designs. Good designs will provide measurable increases in performance and learning about what tested elements are most important in driving that lift. Good test designs will also provide learning even if you don’t get large increases in lift or statistical confidence in the results. As some of you may know designing a good multivariate test is a little harder than it seems. It takes a good amount of experience to understand the inputs and drivers needed. Hopefully sharing some of my experience will help you get better results.

There are two parts of multivariate test design that need to be considered and determined.

1) The test array
2) The test elements

I always pick the array first and then select the elements based on the array selected. Sometimes people do this in reverse and I’ve seen it lead to problems more often than not. Usually because they don’t have the data (traffic) to support the test. Also once you know your array sometimes there is flexibility in the amount of elements and alternatives you can use.

MVT Test Arrays
NOTE: As mentioned in Part 1 of this series MVT arrays are creating using fractional factorial or full factorial design of experiments — I will only be covering fractional factorial array determination.

An array (sometimes referred to as an orthogonal array) is the sequence of elements that make up the test designs. Arrays are based on the decision of how many elements you want to test and how many variations of these elements. The array you choose will also inform how many designs you need to create and exactly how those pages need to be designed.

Example of an L4 MVT Array
3 Elements x 2 Variations = L4
Note: The control or existing always counts as a variation


In the example above three new designs would need to be created in addition to the control for a total of four. These designs are usually created by placing JavaScript around the areas on the page you decide are the most important elements to what you are trying to improve. As mentioned if you look across the table you can clearly see what mix of elements you will need to create the designs correctly.

In some instances the arrays needed to test more elements are the same as those testing less. For instance 5×2, 6×2 and 7×2 are all L8 arrays. Here is a great tool for determining orthogonal arrays. Some of the most successful arrays in my experience have been 7×2 L8 and 4×3 L9 and with less traffic or faster results a 3×2 L4. The nature of these arrays also reduce margin for error because the use less aliased interactions as described in Part 1.

That brings us to how to find the right array for your test. The main factors to determine the proper array are:

• Time available to run test
• Estimated traffic into test
• Estimated or baseline conversion rate

Let’s explore each of these.

Time Needed for MVT

The main considerations for testing time are the amount of data that needs to be collected in order to achieve statistical confidence and ensuring temporal changes in behavior are accounted to ensure result stability.

Data means visits and visits mean time. Depending on what you are trying to accomplish you can choose to run a shorter test with less traffic and fewer elements and alternatives or a longer test with more traffic and more elements and alternatives. This is one of the many trade-offs we encounter while running tests, especially MVT. It’s where the art of testing meets the science of testing and experience with these decisions can be very helpful.

It is important to understand statistical confidence as it is an important part of all testing. Statistical confidence is a percentage score out of 100 that the resulting data set is accurate. It has nothing to do with how confident the performance will be in the future (this is sometimes where people get confused). Statistical confidence is a result of two factors, the amount of data collected and the discrepancy between the performances of the designs. For example if I have 10,000 tested visits and the best performing design has 42% lift over the next best one I will have 99.9 confidence in those results are accurate. If amount of traffic and/or the difference in performance is reduced the confidence will also drop. With less data or closer results the margin of error is increased.

As far as time considerations, you want give yourself a minimum of one week to run a MVT so you can capture any temporal changes in day/week behavior. Two weeks is ideal. Be cautious before launching tests that there are not any unusual events that would cause different behavior during the estimated test period such as special offers, seasonality and holidays.

NOTE: Estimating the time needed to run a test and determining if a test is over are two different things. I will go into greater detail in a future post on determining when the test is complete.

Traffic and Conversions Needed for MVT

Along with time the amount of traffic and their resulting volume of conversions will determine what array you should use. The best way to determine traffic needed is to back into it based on conversions data. A general rule of thumb is 100 conversions for each design (also called recipes, branches or experiments). Using our L4 example from earlier here’s the formula for a sample landing page test:

L4 = 400 conversions needed for test
Baseline conversion rate 5%
Traffic needed for 400 conversions is 8,000 visitors

So if I average 4,000 clicks per week to this landing page then the estimated length of testing to get statistical confidence and stability is 2 weeks.

Having this information it’s pretty easy to determine you could run a 7×2 L8 for this AdGroup if you can wait another two weeks for results. Or maybe you can raise spend caps to drive more traffic and get results in one week. Once you understand all the inputs there are a lot of levers you can pull. Still, at the end of the day even if you have the right array your test results are only as good as the elements you selected and the messaging/creative used for your alternatives. I’ll cover that in Part 3.

3 thoughts on “Multivariate Testing – Test Design

  1. I am a little bit confuse with the 3X2 array. shouldn’t 3X2 have total of 8 design instead of 4? Shouldn’t it be 2 to the power of 3? Can you explain why? Thanks


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s