Getting your MVT tests designed, developing your creative and implementing code are all just precursors to getting your test live. However, once your tests are live success will be determined by how well you did all those things. At this point you are just monitoring results – so relax and kick your feet up…if only it were that easy.
Over three years of doing MVT most of the questions I’ve received have been about determining when a test is over. This is because when a test is live, results are always going to be changing. But testing forever defeats the purpose. Every test needs a time to cut the cord.
So, how to figure out when you’re finished? You need to weight the following factors:
Confidence: Confidence is simply calculating the discrepancy between the results. The better the performance of one test recipe vs. another the more statistical confidence will be achieved. For example if our success metric is conversion rate and one test recipe is A at 5.0% and recipe B is 5.5% there will not be a lot of confidence (that B is 10% better). However, if A was at 5.0% and B was at 10.0% there would be a great deal of confidence (that B was 100% better than A).
Important Point: The confidence metric is based on the data that has been collected. This is not a predictive calculation.
Margin of Error: MOE simply looks at the confidence stats and factors in the amount of data that has been collected. The more data the smaller the increments for MOE. I generally don’t pay much attention to MOE as these swings can be very wide. I know some stat heads might get their panties in a bunch about this but as a marketer who relies on speed this can be a paralyzing metric since so much data is needed in most cases, even with fractional factorial testing.
Stability: Stability coupled with confidence are the two most important things to look at in determining if your test is over. There are two graphs you want to be looking to judge test stability. One is cumulative stability and the other is daily results. Let’s see what these reports look like in the Omniture Test&Target tool.
The main things we’re looking for in the cumulative reports are trending and consistency. Once things seem to level off for a period of a week or so, we’re looking good.
￼The main things we’re looking for in the daily results reports are outliers and fluctuation. Once we have a recipe that wins most of the days we’re looking good.
Account for Temporal Changes!
Generally a best practice is to let your multivariate tests run a minimum of two weeks. This way you can get week over week results and see if there are any strange temporal behaviors that could be skewing the results. Here it is helpful to look at the daily results. I’m hoping Omniture’s Test&Target will soon be able to graph results week over week (or in other comparative timeframes) like Google can.
Don’t look back!
Successful multivariate testing is about speed (how quickly), velocity (how many) and iteration (how intelligent) based on analytic data. I’ve never regretted stopping a test with a big winner because even after is test is done you are going to be monitoring results. More often than not early results hold up as winners even if the overall improvement levels subside a little bit. For best results I’d much rather run 10 small, quick tests over a month period than 2 large ones.
This post effectively wraps up my multivariate testing overview in six parts. My final thoughts:
Multivariate testing can be a tremendous amount of fun and get you great results but it requires highly dedicated marketers and great creative methodology. Matt Roche the founder of Offermatica once shared three learnings from his time building the most successful multivariate testing tool. I’ll end with his great advice for digital marketers.
1. Great marketing comes from great marketers, machines help them aim better
2. Engaged marketers lead to engaged customers
3. Speed is everything