Rules for a successful multivariate test (Billy’s Optimization Guide Part 3)
No Comments Methodology, Testing Concerns, Testing Techniques
If you missed it, see Part 1 (A/B Split Testing) and Part 2 (Multivariate Test Basics).
With the basics of part 2 down, it’s time to start designing a multivariate test. Every optimization project has different challenges and goals, luckily though, there are a few rules that apply to every multivariate test design. These rules fit into two categories: technical rules and content rules.
Technical rules:
- Choose the appropriate multivariate test type (full or fractional factorial)
- Determine the number of factors and levels that can be tested based on estimated conversion traffic (choose a test array)
- Stop the test when it has stabilized, not based on your earlier estimations
These rules ensure statistical significance by constraining the test to the appropriate size at the beginning and then letting the test gather the proper amount of data at the end.
Running a test full factorial, if your traffic supports it, may be a good choice if you’re testing content that you believe to have many interactions or if you only want to test 2 factors with 2 levels each. (Note: the smallest fractional factorial test size is 3 factors with 2 levels each.) Typically though, you’ll want to run a fractional factorial test to save time and expand the number of factors and levels you can test.
In order to find out how many factors and levels you can test, you need to have some idea of your predicted page views, conversions, as well as an estimate of lift. The reason that lift matters, is that a large lift will get you more conversions and so your test will stabilize quicker. Because of this, I would be conservative with lift estimates to ensure that the test is not designed too large. At Widemile, we have a large list of arrays available to our tool and have calculated the approximate conversions needed to stabilize, allowing me to look at the three criteria I listed and find the arrays that are statistically viable for testing. You should look for something similar with your tool of choice.
To figure out when a test is stabilized, I prefer to primarily look at level influence stabilization with experiment conversion rate stabilization for support. Widemile Optimize shows this using graphs, so I simply look for horizontal trending of lines, meaning winning levels and experiments stay winners and their level of influence or conversion rates stay fairly constant (look horizontal) over 3-5 days. If you don’t have graphs available, the historical cumulative conversion rate for your experiments and see if there is a lot of variance between the latest few days of your test.
Content rules:
- Every item you test should answer an important question
- Test variety not quantity
- Test opposites first then refine
- Remember you can run more than one test
The content rules are closely tied together. In effect, they ensure that the items selected for testing have purpose and that they don’t needlessly expand the size of your test, reducing its efficiency. I begin designing tests by creating hypothesis regarding issues with the page and then choose factors and design levels to address those issues.
An example hypothesis is “Having a hero shot on the right side of the page causes users to ignore the important value proposition on the left side.” To test this, I would choose hero shot position as a factor and then have “left side hero shot” as the baseline level and “right side hero shot” as the second level. This example also illustrates that, other than headlines and images, testing layout is possible with creative use of CSS and sometimes JavaScript. As long as you can revert from one to another and it matches the other factors and levels, you are at liberty to test anything.
Coming back to the rules, make sure that you are testing as few items as possible to find out what you need. Before testing a collection of lifestyle hero shots, choose one and test it against an iconic hero shot. This will save you the time of going down a path of testing something that may not work.
Lastly, you aren’t going to be able to get the best page on the first run or even second, third, etc. If you knew what your audience liked 100% of the time then you wouldn’t need testing. Remember to think of your overall test plan beyond just the first run, so that you can answer all the questions you need without having to force everything into one test.
In summary, determine what you’re trying to achieve, select the proper testing method to meet those goals and then make sure to be purposeful and efficient with the content you end up testing in front of your visitors. Testing and optimization is not difficult, although it can be tough to start. Follow these rules and you’ll be on your way to conquering conversion rates, bounce rates, funnel drop-offs and many other metrics.
Photo credit: Aranda\Lasch (CC)



















