Gamble with your conversions to raise them
By Billy | August 19, 2008
You and your competitor’s all have the same landing pages. You have a hero shot of the product, a big call to action button and short, punchy copy. Or maybe you’re already ahead of your competitors and have run a few tests on your page, picking up more conversions on the way. In either situation, you’ll eventually hit a wall and struggle to get additional lift. So how do you continue to improve?
Go for broke. Try something you’ve never tried before. It might end up being a total failure, but it also might give you the lift you want.
The gamble you make with optimization can end in 2 ways:
- You lose X amount of conversions over the week or two that the test is running
- You gain X amount of conversions for the effective lifetime of the page
The possible upside dwarfs the downside by a large margin and, either way, you learn something new and can optimize the next test more successfully based on what you learned.
Luckily, with skill and experience, the risks of testing are minimized, however beating a strong page is never easy or guaranteed. But when you do find something new that works or see that your current page still is a champ, you can rest assured that you’re doing all you can to drive conversions.
Topics: Methodology, Testing Concerns, Testing Techniques, Why Test? | No Comments »
SES San Jose: Landing Page Optimization Roundtable
By Billy | August 14, 2008
If you’re going to SES San Jose and want to really learn about optimization, check out the Landing Page Utopia: Expert Roundtable. My boss and Director of Optimization, Frans Keylard, will be on the panel. Everything I know was taught to me by Frans, so if you’re serious about talking to an optimization expert here’s your chance. He has a wealth of testing experience and is a fun guy in general.
In addition, Jonathan Mendez will be on the same panel. If you don’t know him, he used to run OTTO Digital, a former division of Offermatica. He is on top of the optimization game as well. It should be a great panel!
Topics: Industry News | No Comments »
An Essential Primer on Full and Fractional Factorial Test Design
By Billy | July 24, 2008

What are full and fractional factorial test designs? How do they relate to optimization and what about interactions?
Once you get down and dirty with testing, these questions matter. Whether selecting an optimization platform or trying to thoroughly understand the tests you are building, grasping these concepts will put you in greater control and allow you to design and analyze your tests more effectively.
As simply as possible, I hope to educate you and other marketers about full and fractional factorial test designs and why fractional factorial is the best choice for multivariate testing of online campaigns.
Note: “Partial factorial” and “fractional factorial” are the same. Also, if you don’t have a thorough understanding of experiments and interactions, please read those first.
The tests used in optimization are from the design of experiments field. (From Wikipedia: “Design of experiments is the design of all information-gathering exercises where variation is present, whether under the full control of the experimenter or not.”) The two types of tests I will focus on are fractional factorial and full factorial.
Here is an example I will use to explain these concepts. Below is a test matrix outlining a test for a landing page with 5 factors with 2 levels each. Don’t let the vocabulary scare you away, this means that there are 5 parts of the page being tested and 2 variations of each.
Recipe Matrix: 5 factors = 5 parts (hero shot, headline, etc.) and 2 levels = 2 variations
These factors and their respective levels make up the possible combinations for a landing page. The combinations displayed are called experiments.
Let’s calculate the total number of experiments possible (even if you know how to do this already, this is important to understanding the distinction between fractional and full factorial.) There are 2 levels for each factor, so you can have 2×2x2×2x2 (2 to the 5th power) = 32 possible experiments. This means there are exactly 32 combinations of hero shots, headlines, sub headlines, button text and main copy from our matrix outlined above. Note that if we add another factor, it becomes 2 to the 6th power or 64 possible experiments. Additionally, if you add 2 more levels to any of the existing 5 factors, it will increase from 32 to 4×2x2×2x2 = 64 experiments also.
In testing, each experiment must get a minimum amount of measurable conversions, known as the sample size per experiment. This ensures that there is enough data for a solid statistical analysis. Therefore the more experiments you have, the more conversions you need. You can think of conversion data as time also, since the longer you leave your web page up, the more data you get.
Now we’re ready to go back to the difference between the two test designs. Full factorial testing requires that every possible experiment combination is shown, so our 5-factor test would need to display all 32 experiments. This means that if there is a sample size of 100 conversions, 3,200 conversions will be required. Fractional factorial works differently, it displays a much smaller number of experiments, about 8 in this case, so it would need about 800 conversions.
Since full factorial gathers additional data, it reveals all possible interactions, but as seen by the numbers above, there is a trade-off. More data equals more information but more data also equals a longer test duration. The minimum data requirements for full factorial are very high since you are showing every experiment.
Even if you are using full factorial to get the same amount of information as a fractional factorial test, it will take more time since you need more data to see statistically relevant differences between the many experiments.
You might be wondering how fractional factorial can be accurate if interactions are possible?
Random interactions of high relevance are very rare, especially when looking for interactions of more than 2 factors. You really need to design tests where you look for meaningful interactions that are based on true business requirements rather than hoping for a random and low influence interaction between a red button, a hero shot and a headline.
Whatever the interaction is, you need to be able to understand your audience and infer why there was an interaction in the first place, only then are you ready to start designing for interactions.
Tests should not be filled with random levels, they should be carefully designed for success by focusing on testable hypotheses around the audience. Could a 1 pixel drop shade on a button interacting with the copyright statement ever be truly significant, and not a victim of random error? Is it worth sacrificing thousands of conversions to learn a lesson that won’t result in any relevant increase of real world conversions?
There are interactions that might make sense and those that should be avoided from being measured because of the amount of testing time it adds.
This brings me to fractional factorial. It is possible for fractional factorial tests to detect interactions. How so? Using our example of a 5-factor test, fractional factorial can include everything from only main-effects all the way to 4-factor interaction effects. Full factorial’s only difference is that it is the full extension and includes the 5-factor interaction effects.
Fractional factorial is not a one-trick pony, it is a continuum ranging from testing for no interactions (only main effects) to one factor less than full factorial. It is exactly what the name fractional implies; even one less is a “fraction” of full factorial. It gives you the power to make trade-offs between testing only main effects to testing for interactions based on intelligent test design.
Once you decide to test for all possible interactions, you are committing to a full-factorial test and incur the associated traffic requirements. I’d love to see a test design that is designed for full interactions and still makes sense! Not having the ability to reduce the number of interactions is a huge detriment rather than a benefit of solutions limited to full-factorial testing.
Radically shorter test times allow for many more smart marketing ideas to be tested and adapted based on what you learn from each test run. You, the marketer have the ability to analyze your results and tweak follow-on tests to capitalize on what you learn. This common-sense approach is what hypothesis-based testing is all about and is very powerful. Focus on testing smart ideas to increase your conversion rate – that’s what matters most.
The graph below illustrates how much information is gained and the amount of testing needed, based on the number of interactions tested.
In my experience, the red area shows how valuable the data is based on which effects are being tested, while the blue area shows the amount of data (or time) needed to gather the data to confirm those effects. The x-axis goes from left to right, from main effects to full factorial (5-factor effects).
At Widemile, we believe it is more effective to perform quick, successive tests detecting only main-effects rather than randomly hoping for interactions. While interactions might give you small or even large gains, it likely will never not trump the gains from additional testing, nor the time and money lost looking for random interactions. The additional time required for full factorial tests is large and not many marketers want to wait more than a month for a test to complete.
Fractional factorial is preferred by a few camps, including Widemile, Omniture’s Test&Target (formerly Offermatica) and Interwoven’s Optimost. Full factorial is used in Google’s free Website Optimizer and some tools offered by smaller providers.
Testing for all interactions sacrifices a lot of time. With the speed that audiences, marketing campaigns and seasons can change, it is important to get the most testing done in the least amount of time without sacrificing the quality of the data. Fractional factorial allows you to do just that, making it the wisest choice for multivariate testing.
Topics: Methodology, Terminology, Testing Techniques | 4 Comments »
How to do efficient optimization
By Billy | July 2, 2008

A beginner’s mistake is to test every idea with every test. This is the most obvious way of being efficient. If I can test 50 things in a week, why not?
In my experience, efficiency has more to do with careful test design and doing things right the first time, than trying to test everything and rushing the process. By testing a few big ideas quickly and then designing the next test based on those results, you can do a set of small tests and get answers fast without having to risk your page to many bad ideas.
Every test should have specific questions its trying to answer. Not just “What’s the best performing page?” but questions that lead to that. A car salesman doesn’t blindly try every tactic in the book get you to buy a car, a real salesman probes you with a few questions and changes their technique accordingly.
That’s how you should design your tests.
Here’s an example test plan that works for most clients:
- Step 1 (Split Test) - Find an optimal template/design: What template and/or design effectively gets visitors to stick, click and convert? At this stage, you aren’t testing messaging yet, you’re merely re-skinning and moving elements around to find a good design. Some techniques to use are simplifying the page by de-emphasizing unimportant content (shrink company logo, move ads to the bottom of the page) and emphasizing core content (moving 3rd party validation near the call to action) and adding more whitespace to the page to enhance readability. These are in addition to a well done creative design. This test usually has the greatest impact, however it all depends on your current page and the audience. (Read more on template testing)
- Step 2 (Multivariate Test) - Find the biggest converting segment: This test focuses on finding the correct messaging by appealing to different segments that you know and hypothesize visit your page. If your product was Google Apps, you might test appealing to business users and freelancers. Or if you are selling a cell phone, you might test features against benefits.
- Step 3 (Multivariate Test) - Find the perfect way to communicate to the segment: Step 2 points you in the right direction, but this step helps you find the exact place you should be with your page. Use what you learned (freelance messaging won) and try variations on that winning theme to really grab your audience and give them what they want. Also, step 2 may have revealed 2 or more segments that are worth targeting. If you can segment them out, run multiple tests that are customized for each segment, and you’ll raise conversions even higher.
The alternative is to test 50 ideas of which many of the ideas overlap. Why test any ideas that are remotely similar until you know that they work in general? If I go to a dealership wanting a sports car and the dealer offers me 5 colors of minivans, I’m still not going to buy a minivan. Show me 4 types of cars, let me pick the one I like and then we might talk about color.
Let your visitors lead you!
This really is a simple process, but it drives results. Be methodical to be efficient. By course correcting in each test, you get closer and closer to what you need and don’t spend a lot of time testing losing elements. Follow a test plan like this and you’ll get results and learn a lot about your core converting visitors.
Topics: Landing Page Optimization, Methodology, Testing Techniques | No Comments »
Updates coming…
By Billy | June 6, 2008
I was on vacation and have been swamped in catching up with client work, but I promise I’ll finish my article on fractional factorial test design soon! It’s a very long article and I want to make sure I get it right.
Hope some of you got to see the Widemile booth at SMX Advanced in Seattle (I didn’t since I was on vacation unfortunately.) I heard good things about the panel Widemile’s CEO, Robert Bergquist, was on and that Jonathan Mendez was kind enough to recommend my blog as a resource, so thanks goes out to him.
Thanks for sticking with me!
Topics: Site News | No Comments »
Interactions
By Billy | May 23, 2008

I’ve written an extended definition for interactions in preparation for a long post about full and fractional factorial. Understanding interactions is a critical part to understanding full and fractional factorial also. I look forward to clearing up some misconceptions about fractional factorial test design. Hopefully I’ll have the post done next week.
Topics: Site News, Terminology | No Comments »
Optimization Glossary
By Billy | May 15, 2008
Someone at the office has put together a great glossary, so I modified it slightly and have posted the glossary as its own page (it has a tab dedicated to it above now.) It is in a usable but not optimal form right now, so I’ll be updating it every now and then. I have also decided to add expanded definitions, in the form of separate pages dedicated to a single word. The first word to be done was “experiment.” Please check it out and let me know what you think.
In the past, I have stepped away from using technical language and jargon but, with this glossary, I will begin using the language I use at the office. My hope is to acclimate others and help them understand the terminology used by myself and others at Widemile and around the industry.
Topics: Site News, Terminology | No Comments »
Billy’s Twitter
By Billy | May 2, 2008
![]()
If any of you use Twitter, I’ve opened an account, Billysblog. I’ll be posting interesting links related to optimization, marketing and Widemile, as well as any thoughts I have about online marketing. I’ve included a feed on the left nav bar of this blog too.
Topics: Site News | No Comments »
3 difficult optimization results and what you can learn from them (3 of 3)
By Billy | April 30, 2008
Note: This is the third post of a 3 part series, each focusing on one type of test result that is tough to deal with. Read the other 2 articles on highly mixed data and the original page beating the new variations.

Ready for the toughest of all test results? I brought in Widemile’s Chief Scientist, Vladimir Brayman, for this post to help me with some of the concepts around this topic. The last of the three results is when the results just won’t stabilize.
How does this happen?
As long as you have homogeneous traffic and enough time, a test should stabilize. Unfortunately, this is not always possible and I don’t know anyone with unlimited time. The most obvious way this occurs is when a test is designed too large, meaning you don’t have enough conversion traffic for the number of variations you are trying to test.
Additionally, getting homogenous traffic is not always easy. If your sources are too different, you can have problems. Text, banner, e-mail ads and even Yahoo vs Google traffic may behave differently. The worst case is when these sources of traffic are added mid-test. I have had tests where an e-mail campaign was done at the end of a test without my knowledge (until I asked about the huge spike in traffic!)
You can’t control all traffic coming to your page from some sources like PR, blogs, seasonal events and news. This goes back to part 1, about highly mixed data; everything there applies to this case too.
A test also may not stabilize because the test is designed with elements that are too similar. The same thing can happen when 2 elements are different but have approximately the same amount of impact. In these situations, your data will go back and forth on which of them are the winners.
Anything outside of your page that has a large influence can destabilize your test, this includes pieces of your funnel. One symptom of this is when your clickthroughs are fairly consistent but the full conversions are not. If you are testing a landing page and the sign-up process after it is very kludgey and difficult for users then it can have a large impact on your tests’ ability to stabilize. This is especially true if the experience for visitors changes. An example of this is visitors bailing from a purchase funnel because shipping to their area is prohibitively more expensive than other areas. Although they would have converted if shipping was within the average price range, they ended up not converting because of something encountered outside of the landing page, skewing your results. This is in almost every test, but the magnitude of its impact depends on what exactly occurs.
What can you do to prevent this?
If you are using a testing tool different from what you normally track your conversions with, make sure you run a baseline test so that you can compare the numbers your testing tool gives you with the ones your conversion analytics produces. They should be within about 10%-15% of each other over about a week or so. Finding a large discrepancy here will save you from headaches down the line. This essentially double checks the expected traffic numbers by ensuring you are measuring your current conversion correctly, which allows you to design a test of the appropriate size. By size, I mean ensure that you have enough testing time and within that time you will get enough traffic.
While easier said than done, it is important to look for new traffic that may be driven to your page and to segment it out. Since this shares some of the same problems as highly mixed data, those solutions apply here too.
What can you do if this happens?
First, don’t cut your tests short unless you think more data won’t solve the problem. If you don’t reach stabilization, you are wasting all the time you tested since you have inconclusive data. Always try to be as conservative as possible and end tests only when you are very confident that the test is stabilized or that there is no other choice.
Think about restarting the test if it isn’t stable. Use a smaller design. Pick the important factors (pieces) and the levels (variations) that you think will perform and are drastically different from each other. This prevents elements from looking unstable as they flip flop as the optimal.
If your only problem is that 2 variations are vying for the winning position, then they likely perform about the same. It probably is not really worth your time to wait for them to stabilize and so stopping the test and going with either of them likely will have little difference to your conversion rates.
The problem of outside funnel influence is a bit harder, but not impossible to solve. The best solution is to segment the users that are determined to be unqualified. For example, if you only ship or work with US customers and businesses, then filter out any users that are outside of the US and do your analysis from there. This can be done either at the data level if you can tell where the data came from, otherwise this can be done with a splitter or qualification page that leads people into the appropriate funnel first. This may impact your overall conversions itself though, so careful testing around these methods should be done as well.
From my experience, the problems I’ve listed in these three posts are either preventable or unlikely to occur. The value of having an optimization expert is because they can avoid these situations or at the very least extract useful lessons when they do happen. Having said that, don’t be scared to test. Once you get the hang of it, it is a lot of fun and one of the keys to effectively growing and maturing your online marketing campaigns.
CC photo credit #1: ryaninc - CC photo credit #2: jurvetson
Topics: Methodology, Testing Concerns, Testing Techniques | No Comments »
Google Web Optimizer officially launched, no AdWords required
By Billy | April 18, 2008
I just got news that Google Web Optimizer is out of beta. In addition, it doesn’t even require an AdWords account to use it anymore. This is great news for the testing industry and for all online marketers. Check it out here. In addition, there now is a dedicated Official Google Web Optimizer blog.
I’ll see if I can get some tests running just to see what the isolated tool looks like versus the integrated one. They also upgraded the setup of multivariate tests for all versions.
On another note, it’s good to see that Google saying things like “it’s hard to find a serious advertiser who doesn’t at least plan to do content testing this year.” They even mention some best practices that I’ve talked about at this blog:
- “don’t be shy: big changes generally yield big differences in performance”
- “We recommend letting your experiments run for at least two weeks, no matter how much traffic you get and how strong the results initially appear, just so the data has enough time to normalize.” - I recommended the same things in my Multivariate Testing Primer.
Also there’s a forum for Google Web Optimizer users, which isn’t new, but expect it to grow quickly with this latest announcement.
If you’re waiting for the last post in my 3 part series about difficult test results, I apologize. I’ve been sick all week and wanted to go over my last post with Vladimir Brayman, Widemile’s chief scientist, before I posted it for the world to see. It’s a very important topic and a challenging one too. I’ll try to get it out next week for sure.
Topics: Industry News | No Comments »
















