This post discusses the Growth Marketing Research & Testing course as part of Track 2 (of 7) for the Growth Marketing program from the CXL Institute. The course is taught by optimization expert and business builder Peep Laja. It covers the research & testing strategies and skills required to bake experimentation into a growth marketing process, including the ResearchXL Processes and the PXL Testing Prioritization Framework.
Why Do We Test?
Laja says testing programs aren’t about following overly-simplistic ‘best practice’ listicles, design trends, category leader behavior, or competitor benchmarking. They are about the discovery of what matters most to your customers so you can solve their problems:
- Whose problem are we solving?
- What do they need? Why?
- What do they think they want? Why?
- How are they making a decision? Why?
- What are they doing or not on the website? Why?
- What are they thinking when they see our offer?
- What leads more people to do X-Action?
- How is what we sell clearly different?
- Where is our website leaking money?
The ResearchXL Processes
Laja introduces the ResearchXL process, designed to help answer questions that inform testing. The Process covers 6 types of data-gathering: Technical Analysis, Heuristic Analysis, Digital Analytics, Mouse Tracking Analysis, User Testing, and Qualitative Research.
Research – Technical Analysis.
This is relatively straightforward. It’s all about assessing the current technical state of a website or app. This includes speed testing, cross-browser testing, and cross-device testing – looking for slow pages, browser compatibility issues, and conversion differences, to determine if they are caused by technical bugs or something else. A common tool is crossbrowsertesting.com.
Research – Heuristic Analysis.
This is a little more on the ‘touchy-feely’ side of the analysis spectrum, where you and/or your team are looking to identify ‘areas of interest’. It’s not overly-scientific but it can provide a good theorizing exercise to check for relevancy, clarity, motivation, and friction.
Laja suggests simple exercises like assessing clarity around a single action for each page. He references the FOG behavioral model which requires three needs for conversion:
- High motivation (do customers truly want it?)
- Easy to take action (how quick and simple is it?)
- Clear trigger (when is Call-to-Action presented?)
Research – Digital Analytics.
You can use established digital analytics tools as your baseline health check, and specifically to identify leaks and confirm or refute theories established during heuristic analysis.
- Where are the leaks? Where are the opportunities?
- Which segments (devices, traffic channels, demographics)?
- What are customers doing?
- Which actions correlate to conversion? Do these confirm or refute your thinking around the golden path?
Common tools are Google Analytics (tag manager), and Heap Analytics (good for product companies).
Research – Mouse Tracking Analysis.
Mouse tracking as a general term refers to recorded observations of how customers interact with your website, apps, etc. It includes heat maps, click maps, scroll maps, form interaction, and user session video replays. Ultimately, this type of analysis goes beyond theoretical and into an evidence-based approach to confirm things like where customers click, how far they scroll, and different activity based on different devices and browsers.
Popular tools with wide offerings include Hotjar and VWO, with Zuko specializing in form analytics (although both Hotjar and VWO offer solid form analytics tools as well).
Research – User Testing.
This is the practice of using people in a controlled environment to test your website, apps, etc. while you observe in order to identify usability and clarity issues. What’s hard to understand? What tasks are hard to do? What goes wrong?
Generally, user testing focuses on key tasks, asking testers to perform certain tasks that will confirm or refute theories previously developed through things like heuristic analysis. Common task types include:
-
- Specific task (find info about size 34 men’s jeans)
- Broad task (find a pair of jeans)
- Total funnel to completion (find men’s 34 jeans, apply a discount, and complete purchase)
Testing can be conducted in-person, but it has become common-place (and significantly faster and more affordable) to use remote testing tools like UserTesting and TryMyUI.
Research – Qualitative Research
This research area is primarily driven by the deployment of surveys intended to unlock the ‘why’ to the ‘what’ discovered through quantitative approaches like analytics and testing.
- How are they deciding to buy?
- What’s holding them back?
- What’s the biggest frustration?
- What else do they want to know?
Laja suggests surveying ‘highly qualified’ people who either just successfully completed a purchase (he says recent first-time buyers will average a well above average 10% response rate) or who visited but did not complete their purchase.
He also recommends using open-ended surveys rather than multiple-choice (which are typically a best-practice), as he believes open-ended presents the best opportunity to gather raw feedback that does not steer customers into any preconceived biases.
The PXL Testing Prioritization Framework
From the data obtained with the ResearchXL Processes, we are in a position to use a scoring framework to prioritize the tests most likely to create business impact.
Common scoring frameworks like ICE (Impact, Confidence, and Effort) or PIE (Potential, Importance, and Ease) can be effective. But Laja says the problem with these frameworks is they force you to project the impact/potential before you run an experiment – meaning you score things based on guessing what is possible.
The PXL framework instead looks for criteria that are more likely to produce a result based on a number of criteria that have pure binary inputs (yes/no, and number scoring based on facts).
Experiments with no hard facts that are entered into the PXL model will score a zero. It takes the guess-work out of the prioritization.
Testing Programs Overview.
A/B testing is vitality important to any growth program, and as such it has its own course as part of the Growth Marketing Program at the CXL Institute. Here, Laja just covers the basics of testing operations.
Overview Metrics of a Testing Program:
It can be tempting to track and report every nuance of a testing program, but to paint the most concise picture of program effectiveness focus measurement on 3 things:
- Test Quantity (per week or month).
- Win Rate (% of tests that win).
- Average Uplift per test (can offset lower win rate).
Statistical Significance.
In order to accurately report on individual tests and your testing program as a whole, your need to arrive at statistical significance:
- Adequate sample size (calculated by current conversion rate for page and minimum uplift you want to detect) – free tools like the one created by Optimizely tell you how many you need).
- Adequate business cycles (Laja recommends 4 weeks, starting each cycle at the beginning of the week).
Recommended Test Sizes.
- 500 transactions per month are the minimum recommended size to be able to run one legitimate test for new ventures.
- 1000 transactions per month are recommended before beginning a testing program.
- Do not make the mistake of testing a single page by arriving at a sample size through site-wide calculation.
- If you don’t have enough traffic to get to the needed sample size to ensure a test is valid, then DO NOT run the test.
- You can look to test microtransactions or start implementing theoretical improvements in order to set a baseline.
Recommended Test Durations.
- Test in full weekly cycles (Monday to Sunday).
- 2 weeks is the recommended minimum, and 4 weeks is ideal.
- High volume sites (10K+ transactions per month) can run shorter duration tests while still maintaining validity.
- Do not test longer than 4 weeks (things like cookie expiration, multiple browser usage, and multiple device usage can result in polluted samples).
- For very very important/business-critical tests (such as new product launches or pivots), do not run any concurrent tests, and ensure the test runs a minimum of 4 weeks.
General Test Recommendations.
- Run tests constantly, even a few per week if traffic supports it.
- Track and celebrate the small gains – their compound effect is what usually creates actual impact.
- Validate results with 3rd party analytics rather than relying exclusively on testing tool measurements.
- Do not give up after your first hypothesis fails – keep trying unless you have absolutely no data to support your theory.
- Do not run the same tests across mobile and desktop (due to different user experience and user % split inconsistencies).
- Do not assume you know test outcomes in advance – this is almost never the case.
- Always be on the lookout for validity threats:
- Instrumentation effect — tests not setup right.
- Selection effect – things like new ad traffic driving to a test.
- History effect – something in the world impacting results.
Testing Tools.
Hotjar, VWO, and Zuko have been mentioned above. A few other tools of note include machine-learning focused evolv.ai and conductrics, and testing program management with EffectiveExperiments, Miaprova, and Growthhackers Projects.
Send your suggestions for Growth Marketing resources to .