- Type 1 and Type 2 Errors
- Statistical significance
- Hypothesis testing
- Type 1 error
- Consequence of type 1 errors
- Type 2 error
- Consequence of type 2 errors
- Let’s sum up.
- CRO glossary: type 1 error
- Type 1 errors vs. type 2 errors
- What causes type 1 errors?
- Why are type 1 errors important?
- How do you minimize type 1 errors?
- 6 ways to find the most important elements to test
- Find the perfect elements to A/B test

## Type 1 and Type 2 Errors

In any experiment that is carried out, we often rely on probabilities to prove (or disprove) a hypothesis.

When carrying out an A/B test, for example, we are often seeking statistically significant results.

We are great advocates of testing in production and so A/B testing is one effective way to test your features on a select number of users to make sure that they’re working as they should before rolling them out to everyone else.

However, since such tests are always based on probabilities, as no hypothesis testing can be 100% certain, **this is why sometimes we may arrive at wrong conclusions leading to what is known as type I and type II errors.**

## Statistical significance

We mentioned the term ‘statistical significance’ which is what any experiment is seeking to find. In the experiments you run, you want to make sure that a relationship actually exists between the variables proposed in your hypothesis, which is the purpose of an A/B test.

You are ultimately seeking to ensure that your A/B tests achieve statistical significance before making any decisions.

If you’ve often carried out A/B tests, then you’re probably familiar with this term as it gives you the tools necessary to make informed decisions to meet your business goals.

For the sake of further clarification, a statistically significant result in such tests means that the result is highly unlikely to have occurred randomly and is instead attributed to a specific cause or trend.

Simply put, it is the probability that the gap or difference between variations and control is not random or due to chance but due to a well-backed experiment. It indicates your risk tolerance and confidence level.

In other words, when you run an A/B test with a 95% significance or confidence level, this means you can be 95% confident that when you determine the winning variation, the results obtained are real and not due to chance.

However, as with any hypothesis test based on statistics and probabilities, two types of errors can show up in your results.

## Hypothesis testing

Before we delve deeper into type I errors, it would be worthwhile to give an overview of what hypothesis testing is.

Hypothetical testing is when a hypothesis is tested against its opposite to determine whether it’s true or not. In this case, you have the null hypothesis and the alternative hypothesis or two variables.

Therefore, a statistical hypothesis test is used to determine a possible conclusion from two different and conflicting hypotheses.

The null hypothesis posits that there is no relationship between the two proposed phenomen while the alternative hypothesis is the opposite of what is stated in the null hypothesis.

P-values used in statistical testing help decide whether to reject the null hypothesis. The smaller the value, the more likely you are to reject the null hypothesis. In other words, it tells you how likely your data would have occurred under the null hypothesis.

The p-value is most commonly set at p

## Type 1 error

One such error is type 1 (or type I) error, also referred to as false positive, which is the wrong rejection of a null hypothesis even though it’s true. In other words, you conclude that the results are statistically significant when they are simply a result of chance or due to unrelated factors.

Simply put, a type 1 error occurs when the tester validates a statistically significant difference when there isn’t one.

In an A/B test, a type 1 error is when you declare a bad variation as the winner even though the test conducted was inconclusive. In other words, as a false positive, you adhere to the belief that a variation in a test has made a statistically significant difference.

Type 1 errors have a probability of “α” or alpha correlated to the confidence level you set. For example, if you set a confidence level of 95% then there is a 5% chance that you will get a type 1 error.

## Consequence of type 1 errors

Type 1 means wrongfully assuming that your hypothesis testing worked even though it hasn’t. Consequently, the main reason to remain on the lookout for such errors is that they may end up costing your company a lot of money as they could possibly lead to loss in sales.

If, for example, you tested out a change in the color of a button on your homepage and you noticed early on that the button did lead to more clicks. You are then convinced that this variation made a difference so you decide to end the test early by wrongfully concluding that there is indeed a correlation between this change in color and conversion rates.

Thus, you end up deploying this variation to all your users to find that, surprise, it didn’t actually have an impact. The end result is that you could risk hurting your customer conversion rate in the long run.

The best way to avoid such errors may be to increase test duration to ensure that your variation outperformed the control in the long run and sample size.

## Type 2 error

Type 2 (or type II) errors, also referred to as false negatives, occur when you don’t reject the null hypothesis when it’s actually false and you end up rejecting your own hypothesis and variation. Type 2 errors have a probability of β or beta.

In an A/B test, this means that you fail to conclude there was an effect when there indeed was and so no conclusive winner is declared among the control and variations even though there should be one.

In other words, you believe that a variation has made no statistical difference and you mistakenly believe the null hypothesis and that a relationship doesn’t exist when it does.

A type 2 error is inversely related to the statistical power of a test, where power is the probability that a test can detect an effect that actually exists. The higher the statistical power, the lower the probability of committing a type 2 error.

Statistical power usually depends on three factors: sample size, significance level and The “true” value of your tested parameter.

## Consequence of type 2 errors

Just like type I errors, type II errors can lead to false assumptions and poor decision making by concluding the test too early.

Furthermore, getting false negatives and failing to notice the effect of your variations may lead to wasted opportunities as you’re not taking advantage of opportunities to increase your conversion rate.

To reduce the risk of such an error, make sure you increase the statistical power of your test, for example, having a big enough sample size. This would entail gathering more data over a longer period of time to help avoid reaching the false conclusion that your experiment didn’t have an impact when the opposite is true.

The probability of making type I and type II errors is depicted in the image below, where the null hypothesis distribution shows all possible results if the null hypothesis is true while the alternative hypothesis shows all possible results if the alternative hypothesis is true:

As can be seen, type I and type II errors occur where these two distributions overlap.

## Let’s sum up.

Let’s consider these two scenarios:

- If your results demonstrate statistical significance, this means that there is a difference between the variations. In that case you may reject the null hypothesis. However, this could sometimes be a type 1 error.
- If your results don’t show statistical significance then the null hypothesis cannot be rejected. This could also sometimes be a type 2 error.

In the end, it’s important to strike a balance between making type 1 and type 2 errors. Many argue that making type I errors may be more damaging as it could lead to changes that will end up wasting resources, costing time and money while type 2 errors are more about ‘missed opportunities’ (though it could also have significant consequences).

The essential thing to remember is that A/B tests are based on statistical probabilities meaning that the results obtained are never 100% certain.

Nevertheless, these tests serve as a valuable tool to help marketers increase sales and conversion rate so even if your results may not be as certain as you’d like them to be, you can still increase the probability of the test result being true by avoiding the aforementioned errors.

To reduce probabilities for error, the key is to increase sample size and run the test for as long as possible to ensure the collection of as accurate as possible data and to increase the credibility of your test results.

Read more about A/B testing statistics in our A/B testing guide here.

## CRO glossary: type 1 error

What is a type 1 error?

Type 1 error is a term statisticians use to describe a false positive—a test result that incorrectly affirms a false statement about the nature of reality.

In __A/B testing__, type 1 errors occur when experimenters falsely conclude that any variation of an A/B or __multivariate test__ outperformed the other(s) due to something more than random chance. Type 1 errors can hurt conversions when companies make website changes based on incorrect information.

### Type 1 errors vs. type 2 errors

While a type 1 error implies a false positive—that one version outperforms another—a type 2 error implies a false negative. In other words, a type 2 error falsely concludes that there is no __statistically significant__ difference between conversion rates of different variations when there actually *is* a difference.

**Here’s what that looks like:**

### What causes type 1 errors?

Type 1 errors can result from two sources: random chance and improper research techniques.

**Random chance:** no random sample, whether it’s a pre-election poll or an A/B test, can ever perfectly represent the population it intends to describe. Since researchers sample a small portion of the total population, it’s possible that the results don’t accurately predict or represent reality—that the conclusions are the product of random chance.

__Statistical significance__ measures the odds that the results of an A/B test were produced by random chance. For example, let’s say you’ve run an A/B test that shows Version B outperforming Version A with a statistical significance of 95%. That means there’s a 5% chance these results were produced by random chance.You can raise your level of statistical significance by increasing the sample size, but this requires more traffic and therefore takes more time. In the end, you have to strike a balance between your desired level of accuracy and the resources you have available.

**Improper research techniques**: when running an A/B test, it’s important to gather enough data to reach your desired level of statistical significance. Sloppy researchers might start running a test and pull the plug when they feel there’s a ‘clear winner’—long before they’ve gathered enough data to reach their desired level of statistical significance. There’s really no excuse for a type 1 error like this.

## Why are type 1 errors important?

Type 1 errors can have a huge impact on conversions. For example, if you A/B test two page versions and incorrectly conclude that version B is the winner, you could see a massive drop in conversions when you take that change live for all your visitors to see. As mentioned above, this *could* be the result of poor experimentation techniques, but it might also be the result of random chance. Type 1 errors can (and do) result from flawless experimentation.

When you make a change to a webpage based on A/B testing, it’s important to understand that you may be working with incorrect conclusions produced by type 1 errors.

Understanding type 1 errors allows you to:

Choose the level of risk you’re willing to accept (e.g., increase your sample size to achieve a higher level of statistical significance)

Do proper experimentation to reduce your risk of human-caused type 1 errors

Recognize when a type 1 error may have caused a drop in conversions so you can fix the problem

It’s impossible to achieve 100% statistical significance (and it’s usually impractical to aim for 99% statistical significance, since it requires a disproportionately large sample size compared to 95%-97% statistical significance). The goal of CRO isn’t to get it right every time—it’s to make the right choices *most* of the time. And when you understand type 1 errors, you increase your odds of getting it right.

## How do you minimize type 1 errors?

The only way to minimize type 1 errors, assuming you’re A/B testing properly, is to raise your level of statistical significance. Of course, if you want a higher level of statistical significance, you’ll need a larger sample size.

It isn’t a challenge to study large sample sizes if you’ve got massive amounts of traffic, but if your website doesn’t generate that level of traffic, you’ll need to be more selective about what you decide to study—especially if you’re going for higher statistical significance.

Here’s how to narrow down the focus of your experiments.

### 6 ways to find the most important elements to test

In order to test what matters most, you need to determine what really matters to your target audience. Here are six ways to figure out what’s worth testing.

**Read reviews and speak with your Customer Support department**: figure out what people think of your brand and products. Talk to Sales, Customer Support, and Product Design to get a sense of what people really want from you and your products.

**Figure out why visitors leave without buying:** __traditional analytics__ tools (e.g., Google Analytics) can show where people leave the site. Combining this data with Hotjar’s __Conversion Funnels Tool__ will give you a strong sense of which pages are worth focusing on.

**Discover the page elements that people engage**: __heatmaps__ show where the majority of users click, scroll, and hover their mouse pointers (or tap their fingers on mobile devices and tablets). Heatmaps will help you find trends in how visitors interact with key pages on your website, which in turn will help you decide which elements to keep (since they work) and which ones are being ignored and need further examination.

**Gather feedback from customers**: on-page surveys, __polls__, and feedback widgets give your customers a way to quickly send feedback about their experience your way. This will alert you to issues you never knew existed and will help you prioritize what needs fixing for the experience to improve.

**Look at** ** session recordings**: see how individual (anonymized) users behave on your site. Notice where they struggle and how they go back and forth when they can’t find what they need.

**Pro tip**: pay particular attention to what they do just

*before*they leave your site.

**Explore usability testing**: can help you understand how people see and experience your website. Capture spoken feedback about issues they encounter, and discover what could improve their experience.

**Pro tip**: do you want to improve *everyone’s* experience? That may be tempting, but you’ll get a whole lot further by focusing on your ideal customers. To learn more about identifying your ideal customers, check out our blog post about __creating simple user personas__.

## Find the perfect elements to A/B test

Use Hotjar to pinpoint the right elements to test—those that matter most to your target market.