BLOG

Understanding Google Analytics Data Sampling

Published March 18, 2015
Have you ever compared different reports from the same Google Analytics account and noticed the numbers don’t quite match up? Has it ever made you question your sanity or made you wonder if you’re working a little too hard?
We’ve all been there.
Thankfully, you’re (probably) not losing your mind; you’re seeing data sampling.
Data sampling is an analysis technique that uses a smaller subset of your data to identify larger patterns and trends. Google Analytics uses data sampling to speed up the performance of its queries and calculations when your website has a large volume of analytics data in storage. This most commonly affects sites that receive a high amount of traffic where Google needs to store all the “hits.”
So, if your site sees a high volume of sessions, then the variances you’re seeing are likely reflective of results being drawn from a smaller sample of your actual data. – data sampling.
Of course, conflicting numbers in reports can generate concern, especially when you want to ensure you’re always presenting the most accurate data available to your client or boss.
Let’s take a closer look at what data sampling is, how to identify when it has occurred, and how to address it.

Blog Image Google Analytics Data Sampling

 

How Does Google Analytics Sample Data?

Think about it – if your site receives thousands of Sessions a day, that’s a large volume of data for Google to process. To be able to efficiently serve marketers the reports they need and keep Analytics free, Google uses a random sample of your full data set to estimate the metrics for a high traffic site. This allows Google to quickly generate reports on the spot, including those requiring extra customization and processing power.

Note that this sampling kicks in when over 500,000 Sessions occurred in an Analytics property over the timeframe being viewed.

For example, let’s take a look at a Mobile Overview report for a site that received more than a million Sessions within the past two months. We see a basic breakdown of mobile, tablet and desktop Sessions, in a report showing unsampled data for all metrics.

 

Google Analytics Mobile Overview - Unsampled

 

Now, say we want to see where users from these devices came from. We’ll apply a secondary dimension to add Default Channel Grouping to the report. This will further break down the data to show which channels, such as Social, Organic Search or Paid Search, contributed to Sessions by device category. The screenshot below shows the same report with this dimension added.

 

Google Analytics Mobile Overview - Sampled

 

A yellow bar alerts us that this report is based on about 16% of total Sessions for this time period. This means that Google Analytics is calculating the metrics in this report from a randomly selected subset of the Sessions and using the results to estimate the values for all the Sessions.

Above the yellow bar, you’ll see a symbol that looks like a grid of boxes. Selecting this will allow you to adjust the accuracy of sampling. “Faster processing” means that Analytics will use fewer Sessions to calculate metrics, resulting in less accurate numbers. “Higher precision” will use more Sessions to calculate metrics, while possibly increasing the time necessary to create the report.

 

Google Analytics - Controlling Sampling

 

We’ll move the slider all the way to the right for the highest precision possible. Once we apply the change, we see the report data change once again.

 

Google Analytics - Sampling with High Precision

 

Now the yellow bar at the top tells us the report is based on close to 500,000 Sessions, or 29% of total Sessions. If we compare the first example (data sampled at a normal level) with the second (data sampled at the highest precision), we see several discrepancies in numbers. Every number in the first report differs to at least some extent from the second report. We can note a few specific differences:

  • Total Sessions increases by a single Session in the report with less precision.
  • The number of New Users is greater in the report with higher precision.
  • Mobile Sessions from Social are greater in the report with higher precision.
  • Desktop Sessions from Social are greater in the report with lower precision.
  • Overall Bounce Rate varies by a tenth of a percent.

What Does Analytics Data Sampling Mean to You?

We see that the numbers can vary when sampling kicks in, but what do these differences mean to you as an analytics professional? You should be aware of the potential for data sampling to impact your analytics reports in a number of ways.

Understand Numbers Won’t Always Match Up

First, simply be aware that when looking at large volumes of data, numbers may not match up 100% throughout the account. While analytics provide invaluable data about website performance, take into account the potential for variance. Numbers for the same metrics can vary depending on what reports you’re viewing, what segments or secondary dimensions you’ve applied, and what precision level of sampling you’re using.

For most purposes, the variations resulting from sampling are not material. They usually do not have any impact on the insights to be gained from the analytics. For example, if your reports are showing that conversion rates are lower for mobile users than desktop users – sampling variance may show that mobile is 25% less the desktop at one point, and 24.3% less than desktop at another. But, the basic result – that conversion rates are significantly lower on mobile – holds regardless of the sampling.

Less Concern for Low Traffic Sites

If your site doesn’t receive a high level of traffic (say, less than 1,000 Sessions per month), you likely won’t have to worry about the effects of data sampling, as a report needs to be analyzing at least 500,000 Sessions for sampling to kick in. However, keep in mind that even a site with what may not seem like a heavy volume of Sessions can still encounter sampling when filtering the date range to include long periods of time, such as several years of data.

Shorter Periods of Time, More Accurate Data

If a report indicates that data is being sampled, and you really want to avoid sampling, you can break that report into shorter periods of time that fall under the 500,000 Session threshold. For example, say that we want to view how many Sessions from mobile devices resulted in newsletter signups in the month of January. As shown earlier, we set up a report with a secondary dimension to view this data, only to receive another message that sampling is occurring.

 

Google Analytics - Mobile Traffic in January

 

In the screenshot above, we see a total of 801 newsletter signups coming from Mobile sessions that arrived via Social channels. However, we know from the sampling warning that this number may not be entirely accurate. However, we can split the time period up to look at the first half and the last half of the month separately. When viewed separately, these periods of time do not meet the 500,000 Sessions threshold.

First, January 1-15 shows 582 newsletter signups for this subset.

 

Google Analytics Mobile Traffic - First Half January

 

Next, January 16-31 shows 203 newsletter signups for this subset.

 

Google Analytics Mobile Traffic - Second Half January

 

Adding the number of signups from these date ranges together, 582 + 203 = 785. Our final number is lower than the initial estimate of 801 from the sampled data.

Consider Google Analytics Premium

If your data is frequently limited by sampling due to high traffic volumes, you can upgrade to Google Analytics Premium, which places a much higher threshold before data begins to be sampled (twenty five million as opposed to 500,000). However, the $150,000 annual cost makes this a viable option primarily for larger enterprises.

Conclusion

While data sampling falls under the more technical aspects of analytics, understanding it on a basic level will help guide your interpretation of Google Analytics data. When preparing reports for clients or your boss, you can keep in mind to note when data becomes sampled, realizing that numbers become estimates from a smaller subset of data, and compiling data from shorter date ranges when necessary to ensure more accurate data.

JOIN OUR MAILING LIST

ALSO IN THIS BLOG

When the client first came to you, you talked up the value of Google Analytics. You emphasized the importance of seeing where your traffic was coming from. You went on and on about how Google Analytics can show traffic sources to pinpoint whether people came from search, social media or a specific site referral, and how valuable this data was. You sold them on it, so much so that your client looked forward to receiving that first report, the magical day when they would finally understand where visitors were coming from.
But then the report came, and it looked like this:

 

 

It showed that 10% of your client’s traffic came from “(direct)/(none)”. What does this label mean? How do you explain Direct traffic to your client? Better yet, how do you explain “none”?
Let’s take a closer look at understanding Direct traffic in Google Analytics and how we can address it with clients.
Remember how your mom told you not to stand too close to the television because it might hurt your eyes?

The same rules can apply to data. If you’re too close, you may miss the patterns and trends that are crucial to understanding your website’s performance. You can’t judge a site’s performance looking at data in the bubble of a single day, you must consider any day’s traffic compared to the days before and after.

Google Analytics makes it fairly easy to analyze trends over long periods of time. But it also allows you to stand right in front of that TV, to look at more granular levels of time, right down to the hour.
There’s a better way to get that close to the data, without burning your retinas. We’ll cover how to analyze traffic effectively in today’s post.
Digital marketers spend a lot of time focused on PPC and SEO campaigns in order to drive desirable traffic to a website. The phrases we’re ranking for and bidding on get meticulous attention, so much so that we often forget about some of the other ways that visitors find us.

We put a tremendous amount of the effort we put into reviewing organic search data and PPC campaign performance in analytics. But how closely do we monitor referral reports?

If that’s not a channel you review regularly, you may be missing out on seeing traffic that is coming directly from links you’ve obtained around the web, local business listings, news mentions, and more. Many times, links are only considered as a means to an end, a metric that Google uses in determining how to rank sites in the SERPs (search engine results pages). But the fact is, many of a site’s links may be directly contributing to its traffic.

In this article, we’ll review how to look at referral reports in Google Analytics, and some of the many ways to use that data to better inform your web marketing decisions.