Deciding How to Represent Website Data – Part 2: Segmenting Aggregate Data

Published February 27, 2015
Web analytics guru Avinash Kaushik famously wrote that all data in aggregate is "crap". Time on site. All visits. Total revenue. None of it tells a good story. As Avinash says, the only way to get insight from data is to segment it into parts.
For example, it is mildly useful to know a website receives 15,000 Sessions (visits) per month. That aggregate metric gives you a rough idea of the scale, however, not much else. If you knew that over 50% of those Sessions came from Organic Search, then you’d start to get some insight into what makes the website tick. Clearly, content marketing and SEO are an important part of the story for such a website.
Breaking down Sessions by Acquisition Channel (e.g., Organic Search, Referral, Social, etc) is an example of how we can start segmenting aggregate data – Avinash’s recommended path to enlightenment.
But once you are enlightened, how do you communicate those insights to others? Using the right data visualization helps.
In Part 1 of our series on representing website data, we looked at different ways to visualize time-series web analytics data. In this, the second part of a three-part series, we look at aggregate data for fixed periods of time – how to segment it and how to best represent the segmented data to communicate your insights.

Blog Image Representing Data


Segmenting Data by Dimensions

Google Analytics reports are made up of dimensions and metrics. Dimensions are attributes of data measurement (or metrics) that describe characteristics of your users. Consider, for example, the metric Sessions (website visits). One dimension is the City that the Session comes from. When I’m sitting in New York and I visit your website, Google Analytics will assign the value “New York” to the City attribute of that Session.

We already mentioned another example above – the Acquisition Channel attribute is assigned a value that indicates the method by which the visitor arrived at your website. A dimension divides your data into mutually exclusive parts. That is, the attribute can only have one value. A Session cannot have a City attribute of both New York and San Francisco.

The fact that dimension values are mutually exclusive means we can use a pie chart to represent the segmentation of data by dimension. Because the parts don’t overlap – they always add up to 100%.

Despite the pie chart’s bad rap, there is one thing a pie chart can do better than any other chart. It can help your audience visualize the relationship between the parts and the whole.


Traffic by Channel Pie Chart


For example, the above pie chart helps us immediately see Organic Search contributes more than half of the Sessions to this site. We can also see no other Channel is nearly as important.

The knock against pie charts is that it is difficult to tell the relative size differences between the slices. For example, without the percentage labeling, would it be immediately obvious that Social is 25% larger than Paid Search; or almost double the size of Referral? Probably not.

However, if illustrating the relative sizes of the dimensions is your primary goal – rather than presenting the size relative to the whole – then a bar chart may tell the clearer story.


Traffic by Channel Bar Chart


Above we see the same data represented using a bar chart. Here, it is made clearer that Social is larger than Paid and almost double the size of Referral. You can also see – just as in the pie – that Organic Search is by far the largest. What you cannot see in this bar chart is that Organic Search is more than 50% of the total. To do that, you’d have to mentally rearrange and stack the smaller bars on top of each other to ”see” they don’t add up to the blue bar. That mental feat is well beyond me (and probably your intended audience), which is why I find the pie chart useful. It’s the only way to clearly visualize that Organic Search has more than 50% of the total.

Changing Chart Types in Megalytic

In case you hadn’t guessed, the examples charts shown here are produced from Google Analytics data by the Megalytic reporting tool. If you’d like to follow along with your own data, you can create a 14 day trial account (no credit card required).

Megalytic makes it easy to change between chart styles using the chart type selector in the widget editor.

Geographic Dimensions

Geographic Region is another frequently used dimension. Besides being helpful in showing where your website visitors are located, important marketing insights can be gleaned from geographic data. For example, if you notice your website receives a lot of traffic from Germany, you may decide to provide content written in German – particularly if you notice German traffic converts at a lower rate than traffic from English-speaking countries.

Google Analytics provides a variety of geographic dimensions, including Continent, Country, Region and City (among others). A unique property of geographic data is that it can be represented on a map.


Web Traffic by Country in Europe - Map


Here, we are using a map to illustrate the countries, and the size of the circle to indicate the relative amount of traffic coming from each location. Using this type of visualization, the reader can quickly see the United Kingdom is the largest source of traffic – significantly larger than all the other European countries. We use a legend on the right hand side to indicate the country names and the exact Session amounts. This is a useful way to clarify the data, as not everyone knows the location of every country. For example, without the legend, it might be difficult to tell there are more visitors from the Netherlands (801) than Belgium (368).

When to use Tables

Data visualization is great, but sometimes it just makes sense to use a table to show data – even geographic data. This is particularly true when you want the reader to process multiple metrics at the same time.

As mentioned above, geographic data can provide useful marketing insights relating to language. As we can see from the map visualization, Germany is the second largest source of Sessions. However, as can be seen in the table below, Germany has a conversion rate (2.37%) that is far below the site average (3.96%).

The hypothesis backed up by this data is that because this website is English-only, the engagement from non-English speaking countries will be low. To make this point, I want to show the reader several metrics across the top European countries: Sessions, Avg Session Duration, Completions and Conversion Rate.


Web Traffic by Country in Europe - Table


In this case, a table is a good choice because it is a straightforward way to show four metrics per Country – one in each column. When using a table to make a point like this, it’s a good idea to include a text description of the insight the data shows. In this case, we might write something like this to go with the table:

The data clearly show that this website could benefit from local language content. It seems that there is a particularly significant opportunity in Germany, where we received over 1,000 visits. As shown by the low Avg Session Duration (average length of visit) of 2:12, the engagement is much lower than in English-speaking United Kingdom. This lack of engagement translates into a lower Conversion Rate -- only 2.37% in Germany vs 5.53% in the United Kingdom and 5.04% in Ireland. In fact, conversion rates are also well above average in Belgium (5.98%) and Sweden (4.52%) where English is widely spoken.


When looking at non-time series data about your website, the key to insight is segmenting along dimensions. Depending on what point you want to communicate, the dimension values (e.g., Channels, Countries) can be represented as slices of a pie, bars in a chart, circles on a map, or rows in a table. Rather than getting caught up in dogma (e.g., “pie charts are bad”), let the insights you want to communicate guide your decisions about how to present the data.

Next in our data visualization series we’ll look at how to represent data that compares results across time periods.

Miss the first post in this series? Catch up!


When the client first came to you, you talked up the value of Google Analytics. You emphasized the importance of seeing where your traffic was coming from. You went on and on about how Google Analytics can show traffic sources to pinpoint whether people came from search, social media or a specific site referral, and how valuable this data was. You sold them on it, so much so that your client looked forward to receiving that first report, the magical day when they would finally understand where visitors were coming from.
But then the report came, and it looked like this:



It showed that 10% of your client’s traffic came from “(direct)/(none)”. What does this label mean? How do you explain Direct traffic to your client? Better yet, how do you explain “none”?
Let’s take a closer look at understanding Direct traffic in Google Analytics and how we can address it with clients.
Digital marketers spend a lot of time focused on PPC and SEO campaigns in order to drive desirable traffic to a website. The phrases we’re ranking for and bidding on get meticulous attention, so much so that we often forget about some of the other ways that visitors find us.

We put a tremendous amount of the effort we put into reviewing organic search data and PPC campaign performance in analytics. But how closely do we monitor referral reports?

If that’s not a channel you review regularly, you may be missing out on seeing traffic that is coming directly from links you’ve obtained around the web, local business listings, news mentions, and more. Many times, links are only considered as a means to an end, a metric that Google uses in determining how to rank sites in the SERPs (search engine results pages). But the fact is, many of a site’s links may be directly contributing to its traffic.

In this article, we’ll review how to look at referral reports in Google Analytics, and some of the many ways to use that data to better inform your web marketing decisions.


Remember how your mom told you not to stand too close to the television because it might hurt your eyes?

The same rules can apply to data. If you’re too close, you may miss the patterns and trends that are crucial to understanding your website’s performance. You can’t judge a site’s performance looking at data in the bubble of a single day, you must consider any day’s traffic compared to the days before and after.

Google Analytics makes it fairly easy to analyze trends over long periods of time. But it also allows you to stand right in front of that TV, to look at more granular levels of time, right down to the hour.
There’s a better way to get that close to the data, without burning your retinas. We’ll cover how to analyze traffic effectively in today’s post.