6 Common mistakes with data visualizationPosted on 19 Nov 2013 Written by Sam Davies
One of the core concepts I discussed in my talk at AppsWorld last week, was the importance of the visual appearance of a UI control, and how it is perceived by the human visual system of the users. I demonstrated how we can use the way the human visual system works to our advantage when designing a control. It is this aspect of the talk which people have the most questions about – principally because it’s a concept which is entirely new to developers.
Data visualization is a very simple concept – the display of some kind of raw numerical information (data) in a graphical format (visualization). The reason that this process is so important is because of the way the human visual system and brain work together. A list of numbers (maybe in a table) is difficult to understand – in order to comprehend numbers we have to invoke high-level brain functions and this makes pattern spotting incredibly difficult. Our brains are, however, really good at making sense of shapes – meaning we are able to spot patterns and relationships when data is displayed visually.
Having said this, it’s really important that the process of data visualization is carefully considered – it’s very easy to fool the same mechanisms which make the brain good at perceiving data visualization when it is done badly.
1. Pie charts with percentages that don’t add up
Pie charts are designed to show the proportional breakdown of some conceptual ‘whole’ into different categories. The categories should be exhaustive, and mutually exclusive of each other. Therefore the sum of the percentages each segment represents should always total 100%.
It’s quite common to get this wrong for various reasons, the simplest probably being a rounding error, or miscalculation, as demonstrated below:
This chart also suffers from the segment sizes not matching the specified values – probably just a production mistake.
The second reason that percentages don’t add up to 100% is when non-mutually exclusive categories are plotted on the same chart. This is demonstrated in the following chart:
The chart is measuring information about businesses and because of the way the categories are designed, the same business can be counted in all 3 of the categories (i.e. a business which survives 10 years must have already survived both 5 and 2 years). In this instance changing the categories to age bands would help out “10 years”. Unless this chart only contains failed companies it is questionable how useful it is, since a new company, which might go on to last for 100 years, will currently be in the “<2 years” category, which is probably misleading.
Given that it’s not possible plot non-mutually exclusive categories on the same pie chart, there are lots of different attempts to adapt a pie chart to allow it. This can sometimes work, but more often than not will not add anything to the raw data itself. For example, in the following chart there are 2 arcs which represent 40%, but one of them looks significantly larger than the other:
The purpose behind this data is to make people realize that there is a reasonable amount of unhappiness out in the general population – the relative values of the different categories isn’t really of much interest. In this instance the values themselves are more powerful than attempting to visualize them.
2. Pie charts representing negative values
This is very much a corollary of the last point about pie charts – they’re meant to represent the breakdown of a concept of ‘wholeness’, and therefore it’s not possible to represent negative values. A negative proportion doesn’t make any sense – they are strictly between 0 and 1.
The root cause of this (along with a whole host of data visualization problems) is Microsoft Excel, which will happily create a pie chart with negative values. It does this by looking solely at the magnitude of the values to be plotted – and ignoring the sign.
The following is a chart which demonstrates this problem quite well:
The data here would be much better suited for a categorical column chart, enabling relative comparison of the different categories – including the important information about whether it is negative or positive.
3. Pie charts with too much information
Pie charts are good at demonstrating 2 scenarios:
- One category is significantly larger than all the others.
- The distribution between categories is roughly equal.
It’s a common misconception that pie charts are suitable for displaying any categorical data. However, if there are lots of segments which have significant variance then you have a couple of problems:
- Pie charts are meaningless if the reader can’t determine which category each of the sections represents. If there are lots of segments then there is no room to label each segment, and therefore the reader is forced to rely on looking up what the color coding means in the key. There is a limit to the number of colors which are easily differentiable when they aren’t adjacent. In the below example, which sector represents ‘Russian’ and which ‘Korean’, or ‘Japanese’?
- It’s not possible to compare the relative sizes of the different sections. For example in the chart below, which of the 3 orange segments is the smallest? Our brains are very good at following lines – i.e. determining which of 2 columns is taller on a column chart, but really not very good at estimating area – i.e. choosing which of 2 pie chart segments is larger.
The data displayed in the above chart would have been much better suited to a column chart – this would allow easy labeling along the axis, and comparison between the different categories.
4. Incomprehensible units
Charts need to have associated units to enable a deeper understanding of the data they represent. Although without units patterns and comparative information can be inferred, it’s not possible to establish whether these are truly significant, or to determine the meaning behind the data.
The following chart has a sensible horizontal axis, and a labeled vertical axis, but it’s not clear what the units are. What does it mean to suggest that a value of 20 Obamas is equivalent to a value of 20 for money?
Description of what the axis units are and what they represent would be really helpful for this chart.
This chart also suffers from point 6 below, in that there is a lot of unnecessary art applied, and it is rendered with a pseudo-3D effect, which detracts from the chart and makes it more difficult to read.
5. Inconsistent scaling
One of the major reasons that data visualization is so powerful is the fact it leverages the inherent ability of the human brain to perform visual comparison very quickly. In order to do this it’s very important that the scaling of the axes is coherent, both with themselves and with associated charts. Discontinuities in axes can sometimes be necessary, but it’s important to remember that introducing a discontinuity means that power of comparison has been lost.
The following example has discontinuities in the vertical axes of both of the charts. This means that on the left chart we can compare the first three categories with each other, then the next 2 categories with each other, and the final category cannot be compared. The discontinuities aren’t just breaks in the axis, but actually they represent a rescaling of the axis from that point forward (e.g. the different between 22 and 116 looks larger than that between 550 and 2868):
Having 2 charts next to each other with ‘vs.’ in the middle implies that they should be compared. But this isn’t really possible, since both the scales and discontinuities are different. The implication is they share a similar shape, but with enough discontinuities and axis ‘rescalings’ it would be possible to get any monotonically increasing sequence of 6 numbers to match this shape.
6. Unnecessary embellishments and artistic flair
There is a tendency when creating charts and graphs to think that they look boring, and would benefit with some funky graphics. More often than not this detracts from the data, and will make it incredibly difficult to understand. Sometimes it can actually make the data appear incorrect.
The following example shows a simple column chart, where the columns are represented by scaled pictures of people instead of a standard rectangle:
The fact that images are used for the columns here leads the reader to believe that they mean something. This is not the case – it’s just a column chart. The first column is a different color to the rest of them, but again it’s not really clear what this represents. The height of the 2007-08 column is actually determined by the top of the head, rather than the paint brush.
Infographics can be particularly bad at attempting to visualize data in a way which looks exciting and interesting, but makes little sense. The following is some artwork which represents a comparison of different countries:
Point 5 looked at inconsistent scaling, and this graphic demonstrates that rather well – the shopping bags are not even remotely scaled to the numbers which they represent. Even if they were, it’s very difficult for us to estimate relative size of areas – never mind projected volumes. It’s also not clear whether the relative position of each of the bags is significant. A simple column chart would have made this data a lot more comprehensible.
With the onset of mobile devices, data is not only becoming a lot more accessible, but also far more fleeting – users might check their sales breakdown multiple times a day, but only very briefly. Therefore data visualization is becoming ever more important, and more demand is being placed on developers to produce apps within which data visualization plays a significant role. This is precisely why we made ShinobiControls – to help developers create stunning data visualizations. That doesn’t mean that it’s not possible to abuse them – it’s important to bear in mind the points above. You can download a trial of both our iOS and Android charting products – why not see what kinds of cool mobile visualizations you can come up with yourself?
Maybe you have some data visualization tips of your own – the list I’ve presented here is by no-means exhaustive. Let me know any of your pet hates – either in the comments below, or I’m @iwantmyrealname on twitter.
Examples of data visualization courtesy of wtfviz.net.Back to Blog