The shinobicharts Advanced Charting Kit is designed to supplement our native iOS charts performance with a broader feature set, extending the inherent functionality. In this series, we’ll examine the features on offer and the benefits they can bring to your apps.
The Advanced Charting Kit is included in the trial download and also available for download in the customer portal.
In part 1, sampling is the focus and how it can help with presenting large and often unwieldy sets of data. Hopefully, it will become apparent that despite shinobicharts being able to effortlessly cope with many hundreds of thousands of datapoints – less is often more and the aim is to display a useful representation to the user.
The data set
With the explosion in health gadgets, it would seem appropriate to take a look at some of the data in this area. Many of us now, knowingly or not, record endless amounts of data. We’ll see that this volume of data can often work against us, making it difficult for us to get the most value out of these new found metrics.
Our data comes from a recent run of mine and we’re going to start by focusing our analysis on the heart rate data available. You’ll see in the sample code that the data in is an xml activity file. There’s a very basic bit of parsing to grab the data we need, I’m not going to go over that here as it’s designed to get us going quickly.
If we run this tagged version of the app, we can see that the heart rate data is plotted over distance and the 1 second recording rate results in around 33,000 datapoints for the chart.
Presenting raw data
Below is a screenshot of the basic chart displaying all of the data points. Whilst technically a very accurate representation of the day, maybe not the most insightful chart to a user wanting to analyse their run.
There are far too many data points to clearly display on the screen and the naturally noisy aspect of the data means it’s hard to clearly see any trends.
The Advanced Charting Kit offers us a remedy to the overwhelming amount of data. Applying a sampling algorithm to the series will reduce the number of datapoints that are rendered from the original set. The type of algorithm used decides the strategy for which points to drop and which to keep.
Nth Point:A simple algorithm that keeps every nth point
Ramer-Douglas-Peucker (wikipedia): A more complex algorithm that creates an estimation of the data and then improves it where necessary.
Which sampling algorithm to use?
This isn’t easy to answer, and is certainly very dependent upon the data set you’re working with. For that reason, let’s do some experimentation and work through the decision process for this particular data set to provide an example of how this might be approached.
Below is our original data set with each of the algorithms applied. Luckily it’s very easy to configure a sampler on a series. The single line of code is shown below each image.
lineSeries.dataSampler = ACKNthPointSampler(nthPoint: 500)
lineSeries.dataSampler = ACKRamerDouglasPeuckerSampler(epsilon: 30)
With a bit of trial and error we can establish the above representations – there is much more specific information in the user guide for the Advanced Charting Kit. The RDP algorithm clearly struggles with the high frequency noise of the heartrate data. This kind of algorithm works very well on more predictable trends in the data rather than high frequency changes.
The nth-point algorithm clearly has an easier time of things, but can be prone to ignoring important features in the data due to its simplistic selection. To reassure ourselves, let’s think about the nature of our data. In general, heartrates move progressively rather than “jumping” from one extreme to another. The high frequency noise we see is limited in range. Therefore, it’s less likely that we’ll miss any key information by using the nth-point selection process. In this particular case, it’s the choice of algorithm.
As a user looking to extract value from the extensive data recorded in my run, I can now identify some important key artifacts for further analysis. Why couldn’t I keep my heart rate up midrun? Was I tired? Had I eaten enough? This level of analysis was our aim for this particular chart. It’s important to note that there are many potential uses for one data set and different processing may best suit those applications.
Sampling is a powerful and simple to implement tool (just one line of code!). It can bring insight rather than overwhelming amounts of data to our apps and ultimately empower users in their decision making process. As we can infer from the process of selecting an algorithm, we must take care to understand how and what we are sampling to make sure we don’t trample over significant artifacts in our data. We are removing data from the set and it’s important to have a clear idea of what you want to achieve with sampling, otherwise it’s easy to be tempted to include too much or too little information.
Hopefully, you can see how this might help improve the in-app experience of your users – here is a link to the sampled code. Next we’ll look at further identifying the trends in the data using Smoothing.