V for Victory, or Delta Force

When the data breaks convention, the way you communicate it also needs to break convention

Jun 11, 2020

The non-farm payroll numbers in the US came out last Friday, as they do on the first Friday of every month. The numbers were a positive surprise - much against expectations, the number of people employed in non-farm jobs in the US actually went up in May. This followed a steep decline in jobs in April.

CNBC used this chart to describe the non-farm payroll data.

And in their Monday morning briefing, Bloomberg used this chart.

These graphs (which are rather similar) seem to suggest that all the job losses of April were recovered in May.

However, that is not what happened. While there was indeed a recovery in May, it paled compared to the downturn of April. Why, then, do these graphs mislead? And why would not one but two highly reputed business publications make essentially the same error?

Much like GDP growth or inflation, the non-farm payroll data is normally plotted in terms of differences. Rather than charting the total number of jobs, it has become a convention among macroeconomic and financial analysts to chart the difference in data from the previous month (the reason for this will be apparent soon).

In normal times, this is perfectly fine, since it allows analysts to study the acceleration or deceleration in job creation in the US. And it is likely that both CNBC and Bloomberg have been set up to automatically chart the difference in non-farm payrolls in the form of a line graph.

However, we are not in normal times, and thus the use of a line graph to chart differences has created a massively misleading graph. Strictly speaking, since this is a time series, a line graph is a “natural” method of visualising the data. The extraordinary data points, however, make the use of a line graph highly unsuited to showing this data.

The problem is visual - the use of the line here shows that the quantity is going down steeply, and then again going up equally steeply. In terms of actual data, however, the “going down steeply” is far more steeper than the “going up steeply”, and this point is missed out in the line graph. If only the last two points in this graph were not connected, the information conveyed would be much more honest.

This leads us to two ways in which the graph could be improved, both of which involve breaking away from convention.

The first method involves disregarding the rules of data visualisation, and using a bar graph. Bar graphs are not recommended to show time series, and are not recommended when the dataset contains both positive and negative data points (the information in a bar graph is in the lengths of the bars, which is why as a rule, they need to start from zero).

However, if we have to show the last two points, not connect them and still draw attention to them, a bar graph makes eminent sense (scatter plots don’t do a good job of drawing attention to individual data points, unless you use labelling).

For example, tradingeconomics.com uses a bar graph to show the non-farm payroll changes. The labelling of individual points here helps convey the message rather clearly - the drop in April was steep, and while the rise in May might have been good compared to “normal times”, it does nothing to make up for April’s losses.

The other alternative is to break convention in terms of how this data is normally shown - rather than showing the non-farm payrolls in terms of month-on-month differences, show the data in the form of total jobs. This is what the Federal Reserve Bank of St Louis has done through its excellent FRED platform (FRED is an excellent source of economic data, by the way)

Notice again that this cumulative data works only because we are in extraordinary circumstances. If you look at the portion of the graph until February this year, you see a mostly straight upward sloping line, and there is not much information in there (which is why this data is normally shown in terms of differences). With the steep drop in April and the recovery in May, however, this cumulative graph does an excellent job of conveying the information.

Of late I’ve been doing a lot of work on automated visualisations.

For example, I’ve been producing a daily update on the pandemic statistics for India since mid-April. This is an automated script that sends out a bunch of tweets every morning.

Karthik @karthiks

Time for my daily coronavirus India statistics update. All data from @covid19indiaorg. Overall, cases in India are now doubling every 19 days

While the information continues to be useful, and the automation is useful (to me), the automation means that the graphs each day aren’t customised to the information they’re showing. I’ve been doing some limited rule-based changes to the headings, but when the nature of data can change, you need far greater intelligence than that.

On another note, I’m in the very early stages of developing a “product” (quotes since I’m not yet sure if it will be a product or service) to make intelligent dashboards.

The idea there is that dashboards need to reflect the underlying data, and that includes the dashboard format, titles, emphasis, commentary, etc. Dashboards also need to evolve based on how they’re being used.

What the graphs I’ve described in this edition tell me is that the level of intelligence required in intelligent dashboards is far greater than what I’d started off with.

In any case, if you or your company is interested in being guinea pigs for this intelligent dashboarding “product”, do drop in a line and we can chat.

Visualisations

Discussion about this post