Representing brilliant and mysterious distributions
It is not easy to describe visually the results of Monte Carlo Simulations
I’ve long been a fan of Matt Levine’s newsletter on Bloomberg. Way back in early 2016, less than a year after I started reading it, he described what I thought could very well describe my business development strategy.
A good trick is, find an industry where the words “Monte Carlo model” make you sound brilliant and mysterious, then go to town.
I immediately wrote up a blogpost saying that he was describing my business idea. And wonder of wonders, he actually linked to my blog in the next day’s edition! I’ve saved a screenshot of that. FT Alphaville followed in linking to my blog. I don’t think my blog has seen so many hits on any day before or after this.
Anyways, today we will look at the representation of a Monte Carlo Simulation. One of the side effects of making you appear “brilliant and mysterious” is that they are not easy to represent. I remember doing Monte Carlo simulations for stock paths in a previous life, and the representation would be a page full of wiggly lines going upwards and to the right, without much information being conveyed.
I googled for “Monte Carlo simulation stock paths” and this is the first result that came up in my image search. Clearly, not much information in there.
In that sense, communicating Monte Carlo simulations is an art. And there is no “one size fits all” here. The way you represent the simulation is highly dependent on the simulation itself.
In a recent chart, The Economist does a good job of this. In case you forgot amid the covid-19 crisis and the Chinese intrusions into Ladakh and Black Lives Matter, the US is having a Presidential election later this year. Donald Trump is going to take on Joe Biden (and whoever else stands as an independent). And The Economist has decided to get into the business of election forecasting.
In collaboration with Andrew Gelman (read his blog. it is excellent) and Merlin Heidemanns of Columbia University, The Economist has built a model to forecast how this November’s presidential elections will go. It is clearly not a simple model:
Elastic-net regularisation is a method of reducing the complexity of a model. In general, equations that are simpler—or more “parsimonious”, in statisticians’ lingo—tend to do a better job of predicting unseen data than convoluted ones do. “Regularisation” makes models less complicated, either by shrinking the impact of the variables used as predictors, or by removing weak ones entirely.
Next, in order to determine how much of this “shrinkage” to use, we deploy “leave-one-out cross-validation”. This technique involves chopping up a dataset into lots of pieces, training models on some chunks, and testing their performance on others. In this case, each chunk is one election year.
In any case, The Economist, despite employing two American professors, seems to be falling into the trap of Europeans analysing elections - focussing more on the vote share rather than the “seat share”. In case you remember, in 2016, Hillary Clinton got more votes than Donald Trump, but Trump won because of America’s electoral college system (something like our parliamentary constituencies, but at a state level).
I’ve repeatedly written about how the biggest source of error in Indian election forecasts comes with converting vote shares to seats (read that piece to know what I thought were good visualisations, in 2014).
In any case, given that the model forecasts “popular votes” and not who wins the electoral college, the Economist and team are forced to do a simulation to convert “votes to seats” (that is my reading. Their document documenting the process is rather long and not so easy to read).
Finally, after this Hanuman’s tail of a preamble, we come to this week’s visualisation, sent in by regular reader Adi.
It’s a simple scatter plot, with the democrats’ share of the vote on the X axis and the number of electoral college votes they will receive (or “seats”) on the Y axis. There are several things to like about this graph.
It is clearly mentioned that each data point on this scatter plot represents one simulation. Then, the “midpoints that matter” (for majority on votes and “seats” (I’m Indian, so I’ll use “seats” though I’m describing America’s politics here) ) are clearly marked out. The scenarios in each quadrant of the 2x2 thus formed are clearly labelled.
Each data point is not too dark, so overlapping data points create a dark pattern indicating density. And only things that matter in the simulation (the “input” and the “output”, if you can call them that) have been plotted.
Insights are also very clear, though I might have also preferred the total share of each quadrant to have also been explicitly stated. For example, the top half of this graph represents the case of Democrats winning the election - and the dark blue points show how likely that will happen (rather likely it seems, though again I would have preferred explicit labelling).
The lower half of the graph, representing Republican victory, is also interesting, as it seems that there is a greater chance that Democrats will get more votes and Republicans win the Presidency than Republicans winning more than half the votes. What this also suggests is that models that try to forecast the election using Trump’s approval rating are likely to be flawed - the Republicans are likely to optimise for states that they can win.
Elsewhere
I realise that over the last 3-4 weeks I’ve been mostly describing “positive examples” in this newsletter. So here is a “negative example” (of how NOT to represent data) that at least two of my readers independently sent in. The main reason I decided to not make this the main feature of my newsletter is that it comes from Visual Capitalist, which is notorious for atrocious graphics (and which has been featured on this newsletter once).
When a visualisation is as bad as this, there is absolutely no fun writing about it.
And to end this edition on a humorous note, check out this pie chart by Times Now:
See the numbers and the sizes of the pies. Unmitigated. And that is not even accounting for the fact that they’ve used a 3-D pie chart.