đź‘» GISigh: Brendan Hoover's Blog about data.
Welcome to my Blog
Here is my last prediction for the 2020 Presidential Election. I have Biden winning convincingly, but will note that Nevada, Arizona, Florida, Georgia, and North Carolina are all within polling error. Trump must win all those states to win the presidency (except Nevada). I don’t think it would be particularly odd for a Republican to win those states. If the networks can call Florida tonight for Biden, it’s a wrap. However, close calls in these three East-cost-toss-up states means the race is likely tight. On the flip side, North Carolina may also be the state that tips things in Biden’s favor, where he is polling above 3% in his favor. Since I’ve been using 538’s data, I also stole their color scheme, hehe.
There was some movement overnight. In the last several days high quality polls came in for Pennslyvania and moved the state into 'leans Biden' in my model. Those polls show Biden with a 6 - 7% edge, which is well outside normal polling error for a Trump victory in the state. Polls also moved in Biden's favor in Georgia and Florida. It's worth noting we are in herding season. Read about herding here.
The election polls remain pretty consistent. I've updated the legend and more clearly labeled the "toss up" states. Biden needs to find 17 electoral votes in the toss up states. He may be able to find all of that in Pennslyvania where Biden currently has a 5.3% polling lead.
Not much changed today, so let’s look at what happens if the polls are off by 5% in Trump’s favor, which is what happened in 2016. In this scenario, Biden still comes up ahead, but by a much closer margin. One can imagine a lot of court battles in this scenario. It’s worth noting that I forgot to include Washington, D.C. yesterday. So, Biden’s numbers are +3 there. I’ve added those here. I’ve not included any Bayes here and color the map by 20% intervals. .
Well, it doesn't look like much changed today, but here's a bit about how I made the above map.
First, I input the moduls I'll need. Then I pull the data from 538 and put it in a pandas dataframe. The nice thing about pulling the data from 538 is that I don't have to download it each day. The script takes care of that and it's easy peasy.
Some election polls are better than others based on various factors like sampling design. I don't have any idea which ones are better, so I relied on 538's rankings. These are conveniently already in the dataset. So, I created a weighting scheme based on the letter grade, which can be seen in the below code snippet. It's pretty straight forward. An A+ is weighted more heavily than a poll with a C rating. See the code snippet below #Rank Weighting. I also applied a weighted decay function of 30 days. This basically means that more recent days have the most impact, but as far back as 30 days still have some impact on weighting. This allows us to consider most recent polls, but not toss out the other polls for 30 days.
Much like the master bedroom in MTV’s Cribs, the ARIMA model is where the magic happens. I iterate through each state and apply the weighting from above. I separate the Biden and Trump from each other in each state, which I think they’d both appreciate, and do some smoothing using a 7 day moving average. I then run the ARIMA model using an Auto_Arima function, which makes life easy because I don’t have to make any choices. The function runs a number of scenarios and chooses the one with lowest AIC. Again, Auto_Arima is run on each state and each candidate independently. Interestingly, this is how 538 runs their model (state by state), the Economist uses National Polling. I’m a bit little skeptical of using National Polls, but they make a nice case for it here.
There are a number of other models I could have run, but ARIMA is a pretty solid first go for Time Series.
Now that I’ve run the ARIMA models and made some predictions on the percentage of votes for each state for Biden and Trump, I use those as my likelihoods in a simple Bayesian model. This is about as simple a Bayes model as there is. I use those state likelihoods and pulled in some more data from 538 to use as my priors. Specifically, I use Vote Share of Incumbent and Vote Share of Challenger as my priors for each state. Determining these values on my own would take a lot of heavy lifting, so I cheat or “leverage” (as the kids say) some of 538’s heavy lifting. With those data it’s easy to compute the probability of each candidate winning a state. You’ll notice that a state like Illinois is probably woefully underestimated. I chose the vote shares as a conservative estimate. Still, in the end, my outcome is very close to that of 538, which isn’t bad for a short exercise.
Everything up to this point was data preparation to input to the map. If you’ve read this far (gods bless you) you’ve probably noticed that I was using geopandas, which pretty much has the same functionality as a normal pandas dataframe, but with the added bonus of being geographically located. With a little wrangling and the Bokeh package it’s pretty easy to make a pretty decent map.
And since this is all automated, I can run it every day until the election and see if there have been any big changes.
I used an ARIMA model and some simple Bayesian Statistics to predict the November 3rd election. The model uses every poll available at 538 here.
Since I don’t know anything about polls and which ones are good, I used 538’s poll ratings to derive a weighted mean for each day. I also weighted the average using a 30-day decay, which smarter people than me seem to think is a good decay function for elections. For missing dates, I used a simple interpolation and computed a 7-day rolling mean. I fed this data into the ARIMA model. By using the auto_arima function available in the pmdarima python package, I was able to automate the modeling and can run it every day until the election.
To get Bayesian priors for each candidate winning the state, I again turned to 538 here.
I used the Vote share of the incumbent (Trump) and Vote share of the challenger (Biden) for each state as a Bayesian prior. I chose these as priors because they are conservative estimates, they probably underestimate the probability of Biden winning states like California and Illinois and Trump winning states like North Dakota and Montana. However, these priors fit well for the toss up states, so they were a good choice for a short modeling exercise. I used the final percentage from the ARIMA models as my likelihood for each candidate and calculated the Bayesian posteriors, which are displayed in the above map.
My model is slightly more bullish on Biden than the 538 model, but not by much.