ITP449 — Analysis of S&P 500

Andrew Seltzer
7 min readMar 5, 2021

This project seeks to analyze the S&P500 ($SPY) in several different mannerisms. In regards to the stock market, we tend to see Q4 earnings be the highest of all quarters for a considerable amount of companies due to excessive spending during the holiday season. Considering the S&P 500 is composed of 500 of some of the largest publicly traded US companies, if one were to purchase the S&P500 every year before the holiday season on November 1st, and proceeded to sell during Q4 earnings reports in mid-February, could one expect to consistently generate a profit? How much profit/loss could one expect on average to generate through executing this strategy? As well, how could one expect to compare that strategy as opposed to just buying and holding the S&P 500?

This project will also use several analytical and forecasting tools to analyze the correlation between market volatility and volume, as well as to see if any commonly used forecasting models could accurately predict the price of the S&P500, and of those forecasts, which one is the most accurate.

Here is a visualization of the price comparisons of the pre-Holiday prices vs the post-Holiday prices for each year.

After running some mathematical analysis, it was discovered that there were 5 of the total 28 instances in which the market had been lower in February (post-Holidays) than in November (pre-Holidays). As shown in the output, had you purchased a share of the S&P 500 during the Pre-Holiday season and sold it in the Post-Holiday season every year, you would have made almost $208, compared to $345 had you just bought and held. This in itself displays that although you still would have generated a profit by buying pre-Holiday and selling post-Holiday each year, it would have been more beneficial to have just bought and held the S&P 500.

Here is a grouped bar chart displaying the price comparisons for each year as well, but this one allows one to spot a particular trend. This trend is that we tend to see pre-holiday prices be higher at the peak of the market prior to a crash (as one would expect), as well as at the lowest points of each particular crash. These peaks being 2000, 2008, and 2016. The low points being 2003 and 2009. This indicates that in times of market stability, one could expect to generate a profit by buying $SPY before the holidays and selling it during Q4 earnings, however during times of a market crash or during times of extreme market volatility, this method cannot be applied properly.

Here is a graph displaying the point gain or loss in the S&P 500 from each pre-holiday and post-holiday period. We see the mean difference is +7.42. The average of just the points below zero in which the market went down is -13.632.

Here is a graph displaying the percent increase or decrease in the S&P500 from each pre-holiday and post-holiday period. We see the mean percent change was +5.09%. For the points below zero, the average was -10.058%.

Now that we’ve looked at price comparisons during the pre-holiday and post-holiday timestamps, we will now go into some other analysis through using different analytical tools and predictive models.

Here we see the autocorrelation of the S&P 500 closing price and volume. In the stock market, lower volume indicates the stock is more illiquid, which results in greater volatility. This is represented in these two graphs above. Autocorrelation hovers just slightly below 0 from lag=2000 to lag=4000 in the closing price graph, indicating that there was high volatility. In that same interval, we see a negative autocorrelation of volume. The lack of correlation in the first graph indicates that there was high volatility in the market, which matches the lower volume being traded at that same interval.

Forecasted v Predicted Closing Price of S&P 500 Overtime

Here we applied the Facebook Prophet model onto the closing prices of the S&P 500 from November 1st, 1993 until late February 2021. We see the prophet model does a fairly good job of making in-sample predictions during times of relative market stability and lack of volatility. However, once strong volatility does hit, it is very difficult for the model to forecast these changes. We see this precisely towards more recent years beginning in 2019, when large instances of volatility cause large residuals to occur in regards to our prophet model. We see the mean difference between the real $SPY price was -.000102, with the standard deviation being 8.97. We will use these numbers to see how this Prophet model compares to the ARIMA model later on in this project.

Here we use the predicted values (yhat) from the Prophet model for Pre-Holiday and post-Holiday S&P 500 prices and compare it to the actual (y) values. We see that the in-sample predictions for just the pre-holiday prices were fairly accurate, with an MAE of 3.724% for both pre and post-holiday prices. However, in the past several years, with greater market volatility, we once again see the prophet model struggle to accurately predict the later years of the dataset.

Here we do an out-sample prediction with the prophet model. We once again see how the prophet model seeks the trend, avoiding volatility. This is evident by the drop-off from when the in-sample data ends and the out-sample prediction begins. The model is analyzing the trend, rather than volatility, which could be beneficial for long-term traders and less valuable for short-term traders.

Here we ran the ARIMA model with a (5,1,0) model for all the closing prices from the beginning of November in 1993 until late February 2021. As we’d expect, we see the residuals being larger in times of market volatility. However, our lag choice of 5 actually helps our model adjust to this volatility slightly, as opposed to simply following the overall trend. We see the standard deviation is 1.868, meaning that 99.7% of the real closing values fall within +/- 5.604 of our ARIMA model’s prediction, meaning this ARIMA model fits the overall dataset relatively well. We see that times of volatility have the largest residuals, as one would expect. In comparison to the Prophet model, we see the ARIMA model did a much better job at making accurate predictions as the standard deviation of the Prophet model came in at 8.97, meaning that 99.7% of the real closing values fell within 26.91, about 5 times greater than the ARIMA model.

Upon all our analysis and calculations, it is fair to conclude that in times of a bull market, even a very slight bull, one could expect post-holiday S&P 500 prices to be higher than pre-holiday prices. However, during times of a market peak or market low, one could expect the opposite to occur, where the pre-holiday price would be greater than the post-holiday price, based on the data. Overall, it would still be more economical for one to purchase a share of SPY and just hold it rather than continuously buying and selling as shown earlier in the report.

In terms of best-fitting models, it is clear that Facebook’s Prophet model can predict overall trends of the closing price, however it struggles to deal with volatility considerably. The ARIMA model also struggles a bit with volatility, however the much lower standard deviation of residuals indicates that it also does an overall accurate job of predicting the S&P 500 prices, and handles volatility much better than the Prophet model. All in all, the market volatility from the past 2.5 years alters these models’ accuracy quite substantially, however if the market ever returns to relatively stable conditions, we could expect the ARIMA model to be able to give a more accurate forecast of the S&P 500 price.

--

--