Brian Hamm 4/27/2012 Applied Econometrics Statement of the Problem It is very obvious that some movies do much better than others. The movie Harry Potter and the Deathly Hallows Part 2 made $169 million during its opening. Many movies don’t make more than $5 million. Movies are released year round and of different genres. My goal is to create a model to more accurately predetermine how well a movie will do. It can help predict and guide movie creators to producing specific types of movies and when they are produced and released for maximum profits. There are movie genres that tend to make more money during opening weekend sales than others. Movies that are released at certain times of the year will make more than others. Hypothesis: The genre of a movie, its Motion Picture Association of America Rating, and the season during which it was released affect its opening weekend sales. Literature Review Much has been done to predict how successful movies will be. Gilad Mishne and Natalie Glance tried to show how movie sales were reflected by blogger sentiment in their adequately titled paper, “Predicting Movie Sales from Blogger Sentiment” (Mishne, and Glance). For each movie they used, they collected all relevant weblog posts appearing in the “Blogpulse index” (Mishne, and Glance). By comparing the number of positive, negative, or neutral feedbacks, the total number of blog posts, and using the length of each post, or word count, they tried created a model to predict sales (Mishne, and Glance). Their results said that there is a good correlation between references to movies in weblog posts and the movies’ financial success (Mishne, and Glance). They also found that the number of positive references correlates better than the total number of blog posts before releases to movies’ success (Mishne, and Glance). The correlation between blogger sentiment and movie sales was not strong enough to create a predictive model for sales on blogger sentiment alone, but it could be used in other models with more factors such as movie genre and time of release (Mishne, and Glance). For a more in depth model, I look at the paper created by Jeffrey S. Simonoff and Ilana R. Sparrow titled “Predicting movie grosses: Winners and losers, blockbusters and sleepers”(Simonoff, and Sparrow). For their model they took into consideration genre, MPAA rating of the film (G, PG, PG-13, R, etc...), the origin country of the movie, “star power”, production budget, sequel or not, if it was released on a holiday weekend, the gross revenues for the film’s first weekend of release, critic rating, and if it was nominated for an Academy Award (Simonoff, and Sparrow). Many results were found. Action, Children’s, Horror, and Science Fiction films generally had noticeably higher revenues (Simonoff, and Sparrow). Dramas and Comedies made less (Simonoff, and Sparrow). As Motion Picture Association of America (MPAA) ratings became more mature, less money was usually made (Simonoff, and Sparrow). Movies made in Englishspeaking countries make more money (Simonoff, and Sparrow). Movies with “star power” from the best actors make more money (Simonoff, and Sparrow). Sequels generally perform better than non-sequels. Holiday releases make more money (Simonoff, and Sparrow). Their best prerelease prediction model included the variables genre, MPAA rating, number of best actors, number of top dollar actors, and whether or not the movie was a summer release (Simonoff, and Sparrow). Jeffrey S. Simonoff and llana R. Sparrow concluded that the box office performance of movies can be forecast with some accuracy with easily available information (Simonoff, and Sparrow). Formulation of a Model With the help of Simonoff, Sparrow, Mishne, and Glance I created an original model which includes all possible variables from my data. Opening Sales = B0 + B1Horror + B2SciFi + B3Action - B4Drama - B5Comedy - B6Other - B7Fall + B8Christmas B9Winter - B10Spring + B11Summer + B12G + B13PG + B14PG13 - B15RatedR I don’t expect to see huge differences in parameter estimates. Because I am only using the top 50 movies, in retrospect, each movie did at least fairly well in opening sales. However, since drama and comedies both make less money generally, they are negative. I have “other” negative because it would include less popular genres. Fall, winter, and spring are all negative because they have been stated as less popular times for the year. RatedR is negative based on Simonoff and Sparrow’s findings. I have a separate Christmas variable because the holidays and the summer are usually the biggest time for movies. Data Sources and Descriptions I will use the top 50 grossing movies of 2011 that were released in at least 1000 theaters. These will be gathered from http://boxofficemojo.com/yearly/chart/?yr=2011 Variable Definition Source Opening Opening weekend earnings for the movie Box Office Mojo Theaters Number of theaters the movie opened in Box Office Mojo Date The date the movie was released in theaters. Box Office Mojo Action Is the movie an action film? (0 or 1) IMDB (Internet Movie Database) Drama Is the movie a drama? (0 or 1) IMDB (Internet Movie Database) Comedy Is the movie a comedy? (0 or 1) IMDB (Internet Movie Database) Horror Is the movie a horror film? (0 or 1) IMDB (Internet Movie Database) SciFi Is the movie a science fiction film? (0 or 1) IMDB (Internet Movie Database) Other If the movie does not fall into a previously listed category, it will be considered other. IMDB (Internet Movie Database) Fall Was the movie released in September, October, or November? (0 or 1) Christmas Was the movie released between December 1st and January 10th? (0 or 1) Winter Was the movie released between January 10th and February 29th? (0 or 1) Spring Was the movie released in March, April, or May? (0 or 1) Summer Was the movie released in June, July, or August? (0 or 1) G Did the movie have an MPAA Rating of G? (0 or 1) Box Office Mojo PG Did the movie have an MPAA Rating of PG? (0 or 1) Box Office Mojo PG13 Did the movie have an MPAA Rating of PG13? (0 or 1) Box Office Mojo RatedR Did the movie have an MPAA Rating of R? (0 or 1) Box Office Mojo My Model From my data I created a model to use based on the number of entries and the power of each parameter. The variables “Horror“ and “Other” were dropped because of extremely few entries. Fall, Winter, and Spring didn’t seem to have much effect on the average movie. The MPAA ratings of G, PG, and RatedR and the genres Action and Drama all had similarly little effect. Movies containing these variables will be considered the base, and the opening sales for each movie will be predicted by factoring the parameter estimates of Comedy, SciFi, Summer, Christmas, PG13. Opening = B0 - B1Comedy - B2SciFi + B3Summer - B4Christmas + B5PG13 Results Dependent Variable: Opening Regressor Parameter Estimate Standard Error Adjusted for Heteroscedasticity T-value Adjusted for Heteroscedasticity Intercept 41,134,549 6,700,308 6.14 Comedy -15,495,359 7,976,007 -1.94 SciFi -20,725,893 16,172,398 -1.28 Summer 10,901,576 10,709,168 1.02 Christmas -16,168,864 7,354,390 -2.20 PG13 19,712,716 10,518,660 1.87 Summary Statistics RMSE 29005440 R-Square 0.248 Adj R-Square 0.163 There were some surprising results. My model suggests that science fiction movies actually decrease opening sales by an average of $15 million. While my results were not statistically significant, it shows more likelihood that science fiction does in fact reduce sales. Another surprising result was that movies released around Christmas time did much worse. The model suggests that on average, movie sales were $16 million less for movies released around Christmas. These results were also statistically significant at the 95% level. Less surprisingly, movies released during the summer and/or had a PG13 rating tended to do better in opening sales while movies that were of comedic genre did worse than many other genres. However, comedy and PG13 results were only statistically significant at the 90% level, and summer releases were not statistically significant at all. Unfortunately, this model does not hold much predictive value by only accounting for 24.8% of variance. Conclusion While my intercept and the parameters, Comedy, Christmas, and PG13, were all statistically significant at the 90% level, this model does not hold much predictive value with such a low R-square value. My findings did have some discrepancies with the findings of Simonoff and Sparrow. According to them and what I had previously believed, movies released around Christmas time did noticeably better. However, my model suggest that they do noticeably worse. Also, science fiction movies do worse in my model, but Simonoff and Sparrow said otherwise. This may be explained by my relatively small dataset, or a change in movie preference over a decade. This model is rather ineffective for predicting movie sales, but something can be taken from the successfulness or lack thereof, of comedy films, movies rated PG13, and movies released during the Christmas season. Limitations One of the most serious limitations was the size of my dataset. Many of the variables were compiled by hand and with a larger time frame, a larger dataset could have been gathered. Due to a lack of entries, some variables were excluded from this model. There could be omitted variable bias from the excluded variables. The small sample most likely does not accurately reflect the whole population. The 50 movies that were used do not come close to encompassing all movies released during 2011 in at least 1000 theaters. Not only is this an issue but there are many more factors that can contribute to predicting opening sales of a movie. “Star power” suggested by Simonoff and Sparrow and blogger sentiment suggested by Gilad and Glance or the general advertising and hype a movie receives before opening are not covered in this analysis. References "2011 DOMESTIC GROSSES." Box Office Mojo. IMDb, 2012. Web. 26 Apr 2012. http://boxofficemojo.com/yearly/chart/?yr=2011&p=.htm The Internet Movie Database. 2012. Web. 26 Apr 2012. http://www.imdb.com/ Mishne, Gilad, and Natalie Glance. "Predicting Movie Sales from Blogger Sentiment." . N.p., 2006. Web. 26 Apr 2012. Simonoff, Jeffrey, and Ilana Sparrow. "Predicting movie grosses: Winners and losers, blockbusters and sleepers." . N.p., 1999. Web. 26 Apr 2012.