Econometrics_Project

advertisement
Brian Hamm
4/27/2012
Applied Econometrics
Statement of the Problem
It is very obvious that some movies do much better than others. The movie Harry Potter
and the Deathly Hallows Part 2 made $169 million during its opening. Many movies don’t make
more than $5 million. Movies are released year round and of different genres. My goal is to
create a model to more accurately predetermine how well a movie will do. It can help predict
and guide movie creators to producing specific types of movies and when they are produced
and released for maximum profits. There are movie genres that tend to make more money
during opening weekend sales than others. Movies that are released at certain times of the year
will make more than others.
Hypothesis: The genre of a movie, its Motion Picture Association of America Rating, and the
season during which it was released affect its opening weekend sales.
Literature Review
Much has been done to predict how successful movies will be. Gilad Mishne and Natalie
Glance tried to show how movie sales were reflected by blogger sentiment in their adequately
titled paper, “Predicting Movie Sales from Blogger Sentiment” (Mishne, and Glance). For each
movie they used, they collected all relevant weblog posts appearing in the “Blogpulse index”
(Mishne, and Glance). By comparing the number of positive, negative, or neutral feedbacks, the
total number of blog posts, and using the length of each post, or word count, they tried created a
model to predict sales (Mishne, and Glance).
Their results said that there is a good correlation between references to movies in
weblog posts and the movies’ financial success (Mishne, and Glance). They also found that the
number of positive references correlates better than the total number of blog posts before
releases to movies’ success (Mishne, and Glance). The correlation between blogger sentiment
and movie sales was not strong enough to create a predictive model for sales on blogger
sentiment alone, but it could be used in other models with more factors such as movie genre
and time of release (Mishne, and Glance).
For a more in depth model, I look at the paper created by Jeffrey S. Simonoff and Ilana
R. Sparrow titled “Predicting movie grosses: Winners and losers, blockbusters and
sleepers”(Simonoff, and Sparrow). For their model they took into consideration genre, MPAA
rating of the film (G, PG, PG-13, R, etc...), the origin country of the movie, “star power”,
production budget, sequel or not, if it was released on a holiday weekend, the gross revenues
for the film’s first weekend of release, critic rating, and if it was nominated for an Academy
Award (Simonoff, and Sparrow).
Many results were found. Action, Children’s, Horror, and Science Fiction films generally
had noticeably higher revenues (Simonoff, and Sparrow). Dramas and Comedies made less
(Simonoff, and Sparrow). As Motion Picture Association of America (MPAA) ratings became
more mature, less money was usually made (Simonoff, and Sparrow). Movies made in Englishspeaking countries make more money (Simonoff, and Sparrow). Movies with “star power” from
the best actors make more money (Simonoff, and Sparrow). Sequels generally perform better
than non-sequels. Holiday releases make more money (Simonoff, and Sparrow).
Their best prerelease prediction model included the variables genre, MPAA rating,
number of best actors, number of top dollar actors, and whether or not the movie was a summer
release (Simonoff, and Sparrow).
Jeffrey S. Simonoff and llana R. Sparrow concluded that the box office performance of
movies can be forecast with some accuracy with easily available information (Simonoff, and
Sparrow).
Formulation of a Model
With the help of Simonoff, Sparrow, Mishne, and Glance I created an original model
which includes all possible variables from my data.
Opening Sales = B0 + B1Horror + B2SciFi + B3Action - B4Drama - B5Comedy - B6Other - B7Fall + B8Christmas B9Winter - B10Spring + B11Summer + B12G + B13PG + B14PG13 - B15RatedR
I don’t expect to see huge differences in parameter estimates. Because I am only using
the top 50 movies, in retrospect, each movie did at least fairly well in opening sales. However,
since drama and comedies both make less money generally, they are negative. I have “other”
negative because it would include less popular genres. Fall, winter, and spring are all negative
because they have been stated as less popular times for the year. RatedR is negative based on
Simonoff and Sparrow’s findings. I have a separate Christmas variable because the holidays
and the summer are usually the biggest time for movies.
Data Sources and Descriptions
I will use the top 50 grossing movies of 2011 that were released in at least 1000 theaters. These
will be gathered from http://boxofficemojo.com/yearly/chart/?yr=2011
Variable
Definition
Source
Opening
Opening weekend earnings for the movie
Box Office Mojo
Theaters
Number of theaters the movie opened in
Box Office Mojo
Date
The date the movie was released in theaters.
Box Office Mojo
Action
Is the movie an action film? (0 or 1)
IMDB (Internet Movie
Database)
Drama
Is the movie a drama? (0 or 1)
IMDB (Internet Movie
Database)
Comedy
Is the movie a comedy? (0 or 1)
IMDB (Internet Movie
Database)
Horror
Is the movie a horror film? (0 or 1)
IMDB (Internet Movie
Database)
SciFi
Is the movie a science fiction film? (0 or 1)
IMDB (Internet Movie
Database)
Other
If the movie does not fall into a previously listed category, it will be
considered other.
IMDB (Internet Movie
Database)
Fall
Was the movie released in September, October, or November? (0 or
1)
Christmas
Was the movie released between December 1st and January 10th?
(0 or 1)
Winter
Was the movie released between January 10th and February 29th?
(0 or 1)
Spring
Was the movie released in March, April, or May? (0 or 1)
Summer
Was the movie released in June, July, or August? (0 or 1)
G
Did the movie have an MPAA Rating of G? (0 or 1)
Box Office Mojo
PG
Did the movie have an MPAA Rating of PG? (0 or 1)
Box Office Mojo
PG13
Did the movie have an MPAA Rating of PG13? (0 or 1)
Box Office Mojo
RatedR
Did the movie have an MPAA Rating of R? (0 or 1)
Box Office Mojo
My Model
From my data I created a model to use based on the number of entries and the power of
each parameter. The variables “Horror“ and “Other” were dropped because of extremely few
entries. Fall, Winter, and Spring didn’t seem to have much effect on the average movie. The
MPAA ratings of G, PG, and RatedR and the genres Action and Drama all had similarly little
effect. Movies containing these variables will be considered the base, and the opening sales for
each movie will be predicted by factoring the parameter estimates of Comedy, SciFi, Summer,
Christmas, PG13.
Opening = B0 - B1Comedy - B2SciFi + B3Summer - B4Christmas + B5PG13
Results
Dependent
Variable: Opening
Regressor
Parameter
Estimate
Standard Error Adjusted for
Heteroscedasticity
T-value Adjusted for
Heteroscedasticity
Intercept
41,134,549
6,700,308
6.14
Comedy
-15,495,359
7,976,007
-1.94
SciFi
-20,725,893
16,172,398
-1.28
Summer
10,901,576
10,709,168
1.02
Christmas
-16,168,864
7,354,390
-2.20
PG13
19,712,716
10,518,660
1.87
Summary Statistics
RMSE
29005440
R-Square
0.248
Adj R-Square
0.163
There were some surprising results. My model suggests that science fiction movies
actually decrease opening sales by an average of $15 million. While my results were not
statistically significant, it shows more likelihood that science fiction does in fact reduce sales.
Another surprising result was that movies released around Christmas time did much worse.
The model suggests that on average, movie sales were $16 million less for movies released
around Christmas. These results were also statistically significant at the 95% level.
Less surprisingly, movies released during the summer and/or had a PG13 rating tended
to do better in opening sales while movies that were of comedic genre did worse than many
other genres. However, comedy and PG13 results were only statistically significant at the 90%
level, and summer releases were not statistically significant at all.
Unfortunately, this model does not hold much predictive value by only accounting for
24.8% of variance.
Conclusion
While my intercept and the parameters, Comedy, Christmas, and PG13, were all
statistically significant at the 90% level, this model does not hold much predictive value with
such a low R-square value. My findings did have some discrepancies with the findings of
Simonoff and Sparrow. According to them and what I had previously believed, movies released
around Christmas time did noticeably better. However, my model suggest that they do
noticeably worse. Also, science fiction movies do worse in my model, but Simonoff and Sparrow
said otherwise. This may be explained by my relatively small dataset, or a change in movie
preference over a decade. This model is rather ineffective for predicting movie sales, but
something can be taken from the successfulness or lack thereof, of comedy films, movies rated
PG13, and movies released during the Christmas season.
Limitations
One of the most serious limitations was the size of my dataset. Many of the variables
were compiled by hand and with a larger time frame, a larger dataset could have been
gathered. Due to a lack of entries, some variables were excluded from this model. There could
be omitted variable bias from the excluded variables. The small sample most likely does not
accurately reflect the whole population. The 50 movies that were used do not come close to
encompassing all movies released during 2011 in at least 1000 theaters.
Not only is this an issue but there are many more factors that can contribute to predicting
opening sales of a movie. “Star power” suggested by Simonoff and Sparrow and blogger
sentiment suggested by Gilad and Glance or the general advertising and hype a movie receives
before opening are not covered in this analysis.
References
"2011 DOMESTIC GROSSES." Box Office Mojo. IMDb, 2012. Web. 26 Apr 2012.
http://boxofficemojo.com/yearly/chart/?yr=2011&p=.htm
The Internet Movie Database. 2012. Web. 26 Apr 2012. http://www.imdb.com/
Mishne, Gilad, and Natalie Glance. "Predicting Movie Sales from Blogger Sentiment." . N.p.,
2006. Web. 26 Apr 2012.
Simonoff, Jeffrey, and Ilana Sparrow. "Predicting movie grosses: Winners and losers,
blockbusters and sleepers." . N.p., 1999. Web. 26 Apr 2012.
Download