Lal Bahadur Shastri Institute of Management, Delhi LBSIM Working Paper Series LBSIM/WP/2020/06 Predicting Success of Bollywood Movies Shivani Bali August,2020 LBSIM Working Papers indicate research in progress by the author(s) and are brought out to elicit ideas, comments, insights and to encourage debate. The views expressed in LBSIM Working Papers are those of the author(s) and do not necessarily represent the views of the LBSIM nor its Board of Governors. WP/August2020/06 LBSIM Working Paper Research Cell Predicting Success of Bollywood Movies Shivani Bali Abstract India’s entertainment economy is growing rapidly and with half a billion people under the age of twenty-five, the Media and Entertainment (M&E) companies of the world are taking note of it. With favourable demographics and rise in disposable incomes the propensity to spend on leisure and entertainment is growing faster than the economy itself. Indian Hindi Movie industry popularly known as Bollywood has reached staggering proportions in terms of volume of business (220 billion), employment, movies produced (more than 100 in a year) and its reach (more than 100 countries worldwide). The Indian film industry is the most prolific in the world, with more than 1,000 films produced every year in more than 20 languages. Almost 90 percent of revenues are derived from local films leaving English and foreign films with a minority 10 percent market share. With 3.3 billion tickets sold annually, India also has the highest number of theater admissions. With so much at stake and highly uncertain nature of returns, it is of commercial interest to develop a model which can predict the success of a movie. Movies have been described as experience goods with very less shelf life therefore it is difficult to forecast the demand for a movie. There are number of parameters that may influence success of a movie like – time of its release, marketing, lead actors, director, producer, writer, music director – being some of the factors. The present study aims to develop a model based upon Multiple Linear Regression that may help in predicting the success of a movie in advance thereby reducing certain level of uncertainty. Keywords: Predictive modeling, regression, success Predicting Success of Bollywood Movies Introduction The Indian movie industry produces the maximum number of movies per year, higher than any other country’s movie industry. However, very few movies taste commercial success. Given the low success rate, models and mechanisms to predict reliably the ranking and / or box office collections of a movie can help de-risk the business significantly and increase average returns. Various stakeholders such as actors, financiers, directors etc. can use these predictors to make more informed decisions. The study is restricted to Bollywood movies as time and financial constraints were involved. Some of the questions that can be answered using prediction models are given below. 1. What are the major factors that determine the success or ranking of a Bollywood movie? 2. Is the Budget of the Indian movie a key determinant of rank or success? 3. Does social media ratings drive the success of movie? Further, a DVD rental agency or a distribution house could use these predictions to determine which titles to stock or promote respectively. Data from reliable sources from the Internet - Movie Database (IMDB), Bollywoodhumgama.com ratings, critic’s ratings, social media (twitter, face-book etc.) were taken and various data mining and prediction techniques like multi-linear regression, were used to devise a model that can predict an Bollywood movie’s key success factors. 1. Literature review In Hollywood researchers have conducted several studies on the box office or financial performance of movies by primarily adopting psychological or economic approaches (Litman & Ahn, 1998). The psychological approach focused on the individual moviegoer’s decision to choose a movie over other entertainment options and to select particular movies, whereas the economic approach was based on mainly the economic and industrial factors that are determined by the supply side (Litman & Ahn, 1998). Methodologically, psychological studies primarily used survey methods, whereas the economic approach used secondary data. In another study conducted by Chang and Ki (2005) considered the fact that little research had been done on inherent experience good property of movies as there is advantage in using experience good property in that it is closely related to a movie audience’s decision making process. Jack Valenti, president and CEO of the Motion Picture Association of America, once mentioned that “…No one can tell you how a movie is going to do in the marketplace …not until the film opens in darkened theatre and sparks fly up between the screen and the audience”(Valenti,1978). Movies then are something to be experienced and most trade journals and magazines of motion picture Industry concur with the experience that supports the claim of Valenti. The experience good property has two aspects. First, movies are an experience good because individuals choose and use movies solely for the experience and enjoyment (Hirschman & Holbrook,1982; Holbrook & Hirshman,1982). This means that in case of movie goers, consumption experience is an end in itself (Reddy et al, 1998). Also movies are an experience good because individuals do not know what the value of the movie will be to them until they experience it (Shapiro & Varian) Based on work by Litman (1983) and Sochay (1994), creative sphere, scheduling, release pattern and marketing effort were used as categories of independent variables. However an explicit criterion was not suggested for this categorization. Historically neither the creators nor distributors of “cultural products” have used analytics – data ,statistics, predictive modeling – to determine the likely success of their offerings. It’s easy to see why most people see the prediction of taste as an art (Thomas et al 2009) But the balance between art and science is shifting. Today companies have unprecedented access to data and sophisticated technology that allows even the best known experts to weigh factors and consider evidence that was unobtainable just a few years ago. The entertainment landscape as a result is looking for prominent features for the prediction of consumer taste. Gaikar et al. (2015) used twitter data to predict success of a movie. Bhave et al (2015) discussed various factors that can impact the success of a movie. We are living in the era of internet where we trust more on the sentiments of people given on social media. Mudra et al (2019) dis sentiment analysis on the tweets to predict the success of a movie. Since, not much work has been carried out in India in this area an attempt has been made to develop a model which can predict the success factors of a movie. We differentiate our study as no reported study to predict Bollywood movie success on the variables we have taken has been done, moreover the study is a longitudinal study based on two years namely 2014 and 2016. The research attempts to review and empirically test predictors that have been adopted by field practitioners for the purpose of predicting movie success in Bollywood 2. Research Methodology The objectives of the study were to answer the following research questions. 1. Determine the key variables that results in predicting the success of a Bollywood movie 2. Determining whether budget is an important factor for the success of a Bollywood movie. Secondary data for two years (2014 and 2016) were collected from reliable sources- IMDB rating, Bollywood Hungama ratings, Social media-(Twitter ratings) and Critics ratings. Major parameters that were taken in the study were: Dependent Variables 1. Total Revenue crore- This was the most frequently used variable in literature (eg., Litman & Ahn, 1998). It is the collection over the entire running of movie in theatres. 2. First week box office collection - While the first-week box office is considered to be highly correlated with the total domestic box office collection, In this study this variable is adopted to check whether some independent variables affect the two dependent variables to different degrees.(Thomas et al 2005) Independent Variables 1. Release date – Release date is considered important in India as lot of box office hits have come to movies which have released on holidays/occasions. In fact major production houses form a pact to release movies on the occasions of Diwali/Id/Christmas so that the release dates do not clash. So to see how this variable affects the total success of a movie it is taken in the study. The rationale is that a high-attendance-period release (e.g., Christmas/Eid/Diwali) attracts a bigger audience, which leads to higher box office performance. In reviewing the high-attendance periods, only a few periods, such as Summer, were empirically supported by previous literature (Litman, 1998; Sochay, 1991; Wyatt, 1991). 2. Budget of Movie in Crores- Production costs have been considered an important predictor because big budgets manifest as lavish sets, locales, costumes, special effects etc. Most previous studies (Basuroy et al,2003; Litman,1982; Litman & Ahn,1988; Wyatt,1991, Lee 2009) have also supported the importance of budget. The production budget data were brought in from the IMDb. Accurate information on production budgets is highly difficult to obtain because it is considered confidential. Therefore, some caution is required in using the production budget data because they are usually based on press releases by the studios or estimates by insiders in the industry. 3. Genre Based on previous research, the movies are categorized into seven genres: action/adventure, children/family, comedy, drama, horror, mystery/suspense, and sci-fi/fantasy. This listing is complete and mutually exclusive. To code genre, this study consulted IMDb and Bollywood Hungama which coded genre for all the sample movies. The comedy genre was significant in several studies and sci-fi fantasy and horror genre were supported in other literature, however no studies were unanimous. 4. Lead Actor / Actress- the effects of actors or star power have received considerable attention in literature (Basuroy, Chaterjee & Ravid,2003: De Silva 1998, Holbrook 1999, Prag and Casavant,1994, Wyatt, 1991) So here we have categorized actors in 4 categories depending on No. of movies the Actor has done -whether he is a a. Debut(<5 movies) b. Star(6 -50 movies) or c. Super Star ( more than 50 movies). d. Female Actor According to brand theories, the use of superstars is strongly recommended because the use of stars in movies is actually ingredient branding, which involves creating brand equity for materials, components and parts that are necessarily within other branded products (keller,1998; Levin et al 1997). Empirical studies of actor effect, however report conflicting results. Although some researchers (Litman & Kohl,1989; Sochay ,1994;Wallace et al 1993;Wyatt,1991) found significant contributions from star power, several other studies (De Silva,1998; Vany & Walls, (1999); Litman 1983; Litman & Ahn,1998: Prag and Casavant ,1994) did not find any significant effect. 5. No. of movies made by Director – just as super stars are ingredient branding according to brand theories so are directors. The director becomes a brand depending upon the total number of movies the director has directed during his or her career and higher the successes the bigger the brand. The studies in Hollywood however report that the effect of directors was not significant.(Litman,1982;Litman & Ahn ,1998;Litman and Kohl ,1989;Sochay,1994,Wyatt 1991) 6. Sequel–a sequel uses established brand name to introduce a new product and is conceptualized like a brand extension (Keller 1998).The use of established brand parent allows brand extensions to get customer attention and thus reduces marketing costs. In this study we have determined whether a movie is a sequel or not. 7. Production House – The production house is an important variable in Indian Film Industry and has been taken as one of the variable. Production houses like YashRaj films, Dharma productions, UTV motion pictures and Balaji films are considered some of the most successful production houses. These successful houses can also bear the cost of lavish sets and costumes, expensive digital manipulations as they have budgets for all activities pre-production and post production. 8. Critic’s ratings – Like the case of superstars and production houses, the effect of critics ratings has been widely tested by previous research (De Silva ,1998; Litman & Ahn, 1998, Sochay 1994, Prag & Casavant 1994). Ratings given by famous movie critics. critics assist individuals in making a movie choice, understanding the content of the movie, developing an initial opinion of the film, and communicating movie information to others(Austin 1993) 9. IMDB ratings – IMDB is world’s largest and famous movie database website with rating sclaes. (1 – 10 scale). 10. Bollywood Hungama Ratings – Ratings given on the Bollywood Hungama websites on the movies (1 – 10 scale). 11. Social Media - Average of Tweets for the movie on a scale of 1 - 10. The data for 140 movies on all the above mentioned factors were collected through the secondary reliable sources. 3. Result and Analysis The data collected was raw and a few values were missing which resulted in an incomplete dataset. It also had certain outliers as well. So to make data ready for analysis data cleaning and data transformation was performed. Dummy Variables were created for all the categorical variables present in the data set and are mentioned below: A. Release Date – Dummy Release date was formed where 1 meant movie was released on festival or holiday and 0 meant no occasion. B. No. of movies the Actor has done- So here 4 categories were made- Debut, Actor, Superstar and Female Actor. So since there were 4 categories – 3 dummy variables were made namely – Debut, Actor and Superstar. C. Genre – In Genre there were 7 categories, so 6 Dummy variables were made namely – DummyAction, DummyFamily, DummyComdey, DummyDrama, DummyHorror, and DummyMystery. D. Production House – Here this variable was divided into 5 categories where 4 were the famed production houses and the last category as others. So 4 Dummy variables were DummyYashraj, DummyDharma, DummyUtv and DummyBalaji. The sampled movies earned an average revenue of ₹ 63.53 crores. The mean of the first week box office collection was ₹ 28.92 crores, showing that the performance of the first week accounted for approximately 46% of the total box office revenue. The average budget of the sampled movies was ₹ 33.60 crores which accounted for approximately 53% of the average box office revenue. Action (n=36, 26%), family (n = 12, 9%), comedy (n = 30, 21%), drama (n = 45, 32%), horror (n = 8, 6%), mystery (n = 8, 6%), sci/fic (n = 1, 1%). Regarding the parameters of information source, the average critics’ rating was 4.70 (1 – 10 scale), the average IMDb rating was 5.76 (1 – 10 scale) and average Bollywood hungama rating was 7.02 (1 – 10 scale) quite higher than the IMDb rating. Since we are in the era of social media, therefore, tweets played a very critical role in predicting the success of the movie. The average tweet rating was 6.18 (1 – 10 scale). Regarding release time (n = 12, 9%) of the sampled movies were released during festival season or during holidays. Out of these 12 movies, 8 movies have earned the total revenue of more than 200 crores. The average values of all quantitative data are presented in Table 1. Table 1. Descriptive Analysis Variables Mean First Week Box-Office Collection 28.9179 Total Revenue (in Crores) 63.53 Budget (in Crores) 33.6 Actor No of Movies 35.97 Tweet ratings 6.0921 Director No. of Movies 6.18 Critics Rating 4.6939 IMDb Rating 5.7598 Bollywood Hungama Rating 7.02 Regression Analysis: The result of regression analysis with total box office collection and first week box office collection as dependent variable is shown in the Table 2. Both the models explained significantly high amount of proportions of variances with R 2 values of 0.703 and 0.673 respectively. Table 2 Summary of the outcome of the regression analysis Independent Variable First week Total Dummy Release date Drama Dharma Budget Bollywood Hungama Tweets 30.027** 78.955** 13.698* 2.446** 1.638** 4.258* 3.499** N R R2 Adjusted R2 140 0.838 0.703 0.69 140 0.821 0.673 0.661 13.481* 0.843** F Probability > F 78.689 53.593 0.000 0.000 *p<.05, **p<.01 The first model with total box office as the dependent variable shows that release date, drama, budget, Bollywood hungama rating and tweets were the significant predictors. Movie release date showed significant contribution to the success of total box office. Movies released during festivals and holidays showed success. One interesting finding is that the genre ‘drama’ was positively related to the total box office. The effect of other genres, however, was not found. Budget contributes significantly to the success of the movies. Budget includes expenses on advertising and promotion, hence, more the budget more would be the chances of success. Evaluation of Bollywood hungama and tweets were found to be significant. The study also highlighted that social media likings such as tweet ratings has emerged as an important factor in predicting the success of a movie. The second model predicting the first week box office collection found lesser number of significant variables such as release date, dharma production, budget and tweets. For total box office collection, evaluation of Bollywood hungama and tweets both were significant. But for first week box office collection only tweet ratings was found to be significant. This shows that the Bollywood hungama rating is a predictor rather than an influencer on the box office success of the movies. Similarly, dharma production is an influencer and drama is a predictor. For first week collection dharma production was significant irrespective of the genre of the movie but for total collection drama is significant irrespective of the production house. 4. Discussion Based on the result of the regression analysis we are in a position to answer the following research questions: RQ1: Determine the key variables that results in predicting the success of a Bollywood movie Two models were developed to measure the success of the movie in terms of: a) Total box-office collection. b) First week box-office collection. The variables considered significant in predicting the success are given below: a) Total box office collection (Release date, Drama, Budget, Bollywood Hungama, Tweets) b) First week box office collection (Release date, Dharma Productions, Budget, Tweets) RQ2: Determining whether budget is an important factor for the success of a movie. The regression analysis showed that budget is significant in predicting the box office collection both for first week and total. 5. Practical Implication This study carries implications for both movie makers and the audience. First the study revealed that the release date of a movie is highly significant in predicting the box office collection of both first week and the total. People are more willing to take out time to go to the theatres to watch movies during holidays or festival season. Secondly, an interesting finding of the study is that tweets play a significant role in predicting the box office success. The decision to go for watching a movie is directed by the twitter ratings. People have a belief that twitter ratings are more reliable source for knowing the performance of a movie than any other ratings’ source. The first week box office success is explained by the twitter ratings whereas the total box office success is explained by twitter as well as the evaluations done by Bollywood hungama ratings as well. As always budget has the positive impact on the success of a movie. For this study, budget includes both movie making expenses as well as promotion expenses. The study also revealed that amongst the various production houses, Dharma production has a positive impact over the audience. This is also one of the significant predictors of the first week box-office success; during the first week people are willing to watch a movie under the banner of Dharma production irrespective of the star cast or genre. Last but not the least amongst all the genres ‘Drama’ is the choice of the audience. It has positive impact on the total box office collection. 6. Conclusion & Future Scope This research study attempts to propose models for measuring the success of the Bollywood movies. Two models were proposed to predict the box office collection of first week and total box office collection. Regression analysis was performed on the data set. The study revealed that Release date, Dharma Productions, Budget, Tweets significantly impacted the first week box-office success. For total box office collection, the significant variables were Release date, Drama, Budget, Bollywood Hungama ratings and Tweet ratings. The key contribution of this research is to give direction to the film maker regarding the significant drivers of the success of the movie. Further in this study data for latest years can be added. Also, advance machine learning techniques can be applied to the data set for better predictability of the results. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Austin, B. A. (1983). A longitudinal test of the taste culture and elitist hypotheses. Journal of Popular Film and Television, 11, 157–167. Basuroy, S., Chatterjee, S., & Ravid, S. A. (2003). How critical are critical reviews? The box office effects of film critics, star power, and budget. Journal of Marketing, 67(4), 103–117. Bhave Anand, Kulkarni Himanshu ; Biramane Vinay ; Kosamkar Pranali (2015). Role of different factors in predicting movie success Pervasive Computing (ICPC), 2015 International Conference, DOI: 10.1109/PERVASIVE.2015.7087152, IEEE Chang.B.H and Ki E.J,(2005) Devising a Practical Model for predicting theatrical movie success focusing on the Experience Good Property, Journal of Media Economics, 18(4),247-269 De Silva, I. (1998). Consumer selection of motion pictures. In B. R. Litman (Ed.), The motion picture mega-industry (pp. 144–171). Needham Heights, MA: Allyn & Bacon. Gaikar D. Damodar and Marakarkandy Bijith, Dasgupta Chandan (2015), Using Twitter data to predict the performance of Bollywood movies, Industrial Management & Data Systems, Vol. 115 No. 9, 2015, pp. 1604-1621 Hirschman, E. C., & Holbrook,M. B. (1982). Hedonic consumption: Emerging concepts, methods and propositions. Journal of Marketing, 46, 92–101. Holbrook,M. B., & Hirschman, E. C. (1982). The experiential aspects of consumption: Consumer fantasies, feelings and fun. Journal of Consumer Research, 9, 132–140. http://www.hollywoodreporter.com/news/indian-entertainment-biz-revenues-reach-270337 accessed on 8th July 2016 at 4.53 p.m http://www.pwc.in/en_IN/in/assets/pdfs/PwC- Indian-Entertainment-and-Media-Outlook- 2009.pdf. Keller, K. L. (1998). Strategic brand management. Upper Saddle River, NJ: Prentice Hall. Lee K.J., W. Chang (2009). “Bayesian Belief Network for Box Office Performance: A Case Study of Korean Movies”. Expert Systems with Applications, 2009, vol. 36 (1), page 280-291 Levin, A. M., Levin, I. P., & Health, C. E. (1997). Movie stars and authors as brand names: Measuring brand equity in experiential products. Advances in Consumer Research, 24, 175–181. 14. Litman, B. R. (1982). Decision-making in the film industry: The industry of the TV market. Journal of Communication, 32, 33–52. 15. Litman, B. R. (1983). Predicting success of theatrical movies: An empirical study. Journal of Popular Culture, 16, 159–175. 16. Litman, B. R. (Ed.). (1998). Motion picture mega-industry. Needham Heights, MA: Allyn & Bacon. 17. Litman, B. R., & Kohl, L. S. (1989). Predicting financial success of motion pictures: The ’80s experience.Journal of Media Economics, 2, 35–50. 18. Litman, B. R.,& Ahn, H. (1998). Predicting financial success ofmotion pictures: The early ’90s experience.In B. R. Litman (Ed.), Motion picture mega-industry (pp. 172–197). Needham Heights, MA:Allyn & Bacon publishing incorporated 19. Mundra S., Dhingra A., Kapur A., Joshi D. (2019) Prediction of a Movie’s Success Using Data Mining Techniques. In: Satapathy S., Joshi A. (eds) Information and Communication Technology for Intelligent Systems. Smart Innovation, Systems and Technologies, vol 106. Springer, Singapore 20. Prag, J., & Casavant, J. (1994). An empirical study of the determinants of revenues and marketing expenditures in the motion picture industry. Journal of Cultural Economics, 18, 217–235. 21. Shapiro, C., & Varian, H. R. (1999). Information rules. Boston: Harvard Business School Press. 22. Sochay, S. (1994). Predicting performance of motion pictures. Journal of Media Economics, 7, 1–20. 23. Vany A. De. Walls, D. (1999). Uncertainty in the movies: Can star power reduce the terror of the box office? Journal of Cultural Economics, 23, 285–318. 24. Wyatt, J. (1991). High concept, product differentiation, and the contemporary U.S. film industry. Texas Film and Media Study. LAL BAHADUR SHASTRI INSTITUTE OF MANAGEMENT, DELHI PLOT NO. 11/7, SECTOR-11, DWARKA, NEW DELHI-110075 Ph.: 011-25307700, www.lbsim.ac.in