Prediction of Sporting Events through Social Media across the Multiple Cultures of Australia --- blinded --- Abstract— Social media offerings such as Twitter provide a near real-time forum for expression of personal information through Tweets. Often these Tweets can capture the emotion of the Tweeter at that point and place in time either on a personal level, or with regards to some event, organisation or other individual. In this paper we show how such sentiment can be used to identify and ultimately predict events. Specifically we focus on prediction of events that take place in sports – of specific interest and relevance to Australians and Melbourne in particular as a passionate sports city. What is novel about this work is that it addresses the multicultural factors in use of Twitter for prediction of sporting events, and uses this as the basis for better understanding of the cities and the communities of Australia. To explore this, we focus on event prediction in the FIFA World Cup that took place in 2014 and the Cricket World Cup that took place in 2015. We show the way in which events are detected and importantly the cultural diversity for event detection. We describe the Cloud-based architecture for collecting and analysing such large, diverse data sets and the algorithms that are used to identify changes in sentiment and their accuracy. We illustrate how actual events can indeed be predicted from social media and the cultural differences in sentiment expression through social media. Keywords: Twitter, sentiment analysis, sports prediction. I. INTRODUCTION The II. RELATED WORK A III. CASE STUDY AND ARCHITECTURAL IMPACT IV. CONCLUSIONS The . ACKNOWLEDGMENTS The authors would like to thank the National eResearch Collaboration Tools and Resources (NeCTAR) project for the infrastructure underlying this work. REFERENCES [1] About.twitter.com. Company | about, 2015. URL https://about.twitter.com/company. [2] A. Schulz, A. Hadjakos, H. Paulheim, J. Nachtwey, and M. Muhlhauser. A multi-indicator approach for geolocalization of tweets. In Proceedings of the Eight International Conference on Weblogs and Social Media (ICWSM), pages 573{582, Menlo Park, California, USA, 2013. AAAI Press. ISBN 978-157735-610-3. [3] S. Rosenthal, A. Ritter, P. Nakov, and V. Stoyanov. Sentiment analysis in Twitter. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 73{80, Dublin, Ireland, August 2014. Association for Computational Linguistics and Dublin City University. URL http://www.aclweb.org/anthology/S14-2009. [4] Support.rc.nectar.org.au. Nectar support, 2015. URL https://support.rc.nectar.org.au/. [5] Couchdb.apache.org. Apache couchdb, 2015. URL http://couchdb.apache.org/. [6] Couchbase.com. Couchbase and apache couchdb | couchbase, 2015. URL http://www.couchbase.com/couchbase-vs-couchdb. [7] Wiki.apache.org. Technical overview - couchdb wiki, 2015. URL https://wiki.apache.org/couchdb/Technical%20Overview. [8] Dev.twitter.com. Oauth | twitter developers, 2015. URL https://dev.twitter.com/oauth. [9] Dev.twitter.com. Rest apis | twitter developers, 2015. URL https://dev.twitter.com/rest/public. [10] Dev.twitter.com. The streaming apis | twitter developers, 2015. URL https://dev.twitter.com/streaming/overview. [11] Tweepy.readthedocs.org. Tweepy documentation tweepy 3.2.0 documentation, 2015. URL http://tweepy.readthedocs.org/en/v3.2.0/. [12] Code.google.com. Resources - otterapi - topsy's otter api google project hosting, 2009. URL https://code.google.com/p/otterapi/wiki/Resources. [13] Lightcouch.org. Couchdb java api - lightcouch documentation, 2015. URL http: //www.lightcouch.org/docs.html. [14] Torstein H_nsi. Highcharts documentation, 2015. URL http://www.highcharts.com/docs. [15] Google Developers. Heatmaps, 2015. URL https://developers.google.com/maps/documentation/javascript/examp les/layer-heatmap. [16] http://www.lct master.org/. Introduction to sentiment analysis. 2015. URL http: //www.lctmaster.org/files/MullenSentimentCourseSlides.pdf. [17] P. Lee, L Lee. Opinion mining and sentiment analysis. foundations and trends in information retrieval. Volume 2 Issue 12:1{135, January 2008. [18] R. Feldman. Techniques and applications for sentiment analysis. Communications of the ACM, Vol 56 (4):82{89, April 2013. [19] Z. Zabokrtsky. Feature engineering in machine learning. http://ufal. mff.cuni.cz/~zabokrtsky/courses/npfl104/html/feature_engineering.p df. [20] A. Agarwal, B. Xie, I. Vovsha, O. Rambow, and R. Passonneau. Sentiment analysis of Twitter data. In Proceedings of the Workshop on Languages in Social Media, LSM '11, pages 30{38, Stroudsburg, PA, USA, 2011. Association for Computational Linguistics. ISBN 978-1-932432-96-1. URL http://dl.acm.org/ citation.cfm?id=2021109.2021114. [21] S. Asur and B. A. Huberman. Predicting the future with social media. In Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Volume 01, WI-IAT '10, pages 492-499, Washington, DC, USA, 2010. IEEE Computer Society. ISBN 978-0-7695-4191- 4. doi: 10.1109/WI-IAT.2010.63. URL http://dx.doi.org/10.1109/WIIAT.2010.63. [22] A.G. Jivani. A comparative study of stemming algorithms. Int. J. Comp. Tech. Appl, Vol 2 (6):1930{1938. [23] P. Paroubek A. Pak. Twitter as a corpus for sentiment analysis and opinion mining. Proceedings of the 5th International Workshop on Semantic Evaluation, ACL 2010:436{439, July 2010. [24] G.D. Lavra_c Nada Furnkranz, Johannes. Foundations of rule learning. Springer, XVIII(ISBN { 978-3-540-75196-0):334 p., 2012. [25] A. Hertzmann. Machine learning and data mining lecture notes. February 2012. URL http://www.dgp.toronto.edu/~hertzman/411notes.pdf. [26] L. A. Smith, T.J. Monk, R.S. Mitchell and G. Holme. Geometric comparison of classiffcations and rule sets. Workshop on Knowledge Discovery in Databases, pages 395{406, 1994. [27] P. Raghavan C.D. Manning and H. Schutze. Introduction to information retrieval. Cambridge University Press, ISBN 0521865719, 2008. [28] I. Rish. An empirical study of the naive bayes classiffier. RC 22230 (W0111-014), November 2001. [29] T. M. Mitchell. lecture slides for textbook machine learning. McGraw Hill, 1997. URL http://www.cs.cmu.edu/afs/cs/project/theo20/www/mlbook/ch3.pdf. [30] R. Hwa, B. Maeireizo, D. Litman. Co-training for predicting emotions with spoken dialogue data. Association for Computational Linguistics Stroudsburg, Article No. 28, 2004. [31] J. Weston. Support vector machine and statistical learning theory. NEC Labs. [32] D. Koller, S. Tong. Support vector machine active learning with applications to text classi_cation. The Journal of Machine Learning Research archive, 2: 45{66, January 2002. Bibliography 45 [33] A. Rajaraman; J.D Ullman. Introduction to sentiment analysis. Data Mining: Mining of Massive Datasets, ISBN 9781139058452:1{17, 2011. [34] Official Cricket World Cup Website. History. 2015. URL http://www. icc-cricket.com/cricket-world-cup/about/279/history. [35] Asican Cricket Council. Afghanistan. 2015. URL http://www.asiancricket.org/ index.php/members/afghanistan. [36] Jon Healy Dean Bilton. Cricket world cup: New zealand v south africa semi-_nal in auckland as it happened. ABC News, 2015. URL http://www.abc.net.au/news/2015-03-24/ cricket-world-cup-semi-final3a-new-zealand-v-south-africa-live/ 6343458. [37] New zealand v south africa - 6 defining moments. ICC Cricket, March 2015. URL http://www.icc-cricket.com/ cricket-world-cup/news/2015/features-and-specials/87358/ new-zealand-v-south-africa-6-defining-moments. [38] Australia puts its no-one ranking on the line as icc cricket world cup 2015 starts on saturday. ICC Cricket, Feb 2015. URL http://www.icc-cricket.com/ cricket-world-cup/news/2015/media-releases/85426/~. [39] Twitter. Faqs about retweets (rt). 2014. URL https://support.twitter.com/ articles/77606-faqs-about-retweets-rt. [40] Valerio Basile and Malvina Nissim. Sentiment analysis on italian tweets. [41] D. Maynard and M. A. Greenwood. Who cares about sarcastic tweets? Investigating the impact of sarcasm on sentiment analysis. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC- 2014), Reykjavik, Iceland, May 26-31, 2014., pages 4238{4243, 2014. URL http: //www.lrec-conf.org/proceedings/lrec2014/summaries/67.html. [42] A. Kumar and T. Sebastian. Sentiment analysis: A perspective on its past.