Stock Market Prediction Using Sentiment Detection C. LEE FANZILLI ADVISORS: PROF. DVORAK AND PROF. WEBB Hypothesis Can we use Twitter sentiment mentioning a stock in the NYSE to predict future returns of that stock? Can we predict contemporaneous returns? Do returns predict Twitter sentiment instead? Background People have tried many different ways of predicting prices in the market. Technical Analysis is a methodology for forecasting the direction of prices through the study of past market data, primarily price and volume. In Jan Larson’s paper he saw 300% gains on initial investment with this method (June 2010). Challenges Efficient Market Hypothesis states that the market is always at equilibrium. Once we have a dataset, there is a fair amount of organizing and cleaning up to be done. Not all data is useful data, and the data that is useful may not be sufficient enough to make a claim. Data For this experiment we collected daily stock price information on AMD, Google, and Apple from Yahoo Finance. Google Daily Price Information Date 2/24/2015 We retrieved a list of the top 101 tech tweeters from a Business 2/23/2015 Insider article to extract our tweets from. 2/20/2015 Using Twitter’s API we created a corpus of tweets. 2/19/2015 2/18/2015 Open High Low Close Volume Adj Close 530 536.79 528.25 536.09 1002300 536.09 536.05 536.44 529.41 531.91 1448900 531.91 543.13 543.75 535.8 538.95 1440400 538.95 538.04 543.11 538.01 542.87 986400 542.87 541.4 545.49 537.51 539.7 1447600 539.7 Organization We uploaded our Twitter data to CouchDB, an Apache database. Next we pulled the date posted and text from tweets then separated them based on which of our stocks was mentioned. Then wrote a script to score each tweet’s overall sentiment. Sentiment Detection Apple Stats Sentiment Detection, a form of Mean textual analysis. The university of Pittsburgh provides the MPQA corpus. For a given dataset, we were able to calculate the total 1.1941 number of positive, negative, neutral, and true neutral tweets. # Pos # Neg # Neutral # Balanced # True Neutral Total 926 2249 22560 247 2414 5735 4.31% 40.33% Percentages 16.16% 39.22% 44.64% Example of Scoring Love @sunrise update - smoother calendar sync with GOOG apps and iPad app! 1 @Simonkhalaf @BenedictEvans no doubt, apps are winning but I still have sense that GOOG can change trajectory 1 Another protest against techies at 24th st-- google continues to be a rallying symbol for protestors http://t.co/NebJQ4pwrZ -2 The Beer Game -or- Why Apple Can't Build iPads in the US by @marksweep http://t.co/u2cl4Xne -1 apple analyst releases analysis based on another apple analysts analysis 0 Results Returnt-1 Returnt Returnt+1 Intercept (AAPL) Intercept (AMD) Intercept (GOOG) t-value = 2.04 t-value = 1.80 t-value = 2.48 t-value = 0.497 t-value = 0.79 t-value = 0.99 t-value = 0.15 t-value = -0.37 t-value = 0.62 Sentiment (AAPL) Sentiment (AMD) Sentiment (GOOG) t-value = -0.52 t-value = 0.15 t-value = -0.22 t-value = 0.12 t-value = -1.02 t-value = -0.51 t-value = -0.09 t-value = 1.28 t-value = 2.61** R2(AAPL) -0.000336 -0.01565 -0.00291 R2(AMD) R2(GOOG) -0.000448 0.000789 0.001875 -0.000436 -0.01091 0.01745 We ran linear regression models in RStudio. Our results indicate that there is little to no correlation between sentiment and future returns. But each case tends to vary. Our analysis on Google showed that sentiment was indeed significant. Certain values can be explained by not enough data. Apple Graphs Returns Predicting Sentiment Returns Returns Future Returns Tweet Sentiment Tweet Sentiment Google Graphs Returns Predict Sentiment Returns Returns Future Returns Tweet Sentiment Tweet Sentiment AMD Graphs Returns Predicting Sentiment Returns Returns Future Returns Tweet Sentiment Tweet Sentiment Future Work In the future we would take a look at indices in addition to individual stocks. As well as a broader range of Twitter data, not just tech tweets Rather than calculating return, we would also include the Cumulative Abnormal Return. More Twitter data would have to be collected, many papers about similar experiences have millions of tweets not thousands. Instead of using a linear regression model, we would consider using Support Vector Machines and other Machine Learning tools. Works Cited B. Wiithrich, D. Permunetilleke, S. Leung, V. Cho, J. Zhang, W. Lam, "Daily Prediction of Major Stock Indices from textual WWW Data", The Hong Kong University of Science and Technology J. Bollen, H. Mao, X. J. Zeng, "Twitter mood predicts the stock market", School of Informatics and Computing, Indiana University-Bloomington, October 2010 J. I. Larsen, "Predicting Stock Prices Using Technical Analysis and Machine Learning", Masters in Computer Science, Norwegian University of Science and Technology, June 2010