Presentation Link

advertisement
Stock Market Prediction Using
Sentiment Detection
C. LEE FANZILLI
ADVISORS: PROF. DVORAK AND PROF. WEBB
Hypothesis
 Can
we use Twitter sentiment
mentioning a stock in the
NYSE to predict future returns
of that stock?
 Can we predict
contemporaneous returns?
 Do returns predict Twitter
sentiment instead?
Background

People have tried many different ways
of predicting prices in the market.

Technical Analysis is a methodology
for forecasting the direction of prices
through the study of past market data,
primarily price and volume.

In Jan Larson’s paper he saw 300%
gains on initial investment with this
method (June 2010).
Challenges

Efficient Market Hypothesis states that the market is
always at equilibrium.

Once we have a dataset, there is a fair amount of
organizing and cleaning up to be done.

Not all data is useful data, and the data that is useful
may not be sufficient enough to make a claim.
Data



For this experiment we collected
daily stock price information on
AMD, Google, and Apple from
Yahoo Finance.
Google Daily Price Information
Date
2/24/2015
We retrieved a list of the top 101
tech tweeters from a Business
2/23/2015
Insider article to extract our tweets
from.
2/20/2015
Using Twitter’s API we created a
corpus of tweets.
2/19/2015
2/18/2015
Open
High
Low
Close
Volume
Adj Close
530
536.79
528.25
536.09 1002300
536.09
536.05
536.44
529.41
531.91 1448900
531.91
543.13
543.75
535.8
538.95 1440400
538.95
538.04
543.11
538.01
542.87
986400
542.87
541.4
545.49
537.51
539.7 1447600
539.7
Organization

We uploaded our Twitter data to CouchDB,
an Apache database.

Next we pulled the date posted and text from
tweets then separated them based on which
of our stocks was mentioned.

Then wrote a script to score each tweet’s
overall sentiment.
Sentiment Detection
Apple Stats

Sentiment Detection, a form of Mean
textual analysis.

The university of Pittsburgh
provides the MPQA corpus.

For a given dataset, we were
able to calculate the total
1.1941
number of positive, negative,
neutral, and true neutral tweets.
# Pos
# Neg
#
Neutral
#
Balanced
# True
Neutral
Total
926
2249
22560
247
2414
5735
4.31%
40.33%
Percentages 16.16% 39.22% 44.64%
Example of Scoring
Love @sunrise update - smoother calendar sync with GOOG apps and iPad
app!
1
@Simonkhalaf @BenedictEvans no doubt, apps are winning but I still have
sense that GOOG can change trajectory
1
Another protest against techies at 24th st-- google continues to be a rallying
symbol for protestors http://t.co/NebJQ4pwrZ
-2
The Beer Game -or- Why Apple Can't Build iPads in the US by
@marksweep http://t.co/u2cl4Xne
-1
apple analyst releases analysis based on another apple analysts analysis
0
Results
Returnt-1
Returnt
Returnt+1
Intercept
(AAPL)
Intercept
(AMD)
Intercept
(GOOG)
t-value = 2.04
t-value = 1.80
t-value = 2.48
t-value = 0.497
t-value = 0.79
t-value = 0.99
t-value = 0.15
t-value = -0.37
t-value = 0.62
Sentiment
(AAPL)
Sentiment
(AMD)
Sentiment
(GOOG)
t-value = -0.52
t-value = 0.15
t-value = -0.22
t-value = 0.12
t-value = -1.02
t-value = -0.51
t-value = -0.09
t-value = 1.28
t-value = 2.61**
R2(AAPL)
-0.000336
-0.01565
-0.00291
R2(AMD)
R2(GOOG)
-0.000448
0.000789
0.001875
-0.000436
-0.01091
0.01745

We ran linear regression models
in RStudio.

Our results indicate that there is
little to no correlation between
sentiment and future returns.

But each case tends to vary. Our
analysis on Google showed that
sentiment was indeed significant.

Certain values can be explained
by not enough data.
Apple Graphs
Returns Predicting Sentiment
Returns
Returns
Future Returns
Tweet Sentiment
Tweet Sentiment
Google Graphs
Returns Predict Sentiment
Returns
Returns
Future Returns
Tweet Sentiment
Tweet Sentiment
AMD Graphs
Returns Predicting Sentiment
Returns
Returns
Future Returns
Tweet Sentiment
Tweet Sentiment
Future Work

In the future we would take a look at indices in addition to
individual stocks.

As well as a broader range of Twitter data, not just tech
tweets

Rather than calculating return, we would also include the
Cumulative Abnormal Return.

More Twitter data would have to be collected, many
papers about similar experiences have millions of tweets
not thousands.

Instead of using a linear regression model, we would
consider using Support Vector Machines and other
Machine Learning tools.
Works Cited

B. Wiithrich, D. Permunetilleke, S. Leung, V. Cho, J. Zhang, W. Lam, "Daily
Prediction of Major Stock Indices from textual WWW Data", The Hong Kong
University of Science and Technology

J. Bollen, H. Mao, X. J. Zeng, "Twitter mood predicts the stock market", School of
Informatics and Computing, Indiana University-Bloomington, October 2010

J. I. Larsen, "Predicting Stock Prices Using Technical Analysis and Machine
Learning", Masters in Computer Science, Norwegian University of Science and
Technology, June 2010
Download