ppt

advertisement
SOPS: Stock Prediction using Web Sentiment
Presented by Vivek sehgal, Charles Song
Department of Computer Science, University of Maryland
ICDMW 2007
2009-05-29
Summarized by Jaeseok Myung
In this talk..
 Introducing some papers about sentiment analysis in finance

[1] 0Event and Sentiment Detection in Financial Markets (ISWC 08)
–

[2] SOPS: Stock Prediction using Web Sentiment (ICDMW 07)
–

Simple Architecture
Entire Process
[3] Yahoo! for Amazon: Sentiment Extraction from Small Talk on
the Web (Management Science 07)
–
An Idea that can improve prediction performance
 We will focus on SOPS, but brief introductions about the others
will also be presented
Center for E-Business Technology
Copyright  2009 by CEBT
Sentiment Analysis in Financial Markets
 Sentiment analysis is one of my favorite research topic

I’ve conducted some researches by using product reviews
 In my opinion, finance is more suitable domain than product

Product sales statistics is not publicly available
–


Stock values are always opened
Financial markets are really related to investors’ sentiment
–
‘경제는 심리’
–
Behavioral finance
–
Lots of evidences
Interesting & Worth
Center for E-Business Technology
Copyright  2009 by CEBT
Research Problem from [1][2][3]
 How can information from various, heterogeneous sources be
integrated?

Different formats
 How can the opinions in the documents be extracted?

Statistical, NLP ways
 How can the important opinions be filtered?

Reliable Source(news, blog), Trusted Author, Promising Alg.
 How can the users’ trading decisions be supported?

Finding out the relationships between investors’ sentiment and
stock values
Center for E-Business Technology
Copyright  2009 by CEBT
An Architecture from [1]
Monitor a huge
number of relevant
sources
Extract metadata
and Make a single
representation
Decide whether the
information has to
be analyzed or not
Center for E-Business Technology
Copyright  2009 by CEBT
SOPS: System Overview
Collect data
from a
message board
Remove HTML
tags and
extract
features
Use several
classifiers
Identify reliable
users in order
to filter noise
Center for E-Business Technology
Copyright  2009 by CEBT
SOPS: Data Collection
 260,000 messages for 52
popular stocks on Yahoo!
Finance

The messages covered over 6
month time period
 A message board exists for each
stock traded on major stock
exchange such as NYSE and
NASDAQ

Users must sign up before they
can post messages

Every message posted is
associated with the author
Center for E-Business Technology
Copyright  2009 by CEBT
SOPS: Data Collection
Center for E-Business Technology
Copyright  2009 by CEBT
SOPS: Feature Representation
 After the relevant information has been extracted

Converting each message to a vector of words and author names
 The value of each entry in the vector is then calculated using
TFIDF formula
M : set of all messages
m : a message
w : a term
“good” “stop” “asdf” date
% of change in stock price
( 3.2, 1.6, 1.09, 3.37. 90, 0.5, …)
Center for E-Business Technology
Copyright  2009 by CEBT
SOPS: Sentiment Prediction
What
How
a message
(undisclosed)
a message
(disclosed)
Classifier
Classifier
(Training)
Strong Buy
Hold
Buy
Center for E-Business Technology
Strong Sell
Strong Buy
Sell
Buy
Copyright  2009 by CEBT
Hold
Strong Sell
Sell
SOPS: Sentiment Prediction
 The sentiment for a message m at time instant i is modeled as
follows:
m : a message
Mi : set of all messages
SVi : Stock value
Strong Buy, Buy, Hold, Sell,
Strong Sell
Classifier
1.Naïve Bayes
2.Decision Trees
3.Bagging
0.2
0.3
Strong Buy
0.1
Hold
Buy
Center for E-Business Technology
0.4
Strong Sell
Sell
Copyright  2009 by CEBT
TrustValue Calculation
 Some authors are more knowledgeable than others about the
PredictionScore : author’s prediction performance that is how
stock market
closely does the author’s prediction follow the stock market

NumberOfPrediction
: theweight
total number
predictions made by
Trusted author’s posts should
carry more
=> ofTrustValue
the author
 TrustValue

ExactPrediction : the number of exact predictions
number
of “good
Not only cares about theClosePrediction
direction in: the
which
the
stockenough”
price predictions
went, but
ActivityConstant : a constant used to penalize low activity or
also care about the magnitude
predictions by the author

Takes into account the fact that a single author cannot be expert on
all stocks => an author can be assigned different trust values for
different stocks
Center for E-Business Technology
Copyright  2009 by CEBT
SOPS: Stock Prediction
Classifier
Go up
Center for E-Business Technology
Copyright  2009 by CEBT
Go down
SOPS: Evaluation Metrics
Center for E-Business Technology
Copyright  2009 by CEBT
SOPS: Experiments
Center for E-Business Technology
Copyright  2009 by CEBT
Conclusion
 SOPS can predict Web sentiment with high precision and recall
 SOPS introduced TrustValue which takes into account the trustworthiness of an author
 In my opinion, there are some points that are unclear

Presentation
–
About Summarization

Users

Time Period
Center for E-Business Technology
Copyright  2009 by CEBT
Furthermore
 We have the paper [3]
Center for E-Business Technology
Copyright  2009 by CEBT
Research Problem from [1][2][3]
 How can information from various, heterogeneous sources be
integrated?

Different formats
 How can the opinions in the documents be extracted?

Statistical, NLP ways
 How can the important opinions be filtered?

Reliable Source(news, blog), Trusted Author, Promising Alg.
 How can the users’ trading decisions be supported?

Finding out the relationships between investors’ sentiment and
stock values
Center for E-Business Technology
Copyright  2009 by CEBT
Download