SOPS: Stock Prediction using Web Sentiment Presented by Vivek sehgal, Charles Song Department of Computer Science, University of Maryland ICDMW 2007 2009-05-29 Summarized by Jaeseok Myung In this talk.. Introducing some papers about sentiment analysis in finance [1] 0Event and Sentiment Detection in Financial Markets (ISWC 08) – [2] SOPS: Stock Prediction using Web Sentiment (ICDMW 07) – Simple Architecture Entire Process [3] Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web (Management Science 07) – An Idea that can improve prediction performance We will focus on SOPS, but brief introductions about the others will also be presented Center for E-Business Technology Copyright 2009 by CEBT Sentiment Analysis in Financial Markets Sentiment analysis is one of my favorite research topic I’ve conducted some researches by using product reviews In my opinion, finance is more suitable domain than product Product sales statistics is not publicly available – Stock values are always opened Financial markets are really related to investors’ sentiment – ‘경제는 심리’ – Behavioral finance – Lots of evidences Interesting & Worth Center for E-Business Technology Copyright 2009 by CEBT Research Problem from [1][2][3] How can information from various, heterogeneous sources be integrated? Different formats How can the opinions in the documents be extracted? Statistical, NLP ways How can the important opinions be filtered? Reliable Source(news, blog), Trusted Author, Promising Alg. How can the users’ trading decisions be supported? Finding out the relationships between investors’ sentiment and stock values Center for E-Business Technology Copyright 2009 by CEBT An Architecture from [1] Monitor a huge number of relevant sources Extract metadata and Make a single representation Decide whether the information has to be analyzed or not Center for E-Business Technology Copyright 2009 by CEBT SOPS: System Overview Collect data from a message board Remove HTML tags and extract features Use several classifiers Identify reliable users in order to filter noise Center for E-Business Technology Copyright 2009 by CEBT SOPS: Data Collection 260,000 messages for 52 popular stocks on Yahoo! Finance The messages covered over 6 month time period A message board exists for each stock traded on major stock exchange such as NYSE and NASDAQ Users must sign up before they can post messages Every message posted is associated with the author Center for E-Business Technology Copyright 2009 by CEBT SOPS: Data Collection Center for E-Business Technology Copyright 2009 by CEBT SOPS: Feature Representation After the relevant information has been extracted Converting each message to a vector of words and author names The value of each entry in the vector is then calculated using TFIDF formula M : set of all messages m : a message w : a term “good” “stop” “asdf” date % of change in stock price ( 3.2, 1.6, 1.09, 3.37. 90, 0.5, …) Center for E-Business Technology Copyright 2009 by CEBT SOPS: Sentiment Prediction What How a message (undisclosed) a message (disclosed) Classifier Classifier (Training) Strong Buy Hold Buy Center for E-Business Technology Strong Sell Strong Buy Sell Buy Copyright 2009 by CEBT Hold Strong Sell Sell SOPS: Sentiment Prediction The sentiment for a message m at time instant i is modeled as follows: m : a message Mi : set of all messages SVi : Stock value Strong Buy, Buy, Hold, Sell, Strong Sell Classifier 1.Naïve Bayes 2.Decision Trees 3.Bagging 0.2 0.3 Strong Buy 0.1 Hold Buy Center for E-Business Technology 0.4 Strong Sell Sell Copyright 2009 by CEBT TrustValue Calculation Some authors are more knowledgeable than others about the PredictionScore : author’s prediction performance that is how stock market closely does the author’s prediction follow the stock market NumberOfPrediction : theweight total number predictions made by Trusted author’s posts should carry more => ofTrustValue the author TrustValue ExactPrediction : the number of exact predictions number of “good Not only cares about theClosePrediction direction in: the which the stockenough” price predictions went, but ActivityConstant : a constant used to penalize low activity or also care about the magnitude predictions by the author Takes into account the fact that a single author cannot be expert on all stocks => an author can be assigned different trust values for different stocks Center for E-Business Technology Copyright 2009 by CEBT SOPS: Stock Prediction Classifier Go up Center for E-Business Technology Copyright 2009 by CEBT Go down SOPS: Evaluation Metrics Center for E-Business Technology Copyright 2009 by CEBT SOPS: Experiments Center for E-Business Technology Copyright 2009 by CEBT Conclusion SOPS can predict Web sentiment with high precision and recall SOPS introduced TrustValue which takes into account the trustworthiness of an author In my opinion, there are some points that are unclear Presentation – About Summarization Users Time Period Center for E-Business Technology Copyright 2009 by CEBT Furthermore We have the paper [3] Center for E-Business Technology Copyright 2009 by CEBT Research Problem from [1][2][3] How can information from various, heterogeneous sources be integrated? Different formats How can the opinions in the documents be extracted? Statistical, NLP ways How can the important opinions be filtered? Reliable Source(news, blog), Trusted Author, Promising Alg. How can the users’ trading decisions be supported? Finding out the relationships between investors’ sentiment and stock values Center for E-Business Technology Copyright 2009 by CEBT