Uploaded by Mohammad MR7

Presentation Slides Round 2

advertisement
TextToStocks
Presentation 2
Team 48
Shrey Joshi, Adrian Mah, Suryansh Pratap Singh,
Ryan Van Duren, Muhammad Raza
Advisor: Yuqian Zhang
The Problem
“Meme stocks” are equities that have gained a cult like following online and
through various social media platforms. These equities are often hyped by these
online communities who can often times greatly influence their price. This raises
the question: do internet discussions about equities correlate to the future prices
of said equities, and if so can this correlation be used to improve the accuracy of
equity price predictions?
Scope of Work Recap
To reiterate from last semester, our goal is to be able to draw any
conclusion(s) about the future price of an equity based on prior interned
discussion regarding the equity, or establish that there is no relationship
between the two if that is the case.
We’ll be using machine learning with Natural Language Processing
(NLP) on messages scraped from certain internet forums to make such
predictions on equity prices using a Long Short Term Memory (LSTM) machine
learning model
Task Allocation
● Work is common amongst all team members, but work
ownership is allocated as follows:
Adrian: ML Research and Implementation
Shrey: Sorting and categorizing scraped messages
Muhammad: Webscraping needed pages for data
Ryan: Prediction algorithm based on Natural Language Processing, data
collection
○ Suryansh: Machine training dataset
○
○
○
○
General Progress and Implementation
● Selected environment and several packages
○
○
Environment:
■ Anaconda3 with python
■ Windows
Packages:
■ Deep Learning : Tensorflow
■ Data Acquisition : pandas_datareader and Yahoo Finance
■ Plotting : matplotlib
■ Web Scraping : Beautiful Soup
■ NLP : spaCy
Neural Network : Objective
● Built a NN that takes in basic stock data and outputs a prediction
○
○
“Control” prediction
Data:
■ Indexed by date, separated by equity
■ High, Low, Open, Close, Volume, Adj Close
● Train the model to maximize the accuracy of the prediction
○
Cannot account for world events
■ Accuracy will never be perfect
● Build a second NN that incorporates sentiment
○
○
Compare accuracy of two networks
Ideally demonstrate improved accuracy with sentiment parameter
Neural Network : Progress
●
●
We selected and built a Long Short-Term Memory (LSTM) Neural Network
○ Recurring NN that is capable of learning long-term dependencies
○ Uses each prior state and current state for calculations
○ Common for equity based deep learning projects
○ Our simple LSTM model can predict stock prices when given cherry picked example data
Input Parameters
○ High, Low, Open, Close, Volume, Adj Close
○ Eventually : Sentiment
■ We must calculate sentiment through natural language processing
Neural Network : Issues
● Cannot account for real world events
○
Accuracy will never be perfect
■ Especially with political instability…
● Model must be trained
○
Takes time
● Hardware configuration
○
Configure training environment such that the most efficient hardware is used regardless of
which device the application is run on
■ GPU if available
Neural Network : Remaining Objectives
● Continue to train model with stock data
○
○
Improve accuracy
Train with a variety of stocks
● Build duplicate network and add sentiment parameter
○
Sentiment analysis allows natural language processing and ML to interpret and classify
emotions within text information
● Tweak NN architecture to better accommodate sentiments
● Train NN with sentiment data
○
From r/wallstreetbets
Natural Language Processing : Objective
●
Analyze relevant online discussion
○ Determine which equity conversations refer to and which conversations are relevant
○ Produce a daily “sentiment” value for each measured equity that accurately reflects how real people feel
about an equity
○ To be used as an input parameter for NN
●
State of the Art: Reddit Sentiment Analysis
○ Has been done before
○ Limited to r/wallstreetbets
■ Easy due to ticker system
■ We must expand to other subreddits and other platforms in order to get a more complete picture
Natural Language Processing : Progress
● Identify the online resources we are going to use for finding sentiment
analysis
○
r/wallstreetbets
● Using beautifulsoup to collect the data from the source which is going to be
used for the natural language processing
● Created a basic algorithm using spaCy to identify positive and negative
sentiment in a sentence
○
It has no sarcasm detection yet.
Natural Language Processing : Issues
●
●
Expanding outside of r/wallstreetbets
○ No easily defined tickers
■ Difficult to accurately identify what company is being discussed
○ Solution : Use larger context to assume relevant ticker if key words are missing
■ Ex: someone responding to this tweet with “will be selling my shares!”
● We can infer the response is relevant to a Musk company
Expanding outside of reddit
○ Which Tweets are important?
■ Can’t analyze them all
○ Which other platforms are relevant?
Natural Language Processing : Issues
●
Improving sarcasm detection
○ Ex:
■ WOW… Tesla’s new car is incredible! The people are so excited
■ This is Tesla’s “incredible” new car people are so excited for? WOW…
○ Sentences with similar words can have opposite meanings
●
What comments / discussions are relevant
○ Discussions may not be relevant if they…
■ Are made by bots
● More common than you may think
■ Are made by biased individuals
● Shareholders, employees, fans, etc
■ Are not popular or are particularly unpopular
● Comments / tweets with no likes / responses
● Comments / tweets with negative responses
○ May indicate the opposite sentiment
Remaining Objectives
● Scrape the web for relevant discussions
○
Identify which equity the discussion is relevant to
● Calculate a sentiment value accurate enough that it’s inclusion improves the
accuracy of the LSTM network
○
Must account for sarcasm and popularity
● Required Steps:
○
○
Improve the accuracy of sentiment value
■ Sarcasm detection
■ Bot identification
Train NN with and without sentiment included
Download