Stocks: volatility vs. price

advertisement
Stocks 'R Us:
Analyzing the Volatility of Stocks in the S&P 500
Felix Chu, Jeremy Brudvik,
Philip Kuo, Alex Loddengaard
Introduction
Many companies offer stock shares of their company to a public, open market,
where owners of the stock have a small share in the company. The ownership of
a stock typically gives the buyer votes somewhat proportional to the number of
stocks they own, allowing them a small say in policy-making, as well as a share
of profits and losses depending on stock price fluctuation. The buying and
selling of stocks will also affect its price. Generally good news will cause stock
prices to rise, and bad news will cause stock prices to fall. Moreover, if a stock is
in high demand, then its share price will increase; if it is in low demand, then its
share price will decrease. Sometimes this can be taken as a measure of the
public's confidence in the company, or a reflection of their opinion on the
company's recent actions. A company’s worth can be roughly estimated by the
price of a single stock multiplied by the total amount of stocks.
In order to control the price of a stock, the company that owns the stock may
choose to split. Usually splits are 2:1, where the stock price is halved and stock
owners end up with twice the number of stocks they had originally. The total
number of stocks is doubled as well. Other common splits include 3:1 and 3:2, as
well as others. It is theorized that stock splits, by making the price of an
individual stock lower, make the stock more attractive to larger portions of the
population, such as day traders. For this project, we hope to determine whether
this is the case, by comparing the volatility of a stock as a measure of its
attractiveness to traders with its price. We define volatility to be the difference
between high and low price for the day, divided by the opening price. This gives
us the absolute percent fluctuation of the stocks' prices based on their price on
that day. For example, if a stock price fluctuates greatly during a single day, it
has high volatility. On the other hand, if the stock price stays stable over the day,
it has low volatility.
We investigated the stocks in the S&P 500 for this analysis. The S&P 500 is a stock
market index consisting of 500 "large" companies, with "large" being defined as
having a very high market value, which is defined as the product of the number
of shares available to the public and share price. 3M, Adobe, Amazon, Apple,
Coca Cola, eBay, FedEx, Ford, Google, Heinz, Mattel, McDonalds, Safeway,
Starbucks, Time Warner, and Disney are just a few of the stocks found in this
index. S&P stands for Standards and Poor, who developed and maintains the
index. S&P is a division of McGraw Hill, which, besides printing textbooks,
provides financial and business services, among other things. (1). All companies
in the index are publicly traded in the New York Stock Exchange and NASDAQ.
Nearly all of the 500 are US companies. The companies to be included are chosen
by a committee, who judge which are indicative of the performance of various
industries in the United States. The inclusion/exclusion of companies are
determined on an "as needed" basis. Stocks without enough liquidity are not
included in this index, which is also determined by the committee. <TODO: cite
more in this paragraph>
Methods
Data Source
We have selected a single dataset to analyze stock volatility (2). The dataset
contains daily snapshots of the S&P 500 stock index for a year starting on April
26, 2007. Each entry includes the stock symbol, date, stock ticker, opening price,
high price, low price, close price, and trade volume. Since stock trading does not
occur on national holidays or weekends, we have 253 days worth of stock trades
data (see figure 1). Given that the S&P 500 is made up of 500 companies, we
would expect to have 126,500 data entries. However, the dataset has a few
missing entries for unknown reasons and only includes 126,437 entries. The set of
stocks in the S&P 500 also changed over the year, but the size of the set always
remained at or near 500.
Figure 1: A histogram showing that most stocks were traded
when the market was open.
The S&P 500 dataset was heavily weighted towards stocks with low prices and
also with minimal percent changes. The heavy weighting caused our regression
analysis to be greatly skewed (see figure 2), so we decided to prune the larger
dataset into a sub-dataset. We created a Python script that does the following:
1. Split all entries into groups of $5 opening-price chunks ($0-4, $5-9, $10-14,
etc.)
2. Compute the absolute percentage change for each entry by computing the
difference of high price and low price for the day, divided by the opening
price
3. Only include the data entry with the largest percent change for this $5
chunk
This gives us a data set with a single entry per $5 slice, where that single entry is
the stock that had the highest volatility within that opening price range.
The graph of opening price to percentage change (see figure 2) clearly shows that
there is an inverse-polynomial curve at the top of the data, disregarding outliers.
Essentially by finding the largest percentage change for each $5 chunk, we're
capturing the curve of Figure 2 and avoiding our regression having large errors
from a weighted dataset. Figure 3 shows how the curve becomes more apparent
with the pruned dataset.
Figure 2: Opening price vs. Daily percentage
Figure 3: Opening price vs. Daily percentage
change with full dataset -- no pruning.
change with pruned dataset.
Incidentally, we did investigate the effects of after hour trading (AHT) between
the closing price of a stock on one day and the opening price of the stock on the
following day. By definition, AHT is the buying and selling of stocks outside of
the specified regular trading hours. Both the New York Stock Exchange and the
NASDAQ operate from 9:30 a.m. to 4:00 p.m. EST (3). We discovered the price
change from one day to the next due to AHT was normally distributed around
zero, with a mean of -0.007% and a standard deviation of 1.7% (see figure 4).
While AHT may have had interesting impacts on our measure of volatility, we
decided not to include it in our research for the following reasons. First, the
demographics of traders during AHT are different than non-AHT traders,
because AHT trades are dominated by large financial institutions. Second, we
could not find an AHT dataset that had the same level of detail of our non-AHT
dataset.
Figure 4: AHT percentage change histogram.
Data Analysis
Statistical Methods
In order to fully understand the relationship between daily change and opening
price, we created two regression models to fit our dataset: one non-linear and
one linear with logarithmic data relationships.
Tests and Criteria
Our preliminary tests and criteria were common sense. That is, we directly
compared our models to the dataset itself to determine if a particular model was
accurate. Moreover, we looked at the distribution of the residuals to ensure the
relevancy of each model. For the non-linear model, we looked at the residual
sum of squares, and for the linear model, we looked at the R-squared value, pvalue of significance, and standard error.
Tools
Our data-pruning tool is a custom Python script. A custom Java program was
used for the AHT histogram data preprocessing. All statistical analysis was done
in R with standard packages.
Results
Assaf, we thought we would include our linear and non-linear regressions as
well as histograms and box-plots for the residuals. Tell us what you think .
Discussion and Conclusions
References
1.) http://www2.standardandpoors.com
2.) http://biz.swcp.com/stocks/
3.) http://www.investopedia.com/ask/answers/04/061004.asp
4.) <add a reference to the site we read about the nlm function?>
Appendix
Download