Statarb assignment 4

advertisement
Statarb assignment 4 – Aloke Mukherjee
Q1. Volume profile
intraday volume profile - chicago merc vs. yankee candle - july 11 - aug 11 2006
0.25
cme
ycc
percent of daily (9.30 - 4.00) volume
0.2
0.15
+1 std.dev
0.1
0.05
-1 std.dev
0
0
2
4
6
8
10
half-hour bucket (bucket 1 - 9.30-10.00)
12
14
By inspection we can see that both volume profiles lie within a single standard deviation
of each other corresponding to t-statistics less than one for every half-hour bucket. This
is not enough to reject the hypothesis of the curves being the same or in other words the
two curves are “statistically indistinguishable”.
Q2. Serial correlation on VWAP series
a) For computation of vwap series see vwapseries.m below
b) Serial correlation of changes in log prices
CME autocorrelation w/95% intervals - july 11 - aug 11 2006
0.08
0.06
serial correlation
0.04
0.02
0
-0.02
-0.04
-0.06
0
2
4
6
8
10
lag
12
14
16
18
20
Serial correlation out to 20 lags computed on VWAP price series from 9.45 am – 3.45
pm. This correlogram is not inconsistent with a random walk (i.i.d.) although the 1lag autocorrelation is outside of the 95% interval. A correlogram for the second half of
the price series does not have any autocorrelations outside the 95% interval indicating
that this may not be a stationary feature of the series.
c) Daily volatility estimate
CME
%1.88
%1.80
%0.85
Scaled 5 minute returns (9.30 – 4.00)
5 minute returns (9.45 – 3.45)
Daily log returns from Yahoo (CRSP ends in Dec 2005)
The five minute returns give a higher daily volatility estimate.
d) Histogram of returns vs. Gaussian
CME five minute return dist vs. gaussian - july 11 - aug 11 2006
700
600
normalized histogram
std = 0.0021
mean = -2.3e-5
scaled histogram/pdf
500
400
300
normal dist
std = 0.0011
200
100
0
-0.015
normal dist
std = 0.0021
-0.01
-0.005
0
0.005
five minute returns
0.01
0.015
It is difficult to fit a Gaussian distribution to the normalized histogram of five minute
returns. It displays significant kurtosis.
Q3. Autocorrelation of volatility
daily volatility estimated from 5 minute vwap returns - july 11 - aug 11 2006
0.045
cme
ycc
0.04
daily volatility
0.035
0.03
0.025
0.02
0.015
0.01
0.005
2
4
6
8
10
12
14
16
days since july 11 2006
18
20
22
24
The serial (lag 1) correlation of the CME vol series is negative: -0.00706462
For YCC it is slightly positive: 0.00252462
The lack of a strong positive autocorrelation may be a result of the one month sampling
period.
Q4. Serial correlation in trades
serial correlation of CME log price change per trade - july 11 - aug 11 2006
0.05
0
serial correlation
-0.05
-0.1
-0.15
-0.2
-0.25
0
2
4
6
8
10
lag
12
14
16
18
20
The 1-lag negative serial correlation may be a result of "bid-ask bounce". The bid-ask
spread causes incoming buys and sells to be recorded at different prices. With random
arrival of buys and sells this will cause negative serial correlation of increments. If the
buy-sell indicator is i.i.d. only the 1-lag serial correlation will be negative.
Code
function [ticker, longdate, price, volume] = readtaqtrades(filename, l)
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
function [ticker, longdate, price, volume] = readtaqtrades(filename, l)
Read TAQ trades CSV (ticker,date,time,price,volume) into MATLAB
variables. Example:
SYMBOL,DATE,TIME,PRICE,SIZE
CME,11JUL2006,8:19:50,480.74,100
etc
input:
filename - CSV file to read
l - number of lines to read - if omitted will read all lines
output:
ticker - symbol of the stock
longdate - array of trade dates, trades down rows,
columns are: year month day hour minute second
price - trade prices
volume - number of shares in each trade
2006 aloke mukherjee
[ticker,date,time,price,volume] =
textread(filename,'%s%s%s%f%u','delimiter',',','headerlines',1);
if (nargin < 2)
l = length(price);
end
di = 2000;
disp(sprintf('total lines to read %g', l));
for i = 1:di:l
endidx = min(i+di-1,l);
disp(sprintf('processing lines %g - %g', i, endidx));
catdate = cat(2,datestr(datenum(date(i:endidx)),'dd-mmm-yyyy
'),char(time(i:endidx)));
longdate(i:endidx,:) = datevec(catdate);
end
function b = bucketize(ld, v)
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
function b = bucketize(ld, v)
Calculates fraction of daily volume traded in each half-hour bucket
from 9.30 - 16.00 in each day.
inputs:
ld - one row per trade, columns are: year month day hour minute second
v - volume per trade corresponding to row in ld
output:
b - bucketized data, 13 buckets down rows, days excluding weekends and
holidays along columns
To plot the intraday volume profile and std. deviations
plot(mean(b')); hold on;
plot(mean(b')+std(b'),'--');
plot(mean(b')-std(b'),'--');
2006 aloke mukherjee
% convert y,m,d into simple indices
daynums = datenum(ld(:,1), ld(:,2), ld(:,3));
daynums = daynums - min(daynums) + 1;
udaynums = unique(daynums);
% convert times into buckets
% bucket nums: 9.30-10.00 = 19, 15.30-16.00 = 31
buckets = ld(:,4)*2 + floor(ld(:,5)/30);
bucknums = 19:31;
offset = min(bucknums) - 1;
% tabulate
b = zeros(13, size(udaynums,1));
for bucket = bucknums
for day = udaynums'
b(bucket-offset, day) = sum(v((daynums == day) & (buckets == bucket)));
% can use to check if there are any missing trades
% v((daynums == day) & (buckets == bucket)) = 0;
end
end
% take out weekends and holidays
b(:,sum(b) == 0) = [];
% convert to % of total volume traded in the interval
b = b./(ones(length(bucknums),1)*sum(b));
function [c, err] = cgram(x, lmax);
%
%
%
%
%
%
%
%
%
%
%
%
%
%
function [c, err] = cgram(x, lmax);
Computes lagged correlations and standard error and plots
a correlogram.
inputs:
x - series to examine (column vector)
lmax - maximum lag for which to compute autocorrelation
outputs:
c - autocorrelations for each lag from one to lmax
err - expected standard error
2006 aloke mukherjee
n = size(x,1);
for l = 0:lmax
m(:,l+1)=x(l+1:n-lmax+l);
end
cc=cov(m);
c=cc(1,2:lmax+1)/cc(1,1);
err=sqrt(1/(n-lmax));
plot(1:lmax, c); hold on;
plot(1:lmax, 2*err*ones(1,lmax), '--');
plot(1:lmax, -2*err*ones(1,lmax), '--');
function [w, u] = vwapseries(ld, v, p)
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
%
function [w, u] = vwapseries(ld, v, p)
Calculate volume weighted average price series on five-minute intervals.
inputs:
ld - one row per trade, columns are: year month day hour minute second
v - volume per trade corresponding to row in ld
p - price per trade corresponding to row in ld
output break through like the:
w - volume weighted price series
u - unweighted price series,
unexpectedly.
% example usage: graphing serial correlation
[t,l,p,v]=readtaqtrades('cme-trades.csv'); % takes a while
[wcmesub,ucmesub]=vwapseries(l,v,p);
cgram(diff(log(wcmesub)),20);
% 2006 aloke mukherjee
% convert y,m,d into simple indices
daynums = datenum(ld(:,1), ld(:,2), ld(:,3));
daynums = daynums - min(daynums) + 1;
udaynums = unique(daynums);
% convert times into buckets - 5 minute intervals
% only tabulate trades from 9.30 - 16.00
bucksize = 5; % in minutes
converttobuck = @(hr,min) hr*60/bucksize + floor(min/bucksize);
buckets = converttobuck(ld(:,4), ld(:,5));
minbuck = converttobuck(9, 30);
maxbuck = converttobuck(15, 55);
bucknums = minbuck:maxbuck;
offset = min(bucknums) - 1;
% tabulate
w = zeros(length(bucknums), length(udaynums));
u = zeros(size(w));
for bucket = bucknums
for day = udaynums'
px = p((daynums == day) & (buckets == bucket));
vx = v((daynums == day) & (buckets == bucket));
if (size(px,1))
w(bucket-offset, day) = sum(px .* vx)/sum(vx);
u(bucket-offset, day) = mean(px);
end
end
end
% convert to vector
% w = w(:);
% u = u(:);
%
%
%
%
%
take out zeroes (holidays, weekends and some very rare five-minute
periods when nothing traded)
zs = (w == 0);
w(zs) = [];
u(zs) = [];
%%%%
% computing the autocorrelation of the vol series
% [t,l,p,v]=readtaqtrades('cme-trades.csv');
% [t2,l2,p2,v2]=readtaqtrades('ycc-trades.csv');
% calculate vwap series
[wcme,ucme]=vwapseries(l,v,p);
[wycc,uycc]=vwapseries(l2,v2,p2);
% take out weekends
wcme(:,sum(wcme)==0)=[];
wycc(:,sum(wycc)==0)=[];
% fill in zero prices as previous five minute price
wcme(find(wcme==0)) = wcme(find(wcme==0)-1);
wycc(find(wycc==0))=wycc(find(wycc==0)-1);
% take logs, differences, std dev.
vcme = std(diff(log(wcme)))*sqrt(78);
vycc = std(diff(log(wycc)))*sqrt(78);
% get serial correlations
scme = corrcoef(vcme(1:end-1),vcme(2:end));
sycc = corrcoef(vycc(1:end-1),vycc(2:end));
disp(sprintf('cme vol correlation: %g', scme(1,2)));
disp(sprintf('ycc vol correlation: %g', sycc(1,2)));
Download