Statarb assignment 4 – Aloke Mukherjee Q1. Volume profile intraday volume profile - chicago merc vs. yankee candle - july 11 - aug 11 2006 0.25 cme ycc percent of daily (9.30 - 4.00) volume 0.2 0.15 +1 std.dev 0.1 0.05 -1 std.dev 0 0 2 4 6 8 10 half-hour bucket (bucket 1 - 9.30-10.00) 12 14 By inspection we can see that both volume profiles lie within a single standard deviation of each other corresponding to t-statistics less than one for every half-hour bucket. This is not enough to reject the hypothesis of the curves being the same or in other words the two curves are “statistically indistinguishable”. Q2. Serial correlation on VWAP series a) For computation of vwap series see vwapseries.m below b) Serial correlation of changes in log prices CME autocorrelation w/95% intervals - july 11 - aug 11 2006 0.08 0.06 serial correlation 0.04 0.02 0 -0.02 -0.04 -0.06 0 2 4 6 8 10 lag 12 14 16 18 20 Serial correlation out to 20 lags computed on VWAP price series from 9.45 am – 3.45 pm. This correlogram is not inconsistent with a random walk (i.i.d.) although the 1lag autocorrelation is outside of the 95% interval. A correlogram for the second half of the price series does not have any autocorrelations outside the 95% interval indicating that this may not be a stationary feature of the series. c) Daily volatility estimate CME %1.88 %1.80 %0.85 Scaled 5 minute returns (9.30 – 4.00) 5 minute returns (9.45 – 3.45) Daily log returns from Yahoo (CRSP ends in Dec 2005) The five minute returns give a higher daily volatility estimate. d) Histogram of returns vs. Gaussian CME five minute return dist vs. gaussian - july 11 - aug 11 2006 700 600 normalized histogram std = 0.0021 mean = -2.3e-5 scaled histogram/pdf 500 400 300 normal dist std = 0.0011 200 100 0 -0.015 normal dist std = 0.0021 -0.01 -0.005 0 0.005 five minute returns 0.01 0.015 It is difficult to fit a Gaussian distribution to the normalized histogram of five minute returns. It displays significant kurtosis. Q3. Autocorrelation of volatility daily volatility estimated from 5 minute vwap returns - july 11 - aug 11 2006 0.045 cme ycc 0.04 daily volatility 0.035 0.03 0.025 0.02 0.015 0.01 0.005 2 4 6 8 10 12 14 16 days since july 11 2006 18 20 22 24 The serial (lag 1) correlation of the CME vol series is negative: -0.00706462 For YCC it is slightly positive: 0.00252462 The lack of a strong positive autocorrelation may be a result of the one month sampling period. Q4. Serial correlation in trades serial correlation of CME log price change per trade - july 11 - aug 11 2006 0.05 0 serial correlation -0.05 -0.1 -0.15 -0.2 -0.25 0 2 4 6 8 10 lag 12 14 16 18 20 The 1-lag negative serial correlation may be a result of "bid-ask bounce". The bid-ask spread causes incoming buys and sells to be recorded at different prices. With random arrival of buys and sells this will cause negative serial correlation of increments. If the buy-sell indicator is i.i.d. only the 1-lag serial correlation will be negative. Code function [ticker, longdate, price, volume] = readtaqtrades(filename, l) % % % % % % % % % % % % % % % % % % % % % function [ticker, longdate, price, volume] = readtaqtrades(filename, l) Read TAQ trades CSV (ticker,date,time,price,volume) into MATLAB variables. Example: SYMBOL,DATE,TIME,PRICE,SIZE CME,11JUL2006,8:19:50,480.74,100 etc input: filename - CSV file to read l - number of lines to read - if omitted will read all lines output: ticker - symbol of the stock longdate - array of trade dates, trades down rows, columns are: year month day hour minute second price - trade prices volume - number of shares in each trade 2006 aloke mukherjee [ticker,date,time,price,volume] = textread(filename,'%s%s%s%f%u','delimiter',',','headerlines',1); if (nargin < 2) l = length(price); end di = 2000; disp(sprintf('total lines to read %g', l)); for i = 1:di:l endidx = min(i+di-1,l); disp(sprintf('processing lines %g - %g', i, endidx)); catdate = cat(2,datestr(datenum(date(i:endidx)),'dd-mmm-yyyy '),char(time(i:endidx))); longdate(i:endidx,:) = datevec(catdate); end function b = bucketize(ld, v) % % % % % % % % % % % % % % % % % % % function b = bucketize(ld, v) Calculates fraction of daily volume traded in each half-hour bucket from 9.30 - 16.00 in each day. inputs: ld - one row per trade, columns are: year month day hour minute second v - volume per trade corresponding to row in ld output: b - bucketized data, 13 buckets down rows, days excluding weekends and holidays along columns To plot the intraday volume profile and std. deviations plot(mean(b')); hold on; plot(mean(b')+std(b'),'--'); plot(mean(b')-std(b'),'--'); 2006 aloke mukherjee % convert y,m,d into simple indices daynums = datenum(ld(:,1), ld(:,2), ld(:,3)); daynums = daynums - min(daynums) + 1; udaynums = unique(daynums); % convert times into buckets % bucket nums: 9.30-10.00 = 19, 15.30-16.00 = 31 buckets = ld(:,4)*2 + floor(ld(:,5)/30); bucknums = 19:31; offset = min(bucknums) - 1; % tabulate b = zeros(13, size(udaynums,1)); for bucket = bucknums for day = udaynums' b(bucket-offset, day) = sum(v((daynums == day) & (buckets == bucket))); % can use to check if there are any missing trades % v((daynums == day) & (buckets == bucket)) = 0; end end % take out weekends and holidays b(:,sum(b) == 0) = []; % convert to % of total volume traded in the interval b = b./(ones(length(bucknums),1)*sum(b)); function [c, err] = cgram(x, lmax); % % % % % % % % % % % % % % function [c, err] = cgram(x, lmax); Computes lagged correlations and standard error and plots a correlogram. inputs: x - series to examine (column vector) lmax - maximum lag for which to compute autocorrelation outputs: c - autocorrelations for each lag from one to lmax err - expected standard error 2006 aloke mukherjee n = size(x,1); for l = 0:lmax m(:,l+1)=x(l+1:n-lmax+l); end cc=cov(m); c=cc(1,2:lmax+1)/cc(1,1); err=sqrt(1/(n-lmax)); plot(1:lmax, c); hold on; plot(1:lmax, 2*err*ones(1,lmax), '--'); plot(1:lmax, -2*err*ones(1,lmax), '--'); function [w, u] = vwapseries(ld, v, p) % % % % % % % % % % % % % % % % % % % function [w, u] = vwapseries(ld, v, p) Calculate volume weighted average price series on five-minute intervals. inputs: ld - one row per trade, columns are: year month day hour minute second v - volume per trade corresponding to row in ld p - price per trade corresponding to row in ld output break through like the: w - volume weighted price series u - unweighted price series, unexpectedly. % example usage: graphing serial correlation [t,l,p,v]=readtaqtrades('cme-trades.csv'); % takes a while [wcmesub,ucmesub]=vwapseries(l,v,p); cgram(diff(log(wcmesub)),20); % 2006 aloke mukherjee % convert y,m,d into simple indices daynums = datenum(ld(:,1), ld(:,2), ld(:,3)); daynums = daynums - min(daynums) + 1; udaynums = unique(daynums); % convert times into buckets - 5 minute intervals % only tabulate trades from 9.30 - 16.00 bucksize = 5; % in minutes converttobuck = @(hr,min) hr*60/bucksize + floor(min/bucksize); buckets = converttobuck(ld(:,4), ld(:,5)); minbuck = converttobuck(9, 30); maxbuck = converttobuck(15, 55); bucknums = minbuck:maxbuck; offset = min(bucknums) - 1; % tabulate w = zeros(length(bucknums), length(udaynums)); u = zeros(size(w)); for bucket = bucknums for day = udaynums' px = p((daynums == day) & (buckets == bucket)); vx = v((daynums == day) & (buckets == bucket)); if (size(px,1)) w(bucket-offset, day) = sum(px .* vx)/sum(vx); u(bucket-offset, day) = mean(px); end end end % convert to vector % w = w(:); % u = u(:); % % % % % take out zeroes (holidays, weekends and some very rare five-minute periods when nothing traded) zs = (w == 0); w(zs) = []; u(zs) = []; %%%% % computing the autocorrelation of the vol series % [t,l,p,v]=readtaqtrades('cme-trades.csv'); % [t2,l2,p2,v2]=readtaqtrades('ycc-trades.csv'); % calculate vwap series [wcme,ucme]=vwapseries(l,v,p); [wycc,uycc]=vwapseries(l2,v2,p2); % take out weekends wcme(:,sum(wcme)==0)=[]; wycc(:,sum(wycc)==0)=[]; % fill in zero prices as previous five minute price wcme(find(wcme==0)) = wcme(find(wcme==0)-1); wycc(find(wycc==0))=wycc(find(wycc==0)-1); % take logs, differences, std dev. vcme = std(diff(log(wcme)))*sqrt(78); vycc = std(diff(log(wycc)))*sqrt(78); % get serial correlations scme = corrcoef(vcme(1:end-1),vcme(2:end)); sycc = corrcoef(vycc(1:end-1),vycc(2:end)); disp(sprintf('cme vol correlation: %g', scme(1,2))); disp(sprintf('ycc vol correlation: %g', sycc(1,2)));