Data analysis in MATLAB Christian Ruff Why use MATLAB to analyse data? • One single programme can be used for: – importing single-subject data from any format – re-arranging for multi-subject analyses – statistical tests – plotting results Errors are less likely One single script for analysis and documentation This can even be used by your experimental COGENT-script (online-analysis) Ultimately, MATLAB is **much** more flexible than SPSS or EXCEL, especially for graphs • Nuisances: – some details of SPSS procedures not available (but on the web) – Use not as intuitive as SPSS buttons, but help <functionname> and doc <functionname> Outline • How to: (1) Import single-subject data from any format (and export it as well) (2) Inspect single-subject data for distribution / outliers etc. (3) Re-arrange data for multi-subject analyses (4) Perform statistical tests all as steps in one single script (1) Importing data: Reading in files • MATLAB can read in many different types of files, using different functions • These can be listed with help fileformats • Examples are: – xlsread: EXCEL data – dlmread: tab-delimited text (or any other form of delimited text, e.g., whitespace) – csvread: comma-separated numbers – textread: any mixture of text and numbers – importdata: any formatted data as a full file (looks for the most appropriate function to use) – fopen/fread: any formatted data by line, but need extensive user specification of format • help <functionname> and doc <functionname> give instructions and examples • MATLAB can also be used to save data in the corresponding formats (e.g., dlmwrite, csvwrite, fopen/fwrite/fprintf) Outline • How to: (1) Import single-subject data from any format (2) Inspect single-subject data for distribution / outliers etc. (3) Re-arrange data for multi-subject analyses (4) Perform statistical tests all as steps in one single script (2) Inspecting data: Descriptive statistics • Descriptive statistics: mean, median, min, max, prctile, range, var, std, skewness, kurtosis, cdfplot - many of these also work for data with missing values, by appending “nan” (e.g., nanmean) • Visualisation of distribution: - Histogram: hist, also available with superimposed normal distribution: histfit - Test for normal distribution: - visually with normplot - statistically with lillietest (when testing for normality), kstest (when testing for any distribution) or kstest2 (when testing for identity of distributions of two or more variables) - Scatterplot of two variables: scatter, also available for several variables: plotmatrix - Lineplot of data against one dimension (e.g., time): plot, or two dimensions: plot3 - visual check for outliers: boxplot (or check for impact of outliers with trimmean) Outline • How to: (1) Import single-subject data from any format (2) Inspect single-subject data for distribution / outliers etc. (3) Re-arrange data for multi-subject analyses (4) Perform statistical tests all as steps in one single script (3) Transforming data for multi-subject analyses Matrices are by far the most convenient data format for statistical analyses: – Most descriptive-statistics commands work on dimensions of matrices e.g., mean(matrix,1) over rows, mean(matrix,2) over columns, etc. – Matrices can easily be indexed with logicals e.g., rows = (matrix(:,2)==1); data(:,1) = matrix(rows,:); – Condition indices can easily be created as matrices e.g., data(:,[2:3]) = fullfact([2 12]); – Matrices can be easily transformed with • Sort and sortrows to sort data • flipud, fliplr, flipdim, rot90 to flip dimensions • reshape to change dimensions • squeeze to remove dimensions • shiftdim, circshift to shift dimensions Outline • How to: (1) Import single-subject data from any format (2) Inspect single-subject data for distribution / outliers etc. (3) Re-arrange data for multi-subject analyses (4) Perform statistical tests all as steps in one single script (4) Statistics: mean comparison The MATLAB statistics toolbox contains functions for many (non-)parametric tests (help stats) These ask for data in different input formats (help <functionname> and doc <functionname> They give out all relevant statistics as variables, and/or as tables (if displayopt = ‘on’) • Comparing several independent measures: anova1, anova2, anovan, manova1, kruskalwallis • Comparing several dependent (or mixed) measures: rmaov1, rmaov2, bwoav2, rmaov31, rmaov32, rmaov33, friedman (all repeated measures ANOVAs from http://www.mathworks.com/matlabcentral/fileexchange) • Post-hoc contrasts: multcompare, grpstats • Comparing two independent measures: ttest2, ranksum Comparing two dependent variables: ttest, signtest, signrank (4) Statistics: association/ dimension reduction • Bivariate associations: – correlation: corrcoef – linear regression: regress or robustfit (weighted to minimise impact of outliers) – nonlinear regression (e.g. logistic regression): nlinfit • Multivariate associations: – Canoncorr, manova1, mdscale, classify, cluster • Dimension reduction: – princomp, factoran • Bootstrapping is available: Bootstrp (4) Statistics: many other useful things • The statistics toolbox contains functions for many statistical distributions (beta, binomial, exponential, gamma, poisson, weibull…): – Fits – Cumulative and probability density functions and their inverses – random number generation from many distributions • Efficient design of factorial experiments (e.g. Fullfact; randn) • Advanced statistical methods are either implemented (e.g., hidden Markov Models, decision trees) or can be found on the web: • – http://www.statsci.org/matlab – http://www.mathworks.com/matlabcentral/fileexchange If you want to know more, look at the excellent MATLAB documentation at: – http://www.mathworks.com/access/helpdesk/help/techdoc/