Univariate MATLAB plots boxplot Box plot cdfplot Empirical cumulative distribution function plot dfittool Interactive distribution fitting disttool Interactive density and distribution plots ecdfhist Empirical cumulative distribution function histogram histfit Histogram with a distribution fit normplotNormal probability plot normspecNormal density plot between specifications probplot Probability plots qqplot Quantile-quantile plot randtool Interactive random number generation wblplot Weibull probability plot Some MATLAB bivarite/ multivariate plots boxplot gline gname gplotmatrix gscatter hist3 lsline refcurve refline scatterhist Box plot Interactively add line to plot Add case names to plot Matrix of scatter plots by group Scatter plot by group Bivariate histogram Add least-squares line to scatter plot Add reference curve to plot Add reference line to plot Scatter plot with marginal histograms Statistics Toolbox in MATLAB Some examples help rng rng('default') x = [normrnd(4,1,1,100) normrnd(6,0.5,1,200)]; histfit(x) help histfit 70 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 histfit Histogram with superimposed fitted normal density. histfit(DATA,NBINS) plots a histogram of the values in the vector DATA, along with a normal density function with parameters estimated from the data. NBINS is the number of bars in the histogram. With one input argument, NBINS is set to the square root of the number of elements in DATA. histfit(DATA,NBINS,DIST) plots a histogram with a density from the DIST distribution. DIST can take the following values: 'beta' Beta 'birnbaumsaunders' Birnbaum-Saunders 'exponential' Exponential 'extreme value' or 'ev' Extreme value 'gamma' Gamma 'generalized extreme value' 'gev' Generalized extreme value 'generalized pareto' or 'gp' Generalized Pareto (threshold 0) 'inverse gaussian' Inverse Gaussian 'logistic' Logistic 'loglogistic' Log logistic 'lognormal' Lognormal 'negative binomial' or 'nbin' Negative binomial 'nakagami' N akagami 'normal' Normal 'poisson' Poisson 'rayleigh' Rayleigh 'rician' Rician 'tlocationscale' t location-scale 'weibull' or 'wbl' Weibull H = histfit(...) returns a vector of handles to the plotted lines. H(1) is a handle to the histogram, H(2) is a handle to the density curve. Reference page in Help browser doc histfit probplot('normal',x) Probability plot for Normal distribution 0.995 0.99 0.95 0.9 Probability 0.75 0.5 0.25 0.1 0.05 0.01 0.005 1 2 3 5 4 6 7 8 Data >> p = 0:0.25:1; y = quantile(x,p); z = [p;y] z= 0 0.2500 0.5000 0.7500 1.0000 1.0557 4.7375 5.6872 6.1526 7.5784 >> y = [mean(x) median(x)] y = 5.3438 5.6872 >> y = [skewness(x) kurtosis(x)] y = -1.0417 3.5895 >> Z = zscore(x); >> find(abs(Z)>3) 3 35 quantile initially assigns the sorted values in X to the (0.5/n), (1.5/n), ..., ([n – 0.5]/n) quantiles. For example: For a data vector of six elements such as {6, 3, 2, 10, 8, 1}, the sorted elements {1, 2, 3, 6, 8, 10} respectively correspond to the (0.5/6), (1.5/6), (2.5/6), (3.5/6), (4.5/6), and (5.5/6) quantiles. For a data vector of five elements such as {2, 10, 5, 9, 13}, the sorted elements {2, 5, 9, 10, 13} respectively correspond to the 0.1, 0.3, 0.5, 0.7, and 0.9 quantiles. The following figure illustrates this approach for data vector X = {2, 10, 5, 9, 13}. The first observation corresponds to the cumulative probability 1/5 = 0.2, the second observation corresponds to the cumulative probability 2/5 = 0.4, and so on. The step function in this figure shows these cumulative probabilities. quantile instead places the observations in midpoints, such that the first corresponds to 0.5/5 = 0.1, the second corresponds to 1.5/5 = 0.3, and so on, and then connects these midpoints. The red lines in the following figure connect the midpoints. By switching the axes, as the next figure, you can see the values of the variable X that correspond to the p quantiles. quantile finds any quantiles between the data values using linear interpolation. Linear interpolation uses linear polynomials to approximate a function f(x) and construct new data points within the range of a known set of data points. Algebraically, given the data points (x1, y1) and (x2, y2), where y1 = f(x1) and y2 = f(x2), linear interpolation finds y = f(x) for a given x between x1 and x2 as follows: Similarly, if the 1.5/n quantile is y1.5/n and the 2.5/n quantile is y2.5/n, then linear interpolation finds the 2.3/n quantile y2.3/n as load gas prices = [price1 price2]; normplot(prices) Both scatters approximately follow straight lines through thefirst and third quartiles of the samples, indicating approximate normal distributions A hypothesis test is used to quantify the test of normality. Since each sample is relatively small, a Lilliefors test is recommended. lillietest(price1) ans = 0 lillietest(price2) ans = 0 Null hypothesis is accepted sample_means = mean(prices) sample_means = 115.1500 118.5000 h,pvalue,ci] = ztest(price1/100,1.15,0.04) h=0 pvalue = 0.8668 ci = 1.1340 1.1690 The logical output h = 0 indicates a failure to reject the null hypothesis atthe default significance level of 5%. This is a consequence of thehigh probability under the null hypothesis, indicated by the p value,of observing a value as extreme or more extreme of the z-statisticcomputed from the sample. The 95% confidence interval on the mean[1.1340 1.1690] includes the hypothesized population mean of $1.15. Does the later sample offer stronger evidence for rejecting a null hypothesis of a statewide average price of $1.15 in February? The shift shown in the probability plot and the difference in the computed sample means suggest this. The shift might indicate a significant fluctuation in the market, raising questions about the validity of using the historical standard deviation. If a known standard deviation cannot be assumed, a t-test is more appropriate. [h,pvalue,ci] = ttest(price2/100,1.15) h = 1; pvalue = 4.9517e-004; ci = 1.1675 1.2025 The logical output h = 1 indicates a rejection of the null hypothesis at the default significance level of 5%. In this case, the 95% confidence interval on the mean does not include the hypothesized population mean of $1.15. You might want to investigate the shift in prices a little more closely. The function ttest2 tests if two independent samples come from normal distributions with equal but unknown standard deviations and the same mean, against the alternative that the means are unequal. [h,sig,ci] = ttest2(price1,price2) h = 1; sig = 0.0083; ci = -5.7845 -0.9155 The null hypothesis is rejected at the default 5% significance level, and the confidence interval on the difference of means does not include the hypothesized value of 0. A notched box plot is another way to visualize the shift. boxplot(prices,1) set(gca,'XTick',[1 2]) set(gca,'XtickLabel',{'January','February'}) xlabel('Month') ylabel('Prices ($0.01)') The plot displays the distribution of the samples around their medians. The heights of the notches in each box are computed so that the side-by-side boxes have nonoverlapping notches when their medians are different at a default 5% significance level. The computation is based on an assumption of normality in the data, but the comparison is reasonably robust for other distributions. The side-by-side plots provide a kind of visual hypothesis test, comparing medians rather than means. The plot above appears to barely reject the null hypothesis of equal medians. The nonparametric Wilcoxon rank sum test, implemented by thefunction ranksum, can be usedto quantify the test of equal medians. It tests if two independentsamples come from identical continuous (not necessarily normal) distributionswith equal medians, against the alternative that they do not haveequal medians. [p,h] = ranksum(price1,price2) p= 0.0095 h= 1 The test rejects the null hypothesis of equal medians at thedefault 5% significance level.