Matlab presentation

advertisement
Francis, UNL, 16 April 2015
Computational Statistics in MATLAB
By: Francis Ayimiah – Nterful
Department of Statistics - UNL
Outline
Part I
Introduction
 What is MATLAB?
 The MATLAB System
 MATLAB Online Help
MATLAB Development Environment
 Starting and Quitting MATLAB
 MATLAB Desktop and Desktop Tools
Manipulating Arrays And Matrices In MATLAB
MATLAB Graphics
Programming With MATLAB
Part II
Sampling From Random Variables
 Sampling from the standard distributions
1
 Sampling from non-standard distribution
- Inverse Transform Sampling
Part I
1. Introduction
In this presentation, we will discuss the basics of
how to use MATLAB along with the basic statistical
aspects of it, including inverse transform sampling.
1.1
What Is MATLAB?
 MATLAB is a high-performance language for
technical computing.
 It is an integrated computing environment for
numeric computation, visualization and
programming. Typical uses include
- Math and computation
- Algorithm development
- Data acquisition
- Modeling, simulation, and visualization
- Scientific and engineering graphics
- Application development, including graphical
user interface building.
 MATLAB is an interactive system whose basic
data element is a matrix. Programming features
2
in MATLAB are similar to those of other
computer languages; examples are functions, IF
statements and FOR loops.
 MATLAB provides GUI tools so the user can
develop applications.
 The name MATLAB stands for matrix (MAT)
laboratory (LAB) for the reason that it was
originally written to provide easy access to
matrix software developed by the LINPACK and
EISPACK projects.
 Latest version of MATLAB released on February
12, 2015.
 MATLAB features a family of add-on applicationspecific solutions called toolboxes. Toolboxes
allow you to learn and apply specialized
technology. They are comprehensive collections
of MATLAB functions (M-files) that extend the
MATLAB environment to solve particular classes
of problems.
 Areas in which toolboxes are available include:
3
- Signal processing
- Optimization
- Communication
- Control system
- System identification
- Neural networks
- Statistics (this include)
Probability distribution
Descriptive statistics
Hypothesis tests
Cluster analysis
Linear models
Nonlinear models
Multivariate statistics
Statistical plots
Design of experiment, and
Statistical process control
- Image processing, and many others.
 There are numerous individuals that offer third
party toolboxes in MATLAB. These can be
found at the MATLAB webpage:
4
Http://www.mathworks.com:
1.2 The MATLAB System
 The MATLAB system consists of five main
parts. These are:
1. Development Environment – This is all the
components that help you use MATLAB functions
and files, see your results, and interact with data
from other sources. It includes the MATLAB desktop
and Command Window, a command history, an
editor and debugger, browsers for viewing help, the
workspace, files, and the search path.
The MATLAB development environment is available
for the following operating systems:
- Microsoft-Window
- Macintosh
- UNIX / Linux
2. The MATLAB Mathematical Function Library
– MATLAB contains a vast collection of
computational algorithms that comprise of simple
functions like sine, cosine, sum, etc, as well as
5
sophisticated functions for matrix computation and
manipulations, statistical analysis, signal processing
and many others.
3. The MATLAB Language – At the heart of
MATLAB is a high-level programming language with
matrix as its basic entity. All the usual programming
features such as control flow statements, functions,
data structure, input-output, and object-oriented
features are available.
4. MATLAB Graphics – MATLAB has extensive
tools for displaying vectors and matrices as graphs,
as well as annotating and printing these graphs.
Handle Graphics is available through the MATLAB
Language for MATLAB programmers to build custom
user interfaces and applications.
5. The MATLAB Application Program Interface
(API) - This includes a library of functions that allows
you to write programs that interact with MATLAB.
6
With this library, your C and Fortran programs, for
example, can access MATLAB functions, access
MATLAB data files, and call MATLAB as a
computational engine.
1.3 MATLAB Online Help
 To view the online documentation, select
MATLAB Help from the Help menu in
MATLAB.(Further detail in appendix 1)
2. MATLAB Development Environment
As discussed above, the MATLAB development
environment is the set of tools and facilities that help
you use MATLAB functions and files.
2.1 Starting MATLAB
 On Windows platforms, double-click the
MATLAB shortcut icon;
, on your Windows
desktop to start MATLAB.
 On UNIX platforms, to start MATLAB, type
matlab at the operating system prompt.
7
 After starting MATLAB, the MATLAB desktop
opens – see Figure 1 below:
Figure 1: MATLAB Desktop
2.2 Quitting MATLAB
 To end your MATLAB session, select Exit
MATLAB from the File menu in the desktop, or
simply type quit in the Command Window and
hit enter.
 To execute specified functions each time a
MATLAB quits, such as saving the workspace,
you can create and run a finish.m script.
2.3 MATLAB Desktop
 The MATLAB desktop contains a number of
tools. The tools include
- Command Window: Used to enter variables
and run functions and M-files. You can run
8
external programs from the MATLAB
Command Window.
- Command History: Statements you enter in
the Command Window are logged in the
Command History. So, you can view
previously run statements, copy and execute
selected statements in the Command History.
- Help Browser: Use Help Browser to search
and view documentation and demos for all
MathWork products. You can also type doc or
help in the Command Window to view
documentation (e.g. doc plot).
- Workspace Browser: This consists of the set
of variables (named arrays) built up during a
MATLAB session and stored in memory. You
add variables to the workspace by using
functions, running M-files, and loading saved
workspace.
- Editor/Debugger: Use the Editor/Debugger to
create and debug M-files, which are programs
you write to run MATLAB functions. The
Editor/Debugger provides a graphical user
interface for basic text editing, as well as for
M-file debugging.
9
3. Manipulating Array and Matrices
3.1 Array/Matrix operations
 Like other computer languages, MATLAB
provides high-level operators and functions for
creating and manipulating arrays.
 Arithmetic operations on arrays are done
element-by-element. This means that addition
and subtraction are the same for arrays and
matrices, but that multiplicative operations are
different. The list of operators includes
Subtraction
+
Addition
.*
Element-by-element multiplication
./
Element-by-element division
.\
Element-by-element left division
.^
Element-by-element power
.’
Unconjugated array transpose
Examples
 Addition and Subtraction
>> a = 1:5
integers 1 to 10
% vector containing the
10
>> b = 3:7
integers 3 to 7
>> a+b =
4 6 8 10 12
% vector containing the
 Squaring a vector
>> t = [ 1 2 3 4 5 ];
>> m = t.^2;
% vector of values to square
% square all the values
m=
1 2 9 16 25
 Reshape an array
>> a=1:6
a=
123456
>> reshape(a,2,3)
x 3 matrix
ans =
1 3 5
2 4 6
% reshape vector a to 2
3.2 Building Tables
 Array operations are useful for building tables.
Similar to other familiar languages, MATLAB
uses column-oriented analysis for multivariate
statistical data. Each column in a data set
11
represents a variable and each row an
observation.
 The (i,j)th element is the ith observation of the
jth variable.
Example: Consider a data set, D, with three
variables: v1, v2, and v3 (making up the columns).
For 5 observations, the array is given as follows
>> D = [72 134 3.2; 81 201 3.5; 69 156 7.1; 82 148
2.4; 75 170 1.2];
D = 72
134
3.2
81
201
3.5
69
156
7.1
82
148
2.4
75
170
1.2
 To obtain the mean and standard deviation of
each column, we have
>>mu = mean (D), sigma = std (D)
mu =
75.8
161.8
3.48
12
Sigma =
5.6303
25.499
2.2107
 So, mean (v1) = 75.8, mean (v2) = 161.8, and
mean (v3) = 3.48. Similarly, the standard
deviations are std (v1) = 5.6303, std(v2) =
25.499, and std(v3) = 2.2107
 Can also find correlation among variables, and
many others.
4
MATLAB Graphics
4.1 2-D Plots
 The basic 2-D plotting function in MATLAB is
plot. (Further details in appendix 2)
Example 1: using the plotmatrix command
>> x =randn(50,3); %Normally
distributed random values
>> y = x*[-1 2 1;2 0 1;1 -2 3]';
figure(1);
>> plotmatrix(y) % creates a matrix of
subaxes; same as plotmatrix(y,y)
>> title('Matrix plot')
13
Example 2: Multiple plots in one figure
% subplot (nrows,ncols,plot_number).
This is multiple plots in one figure
figure(2);
x=0:.1:2*pi; % x vector from 0 to 2*pi,
dx = 0.1
subplot(2,2,1); % plot sine function
plot(x,sin(x)); title('sin(x)')
subplot(2,2,2); % plot cosine function
plot(x,cos(x)); title('cos(x)')
subplot(2,2,3) % plot negative
exponential function
plot(x,exp(-x)); title('exp(-x)')
%Put all above curves together to form
the 4th plot
subplot(2,2,4);
14
plot(x, sin(x),'k-', x, cos(x),'b+',x,
exp(-x),'ro');
legend('sin(x)','cos(x)','exp(-x)')
4.2 3-D plot
Example
t=0:pi/10:2*pi;
% a vector of t
values from 0 to 2pi
% in increment of
pi/10
[X,Y,Z]=cylinder(4*cos(t));% returns
the x-, y-,and
%zcoordinates of the cylinder
15
figure(3);
subplot(2,2,1); mesh(X);
subplot(2,2,2); mesh(Y);
subplot(2,2,3); mesh(Z);
subplot(2,2,4); mesh(X,Y,Z);
% mesh produces wireframe surfaces
that color only the lines connecting
the defining points
5. Programming With MATLAB
MATLAB provides extensive programming
features, with just a few mentioned here. This
include
16
 Flow control constructs such as if, for,
while, continue, and break.
 Scripts and Functions, which are called Mfiles. While Scripts do not accept input
argument or return output arguments,
Functions can accept input arguments and
return our arguments.
 Demonstration programs
Part II
6
Sampling from Random Variables
6.1 Sampling from the standard distributions
 The MATLAB Statistics Toolbox supports about
20 probability distributions. For each distribution,
there are 5 associated functions. These are
- Probability distribution function (pdf)
- Cumulative distribution function (cdf)
- Inverse of the cumulative distribution function
- Random number generator
- Mean and variance as a function of the
parameter
17
 Table 6.1 lists some of the standard distributions
supported by MATLAB and how to sample
random values from them.
 The MATLAB documentation lists many more
distributions that can be simulated with MATLAB.
 Using online resources, it is often easy to find
support for a number of other common
distributions.
Table 6.1: Examples of MATLAB functions for
evaluating probability density, cumulative density
and drawing random numbers
Distribution
PDF
Normal
normpdf normcdf norm
Uniform(continuous) unifpdf
CDF
unifcdf
Random #
Generation
unifrnd
Beta
betapdf betacdf
betarnd
Exponential
exppdf
expcdf
exprnd
Uniform (discrete)
unidpdf unidcdf
unidrnd
Binomial
binopdf binocdf
binornd
Multinomial
mnpdf
mnrnd
Poisson
poisspdf poisscdf poissrnd
 The Statistics Toolbox has functions for
computing parameter estimates and confidence
intervals of these data driven distributions.
18
 As an illustration for some of these functions, we
can use the MATLAB code below (Code 6.2) to
visualize the Normal (µ, σ) distribution where µ =
100 and σ = 15.
- Let us assume that this distribution represents
the observed variability of IQ coefficients in
some population.
- The code shows how to display the probability
density and the cumulative density.
- It also shows how to draw random values from
this distribution and how to visualize the
distribution of these random samples using the
hist function.
Code 6.2: MATLAB code for visualizing the normal
distribution
%% Explore the Normal distribution
N(mu , sigma)
mu = 100;
% the mean
sigma = 15;
% the standard deviation
xmin = 70;
% minimum x value for pdf
and cdf plot
xmax = 130; % maximum x value for pdf
and cdf plot
n = 100;
% number of points on pdf
and cdf plot
19
k = 10000;
% number of random draws
for histogram
% create a set of values ranging
from xmin to xmax
x = linspace( xmin , xmax , n );
p = normpdf( x , mu , sigma ); %
calculate the pdf
c = normcdf( x , mu , sigma ); %
calculate the cdf
figure( 4 ); clf; % create a new
figure and clear the contents
subplot( 1,3,1 );
plot( x , p , 'k' );
xlabel( 'x' ); ylabel( 'pdf' );
title('Probability Density Function'
);
subplot( 1,3,2 );
plot( x , c , 'k' );
xlabel( 'x' ); ylabel( 'cdf' );
title('Cumulative Density Function' );
% draw k random numbers from a N(mu,
sigma)distribution
y = normrnd( mu , sigma , k , 1 );
subplot( 1,3,3 );
20
hist( y , 20 );
xlabel( 'x' ); ylabel( 'frequency' );
title( 'Histogram of random values' );
 The code produces the output shown below
(Figure 6.3).
Figure 6.3: Illustration of the Normal(µ, σ)
distribution where µ = 100 and σ = 15
6.2 Sampling from non-standard distribution
 Here, we want to sample from a distribution that
is not one of the standard distributions that is
supported by MATLAB.
 Let 𝐹(𝑥); 𝑥 ∈ 𝐼𝑅; denote any cumulative
distribution function (cdf) (continuous or not).
(Properties of F in appendix 3)
 Our objective is to generate (simulate) random
variables X distributed as F; that is, we want to
21
simulate a random variable X such that 𝑃(𝑋 ≤
𝑥 ) = 𝐹 (𝑥 ); 𝑥 ∈ 𝐼𝑅.
 Define the generalized inverse of F, 𝐹 −1 ∶
[0; 1] → 𝐼𝑅, via
𝐹 −1 (𝑦) = min{𝑥 ∶ 𝐹(𝑥) ≥ 𝑦}; 𝑦 ∈ [0; 1]:
6.2.1
Inverse transform sampling
Theorem: (Inverse Transform Method)
 Let ; 𝐹(𝑥) ; 𝑥 ∈ 𝐼𝑅 be the cumulative density
function (cdf) of our target variable X
(continuous or not). Let 𝐹 −1 (𝑦), 𝑦 ∈ 𝐼𝑅 be the
inverse of this function, assuming that we can
actually calculate this inverse. Define 𝑋 =
𝐹 −1 (𝑈), where 𝑈 has the continuous uniform
distribution over the interval (0; 1). Then X is
distributed as F, that is, 𝑃(𝑋 ≤ 𝑥 ) = 𝐹 (𝑥 ), 𝑥 ∈ 𝐼𝑅.
Proof of theorem is in appendix 4.
 Therefore, in order to generate a random
variable X~F, we can generate U according to
U(0,1) and then make the transformation X=F −
1
(U)
22
 The algorithm is
1. Draw U ∼ Uniform (0, 1)
2. Set X = F−1 (U)
Example 1: 𝐸𝑥𝑝𝑜𝑛𝑒𝑛𝑡𝑖𝑎𝑙: 𝑓(𝑥) =
1
𝜆
𝑒
1
𝜆
− 𝑥
 Suppose we want to sample random numbers
from the exponential distribution. When 𝜆 > 0,
the cumulative density function is 𝐹 (𝑥|𝜆) = 1 −
𝑒𝑥𝑝(−𝑥/𝜆). Using some simple algebra, one
can find the inverse of this function, which is
𝐹 −1 (𝑢|𝜆) = −𝑙𝑜𝑔(1 − 𝑢)𝜆.
 This leads to the following sampling procedure
to sample random numbers from an Exponential
(λ) distribution:
1. Draw 𝑈 ∼ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0, 1)
2. Set 𝑋 = −𝑙𝑜𝑔(1 − 𝑈)𝜆
Code 6.4: MATLAB code for inverse transform
sampling from exponential (𝜆 = 2)
seed=12; rand('state',seed);
r=1000;
% Let's take r samples
u=unifrnd(0,1,r,1); %Uniform (0,1);or
can use rand
figure(5);
23
hist(u)
% this generates a fairly uniform
diagram
Figure 6.5: Uniform distribution.
% let 𝜆 = 2
x=-log(1-u)*2;
%inverse cdf
%Alternative approach for inverse cdf
x2 = icdf('Exponential',u,2);
cd=cdf('Exponential',x2,2);
%Plotting histogram
figure(6); clf
subplot( 1,3,1 );
hist(x);
xlabel( 'x' ); ylabel( 'frequency' );
title('Histogram of random valuesEDF');
% Actual distribution
Z=exprnd(2,r,1);
subplot( 1,3,2 );
24
hist( Z , 20);
xlabel( 'x' ); ylabel( 'frequency' );
title('Histogram of random values
CDF');
%Plot CDF and EDF on the same graph
subplot( 1,3,3 );
tt=sort(x);
%Sort random sample
mm=(1:r)/r;
%Prob of rth sample
g=linspace(0,20);
w=expcdf(g,2);
plot(g,w,'k',tt,mm,'r');
legend({'CDF', 'EDF'})
xlabel( 'x' );
ylabel( 'Probability' );
title( 'CDF vs. EDF of random values
' );
Figure 6.6: Distribution graphs for the exponential(2)
25
Example 2: B𝑖𝑛omial random variables with n = 9, p
= 0.60;
𝑛 𝑗 𝑛−𝑗
 𝑃(𝑋 = 𝑗) = ( 𝑗 ) 𝑝 𝑞
= 𝑟𝑗, 𝑗 = 0, 1, 2, . . . 𝑛 , 𝑝 =
1−𝑞
 Let 𝑟𝑗+1 = 𝑟𝑗
𝑛−𝑗+1
𝑗(1−𝑝)
with linear search. Then in
MATLAB, we have the following
Code 6.7: Generating Binomial RVs
n = 9; p=0.60; q = 1-p; r(1) = q^n;
js=[1:n+1];
for j=1:n, r(j+1) = r(j)*p*(nj+1)/(j*q); end
F = cumsum(r);
%cumulative sum of r
K=10000; for k=1:K,
X(k)=min(js(F>=rand))-1; end
figure(7);
subplot(1,2,1),
hist(X),xlabel('x'),ylabel('Frequency'
)
title('Histogram of x')
subplot(1,2,2),
plot(sort(X),[1:K]/K),xlabel('x'),ylab
el('Probability')
title('EDF of x')
for k=1:K, X(k)=sum(rand(1,n)<p); end
26
Exercise 1 (Part I)
1. Adapt the Matlab program in Code 6.2 to illustrate
the Beta(α, β) distribution where α = 2 and β = 3.
Similarly, show the Exponential(λ) distribution where
λ = 2.
2. Adapt the matlab program above to illustrate the
Binomial(N, θ) distribution where N = 10 and θ = 0.7.
3. Write a demonstration program to sample 10
values from a Bernoulli(θ) distribution with θ = 0.3.
Recall that there are only two possible outcomes, 0
and 1. With probability θ, the outcome is 1, and with
probability 1 − θ, the outcome is 0. In other words,
p(X = 1) = θ, and p(X = 0) = 1 − θ. In Matlab, you
can simulate the Bernoulli distribution using the
binomial distribution with N = 1. However, for the
27
purpose of this exercise, please write the code
needed to randomly sample Bernoulli distributed
values that does not make use of the built-in
binomial distribution. Using a seed of 21, verify that
the first four observations agree with 1, 1, 0 and 1
respectively. Your MATLAB code could use the
following line for the seeding:
>> seed=21; rand(’state’,seed); randn(’state’,seed);
Exercise 2 (Part II)
1. In this exercise, we want to generate the random
variable x=(x1, x2) from Beta(1, 𝛽) by inverse
transform method.
a) Write down the algorithm for generating x.
b) Use MATLAB to construct two sets of N=1000
observations each from Beta (1,4). Set seed =131.
Plot histograms of both sets on the same graph and
print out the first 5 observations from each set.
Observe that the first observations are x1=0.0251
and x2 = 0.0348.
c) Find the mean of each set generated.
28
7. References
1. www.mathworks.com
2. Computational Statistics Handbook with
MATLAB; Wendy L. Martinez & Angel R. Martinez
29
Download