PowerPoint to accompany Introduction to MATLAB 7 for Engineers William J. Palm III Chapter 7 Probability and Statistics Copyright © 2005. The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Agenda Histograms – distribution of data Probability Expected Value and Variance Random number generation Normal Distribution Curve and its meaning Intro to Monte Carlo Methods An example of a histogram: test scores for 20 students. Figure 7.1–1 number in each bin Bins 7-2 "Absolute Frequency" Histogram Plot -- shows number of outcomes in ea. bin % Thread breaking strength data for 20 tests. y = [92,94,93,96,93,94,95,96,91,93,95,95,95,92,93,94, 91,94,92,93]; % six possible outcomes = 91,92,93,94,95,96. x = [91:96]; hist(y,x),axis([90 97 0 6]) ylabel('Frequency') xlabel('Thread Strength (N)') title('Absolute Frequency Histogram for 20 Tests') This creates the next figure. 7-3 Histograms for 20 tests of thread strength. Figure 7.1–2 7-4 Absolute frequency histogram for 100 thread tests. Figure 7.1–3. This was created by the program on page 421. 7-5 Variations of hist() hist(y) plots data vector y in histogram with 10 bins hist(y,n) plots data vector y in histogram with n bins hist(y,x) plots data vector y in histogram with bins determined by x vector. hist( ) can also just sort your data into bins For further processing tests = 100; y = [91*ones(1,13),92*ones(1,15),93*ones(1,22), 94*ones(1,19),95*ones(1,17),96*ones(1,14)]; x = [91:96]; sort data into bins [z,x] = hist(y,x); bar(x,z/tests) ylabel('Relative Frequency') xlabel('Thread Strength (N)') title('Relative Frequency Histogram for 100 Tests') This also creates the previous figure. 7-8 Histogram functions Table 7.1–1 7-9 Command Description bar(x,y) Creates a bar chart of y versus x. hist(y) Aggregates the data in the vector y into 10 bins evenly spaced between the minimum and maximum values in y. hist(y,n) Aggregates the data in the vector y into n bins evenly spaced between the minimum and maximum values in y. hist(y,x) Aggregates the data in the vector y into bins whose center locations are specified by the vector x. The bin widths are the distances between the centers. [z,x] = hist(y) Same as hist(y) but returns two vectors z and x that contain the frequency count and the bin locations. [z,x] = hist(y,n) Same as hist(y,n) but returns two vectors z and x that contain the frequency count and the bin locations. [z,x] = hist(y,x) Same as hist(y,x) but returns two vectors z and x that contain the frequency count and the bin locations. The returned vector x is the same as the user-supplied vector x. Normal Distribution Curve The basic shape of the normal distribution curve. Figure 7.2–3 7-16 More? See pages 429 – 431. The effect on the normal distribution curve of increasing σ. For this case μ = 10, and the three curves correspond to σ = 1, σ = 2, and σ = 3. Figure 7.2–4 7-17 Probability interpretation of the μ ± σ limits. Figure 7.2–5 7-18 Probability interpretation of the μ ± 2σ limits. Figure 7.2–5 (continued) 7-19 More? See pages 431-432. Error Function The probability that the random variable x is no less than a and no greater than b is written as P(a x b). It can be computed as follows: P(a x b) = erf b - m s 2 2 1 a-m - erf s 2 (7.2-5) in matlab the function erf() can be used for the above calculation More? See pages 434 -435. Random Number Generation Two basic PDFs are used throughout technical computing world: Normal -- gaussian "bell curve" with mean and standard deviation as discussed Uniform – flat distribution, all values equally likely to occur Sums and Differences of Random Variables It can be proved that the mean of the sum (or difference) of two independent normally 2distributed random variables 2 equals the sum (or difference) of their means, but the variance is always the sum of the two variances. That is, if x and y are normally distributed with means mx and my, and variances s x and s y, and if u = x + y and u = x - y, then mu = mx + my (7.2–6) mu = mx - my (7.2–7) s 7-21 2 u= 2 2 2 su=sx+sy (7.2–8) Random number functions Table 7.3–1 Command Description rand Generates a single uniformly distributed random number between 0 and 1. rand(n) Generates an n n matrix containing uniformly distributed random numbers between 0 and 1. rand(m,n) Generates an m n matrix containing uniformly distributed random numbers between 0 and 1. s = rand(’state’) Returns a 35-element vector s containing the current state of the uniformly distributed generator. rand(’state’,s) Sets the state of the uniformly distributed generator to s. rand(’state’,0) Resets the uniformly distributed generator to its initial state. rand(’state’,j) Resets the uniformly distributed generator to state j, for integer j. rand(’state’,sum(100*clock)) Resets the uniformly distributed generator to a different state each time it is executed. 7-22 Table 7.3–1 (continued) randn Generates a single normally distributed random number having a mean of 0 and a standard deviation of 1. randn(n) Generates an n n matrix containing normally distributed random numbers having a mean of 0 and a standard deviation of 1. randn(m,n) Generates an m n matrix containing normally distributed random numbers having a mean of 0 and a standard deviation of 1. Like rand(’state’) but for the normally distributed generator. Like rand(’state’,s) but for the normally distributed generator. Like rand(’state’,0) but for the normally distributed generator. Like rand(’state’,j) but for the normally distributed generator. Like rand(’state’,sum(100*clock)) but for the normally distributed generator. Generates a random permutation of the integers from 1 to n. s = randn(’state’) randn(’state’,s) randn(’state’,0) randn(’state’,j) randn(’state’,sum(100*clock)) randperm(n) 7-23 Repeating Randomness (useful for simulation) The following session shows how to obtain the same sequence every time rand is called. >>rand('state',0) >>rand ans = 0.9501 >>rand ans = 0.2311 >>rand('state',0) >>rand ans = 0.9501 >>rand 7-24 ans = 0.2311 You need not start with the initial state in order to generate the same sequence. To show this, continue the above session as follows. >>s = rand('state'); >>rand('state',s) >>rand ans = 0.6068 >>rand('state',s) >>rand ans = 0.6068 7-25 Generating a Uniform Distribution The general formula for generating a uniformly distributed random number y in the interval [a, b] is y = (b - a) x + a (7.3–1) where x is a random number uniformly distributed in the interval [0, 1]. For example, to generate a vector y containing 1000 uniformly distributed random numbers in the interval [2, 10], you type y = 8*rand(1,1000) + 2. 7-26 Generating a Normal Distribution If x is a random number with a mean of 0 and a standard deviation of 1, use the following equation to generate a new random number y having a standard deviation of s and a mean of m. y=sx+m (7.3–2) For example, to generate a vector y containing 2000 random numbers normally distributed with a mean of 5 and a standard deviation of 3, you type y = 3*randn(1,2000) + 5. 7-28 Statistical analysis and manufacturing tolerances: Example 7.3-2. Dimensions of a triangular cut. Figure 7.3–2 7-30 Scaled histogram of the angle q. Figure 7.3–3 7-31 More? See pages 443 – 444. Grading on the (Bell) curve What is it? Here’s the way I understand the “bell curve”: make the mean a C, then the mean plus/minus a half standard deviation would be the C-/C/C+ scores, one more standard deviation out would give the B’s and D’s, and the tails would give the A’s and F’s. This could be tweaked in any number of ways—change the mean, fatten or slim the distribution. I don’t know if this is used by any professors anymore (in small classes, at least). Pros: grades end up with a very predictable distribution Cons: ruthless, students competing against classmates Use when: for standardized tests in which only a certain number of students can pass, for large classes or multiple sections when there must be a fixed distribution