ENGR-25_Lec-18_Statistics

advertisement
Engr/Math/Physics 25
Chp7
Statistics-1
Bruce Mayer, PE
Licensed Electrical & Mechanical Engineer
BMayer@ChabotCollege.edu
Engineering/Math/Physics 25: Computational Methods
1
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Learning Goals
 Use MATLAB to solve Problems in
• Statistics
• Probability
 Use Monte Carlo (random) Methods to
Simulate Random processes
 Properly Apply Interpolation or
Extrapolation to Estimate values
between or outside of know data points
Engineering/Math/Physics 25: Computational Methods
2
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Histogram
 Histograms are
COLUMN Plots that
show the
Distribution of Data
• Height Represents
Data Frequency
 Some General
Characteristics
• Used to represent
continuous grouped,
or BINNED, data
– BIN  SubRange
within the Data
Engineering/Math/Physics 25: Computational Methods
3
• Usually Does not
have any gaps
between bars
• Areas represent
%-of-Total Data
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
HistoGram ≡ Frequency Chart
 A HistoGram shows how OFTEN some
event Occurs
• Histograms are
often constructed
using Frequency
Tables
Engineering/Math/Physics 25: Computational Methods
4
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Histograms In MATLAB
 MATLAB has 6
Forms of the
Histogram Cmd
 The Simplest
Hist(y)
• Generates a
Histogram with
10 bins
 Example: Max Temp
at Oakland AirPort in
Jul-Aug08
Engineering/Math/Physics 25: Computational Methods
5
TmaxOAK
65, 66,
73, 79,
70, 74,
77, 86,
66, 72,
82, 76,
68, 65,
70, 68,
69, 67]
= [70, 75, 63, 64,
65, 65, 67, 78, 75,
71, 72, 67, 69, 69,
71, 72, 71, 74, 77,
90, 90, 70, 71, 66,
68, 73, 72, 82, 91,
75, 72, 72, 69, 70,
67, 65, 63, 64, 72,
71, 77, 65, 63, 69,
 The Plot Statement
hist(TmaxOAK), ylabel('No.
Days'), xlabel('Max. Temp
(°F)'), title('Oakland
Airport - Jul-Aug08')
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
hist Result for Oakland
Oakland Airport - Jul-Aug08
15
 It was
COLD in
Summer 08
10
No. Days
 Bin Width =
(91-63)/10 =
2.8 °F
5
0
60
65
70
75
80
85
90
95
Max. Temp (°F)
Engineering/Math/Physics 25: Computational Methods
6
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Histograms In MATLAB
 Next Example: Max
Temp at Stockton
AirPort in Jul-Aug08
Hist(y)
• Generates a
Histogram with
10 bins
TmaxSTK = [94, 98, 93, 94,
91, 96, 93, 87, 89, 94,
100, 99, 103, 103, 103, 97,
91, 83, 84, 90, 89, 95, 94,
99, 97, 94, 102, 103, 107,
98, 86, 89, 95, 91, 84, 93,
98, 104, 105, 107, 103, 91,
90, 96, 93, 86, 92, 93, 95,
95, 86, 81, 93, 97, 96, 97,
101, 92, 89, 92, 93, 94]
 The Plot Statement
hist(TmaxSTK), ylabel('No.
Days'), xlabel('Max. Temp
(°F)'), title(‘Stockton
Airport - Jul-Aug08')
Engineering/Math/Physics 25: Computational Methods
7
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
hist Result for Stockton
Stockton Airport - Jul-Aug08
16
 It was HOT
in Summer
08
14
12
No. Days
10
 Bin Width =
(107-81)/10
= 2.6 °F
8
6
4
2
0
80
85
90
95
100
105
110
Max. Temp (°F)
Engineering/Math/Physics 25: Computational Methods
8
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
hist Command Refinements
 Adjust The number
 Consider Summer
and width of the bins
08 Max-Temp Data
using
from Oakland and
hist(y,N)
Stockton
hist(y,x)
• Where
 Make 2 Histograms
– N  an integer
specifying the
NUMBER of Bins
– x  A vector that
Specs CENTERs of
the Bins
Engineering/Math/Physics 25: Computational Methods
9
• 17 bins
• 60F→110F by 2.5’s
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
hist Plots  17 Bins
>> hist(TmaxSTK,17),
ylabel('No. Days'),
xlabel('Max. Temp (°F)'),
title('Stockton, CA - JulAug08')>>
hist(TmaxOAK,17),
ylabel('No. Days'),
xlabel('Max. Temp (°F)'),
title('Oakland, CA - JulAug08')
Oakland, CA - Jul-Aug08
10
9
9
8
8
7
7
6
6
No. Days
No. Days
Stockton, CA - Jul-Aug08
10
5
5
4
4
3
3
2
2
1
1
0
80
85
90
95
Max. Temp (°F)
100
105
Engineering/Math/Physics 25: Computational Methods
10
110
0
60
65
70
75
80
Max. Temp (°F)
85
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
90
95
hist Plots  Same Scale
>> x = [60:2.5:110];
>> hist(TmaxSTK,x),
ylabel('No. Days'),
xlabel('Max. Temp (°F)'),
title('Stockton, CA - JulAug08')
>> x = [60:2.5:110];
hist(TmaxOAK,x),
ylabel('No. Days'),
xlabel('Max. Temp (°F)'),
title('Oakland, CA - JulAug08')
Oakland, CA - Jul-Aug08
16
14
14
12
12
10
10
No. Days
No. Days
Stockton, CA - Jul-Aug08
16
8
8
6
6
4
4
2
2
0
60
65
70
75
80
85
Max. Temp (°F)
90
95
100
105
Engineering/Math/Physics 25: Computational Methods
11
110
0
60
65
70
75
80
85
Max. Temp (°F)
90
95
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
100
105
110
hist Numerical Output
 Hist can also
provide numerical
Data about the
Histogram
n = hist(y)
• Gives the number of
values in each of the
(default) 10 Bins
 For the Stockton
data
Engineering/Math/Physics 25: Computational Methods
12
k =
2
7
5
9
1
2
10
7
16
3
 We can also spec
the number and/or
Width of Bins
>> k13 = hist(TmaxSTK,13)
k13 =
2
2
4
4
6
10
10
7
5
2
6
2
2
>> k2_5s = hist(TmaxOAK,x)
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
hist Numerical Output
 Bin-Count and Bin-Locations
(Frequency Table) for the Oakland Data
>> [u, v] = hist(TmaxOAK,x)
u =
0
3
11
7
15
9
6
4
1
2
1
0
3
0
0
0
0
0
0
0
0
v =
60.0000
62.5000
65.0000
72.5000
75.0000
77.5000
85.0000
87.5000
90.0000
97.5000 100.0000 102.5000
110.0000
Engineering/Math/Physics 25: Computational Methods
13
67.5000
80.0000
92.5000
105.0000
70.0000
82.5000
95.0000
107.5000
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Histogram Commands - 1
Command
bar(x,y)
Description
Creates a bar chart of y versus x.
hist(y)
Aggregates the data in the vector y into
10 bins evenly spaced between the
minimum and maximum values in y.
hist(y,n)
Aggregates the data in the vector y into
n bins evenly spaced between the
minimum and maximum values in y.
hist(y,x)
Aggregates the data in the vector y into
bins whose center locations are
specified by the vector x. The bin widths
are the distances between the centers.
Engineering/Math/Physics 25: Computational Methods
14
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Histogram Commands - 2
Command
[z,x] = hist(y)
Description
Same as hist(y) but returns two vectors
z and x that contain the frequency
count and the 10 bin locations.
Same as hist(y,n) but returns two
[z,x] = hist(y,n) vectors z and x that contain the
frequency cnt and the n bin locations.
Same as hist(y,x) but returns two
vectors z and x that contain the
[z,x] = hist(y,x) frequency count and the bin locations.
The returned vector x is the same as
the user-supplied vector x.
Engineering/Math/Physics 25: Computational Methods
15
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Data Statistics Tool - 1
 Make LinePlot of Temp
Data for
Stockton, CA
 Use the Tools
Menu to find
the Data
Statistics Tool
Engineering/Math/Physics 25: Computational Methods
16
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Data Statistics Tool - 2
 Use the
Tool to Add
Plot Lines
for
• The Mean
• ±StdDev
Engineering/Math/Physics 25: Computational Methods
17
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Data Statistics Tool - 3
 Quite a Nice
Tool,
Actually
 The Result
 The Avg
Max Temp
Was
96.97 °F
Engineering/Math/Physics 25: Computational Methods
18
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Probability
 Probability  The LIKELYHOOD that a
Specified OutCome Will be Realized
• The “Odds” Run from 0% to 100%
 Class Question: What are the
Odds of winning the California
MEGA-MILLIONS Lottery?
Engineering/Math/Physics 25: Computational Methods
19
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
175 711 536 ... EXACTLY???!!!
 To Win the MegaMillions Lottery
• Pick five numbers from 1 to 56
• Pick a MEGA number from 1 to 46
 The Odds for the 1st ping-pong Ball
= 5 out of 56
 The Odds for the 2nd ping-pong Ball
= 4 out of 55, and so On
 The Odds for the MEGA are 1 out of 46
Engineering/Math/Physics 25: Computational Methods
20
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
175 711 536 ... Calculated
 Calc the OverAll Odds as the
PRODUCT of each of the Individual
OutComes
 5 4 3 2 1  1 5!51! 1
Odds        


56! 46
 56 55 54 53 52  46
120
1


21,085,384,320 175,711,536
• This is Technically a COMBINATION
Engineering/Math/Physics 25: Computational Methods
21
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
175 711 536 ... is a DEAL!
 The ORDER in Which the Ping-Pong
Balls are Drawn Does NOT affect the
Winning Odds
 If we Had to Match the Pull-Order:
1 1 1 1 1 1
51!
Odds      

56 55 54 53 52 46 46  56!
1

 120X the Current
21,085,384,320
• This is a PERMUTATION
Engineering/Math/Physics 25: Computational Methods
22
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Normal Distribution - 1
 Consider Data on the Height of a
sample group of 20 year old Men
 We can Plot this Frequency Data
using bar
Engineering/Math/Physics 25: Computational Methods
23
>>
y_abs=[1,0,0,0,2,4,5,
4,8,11,12,10,9,8,7,5,
4,4,3,1,1,0,1];
>> xbins =
[64:0.5:75];
>> bar(xbins, y_abs),
ylabel('No.'),
xlabel('Height
(Inches'),
title('Height of 20
Yr-Old Men')
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Ht (in)
64
64.5
65
65.5
66
66.5
67
67.5
68
68.5
69
69.5
70
70.5
71
71.5
72
72.5
73
73.5
74
74.5
75
No.
1
0
0
0
2
4
5
4
8
11
12
10
9
8
7
5
4
4
3
1
1
0
1
Normal Distribution - 2
 We can also SCALE the Bar/Hist
such that the AREA UNDER the
CURVE equals 1.00, exactly
 The Game Plan for Scaling
• Calc the Height of Each Bar To Get
the Total Area = [Bin Width] x
[Σ(individual counts)]
• The individual Bar Area =
[Bin Width] x [individual count]
• %-Area any one bar →
[Bar Areas]/[Total Area]
Engineering/Math/Physics 25: Computational Methods
24
Ht (in) No. Area (BW*No.) No./TotArea
0.0200
0.5
1
64
64.5
0.0000
0
0
65
0.0000
0
0
65.5
0.0000
0
0
66
0.0400
1
2
66.5
0.0800
2
4
67
0.1000
2.5
5
67.5
0.0800
2
4
68
0.1600
4
8
68.5 11
0.2200
5.5
69 12
0.2400
6
69.5 10
0.2000
5
70
0.1800
4.5
9
70.5
0.1600
4
8
71
0.1400
3.5
7
71.5
0.1000
2.5
5
72
0.0800
2
4
72.5
0.0800
2
4
73
0.0600
1.5
3
73.5
0.0200
0.5
1
74
0.0200
0.5
1
74.5
0.0000
0
0
75
0.0200
0.5
1
 50.0
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Normal Distribution - 3
 We can Use bar to Plot
the Scaled-Area Hist.
>>y_abs=[1,0,0,0,2,4,5,4,8,11
,12,10,9,8,7,5,4,4,3,1,1,0,1]
;
>> xbins = [64:0.5:75];
>> TotalArea = sum(0.5*y_abs)
>> y_scale =
100*y_abs/TotalArea;
>> bar(xbins, y_scale),
ylabel('Fraction (%/inch)'),
xlabel('Height (inches)'),
title('Height of 20 Yr-Old
Men')
Engineering/Math/Physics 25: Computational Methods
25
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Normal Distribution - 4
 This is a Good Time
for a UNITS Check
• Remember, our
GOAL → the Area
Under the Curve = 1
 Recall From the Plot
the UNITS for the
y-axis → %/inch (?)
 The Units come
from these MATLAB
Statements
Engineering/Math/Physics 25: Computational Methods
26
TotalArea = sum(0.5*y_abs)
Bin Width
in INCHES
 So TotalArea is in
inches•No.
 Now y_scale
y_scale =
100*y_abs/TotalArea;
• Cont. on Next
Slide
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Normal Distribution - 5
 The Units Analysis
for y-scale
y_scale =
100*y_abs/TotalArea;
 Recall From MTH1
that for y = f(x)
displayed in BAR
Form the Area
Under the Curve
Acrv   Individual Areas
100%
No.
  Hgt  y  x  BinWidth  x 
y_scale 
*
1
inches * No. x
   y xlo  x x
%
x
y_scale 
inch
hi
lo
Engineering/Math/Physics 25: Computational Methods
27
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Normal Distribution - 6
 In this Case
• y(x) → y_scale
in %/inch
• Δx → Bin Width =
0.5 in inches
 Then The Units
Analysis for Our
“integration”
x
Acrv    y xlo  x x
hi
xlo
 % 
 y
 0.5 inch 

 inch 
Engineering/Math/Physics 25: Computational Methods
28
 Check the integration
Ht (in) No. Area (BW*No.) No./TotArea BW*(No./TotArea)
1
0.5
0.0200
1.00%
64
64.5
0
0
0.0000
0.00%
65
0
0
0.0000
0.00%
65.5
0
0
0.0000
0.00%
66
2
1
0.0400
2.00%
66.5
4
2
0.0800
4.00%
67
5
2.5
0.1000
5.00%
67.5
4
2
0.0800
4.00%
68
8
4
0.1600
8.00%
68.5 11
5.5
0.2200
11.00%
69 12
6
0.2400
12.00%
69.5 10
5
0.2000
10.00%
70
9
4.5
0.1800
9.00%
70.5
8
4
0.1600
8.00%
71
7
3.5
0.1400
7.00%
71.5
5
2.5
0.1000
5.00%
72
4
2
0.0800
4.00%
72.5
4
2
0.0800
4.00%
73
3
1.5
0.0600
3.00%
73.5
1
0.5
0.0200
1.00%
74
1
0.5
0.0200
1.00%
74.5
0
0
0.0000
0.00%
75
1
0.5
0.0200
1.00%
 50.0
 100.00%
Example
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Normal Distribution - 7
 Example  71”
 The 71” Bar Area =
Hgt•Width:
A71, scl
% 

 14
  0.5 inches 
 inch 
 7% (of the total area)
 Alternatively from
the Absolute values
A71,abs
 7 by No.  0.5 inches 
 3.5 No.  inch
• The Total Abs Area
= 50 No.•inch 
A71,abs
Engineering/Math/Physics 25: Computational Methods
29
Aall,abs
3.5 No.  in

 7%
50BruceNo.
 in
Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Probability Distribution Fcn (PDF)
 Because the Area
Under the Scaled
Plot is 1.00, exactly,
The FRACTIONAL
Area under any bar,
or set-of-bars gives
the probability that
any randomly
Selected 20 yr-old
man will be that
height
Engineering/Math/Physics 25: Computational Methods
30
 e.g., from the Plot
we Find
• 67.5 in → 8 %/in
• 68 in → 16 %/in
• 68.5 in → 22%/in
 Summing → 46 %/in
 Multiply the Uniform
BinWidth of 0.5 in →
23% of 20 yr-old
men are 67.2568.75 inches tall
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Random Variable
 A random variable x takes on a defined set of
values with different probabilities; e.g..
• If you roll a die, the outcome is random (not fixed)
and there are 6 possible outcomes, each of which
occur with equal probability of one-sixth.
• If you poll people about their voting preferences,
the percentage of the sample that responds “Yes
on Proposition 101” is a also a random variable
– the %-age will be slightly differently every time you poll.
 Roughly, probability is how frequently we
expect different outcomes to occur if we
repeat the experiment over and over
(“frequentist” view)
Engineering/Math/Physics 25: Computational Methods
31
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Random variables can be Discrete
or Continuous
 Discrete random variables have a
countable number of outcomes
• Examples: Dead/Alive, Red/Black,
Heads/Tales, dice, counts, etc.
 Continuous random variables have an
infinite continuum of possible values.
• Examples: blood pressure, weight, Air
Temperature, the speed of a car, the real
numbers from 1 to 6.
Engineering/Math/Physics 25: Computational Methods
32
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Probability Distribution Functions
 A Probability Distribution Function
(PDF) maps the possible values of x
against their respective probabilities of
occurrence, p(x)
 p(x) is a number from 0 to 1.0, or
alternatively, from 0% to 100%.
 The area under a probability
distribution function curve is
always 1 (or 100%).
Engineering/Math/Physics 25: Computational Methods
33
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Discrete Example: Roll The Die
x
p(x)
1
p(x=1)=1/6
2
p(x=2)=1/6
3
p(x=3)=1/6
4
p(x=4)=1/6
5
p(x=5)=1/6
6
p(x=6)=1/6
Engineering/Math/Physics 25: Computational Methods
34
px
1/6
1
2
3
4
5
 px   1
all x
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
6
x
Continuous Case
 The probability function that accompanies a
continuous random variable is a continuous
mathematical function that integrates to 1.
 The Probabilities associated with
continuous functions are just areas under a
Region of the curve (→ Definite Integrals)
 Probabilities are given for a range of
values, rather than a particular value
• e.g., the probability of getting a math SAT
score between 700 and 800 is 2%).
Engineering/Math/Physics 25: Computational Methods
35
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Continuous Case PDF Example
 Recall the negative exponential function
(in probability, this is called
x
f
(
x
)

e
an “exponential distribution”):
 This Function Integrates to 1 zero to
infinity as required for all PDF’s

e

x

 e

 x 
0
 0   1  1
0
Engineering/Math/Physics 25: Computational Methods
36
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Continuous Case PDF Example
 The probability that
x is any exact value
(e.g.: 1.9976) is 0
• we can ONLY assign
Probabilities
to possible
RANGES of x
 For example, the
probability of x
falling within 1 to 2:
p(x)=e-x
1
x
p(x)=e-x
1
1
NO Area
Under a
LINE
2

p (1  x  2)   e  x   e  x

 e  2   e 1
x
Engineering/Math/Physics 25: Computational Methods
37
2

1

 .135  .368  .23 23% 
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
2
1
Gaussian Curve
 The Man-Height
HistroGram had
some Limited, and
thus DISCRETE,
Data
 If we were to
Measure 10,000 (or
more) young men
we would obtain a
HistoGram like this
Engineering/Math/Physics 25: Computational Methods
38
 As We increase the
number and
fineness of the
measurements The
PDF approaches a
CONTINUOUS
Curve
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Gaussian Distribution
 A Distribution that
Describes Many Physical
Processes is called the
GAUSSIAN or NORMAL
Distribution
 Gaussian (Normal) distribution
• Gaussian → famous “bell-shaped curve”
– Describes IQ scores, how fast horses can run, the no. of
Bees in a hive, wear profile on old stone stairs...
• All these are cases where:
– deviation from mean is equally probable in either direction
– Variable is continuous (or large enough integer
to look continuous)
Engineering/Math/Physics 25: Computational Methods
39
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Normal Distribution
 Real-valued PDF: f(x) → −∞ < x < +∞
 2 independent fitting parameters:
µ , σ (central location and width)
 Properties:
• Symmetrical about Mode at µ ,
• Median = Mean = Mode,
• Inflection points at ±σ
 Area (probability of observing event) within:
• ± 1σ = 0.683
• ± 2σ = 0.955
 For larger σ, bell shaped curve becomes
wider and lower (since area =1 for any σ)
Engineering/Math/Physics 25: Computational Methods
40
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Normal Distribution
 Mathematically
f x  
• Where
1
2 
e
 ( x   ) 2 2
– σ2 = Variance
– µ = Mean
 TheArea Under the Curve
 f x dx 


1
2 
e
 ( x   ) 2 2
dx  1

Engineering/Math/Physics 25: Computational Methods
41
2
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
2
68-95-99.7 Rule for Normal Dist
68% of
the data
σ
σ
95% of the data
2σ
2σ
3σ
99.7% of the data
Engineering/Math/Physics 25: Computational Methods
42
3σ
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
68-95-99.7 Rule in Math terms…
 Using Definite-Integral Calculus
 
1

e

   2
1 x 2
 (
)
2 
  2
1 x 2
 (
)
2 
  3
1 x 2
 (
)
2 
1
e

  2  2
1
e

  3  2
Engineering/Math/Physics 25: Computational Methods
43
dx  .68 68% 
dx  .95 95% 
dx  .997 99.7% 
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
How Good is the Rule for Real?
 Check some example data:
 The mean, µ, of the weight of a large
group of women
Cross Country
Runners = 127.8 lbs
 The standard
deviation (σ)
for this Group
= 15.5 lbs
Engineering/Math/Physics 25: Computational Methods
44
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
68% of 120 = .68x120 = ~ 82 runners
In fact, 79 runners fall within 1σ (15.5 lbs) of the mean
112.3
127.8
143.3
25
20
P
e
r
c
e
n
t
15
10
5
0
80
90
100
110
120
130
140
150
160
POUNDS
Engineering/Math/Physics 25: Computational Methods
45
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
95% of 120 = .95 x 120 = ~ 114 runners
In fact, 115 runners fall within 2σ of the mean
96.8
127.8
158.8
25
20
P
e
r
c
e
n
t
15
10
5
0
80
90
100
110
120
130
140
150
160
POUNDS
Engineering/Math/Physics 25: Computational Methods
46
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
99.7% of 120 = .997 x 120 = 119.6 runners
In fact, all 120 runners fall within 3σ of the mean
81.3
127.8
174.3
25
20
P
e
r
c
e
n
t
15
10
5
0
80
90
100
110
120
130
140
150
160
POUNDS
Engineering/Math/Physics 25: Computational Methods
47
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Estimating µ & σ (1)
 The Location &
Width Parameters, µ
& σ, are Calculated
from the ENTIRE
POPULATION
• Mean, µ
N
   xk N
k 1
• Variance, σ2
N
 2   xk   2 N
• Standard Deviation, σ
  2
 For LARGE
Populations it is
usually impractical to
measure all the xk
 In this case we take a
Finite SAMPLE to
ESTIMATE µ & σ
k 1
Engineering/Math/Physics 25: Computational Methods
48
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Estimating µ & σ (2)
 Say we want to
characterize
Miles/Yr driven by
Every Licensed
Driver in the USA
 We Take the Mean of
the SAMPLE
 We assume that this
is Normally
Distributed, so we
take a Sample of
N = 1013 Drivers
 Use the SAMPLEMean to Estimate the
POPULATION-Mean
Engineering/Math/Physics 25: Computational Methods
49
N
x   xn N
k 1
N
µ  x   xn N
k 1
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Estimating µ & σ (3)
 S
 Now Calc the
 Estimate
SAMPLE Variance &
• standard deviation:
StdDev N
positive square root of
2
S 
2
 x
k 1
k
x

N 1
• Number decreased
from N to (N – 1) To
Account for case
where N = 1
– In this case x-bar = x1,
and the S2 result is
meaningless
Engineering/Math/Physics 25: Computational Methods
50
the variance
– small std dev:
observations are
clustered tightly around
a central value
– large std dev:
observations are
scattered widely about
the mean
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Sample Mean and StdDev
For a series of N observations, the most probable estimate of the
mean µ is the average x of the observations. We refer to this as
the sample mean x to distinguish it from the population mean µ.
 x
1
x

N
i
Sample Mean
Calculate the Population Variance, σ2, from:
 xi
2  xi
2
1
2
   xi    

N
N
2
N


N
2
1
2
  xi    2
N

But we cannot know the true population mean µ so the practical
estimate for the sample variance and standard deviation would be:
 s 
2
2
x x


N 1
1
Engineering/Math/Physics 25: Computational Methods
51
2
i
Sample Variance
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Error Function (erf) & Probability
 Guass’s Defining
Eqn
erf z  
2


z
0
e
 y2
IG  
dy
 This looks a lot Like
the normal dist
f x  
1
2 
e
 ( x   ) 2
2
 Consider the
Gaussian integral
Engineering/Math/Physics 25: Computational Methods
52
2
1
2 
 Or
IG 
1
2 
e
 ( x   ) 2 2
e
 x 


 2 
 Now Let
 x 
y

 2 
1
 dy 
dx Or
 2
dx   2dy
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
2
dx
2
dx
Error Function (erf) & Probability
 Subbing for x & dx
1
IG 
e

2 
 x 


 2 
IG 
2
dx
1

e
1 2
 
2 
1
 y2
1
IG 
e
2dy

 erf 
2 
2
1
 y2
 As
IG 
e
dy


 ReArranging
Engineering/Math/Physics 25: Computational Methods
53
erf z  
2
 y2
e
dy
 y2
z
e



dy 

 y2
0
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
dy
Error Function (erf) & Probability
 Now the Limits
 This Fcn is
Symmetrical about
y=0
 Plotting
1
f y  e
0.9
 y2
 Recall
0.8
erf z  
2
f(y) = exp(-y )
0.7
0.6
0.5
2
z
e


 y2
0
dy
 And the erf properties
0.4
0.3
• erf(0) = 0
• erf(h) = 1
0.2
0.1
0
-3
-2
-1
0
1
2
3
y
Engineering/Math/Physics 25: Computational Methods
54
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Error Function (erf) & Probability
 By Symmetry about y = 0 for e
2



0
e
 y2
dy 
2
0
e


 y2

 y2
dy  1
 Thus
2


B
e

 y2
dy 
2
0
e



 y2
dy 
2


B
0
e
 y2
dy
 So Finally integrating −h to B
2


B
e
 y2

Engineering/Math/Physics 25: Computational Methods
55
dy  1  erf ( B)
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Error Function (erf) & Probability
 Note That for a
Continuous PDF
• Probability that x is
Less or Equal to b
Px  b  
b
 f x dx

• Probability that x is
between a & b
b
Pa  x  b    f x dx
a
Engineering/Math/Physics 25: Computational Methods
56
 The probability for
the Normal Dist
Px  b  
b
1
2 
e
2
dx

b
Pa  x  b  
 But
IG 
 ( x   ) 2 2
1
2 
e
 ( x   ) 2 2
a
1
2 
e
 ( x   ) 2 2
2
dx
2



 x 
1

 2 erf 
2

2



Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
2
dx
Error Function (erf) & Probability
 If We Scale this


1
b

µ


Properly we can Px  b   1  erf 

2
  2 
Cast these Eqns
into the ½erf Form
1  bµ
 a  µ 
Pa  x  b   erf 
  erf 

2   2 
  2 
 MATLAB has the erf built-in, so if we have the
sample Mean & StdDev We can Calc
Probabilities for Normally Distributed
Quantities
Engineering/Math/Physics 25: Computational Methods
57
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
All Done for Today
Gaussian?
Or
Normal?
 Recall De Moivre’s Theorem
z  R cos   jR sin  
 Normal distribution was
introduced by French
mathematician
A. De Moivre in 1733.
• Used to approximate
probabilities of coin tossing
• Called it the exponential
bell-shaped curve
 1809, K.F. Gauss, a German
mathematician, applied it to
predict astronomical entities… it
became known as the Gaussian
distribution.
 Late 1800s, most believe majority
of physical data would follow the
distribution  called normal
distribution
z k  R k cosk   j sin k 
Engineering/Math/Physics 25: Computational Methods
58
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Engr/Math/Physics 25
Appendix
f x   2 x  7 x  9 x  6
3
2
Bruce Mayer, PE
Licensed Electrical & Mechanical Engineer
BMayer@ChabotCollege.edu
Engineering/Math/Physics 25: Computational Methods
59
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Basic Fitting Demo File
% Bruce Mayer, PE
% ENGR25 * 11Apr10
% file = Demo_Basic_Fitting_Stockton_Temps_1004.m
%
TmaxSTK = [94, 98, 93, 94, 91, 96, 93, 87, 89, 94, 100, 99,
103, 103, 103, 97, 91, 83, 84, 90, 89, 95, 94, 99, 97, 94,
102, 103, 107, 98, 86, 89, 95, 91, 84, 93, 98, 104, 105,
107, 103, 91, 90, 96, 93, 86, 92, 93, 95, 95, 86, 81, 93,
97, 96, 97, 101, 92, 89, 92, 93, 94]
Ntot = length(TmaxSTK)
nday = [1:Ntot];
plot(nday, TmaxSTK, '-dk'), xlabel('No. Days after
31Jun08'), ylabel('Max. Temp (°F)'), title('Stockton, CA Jul-Aug08')
Engineering/Math/Physics 25: Computational Methods
60
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Normal or Gaussian?
 Normal distribution was introduced by French
mathematician A. De Moivre in 1733.
• Used to approximate probabilities of coin tossing
• Called it exponential bell-shaped curve
 1809, K.F. Gauss, a German mathematician,
applied it to predict astronomical entities… it
became known as Gaussian distribution.
 Late 1800s, most believe majority data would
follow the distribution  called normal
distribution
Engineering/Math/Physics 25: Computational Methods
61
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Carl Friedrich Gauss
Engineering/Math/Physics 25: Computational Methods
62
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Ht (in)
No.
Area (BW*No.)
No./TotArea
64
1
0.5
0.0200
1.00%
64.5
0
0
0.0000
0.00%
65
0
0
0.0000
0.00%
65.5
0
0
0.0000
0.00%
66
2
1
0.0400
2.00%
66.5
4
2
0.0800
4.00%
67
5
2.5
0.1000
5.00%
67.5
4
2
0.0800
4.00%
68
8
4
0.1600
8.00%
68.5
11
5.5
0.2200
11.00%
69
12
6
0.2400
12.00%
69.5
10
5
0.2000
10.00%
70
9
4.5
0.1800
9.00%
70.5
8
4
0.1600
8.00%
71
7
3.5
0.1400
7.00%
71.5
5
2.5
0.1000
5.00%
72
4
2
0.0800
4.00%
72.5
4
2
0.0800
4.00%
73
3
1.5
0.0600
3.00%
73.5
1
0.5
0.0200
1.00%
74
1
0.5
0.0200
1.00%
74.5
0
0
0.0000
0.00%
75
1
0.5
0.0200
1.00%
Engineering/Math/Physics

50.0 25: Computational Methods
63
BW*(No./TotArea)

100.00%
Normal
Dist
Data
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
SPICE Circuit
Engineering/Math/Physics 25: Computational Methods
64
Bruce Mayer, PE
BMayer@ChabotCollege.edu • ENGR-25_Lec-19_Statistics-1.ppt
Download