Engineering statistics EQT 271

advertisement
Khatijahhusna Abd Rani
School Of Electrical System Engineering (PPKSE)
Semester II 2014/2015
Slide was prepared by Miss Syafawati (with modification)
 Chapter 1: Basic Statistics
 Chapter 2: Statistical Inference
 Chapter 3: Analysis of Variance
 Chapter 4: Introductory Linear Regression
 Chapter 5: Nonparametric Statistics
 Statistics in Engineering
 Collecting Engineering Data
 Data summary & Presentation
 Probability Distribution
 Discrete Probability Distribution
 Continuous Probability Distribution
 Sampling Distributions of the Mean and Proportion
Statistics is the science of conducting studies to collect,
organize, summarize, analyze, and draw conclusions from data
 To determine the satisfaction of students
towards Universities facilities among UniMAP
students
 To investigate the effects of Mobile base station
Exposure on Body Temperature of Children's in
Perlis
• Gather information
from data and make
conclusions &
recommendations
•a collection of
numerical information
is called statistics
 Population:
Consists of all subjects (human or otherwise) that are
being studied)
Example 1: undergraduates students in UniMAP
Example 2: Children's in Perlis
 Sample:
A group of subjects selected from a population
Population
sample
Discrete
Qualitative
 Variable:
Quantitative
Continuous
Characteristics or attributes that can assume different
values
Example 2: Body temperature, gender, age
 Observation:
Value of variable for an element
 Data set:
A collection of observation on one or more variables.
Examples of measurement scales
Nominal
Ordinal
Interval
Ratio
Zip Code
Gender
Eye Color (blue,
brown, green)
Nationality
Grade (A, B,
C,D)
Rating Scale
(poor, good,
excellent)
Temperature
IQ
Height
Weight
Age
Time
Salary
 Direct observation
The simplest method of obtaining data.
Advantage: relatively inexpensive
Disadvantage: difficult to produce useful information
since it does not consider all aspects regarding the
issues.
 Experiments
More expensive methods but better way to produce data
Data produced are called experimental
 Surveys
Most familiar methods of data collection
Depends on the response rate
 Personal Interview
Has the advantage of having higher expected response rate
Fewer incorrect respondents.
Dis: High Costs, Need for highly trained interviewers
Grouped Data
Data that has been organized
into groups (into a frequency
distribution).
When the range of data is large
Ungrouped Data
Data that has not been organized
into groups. Also called as raw
data.
Weight
60-62
63-65
66-68
69-71
72-74
Total
Frequency
5
18
42
27
8
100
 depends on the type (nature) of data whether the data
is in qualitative (such as gender and ethnic group) or
quantitative (such as income and CGPA).
Data Presentation of Qualitative Data
 Tabular presentation for qualitative data is usually in the form
of frequency table that is a table represents the number of
times the observation occurs in the data.
*Qualitative :- characteristic being studied is nonnumeric.
Examples:- gender, religious affiliation or eye color.
 The most popular charts for qualitative data are:
1. bar chart/column chart;
2. pie chart; and
3. line chart.
Table 1: Frequency table
Observations
Frequency
Malay
33
Chinese
9
Indian
6
Others
2
 Bar Chart: used to display the frequency distribution in the graphical
form. Frequencies are shown on the Y-axis and the ethnic group is
shown on the X-axis
 Pie Chart: used to display the frequency distribution. It displays the
ratio of the observations
6
2
Malay
9
Chinese
Indian
33
Others
 Line chart: used to display the trend of observations. It is a very
popular display for the data which represent time.
Jan
10
Feb
7
Mar
5
Apr
10
May
39
Jun
7
Jul
260
Aug
316
Sep
142
Oct
11
Nov
4
Dec
9
Data Presentation Of Quantitative Data
 Tabular presentation for quantitative data is usually in the form of
frequency distribution that is a table represent the frequency of the
observation that fall inside some specific classes (intervals).
*Quantitative : variable studied are numerically. Examples:balanced in accounts, ages of students, the life of an
automobiles batteries such as 42 months).
 Frequency distribution: A grouping of data into mutually exclusive
classes showing the number of observations in each class.
 There
are few graphs available for the graphical
presentation of the quantitative data.
The most popular graphs are:
1. histogram;
2. frequency polygon; and
3. ogive.
Table 1: Frequency Distribution
Weight
Frequency
60-62
5
63-65
18
66-68
42
69-71
27
72-74
8
 Histogram: Looks like the bar chart except that
the
horizontal
axis represent the data which is quantitative in nature. There is no gap
between the bars.
 Frequency Polygon: looks like the line chart except that the horizontal
axis represent the class mark of the data which is quantitative in nature.
 Ogive: line graph with the horizontal axis represent the upper limit of the
class interval while the vertical axis represent the cummulative frequencies.
Constructing Frequency Distribution
 When summarizing large quantities of raw data, it is often useful to distribute the
data into classes. Table 1.1 shows that the number of classes for Students` weight.
Weight
60-62
63-65
66-68
69-71
72-74
Total
Frequency
5
18
42
27
8
100
Table 1.1: Weight of 100 male students
in XYZ university
 A frequency distribution for quantitative data lists all the classes and the number of
values that belong to each class.
 Data presented in the form of a frequency distribution are called grouped data.
 For quantitative data, an interval that includes all the values that fall within two
numbers; the lower and upper class which is called class.
 Class is in first column for frequency distribution table.
*Classes always represent a variable, non-overlapping; each value is belong to one
and only one class.
 The numbers listed in second column are called frequencies, which gives the
number of values that belong to different classes. Frequencies denoted by f.
Table 1.2 : Weight of 100 male students in XYZ university
Variable
Third class
(Interval Class)
Lower Limit
of the fifth class
Weight
60-62
63-65
66-68
69-71
72-74
Total
Frequency
5
18
42
27
8
100
Upper limit of the sixth class
Frequency
column
Frequency
of the third
class.
 The class boundary is given by the midpoint of the upper
limit of one class and the lower limit of the next class. (no gap)
 The difference between the two boundaries of a class gives the class
width; also called class size.
Formula:
- Class Midpoint or Mark
Class midpoint or mark = (Lower Limit + Upper Limit)/2
- Finding The Number of Classes
Number of classes, c = 1  3.3log n
Finding Class Width For Interval Class
class width , i = (Largest value – Smallest value)/Number of classes
* Any convenient number that is equal to or less than the smallest values in the data
set can be used as the lower limit of the first class.
Example 1.9:
From Table 1.1: Class Boundary
Weight (Class
Interval)
60-62
63-65
66-68
69-71
72-74
Total
Class
Boundary
59.5-62.5
62.5-65.5
65.5-68.5
68.5-71.5
71.5-74.5
Frequency
5
18
42
27
8
100
Cumulative Frequency Distributions
 A cumulative frequency distribution gives the total number of values that fall
below the upper boundary of each class.
 In cumulative frequency distribution table, each class has the same lower limit
but a different upper limit.
Table 1.3: Class Limit, Class Boundaries, Class Width , Cumulative Frequency
Weight
(Class
Interval;)
Number of
Students, f
Class
Boundaries
Cumulative
Frequency
60-62
5
59.5-62.5
5
63-65
18
62.5-65.5
5 + 18 = 23
66-68
42
65.5-68.5
23 + 42 = 65
69-71
27
68.5-71.5
65 + 27 =92
72-74
8
71.5-74.5
92 + 8 = 100
100
How to construct histogram?
 Prepare the frequency distribution table by:
1.Find the minimum and maximum value
2.Decide the number of classes to be included in your frequency
distribution table.
-Usually 5-20 classes. Too small-may not able to see any pattern
OR
-Sturge’s rule, Number of classes= 1+3.3log n
3.Determine class width, i = (max-min)/num. of class
4.Determine class limit.
5.Find class boundaries and class mid points
6.Count frequency for each class
 Draw histogram
Exercise 1.1 :
The data below represent the waiting time (in
minutes) taken by 30 customers at one local bank.
25 31
20
30
22
32
37
28
29 23
35
25
29
35
29
27
23 32
31
32
24
35
21
35
35 22
33
24
39
43
 Construct a frequency distribution and cumulative
frequency distribution table.
 Construct a histogram.
•
Data Summary
Summary statistics are used to summarize a set of observations.
Two basic summary statistics are measures of central tendency and measures of
dispersion.
Measures of Central Tendency
 Mean
 Median
 Mode
Measures of Dispersion
 Range
 Variance
 Standard deviation
Measures of Position
 Z scores
 Percentiles
 Quartiles
 Outliers
Measures of Central Tendency
 Mean
Mean of a sample is the sum of the sample data divided by the
total number sample.
Mean for ungrouped data is given by:
_
x
x1  x2  .......  xn x

x
, for n  1,2,..., n or x 
n
n
_
Mean for group data is given by:
n

x
fx
fx

or
f

f

i 1
n
i 1
i i
i
Example 1.11 (Ungrouped data):
Mean for the sets of data 3,5,2,6,5,9,5,2,8,6
Solution :
35 2 6595 28 6
x
 5.1
10
Example 1.12 (Grouped Data):
Use the frequency distribution of weights 100 male
students in XYZ university, to find the mean.
Weight
Frequency
60-62
63-65
66-68
69-71
72-74
5
18
42
27
8
Solution :
Weight
(Class
Interval
Frequency, f
Class
Mark, x
Fx64
60-62
63-65
66-68
69-71
72-74
5
18
42
27
8
61
64
67
70
73
305
1152
2814
1890
584
6745
fx

x
?
f
 Median of ungrouped data: The median depends on the
number of observations in the data, n . If n is odd, then the
median is the (n+1)/2 th observation of the ordered observations.
But if is even, then the median is the arithmetic mean of the
n/2 th observation and the (n+1)/2 th observation.
 Median of grouped data:
 f


F

j 1 
2
x  Lc

f


j


where
L = the lower class boundary of the median class
c = the size of median class interval
Fj 1  the sum of frequencies of all classes lower than the median class
f j  the frequency of the median class
Example 1.13 (Ungrouped data):
n is odd
The median for data 4,6,3,1,2,5,7 (n=7)
(median=(7+1/2=4th place)
median
Rearrange the data : 1,2,3,4,5,6,7
n is even
The median for data 4,6,3,2,5,7 (n=6)
Rearrange the data : 2,3,4,5,6,7
Median=(4+5)/2=4.5
Example 1.14 (Grouped Data):
The sample median for frequency distribution as in
example 1.12
Solution:
Weight
(Class
Interval
Frequency,
f
Class
Mark,
x
fx
Cumulative
Frequency,
F
Class
Boundary
60-62
63-65
66-68
69-71
72-74
5
18
42
27
8
61
64
67
70
73
305
1152
2814
1890
584
5
23
65
92
100
59.5-62.5
62.5-65.5
65.5-68.5
68.5-71.5
71.5-74.5
 f


F

j 1 
2
x  Lc
?
f


j


Mode
Mode of ungrouped data: The value with the highest
frequency in a data set.
*It is important to note that there can be more than one
mode and if no number occurs more than once in the set,
then there is no mode for that set of numbers
 Mode for grouped data
When data has been grouped in classes and a frequency curveis drawn
to fit the data, the mode is the value of x corresponding to the maximum
point on the curve, that is
 1 
xˆ  L  c 




2
 1
L  the lower class boundary of the modal class
c = the size of the modal class interval
1  the difference between the modal class frequency and the class before it
 2  the difference between the modal class frequency and the class after it
*the class which has the highest frequency is called the modal class
Example 1.15 (Ungrouped data)
Find the mode for the sets of data 3, 5, 2, 6, 5, 9, 5, 2, 8, 6
Mode = number occurring most frequently = 5
Example 1.16 Find the mode of the sample data below
Solution:
Weight Frequency Class
(Class
,f
Mark
Interval
,x
Mode class
60-62
63-65
66-68
69-71
72-74
5
18
42
27
8
Total
100
 1 
ˆx  L  c 
 ?
 1   2 
61
64
67
70
73
fx
Cumulative
Frequency,
F
Class
Boundary
305
1152
2814
1890
584
5
23
65
92
100
59.5-62.5
62.5-65.5
65.5-68.5
68.5-71.5
71.5-74.5
6745
Measures of Dispersion
 Range = Largest value – smallest value
 Variance: measures the variability (differences) existing in a set
of data.
The variance for the ungrouped data:
s 
2
  x  x  (for sample)
n 1
2
 
2
  x 
N
2
(for population)
The variance for the grouped data:
2
S 
2
2 
2
fx
 nx
or S 
2
n 1

 fx2  Nx 2
N
or
σ2 

2
(
fx
)
fx 2  
n
n 1
 fx
(for sample)
 fx
 
2
2
N
N
(for population)
 The positive square root of the variance is the standard
deviation

S
 ( x  x)
n 1
2

 fx
2
2
nx
n 1
 A large variance means that the individual scores (data) of
the sample deviate a lot from the mean.
 A small variance indicates the scores (data) deviate little
from the mean.
Example 1.17 (Ungrouped data)
Find the variance and standard deviation of the sample
data : 3, 5, 2, 6, 5, 9, 5, 2, 8, 6
Example 1.18 (Grouped data)
Find the variance and standard deviation of the sample
data below:
Weight
(Class
Interval
Frequency,
f
Class
Mark,
x
fx
60-62
63-65
66-68
69-71
72-74
5
18
42
27
8
61
64
67
70
73
305
1152
2814
1890
584
Total
100
S 
2

2
(
fx
)
fx 2  
n
?
n 1
Cumulative
Frequency,
F
Class
Boundary
x
5
23
65
92
100
59.5-62.5
62.5-65.5
65.5-68.5
68.5-71.5
71.5-74.5
3721
4096
4489
4900
5329
6745
2
fx 2
18605
73728
188538
132300
42632
455803
S

2
fx 2  n x
n 1
?
Exercise 1.2
The defects from machine A for a sample of products
were organized into the following:
Defects
(Class Interval)
Number of products get
defect, f ( frequency)
2-6
1
7-11
4
12-16
10
17-21
3
22-26
2
What is the mean, median, mode, variance and
standard deviation.
Exercise 1.3
The following data give the sample number of iPads sold by a
mail order company on each of 30 days. (Hint : 5 number of
classes)
8 25
11
15
29
22
10
5
17
21
22 13
26
16
18
12
9
26
20
16
23 14
19
23
20
16
27
9
21
14
a) Construct a frequency distribution table.
b) Find the mean, variance and standard deviation, mode and
median.
c) Construct a histogram.
Measures of Position
To describe the relative position of a certain data value
within the entire set of data.
z scores
Percentiles
Quartiles
Outliers
Z Score
 A standard score or z score tells how many standard
deviations a data value is above or below the mean for a
specific distribution of values.
 If a z score is 0, then the data value is the same as the mean.
 The formula is:
value  mean
z
standard deviation
samples, z  x  x
s
populations, z  x  

 Note that if the z score is positive, the score is above the
mean. If the z score is 0, the score is the same as the mean.
And if the z score is negative, the z score is below the mean.
 A student score 65 on calculus test that has a mean of 50 and a standard
deviation of 10; she scored 30 on a history test with mean of 25 and a
standard deviation of 5. Compare her relative positions on the two test.
Solution:
First, find the Z scores.
For calculus the z score is
The calculus score of 65 was
x  x 65  50
z

 1.5 actually 1.5 standard deviations
above the mean 50
s
10
For history the z score is z  x  x  30  25  1.0
s
5
The history score of 30 was
actually 1.0 standard deviations
above the mean 25
Since the z score for calculus is larger, her relative position in the calculus
class is higher than her relative position in the history class
Exercise 1.4
 Find the z score for each test, and state which is higher
Test
x
x
Mathematics
38
40
5
Statistics
94
100
10
s
Quartiles
 Divide the distribution into four equal groups, denoted by Q₁,
Q₂, Q₃.
 Note that Q₁is the same as the 25th percentile
 Q₂ is the same as the 50th percentile or median
 Q₃ corresponds to the 75th percetile
Q1 
1
 n  1th
4
Q2  median 
Q3 
1
 (n  1)th
2
3
 n  1)th
4
Odd number of observations
Positions are integers
Example: 5, 8, 4, 4, 6, 3, 8 (n=7)
1. Put them in order: 3, 4, 4, 5, 6, 8,Q18
2. Calculate the quartiles
1
Q1   7  1th  2th
4
1
Q2  median   (7  1)th  4th
2
3
Q3   7  1)th  6th
4
Q2
Q3
3, 4, 4, 5, 6, 8, 8
Even number of observations:
Positions are not integers
Example: 5, 12, 10, 4, 6, 3, 8, 14 (n=8)
1. Put them in order:
2. Calculate the quartiles
1
Q1   8  1th  2.25th
4
1
Q2  median   (8  1)th  4.5th
2
3
Q3   8  1)th  6.75th
4
3, 4, 5,6,8,10,12,14
Q1
Q2
Q3
Q1  4  0.25(5  4)  4.25
86
7
2
Q3  10  0.75(12  10)  11.5
Q2  median 
Exercise 1.5
The following data represent the number of inches of
rain in Chicago during the month of April for 10
randomly years.
2.47 3.97 3.94 4.11 5.22
1.14 4.02 3.41 1.85 0.97
Determine the quartiles.
Outliers
 Extreme observations
 Can occur because of the error in measurement of a
variable, during data entry or errors in sampling.
Checking for outliers by using Quartiles
Step 1: Rank the data in increasing order,
Step 2: Determine the first, median and third quartiles
of data.
Step 3: Compute the interquartile range (IQR).
IQR  Q3  Q1
Step 4: Determine the fences. Fences serve as cutoff
points for determining outliers.
Lower Fence  Q1  1.5( IQR)
Upper Fence  Q3  1.5( IQR)
Step 5: If data value is less than the lower fence or
greater than the upper fence, considered outlier.
Determine whether there are outliers in the data set.
2.47
1.14
3.97
4.02
3.94
3.41
4.11
1.85
5.22
0.97
Solution:
0.97, 1.14, 1.85, 2.47, 3.41, 3.94, 3.97, 4.02, 4.11, 5.22
Q1  1.6725, Q3  4.0425
IQR  Q3  Q1  4.0425  1.6725  2.37
Lower fence  Q1  1.5( IQR )
 1.6725  1.5(2.37)
 1.8825
Upper fence  Q3  1.5(IQR)
 4.0425  1.5(2.37)
 7.5975
 Since all the data are not less than -1.8825 and not greater
than 7.5975, then there are no outliers in the data
The Five Number Summary; Boxplots
Compute the five-number summary
MINIMUM Q1 M Q3 MAXIMUM
Example 1.24
2.47 3.97 3.94 4.11 5.22
1.14 4.02 3.41 1.85 0.97
Compute all five-number summary.
Solution:
0.97, 1.14, 1.85, 2.47, 3.41, 3.94, 3.97, 4.02, 4.11, 5.22
Minimum  0.97,
Q1  1.6725,
Q2  Median  3.675,
Q3  4.0425,
Maximum  5.22
BOXPLOT
 The five-number summary can be used to create a simple
graph called a boxplot.
 From the boxplot, you can quickly detect any skewness in
the shape of the distribution and see whether there are any
outliers in the data set.
Interpreting Boxplot
Boxplots
Step 1: Determine the lower and upper fences:
Lower Fence  Q1  1.5( IQR)
Upper Fence  Q3  1.5( IQR)
Step 2: Draw vertical lines at Q , M and. Q
1
3
Step 3: Label the lower and upper fences.
Step 4: Draw a line fromQ1 to the smallest data value
that is larger than the lower fence. Draw a
line from Q3 to the largest data value that is
smaller than the upper fence.
Step 5: Any data value less than the lower fence or
greater than the upper fence are outliers and
mark (*).
2.47
1.14
3.97
4.02
3.94
3.41
4.11
1.85
5.22
0.97
Sketch the boxplot and interpret the shape of the
boxplot
Boxplots
Step 1: Rank the data in increasing order.
Step 2: Determine the quartiles and median.
Step 3: Draw vertical lines at Q1 , M and Q
.3
Step 4: Draw a line fromQ1 to the smallest data value.
Draw a line from Q3 to the largest data value.
Step 5: Any data value less than the lower fence or
greater than the upper fence are outliers and
mark (*).

Download