Applied Statistics I Liang Zhang June 9, 2008

advertisement
Applied Statistics I
Liang Zhang
Department of Mathematics, University of Utah
June 9, 2008
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
1 / 36
Introduction
What is statistics?
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
2 / 36
Introduction
What is statistics?
“Utah Democrats are more sure. Thirty-six percent said Obama will take
the oath of office, 24 percent didn’t know, and 22 percent said it will be
Clinton.”
from Desert News:
Liang Zhang (UofU)
It’s Utah’s turn:
Local voters favor Mitt and Obama, poll shows
Applied Statistics I
June 9, 2008
2 / 36
Introduction
What is statistics?
“Utah Democrats are more sure. Thirty-six percent said Obama will take
the oath of office, 24 percent didn’t know, and 22 percent said it will be
Clinton.”
from Desert News:
It’s Utah’s turn:
Local voters favor Mitt and Obama, poll shows
R
“GE Spiral lamps:
long life – from 8,000 to 12,000 hours”
from http://www.geconsumerproducts.com/
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
2 / 36
Introduction
What is statistics?
“Utah Democrats are more sure. Thirty-six percent said Obama will take
the oath of office, 24 percent didn’t know, and 22 percent said it will be
Clinton.”
from Desert News:
It’s Utah’s turn:
Local voters favor Mitt and Obama, poll shows
R
“GE Spiral lamps:
long life – from 8,000 to 12,000 hours”
from http://www.geconsumerproducts.com/
“Dow Jones Industrial Average on Jun.5th”
from http://www.finance.yahoo.com/
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
2 / 36
Introduction
Latin word “status” meaning “state”
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
3 / 36
Introduction
Latin word “status” meaning “state”
The discipline of statistics probides methods for organizing and
summarizing data and for drawing conclusions based on information
contained in the data.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
3 / 36
Introduction
Latin word “status” meaning “state”
The discipline of statistics probides methods for organizing and
summarizing data and for drawing conclusions based on information
contained in the data.
Our Focus: Drawing Conclusions or Making Statistical Inferences
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
3 / 36
Basic Concepts
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
4 / 36
Basic Concepts
Population: total collection of objects we are interested in
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
4 / 36
Basic Concepts
Population: total collection of objects we are interested in
Sample: a subset of the population
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
4 / 36
Basic Concepts
Population: total collection of objects we are interested in
Sample: a subset of the population
Census: information for all objects in the population
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
4 / 36
Basic Concepts
Population: total collection of objects we are interested in
Sample: a subset of the population
Census: information for all objects in the population
Examples:
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
4 / 36
Basic Concepts
Population: total collection of objects we are interested in
Sample: a subset of the population
Census: information for all objects in the population
Examples:
Number of students in this classroom who drove here today
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
4 / 36
Basic Concepts
Population: total collection of objects we are interested in
Sample: a subset of the population
Census: information for all objects in the population
Examples:
Number of students in this classroom who drove here today
Population: all the students in the class room;
Sample: All the boy; Census: possible
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
4 / 36
Basic Concepts
Population: total collection of objects we are interested in
Sample: a subset of the population
Census: information for all objects in the population
Examples:
Number of students in this classroom who drove here today
Population: all the students in the class room;
Sample: All the boy; Census: possible
GE manufactured 100,000,000 lamps. What’s life range?
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
4 / 36
Basic Concepts
Population: total collection of objects we are interested in
Sample: a subset of the population
Census: information for all objects in the population
Examples:
Number of students in this classroom who drove here today
Population: all the students in the class room;
Sample: All the boy; Census: possible
GE manufactured 100,000,000 lamps. What’s life range?
Population: 100,000,000 lamps; Sample: randomly
selected 1,000 lamps; Census: impossible
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
4 / 36
Basic Concepts
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
5 / 36
Basic Concepts
Variable: a characteristic of the population that may differ from
individual to individual
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
5 / 36
Basic Concepts
Variable: a characteristic of the population that may differ from
individual to individual
usually use lowercase letters to denote variables
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
5 / 36
Basic Concepts
Variable: a characteristic of the population that may differ from
individual to individual
usually use lowercase letters to denote variables
Examples: x = yes or no a student drove to school today
y = maximum hours a lamp can last
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
5 / 36
Basic Concepts
Variable: a characteristic of the population that may differ from
individual to individual
usually use lowercase letters to denote variables
Examples: x = yes or no a student drove to school today
y = maximum hours a lamp can last
Univariate Data: observation on a single variable
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
5 / 36
Basic Concepts
Variable: a characteristic of the population that may differ from
individual to individual
usually use lowercase letters to denote variables
Examples: x = yes or no a student drove to school today
y = maximum hours a lamp can last
Univariate Data: observation on a single variable
Bivariate Data: observation on each of two variables
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
5 / 36
Basic Concepts
Variable: a characteristic of the population that may differ from
individual to individual
usually use lowercase letters to denote variables
Examples: x = yes or no a student drove to school today
y = maximum hours a lamp can last
Univariate Data: observation on a single variable
Bivariate Data: observation on each of two variables
Multivariate Data: observations made on more than one variable
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
5 / 36
Basic Concepts
Variable: a characteristic of the population that may differ from
individual to individual
usually use lowercase letters to denote variables
Examples: x = yes or no a student drove to school today
y = maximum hours a lamp can last
Univariate Data: observation on a single variable
Bivariate Data: observation on each of two variables
Multivariate Data: observations made on more than one variable
Examples:
The collection of data about whether students drove to school today
and the gender of students
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
5 / 36
Basic Concepts
Variable: a characteristic of the population that may differ from
individual to individual
usually use lowercase letters to denote variables
Examples: x = yes or no a student drove to school today
y = maximum hours a lamp can last
Univariate Data: observation on a single variable
Bivariate Data: observation on each of two variables
Multivariate Data: observations made on more than one variable
Examples:
The collection of data about whether students drove to school today
and the gender of students
The collection of data about whether students drove to school today,
the gender of students and the distance from their home to campus
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
5 / 36
Basic Concepts
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
6 / 36
Basic Concepts
Conceptual/Hypothetical Population: population which does not
physically exist
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
6 / 36
Basic Concepts
Conceptual/Hypothetical Population: population which does not
physically exist
Examples: all possible values of tomorrow’s highest temperature; all
possible pH values of some unknown liquid; etc.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
6 / 36
Basic Concepts
Conceptual/Hypothetical Population: population which does not
physically exist
Examples: all possible values of tomorrow’s highest temperature; all
possible pH values of some unknown liquid; etc.
Enumerative v.s. Analytic Studies
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
6 / 36
Basic Concepts
Conceptual/Hypothetical Population: population which does not
physically exist
Examples: all possible values of tomorrow’s highest temperature; all
possible pH values of some unknown liquid; etc.
Enumerative v.s. Analytic Studies
Enumerative Studies: the sample is available to an investigator or
else can be constructed
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
6 / 36
Basic Concepts
Conceptual/Hypothetical Population: population which does not
physically exist
Examples: all possible values of tomorrow’s highest temperature; all
possible pH values of some unknown liquid; etc.
Enumerative v.s. Analytic Studies
Enumerative Studies: the sample is available to an investigator or
else can be constructed
Examples: life of the GE lamps; the gender of students in this
classroom
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
6 / 36
Basic Concepts
Conceptual/Hypothetical Population: population which does not
physically exist
Examples: all possible values of tomorrow’s highest temperature; all
possible pH values of some unknown liquid; etc.
Enumerative v.s. Analytic Studies
Enumerative Studies: the sample is available to an investigator or
else can be constructed
Examples: life of the GE lamps; the gender of students in this
classroom
Analytic Studies: the sample is NOT available
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
6 / 36
Basic Concepts
Conceptual/Hypothetical Population: population which does not
physically exist
Examples: all possible values of tomorrow’s highest temperature; all
possible pH values of some unknown liquid; etc.
Enumerative v.s. Analytic Studies
Enumerative Studies: the sample is available to an investigator or
else can be constructed
Examples: life of the GE lamps; the gender of students in this
classroom
Analytic Studies: the sample is NOT available
Examples: tomorrow’s highest temperature; Champion of the 2009
NBA
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
6 / 36
Basic Concepts
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
7 / 36
Basic Concepts
Descriptive Statistics & Inferential Statistics
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
7 / 36
Basic Concepts
Descriptive Statistics & Inferential Statistics
Recall: The discipline of statistics probides methods for organizing and
summarizing data and for drawing conclusions based on information
contained in the data.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
7 / 36
Basic Concepts
Descriptive Statistics & Inferential Statistics
Recall: The discipline of statistics probides methods for organizing and
summarizing data and for drawing conclusions based on information
contained in the data.
Descriptive Statistics: discipline of organizing and summarizing data
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
7 / 36
Basic Concepts
Descriptive Statistics & Inferential Statistics
Recall: The discipline of statistics probides methods for organizing and
summarizing data and for drawing conclusions based on information
contained in the data.
Descriptive Statistics: discipline of organizing and summarizing data
Inferential Statistics: discipline of drawing conclusions from a
sample to a population
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
7 / 36
Basic Concepts
Example(Example 1.2 p5): The article ‘‘Effects of Aggregates and Microfillers on
the Flexural Properties of Concrete’’ reported on a study of strength properties of
high performance concrete obtained by using superplasticizers and certain binders. The
accompanying data on flexural strength (in MPa) appeared in the article cited:
5.9 7.2
7.3
6.3
8.1
6.8
7.0
7.6
6.8
6.5 7.0
6.3
7.9
9.0
8.2
8.7
7.8
9.7
7.4 7.7
9.7
7.8
7.7
11.6
11.3
11.8
10.7
We are interested in the average value of flexural strength for all beams that could be
made in this way.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
8 / 36
Basic Concepts
Example(Example 1.2 p5): The article ‘‘Effects of Aggregates and Microfillers on
the Flexural Properties of Concrete’’ reported on a study of strength properties of
high performance concrete obtained by using superplasticizers and certain binders. The
accompanying data on flexural strength (in MPa) appeared in the article cited:
5.9 7.2
7.3
6.3
8.1
6.8
7.0
7.6
6.8
6.5 7.0
6.3
7.9
9.0
8.2
8.7
7.8
9.7
7.4 7.7
9.7
7.8
7.7
11.6
11.3
11.8
10.7
We are interested in the average value of flexural strength for all beams that could be
made in this way.
The stem-and-leaf plot:
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
8 / 36
Basic Concepts
Example(Example 1.2 p5): The article ‘‘Effects of Aggregates and Microfillers on
the Flexural Properties of Concrete’’ reported on a study of strength properties of
high performance concrete obtained by using superplasticizers and certain binders. The
accompanying data on flexural strength (in MPa) appeared in the article cited:
5.9 7.2
7.3
6.3
8.1
6.8
7.0
7.6
6.8
6.5 7.0
6.3
7.9
9.0
8.2
8.7
7.8
9.7
7.4 7.7
9.7
7.8
7.7
11.6
11.3
11.8
10.7
We are interested in the average value of flexural strength for all beams that could be
made in this way.
The stem-and-leaf plot:
5 |
9
6 |
33588
7 |
00234677889
8 |
127
9 |
077
10 |
7
11 |
368
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
8 / 36
Basic Concepts
Example(Example 1.2 p5): The article ‘‘Effects of Aggregates and Microfillers on
the Flexural Properties of Concrete’’ reported on a study of strength properties of
high performance concrete obtained by using superplasticizers and certain binders. The
accompanying data on flexural strength (in MPa) appeared in the article cited:
5.9 7.2
7.3
6.3
8.1
6.8
7.0
7.6
6.8
6.5 7.0
6.3
7.9
9.0
8.2
8.7
7.8
9.7
7.4 7.7
9.7
7.8
7.7
11.6
11.3
11.8
10.7
We are interested in the average value of flexural strength for all beams that could be
made in this way.
The stem-and-leaf plot:
The histogram graph:
5 |
9
6 |
33588
7 |
00234677889
8 |
127
9 |
077
10 |
7
11 |
368
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
8 / 36
Basic Concepts
Example(Example 1.2 p5): The article ‘‘Effects of Aggregates and Microfillers on
the Flexural Properties of Concrete’’ reported on a study of strength properties of
high performance concrete obtained by using superplasticizers and certain binders. The
accompanying data on flexural strength (in MPa) appeared in the article cited:
5.9 7.2
7.3
6.3
8.1
6.8
7.0
7.6
6.8
6.5 7.0
6.3
7.9
9.0
8.2
8.7
7.8
9.7
7.4 7.7
9.7
7.8
7.7
11.6
11.3
11.8
10.7
We are interested in the average value of flexural strength for all beams that could be
made in this way.
The stem-and-leaf plot:
The histogram graph:
5 |
9
6 |
33588
7 |
00234677889
8 |
127
9 |
077
10 |
7
11 |
368
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
8 / 36
Basic Concepts
Example(Example 1.2 p5): The article ‘‘Effects of Aggregates and
Microfillers on the Flexural Properties of Concrete’’ reported on a
study of strength properties of high performance concrete obtained by using
superplasticizers and certain binders. The accompanying data on flexural strength
(in MPa) appeared in the article cited:
5.9 7.2 7.3 6.3 8.1
6.8
7.0
7.6
6.8
6.5 7.0 6.3 7.9 9.0
8.2
8.7
7.8
9.7
7.4 7.7 9.7 7.8 7.7 11.6 11.3 11.8 10.7
We are interested in the average value of flexural strength for all beams that could
be made in this way.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
9 / 36
Basic Concepts
Example(Example 1.2 p5): The article ‘‘Effects of Aggregates and
Microfillers on the Flexural Properties of Concrete’’ reported on a
study of strength properties of high performance concrete obtained by using
superplasticizers and certain binders. The accompanying data on flexural strength
(in MPa) appeared in the article cited:
5.9 7.2 7.3 6.3 8.1
6.8
7.0
7.6
6.8
6.5 7.0 6.3 7.9 9.0
8.2
8.7
7.8
9.7
7.4 7.7 9.7 7.8 7.7 11.6 11.3 11.8 10.7
We are interested in the average value of flexural strength for all beams that could
be made in this way.
Moreover, we can make statistical inferences from this data set.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
9 / 36
Basic Concepts
Example(Example 1.2 p5): The article ‘‘Effects of Aggregates and
Microfillers on the Flexural Properties of Concrete’’ reported on a
study of strength properties of high performance concrete obtained by using
superplasticizers and certain binders. The accompanying data on flexural strength
(in MPa) appeared in the article cited:
5.9 7.2 7.3 6.3 8.1
6.8
7.0
7.6
6.8
6.5 7.0 6.3 7.9 9.0
8.2
8.7
7.8
9.7
7.4 7.7 9.7 7.8 7.7 11.6 11.3 11.8 10.7
We are interested in the average value of flexural strength for all beams that could
be made in this way.
Moreover, we can make statistical inferences from this data set.
It can be shown that, with a high degree of confidence, the population mean
strength is between 7.48 MPa and 8.80 Mpa; this is called a confidence interval or
interval.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
9 / 36
Basic Concepts
Example(Example 1.2 p5): The article ‘‘Effects of Aggregates and
Microfillers on the Flexural Properties of Concrete’’ reported on a
study of strength properties of high performance concrete obtained by using
superplasticizers and certain binders. The accompanying data on flexural strength
(in MPa) appeared in the article cited:
5.9 7.2 7.3 6.3 8.1
6.8
7.0
7.6
6.8
6.5 7.0 6.3 7.9 9.0
8.2
8.7
7.8
9.7
7.4 7.7 9.7 7.8 7.7 11.6 11.3 11.8 10.7
We are interested in the average value of flexural strength for all beams that could
be made in this way.
Moreover, we can make statistical inferences from this data set.
It can be shown that, with a high degree of confidence, the population mean
strength is between 7.48 MPa and 8.80 Mpa; this is called a confidence interval or
interval.
Furthermore, with a high degree of confidence, the strength of a single such beam
will exceed 7.35 MPa; this number 7.35 is called a lower prediction bound.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
9 / 36
Probability & Statistics
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
10 / 36
Probability & Statistics
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
10 / 36
Probability & Statistics
Probability: know the information of population and ask question
about sample
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
10 / 36
Probability & Statistics
Probability: know the information of population and ask question
about sample
A probability question: We have a fair coin and toss it many times.
What’s the chance to get three consecutive heads?
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
10 / 36
Probability & Statistics
Probability: know the information of population and ask question
about sample
A probability question: We have a fair coin and toss it many times.
What’s the chance to get three consecutive heads?
Statistics: know the information of sample and ask question about
population
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
10 / 36
Probability & Statistics
Probability: know the information of population and ask question
about sample
A probability question: We have a fair coin and toss it many times.
What’s the chance to get three consecutive heads?
Statistics: know the information of sample and ask question about
population
A statistic question: We have a coin and toss it 6 times. The results
are THT,THH, HTT, HTH, TTH and HTT. Is this coin a fair coin?
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
10 / 36
Pictorial and Tabular Methods
Stem-and-Leaf Displays
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
11 / 36
Pictorial and Tabular Methods
Stem-and-Leaf Displays
1. Select one or more leading digits for the stem values. The trailing
digits become the leaves.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
11 / 36
Pictorial and Tabular Methods
Stem-and-Leaf Displays
1. Select one or more leading digits for the stem values. The trailing
digits become the leaves.
2. List possible stem values in a vertical column.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
11 / 36
Pictorial and Tabular Methods
Stem-and-Leaf Displays
1. Select one or more leading digits for the stem values. The trailing
digits become the leaves.
2. List possible stem values in a vertical column.
3. Record the leaf for every observation beside the corresponding
stem value.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
11 / 36
Pictorial and Tabular Methods
Stem-and-Leaf Displays
1. Select one or more leading digits for the stem values. The trailing
digits become the leaves.
2. List possible stem values in a vertical column.
3. Record the leaf for every observation beside the corresponding
stem value.
4. Indicate the units for stems and leaves someplace in the display.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
11 / 36
Pictorial and Tabular Methods
Example(Example 1.2 p5): The article ‘‘Effects of Aggregates
and Microfillers on the Flexural Properties of
Concrete’’ reported on a study of strength properties of high
performance concrete obtained by using superplasticizers and certain
binders. The accompanying data on flexural strength (in MPa)
appeared in the article cited:
5.9 7.2 7.3 6.3 8.1
6.8
7.0
7.6
6.8
6.5 7.0 6.3 7.9 9.0
8.2
8.7
7.8
9.7
7.4 7.7 9.7 7.8 7.7 11.6 11.3 11.8 10.7
We are interested in the average value of flexural strength for all
beams that could be made in this way.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
12 / 36
Pictorial and Tabular Methods
5.9
6.5
7.4
7.2
7.0
7.7
7.3
6.3
9.7
Liang Zhang (UofU)
6.3
7.9
7.8
8.1
9.0
7.7
6.8
8.2
11.6
7.0
8.7
11.3
Applied Statistics I
7.6
7.8
11.8
6.8
9.7
10.7
June 9, 2008
13 / 36
Pictorial and Tabular Methods
5.9
6.5
7.4
7.2
7.0
7.7
7.3
6.3
9.7
6.3
7.9
7.8
8.1
9.0
7.7
6.8
8.2
11.6
7.0
8.7
11.3
7.6
7.8
11.8
6.8
9.7
10.7
The decimal point is at the |
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
13 / 36
Pictorial and Tabular Methods
5.9
6.5
7.4
7.2
7.0
7.7
7.3
6.3
9.7
6.3
7.9
7.8
8.1
9.0
7.7
6.8
8.2
11.6
7.0
8.7
11.3
7.6
7.8
11.8
6.8
9.7
10.7
The decimal point is at the |
5
|
9
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
13 / 36
Pictorial and Tabular Methods
5.9
6.5
7.4
7.2
7.0
7.7
7.3
6.3
9.7
6.3
7.9
7.8
8.1
9.0
7.7
6.8
8.2
11.6
7.0
8.7
11.3
7.6
7.8
11.8
6.8
9.7
10.7
The decimal point is at the |
5
6
|
|
9
33588
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
13 / 36
Pictorial and Tabular Methods
5.9
6.5
7.4
7.2
7.0
7.7
7.3
6.3
9.7
6.3
7.9
7.8
8.1
9.0
7.7
6.8
8.2
11.6
7.0
8.7
11.3
7.6
7.8
11.8
6.8
9.7
10.7
The decimal point is at the |
5
6
7
|
|
|
9
33588
00234677889
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
13 / 36
Pictorial and Tabular Methods
5.9
6.5
7.4
7.2
7.0
7.7
7.3
6.3
9.7
6.3
7.9
7.8
8.1
9.0
7.7
6.8
8.2
11.6
7.0
8.7
11.3
7.6
7.8
11.8
6.8
9.7
10.7
The decimal point is at the |
5
6
7
8
|
|
|
|
9
33588
00234677889
127
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
13 / 36
Pictorial and Tabular Methods
5.9
6.5
7.4
7.2
7.0
7.7
7.3
6.3
9.7
6.3
7.9
7.8
8.1
9.0
7.7
6.8
8.2
11.6
7.0
8.7
11.3
7.6
7.8
11.8
6.8
9.7
10.7
The decimal point is at the |
5
6
7
8
9
|
|
|
|
|
9
33588
00234677889
127
077
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
13 / 36
Pictorial and Tabular Methods
5.9
6.5
7.4
7.2
7.0
7.7
7.3
6.3
9.7
6.3
7.9
7.8
8.1
9.0
7.7
6.8
8.2
11.6
7.0
8.7
11.3
7.6
7.8
11.8
6.8
9.7
10.7
The decimal point is at the |
5
6
7
8
9
10
|
|
|
|
|
|
9
33588
00234677889
127
077
7
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
13 / 36
Pictorial and Tabular Methods
5.9
6.5
7.4
7.2
7.0
7.7
7.3
6.3
9.7
6.3
7.9
7.8
8.1
9.0
7.7
6.8
8.2
11.6
7.0
8.7
11.3
7.6
7.8
11.8
6.8
9.7
10.7
The decimal point is at the |
5
6
7
8
9
10
11
|
|
|
|
|
|
|
9
33588
00234677889
127
077
7
368
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
13 / 36
Pictorial and Tabular Methods
5.9
6.5
7.4
7.2
7.0
7.7
7.3
6.3
9.7
6.3
7.9
7.8
8.1
9.0
7.7
6.8
8.2
11.6
7.0
8.7
11.3
7.6
7.8
11.8
6.8
9.7
10.7
The decimal point is at the |
5
6
7
8
9
10
11
|
|
|
|
|
|
|
9
33588
00234677889
127
077
7
368
Liang Zhang (UofU)
• identification of a typical value
Applied Statistics I
June 9, 2008
13 / 36
Pictorial and Tabular Methods
5.9
6.5
7.4
7.2
7.0
7.7
7.3
6.3
9.7
6.3
7.9
7.8
8.1
9.0
7.7
6.8
8.2
11.6
7.0
8.7
11.3
7.6
7.8
11.8
6.8
9.7
10.7
The decimal point is at the |
5
6
7
8
9
10
11
|
|
|
|
|
|
|
9
33588
00234677889
127
077
7
368
Liang Zhang (UofU)
• identification of a typical value
• presence of any gaps in the data
Applied Statistics I
June 9, 2008
13 / 36
Pictorial and Tabular Methods
5.9
6.5
7.4
7.2
7.0
7.7
7.3
6.3
9.7
6.3
7.9
7.8
8.1
9.0
7.7
6.8
8.2
11.6
7.0
8.7
11.3
7.6
7.8
11.8
6.8
9.7
10.7
The decimal point is at the |
5
6
7
8
9
10
11
|
|
|
|
|
|
|
9
33588
00234677889
127
077
7
368
Liang Zhang (UofU)
• identification of a typical value
• presence of any gaps in the data
• extent of symmetry in the
distribution of values
Applied Statistics I
June 9, 2008
13 / 36
Pictorial and Tabular Methods
5.9
6.5
7.4
7.2
7.0
7.7
7.3
6.3
9.7
6.3
7.9
7.8
8.1
9.0
7.7
6.8
8.2
11.6
7.0
8.7
11.3
7.6
7.8
11.8
6.8
9.7
10.7
The decimal point is at the |
5
6
7
8
9
10
11
|
|
|
|
|
|
|
9
33588
00234677889
127
077
7
368
Liang Zhang (UofU)
• identification of a typical value
• presence of any gaps in the data
• extent of symmetry in the
distribution of values
• number and location of peaks
Applied Statistics I
June 9, 2008
13 / 36
Pictorial and Tabular Methods
5.9
6.5
7.4
7.2
7.0
7.7
7.3
6.3
9.7
6.3
7.9
7.8
8.1
9.0
7.7
6.8
8.2
11.6
7.0
8.7
11.3
7.6
7.8
11.8
6.8
9.7
10.7
The decimal point is at the |
5
6
7
8
9
10
11
|
|
|
|
|
|
|
9
33588
00234677889
127
077
7
368
Liang Zhang (UofU)
• identification of a typical value
• presence of any gaps in the data
• extent of symmetry in the
distribution of values
• number and location of peaks
• presence of any outlying values
Applied Statistics I
June 9, 2008
13 / 36
Pictorial and Tabular Methods
Remark:
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
14 / 36
Pictorial and Tabular Methods
Remark:
1. Each data in the population must consist of at least two digits.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
14 / 36
Pictorial and Tabular Methods
Remark:
1. Each data in the population must consist of at least two digits.
e.g. the stem-and-leaf display is not suitable for the data set
1,2,1,4,1,5,2,6,1,3,2,3
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
14 / 36
Pictorial and Tabular Methods
Remark:
1. Each data in the population must consist of at least two digits.
e.g. the stem-and-leaf display is not suitable for the data set
1,2,1,4,1,5,2,6,1,3,2,3
2. Ordering the leaves from smallest to largest is not necessary
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
14 / 36
Pictorial and Tabular Methods
The decimal point is at the |
5
6
7
8
9
10
11
|
|
|
|
|
|
|
9
38853
23060984787
127
077
7
638
Liang Zhang (UofU)
The decimal point is at the |
5
6
7
8
9
10
11
Applied Statistics I
|
|
|
|
|
|
|
9
33588
00234677889
127
077
7
368
June 9, 2008
15 / 36
Pictorial and Tabular Methods
Dotplots:
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
16 / 36
Pictorial and Tabular Methods
Dotplots:
e.g. The dotplot for the previous example:
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
16 / 36
Pictorial and Tabular Methods
Dotplots:
e.g. The dotplot for the previous example:
In a dotplot, each data is represented by a dot above the
corresponding location on a horizontal measurement scale. When a
value occurs more than once, there is a dot for each occurrence, and
these dots are stacked vertically.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
16 / 36
Pictorial and Tabular Methods
Histograms
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
17 / 36
Pictorial and Tabular Methods
Histograms
e.g. The histogram for the previous example:
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
17 / 36
Pictorial and Tabular Methods
Discrete & Continuous Variables:
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
18 / 36
Pictorial and Tabular Methods
Discrete & Continuous Variables:
A numerical variable is discrete if its set of possible values is either
finite or can be listed in an infinite sequence.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
18 / 36
Pictorial and Tabular Methods
Discrete & Continuous Variables:
A numerical variable is discrete if its set of possible values is either
finite or can be listed in an infinite sequence.
e.g. x = number of students in this classroom who drove to school
today
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
18 / 36
Pictorial and Tabular Methods
Discrete & Continuous Variables:
A numerical variable is discrete if its set of possible values is either
finite or can be listed in an infinite sequence.
e.g. x = number of students in this classroom who drove to school
today
Usually arising from counting
A numerical variable is continuous if its possible values consist of an
entire interval on the number line.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
18 / 36
Pictorial and Tabular Methods
Discrete & Continuous Variables:
A numerical variable is discrete if its set of possible values is either
finite or can be listed in an infinite sequence.
e.g. x = number of students in this classroom who drove to school
today
Usually arising from counting
A numerical variable is continuous if its possible values consist of an
entire interval on the number line.
e.g y = maximum hours a GE lamp can last
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
18 / 36
Pictorial and Tabular Methods
Discrete & Continuous Variables:
A numerical variable is discrete if its set of possible values is either
finite or can be listed in an infinite sequence.
e.g. x = number of students in this classroom who drove to school
today
Usually arising from counting
A numerical variable is continuous if its possible values consist of an
entire interval on the number line.
e.g y = maximum hours a GE lamp can last
Usually arising from measuring
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
18 / 36
Pictorial and Tabular Methods
Frequency: the frequency of any particular data value is the number
of times that value occurs in the data set.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
19 / 36
Pictorial and Tabular Methods
Frequency: the frequency of any particular data value is the number
of times that value occurs in the data set.
Relative Frequency: the relative frequency of a value is the fraction
of proportion of times the value occurs
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
19 / 36
Pictorial and Tabular Methods
Frequency: the frequency of any particular data value is the number
of times that value occurs in the data set.
Relative Frequency: the relative frequency of a value is the fraction
of proportion of times the value occurs
relative frequency =
Liang Zhang (UofU)
number of times the value occur
number of observations in the data set
Applied Statistics I
June 9, 2008
19 / 36
Pictorial and Tabular Methods
Frequency: the frequency of any particular data value is the number
of times that value occurs in the data set.
Relative Frequency: the relative frequency of a value is the fraction
of proportion of times the value occurs
relative frequency =
number of times the value occur
number of observations in the data set
e.g.
frequency of value 6.8:
relative frequency of the value 6.8:
Liang Zhang (UofU)
Applied Statistics I
2
2
27
= 0.074
June 9, 2008
19 / 36
Pictorial and Tabular Methods
Frequency: the frequency of any particular data value is the number
of times that value occurs in the data set.
Relative Frequency: the relative frequency of a value is the fraction
of proportion of times the value occurs
relative frequency =
number of times the value occur
number of observations in the data set
e.g.
frequency of value 6.8:
2
2
relative frequency of the value 6.8: 27
= 0.074
Frequency Distribution: a tabulation of the frequencies and/or
relative frequencies.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
19 / 36
Pictorial and Tabular Methods
Constructing a Histogram for a Data Set:
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
20 / 36
Pictorial and Tabular Methods
Constructing a Histogram for a Data Set:
1. Divide the data set into a suitable number of class interval or classes;
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
20 / 36
Pictorial and Tabular Methods
Constructing a Histogram for a Data Set:
1. Divide the data set into a suitable number of class interval or classes;
2. Determine the frequency and relative frequency for each class;
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
20 / 36
Pictorial and Tabular Methods
Constructing a Histogram for a Data Set:
1. Divide the data set into a suitable number of class interval or classes;
2. Determine the frequency and relative frequency for each class;
3. Mark the class boundaries on a horizontal measurement axis;
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
20 / 36
Pictorial and Tabular Methods
Constructing a Histogram for a Data Set:
1. Divide the data set into a suitable number of class interval or classes;
2. Determine the frequency and relative frequency for each class;
3. Mark the class boundaries on a horizontal measurement axis;
4. Above each class interval, draw a rectangle whose height is the
corresponding relative frequency(or frequency)
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
20 / 36
Pictorial and Tabular Methods
classes
5.00 - 5.99
6.00 - 6.99
7.00 - 7.99
8.00 - 8.99
9.00 - 9.99
10.00 - 10.99
11.00 - 11.99
Liang Zhang (UofU)
frequency
1
5
11
3
3
1
3
relative frequency
0.037
0.185
0.407
0.111
0.111
0.037
0.111
Applied Statistics I
June 9, 2008
21 / 36
Pictorial and Tabular Methods
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
22 / 36
Pictorial and Tabular Methods
Remark:
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
23 / 36
Pictorial and Tabular Methods
Remark:
1. For discrete data, we usually don’t have to determine the class
intervals.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
23 / 36
Pictorial and Tabular Methods
Remark:
1. For discrete data, we usually don’t have to determine the class
intervals.
2. There is no hard-and-fast rules for the choice of class intervals. A
reasonable rule of thumb is
√
number of classes = number of observation
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
23 / 36
Pictorial and Tabular Methods
Remark:
1. For discrete data, we usually don’t have to determine the class
intervals.
2. There is no hard-and-fast rules for the choice of class intervals. A
reasonable rule of thumb is
√
number of classes = number of observation
3. Equal-width classes may not be a sensible choice if a data set
“stretches out” to one side or the other.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
23 / 36
Pictorial and Tabular Methods
Remark:
1. For discrete data, we usually don’t have to determine the class
intervals.
2. There is no hard-and-fast rules for the choice of class intervals. A
reasonable rule of thumb is
√
number of classes = number of observation
3. Equal-width classes may not be a sensible choice if a data set
“stretches out” to one side or the other.
e.g.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
23 / 36
Pictorial and Tabular Methods
Remark:
3. Equal-width classes may not be a sensible choice if a data set
“stretches out” to one side or the other.
e.g.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
24 / 36
Pictorial and Tabular Methods
Remark:
3. Equal-width classes may not be a sensible choice if a data set
“stretches out” to one side or the other.
e.g.
Use a few wider intervals near extreme observations and narrower
intervals in the region of high concentration.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
24 / 36
Pictorial and Tabular Methods
Remark:
3. Equal-width classes may not be a sensible choice if a data set
“stretches out” to one side or the other.
e.g.
Use a few wider intervals near extreme observations and narrower
intervals in the region of high concentration.
rectangle height =
Liang Zhang (UofU)
relative frequency of the class
class width
Applied Statistics I
June 9, 2008
24 / 36
Pictorial and Tabular Methods
Shapes of Histograms:
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
25 / 36
Measure of Location
Notation: We use n to denote the sample size; i.e. the number of
observations in a single sample.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
26 / 36
Measure of Location
Notation: We use n to denote the sample size; i.e. the number of
observations in a single sample.
e.g. if the sample of students’ heights is {180cm, 175cm, 191cm,
184cm, 178cm, 188cm}, then n = 6.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
26 / 36
Measure of Location
Notation: We use n to denote the sample size; i.e. the number of
observations in a single sample.
e.g. if the sample of students’ heights is {180cm, 175cm, 191cm,
184cm, 178cm, 188cm}, then n = 6.
Furthermore, we use x1 , x2 , . . . , xn to denote the sample data.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
26 / 36
Measure of Location
Notation: We use n to denote the sample size; i.e. the number of
observations in a single sample.
e.g. if the sample of students’ heights is {180cm, 175cm, 191cm,
184cm, 178cm, 188cm}, then n = 6.
Furthermore, we use x1 , x2 , . . . , xn to denote the sample data.
e.g. in the above example,
x1 = 180, x2 = 175, x3 = 191, x4 = 184, x5 = 178, x4 = 188.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
26 / 36
Measure of Location
Sample Mean:
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
27 / 36
Measure of Location
Sample Mean:
The sample mean x̄ of observations x1 , x2 , . . . , xn is defined as
Pn
xi
x1 + x2 + · · · , xn
= i=1
x̄ =
n
n
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
27 / 36
Measure of Location
Sample Mean:
The sample mean x̄ of observations x1 , x2 , . . . , xn is defined as
Pn
xi
x1 + x2 + · · · , xn
= i=1
x̄ =
n
n
Remark:
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
27 / 36
Measure of Location
Sample Mean:
The sample mean x̄ of observations x1 , x2 , . . . , xn is defined as
Pn
xi
x1 + x2 + · · · , xn
= i=1
x̄ =
n
n
Remark:
1. For simplicity, we can informally write x̄ =
summation is over all sample observations.
Liang Zhang (UofU)
Applied Statistics I
P
xi
n ,
where the
June 9, 2008
27 / 36
Measure of Location
Sample Mean:
The sample mean x̄ of observations x1 , x2 , . . . , xn is defined as
Pn
xi
x1 + x2 + · · · , xn
= i=1
x̄ =
n
n
Remark:
P
1. For simplicity, we can informally write x̄ = nxi , where the
summation is over all sample observations.
2. When reporting x̄, we use decimal accuracy of one digit more than
the accuracy of the xi ’s.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
27 / 36
Measure of Location
Sample Mean:
The sample mean x̄ of observations x1 , x2 , . . . , xn is defined as
Pn
xi
x1 + x2 + · · · , xn
= i=1
x̄ =
n
n
Remark:
P
1. For simplicity, we can informally write x̄ = nxi , where the
summation is over all sample observations.
2. When reporting x̄, we use decimal accuracy of one digit more than
the accuracy of the xi ’s.
3. The average of all values in the population is defined as population
mean and it is denoted by the Greek letter µ. In statistics, µ is
usually unavailable and we want to get some infomation about
population mean µ from sample mean x̄.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
27 / 36
Measure of Location
Example:
In the previous example, the sample is {180, 175, 191, 184, 178, 188} and
the sample size is 6; then the sample mean is calculated as
x̄ =
Liang Zhang (UofU)
180 + 175 + 191 + 184 + 178 + 188
= 182.7
6
Applied Statistics I
June 9, 2008
28 / 36
Measure of Location
Pros and Cons
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
29 / 36
Measure of Location
Pros and Cons
Pros: the sample mean tells us the location (center) of the sample.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
29 / 36
Measure of Location
Pros and Cons
Pros: the sample mean tells us the location (center) of the sample.
Cons: the sample mean can be significantly affected by outliers
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
29 / 36
Measure of Location
Pros and Cons
Pros: the sample mean tells us the location (center) of the sample.
Cons: the sample mean can be significantly affected by outliers
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
29 / 36
Measure of Location
Sample Median
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
30 / 36
Measure of Location
Sample Median
The sample median is obtained by first ordering the n observations
from smallest to largest (with any repeated values included so that
every sample observation appears in the ordered list). Then,
(
th
( n+1
if n is odd
2 ) ordered value,
x̃ =
n
n th
th
average of ( 2 ) and ( 2 + 1) ordered values, if n is even
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
30 / 36
Measure of Location
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
31 / 36
Measure of Location
e.g. in the previous example, the sample is
x1 = 180, x2 = 175, x3 = 191, x4 = 184, x5 = 178, x4 = 188. Then the
ordered observation is
x1:6 = 175, x2:6 = 178, x3:6 = 180, x4:6 = 184, x5:6 = 188, x6:6 = 191.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
31 / 36
Measure of Location
e.g. in the previous example, the sample is
x1 = 180, x2 = 175, x3 = 191, x4 = 184, x5 = 178, x4 = 188. Then the
ordered observation is
x1:6 = 175, x2:6 = 178, x3:6 = 180, x4:6 = 184, x5:6 = 188, x6:6 = 191.
And the sample median is the average of x3:6 and x4:6 , which is 182, since
the sample size is even.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
31 / 36
Measure of Location
e.g. in the previous example, the sample is
x1 = 180, x2 = 175, x3 = 191, x4 = 184, x5 = 178, x4 = 188. Then the
ordered observation is
x1:6 = 175, x2:6 = 178, x3:6 = 180, x4:6 = 184, x5:6 = 188, x6:6 = 191.
And the sample median is the average of x3:6 and x4:6 , which is 182, since
the sample size is even.
If we have one more observation x7 = 189, then the ordered observation is
x1:7 = 175, x2:7 = 178, x3:7 = 180, x4:7 = 184, x5:7 = 188, x6:7 = 189, x7:7 =
191 and the sample median is x4:7 = 184, since the sample size now is odd.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
31 / 36
Measure of Location
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
32 / 36
Measure of Location
Remark:
1. Contrary to the sample mean, the sample median is very insensitive to
outliers. In fact, the sample median is affected by at most two values in
the sample.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
32 / 36
Measure of Location
Remark:
1. Contrary to the sample mean, the sample median is very insensitive to
outliers. In fact, the sample median is affected by at most two values in
the sample.
2. Similar to the sample mean and the population mean, we can define the
population median. However, in general, the sample median DOES NOT
equal to the population median. In statistics, we want to use sample
median to infer population median.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
32 / 36
Measure of Location
Other Measures of Location:
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
33 / 36
Measure of Location
Other Measures of Location:
Quartiles: a quartile is any of the three values which divide the
ordered data set into four equal parts, so that each part represents
( 41 )th of the sample.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
33 / 36
Measure of Location
Other Measures of Location:
Quartiles: a quartile is any of the three values which divide the
ordered data set into four equal parts, so that each part represents
( 41 )th of the sample.
e.g. If our sample data about the students’ height is 180, 175, 191,
184, 178, 188,189, 183, 197, 186, 172, 169, 181, 177, 170, 172, then
the ordered data would be
169 170 172 172 | 175 177 178 180 | 181 183 184 186 | 188 189 191
197. And a summer of this sample data is given by:
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
33 / 36
Measure of Location
Other Measures of Location:
Quartiles: a quartile is any of the three values which divide the
ordered data set into four equal parts, so that each part represents
( 41 )th of the sample.
e.g. If our sample data about the students’ height is 180, 175, 191,
184, 178, 188,189, 183, 197, 186, 172, 169, 181, 177, 170, 172, then
the ordered data would be
169 170 172 172 | 175 177 178 180 | 181 183 184 186 | 188 189 191
197. And a summer of this sample data is given by:
Min. 1st Qu. Median Mean 3rd Qu. Max.
169.0
173.5
180.5
180.8
187
197.0
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
33 / 36
Measure of Location
Other Measures of Location:
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
34 / 36
Measure of Location
Other Measures of Location:
Percentiles: A percentile is the data value below which a certain
percent of observations fall.
e.g. the 20th percentile is the value below which 20 percent of the
observations may be found. In our previous example, the sampel size
is 16, 20% which is 3.2. So the 20th percentile is 171.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
34 / 36
Measure of Location
Other Measures of Location:
Percentiles: A percentile is the data value below which a certain
percent of observations fall.
e.g. the 20th percentile is the value below which 20 percent of the
observations may be found. In our previous example, the sampel size
is 16, 20% which is 3.2. So the 20th percentile is 171.
Trimmed Mean: a p% trimmed mean is obtained by eliminating the
smallest p% data values and the largest p% data values and
averaging the left data values. It is a compromise between sample
mean and sample median.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
34 / 36
Measure of Location
Other Measures of Location:
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
35 / 36
Measure of Location
Other Measures of Location:
Trimmed Mean:
e.g. in our previous example, the sample data is 180, 175, 191, 184,
178, 188,189, 183, 197, 186, 172, 169, 181, 177, 170, 172. If we
want to eliminate the largest and smallest observation, then it is a
1
16 = 6.25% trimmed mean. Then the 6.25% trimmed mean is
x̄tr (6.25%) = 180.4.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
35 / 36
Measure of Location
Categorical Data:
In some cases, we can assign values to categorical data. Then we
can calculate the sample mean. In that situation, the sample mean
would be the sample proportion.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
36 / 36
Measure of Location
Categorical Data:
In some cases, we can assign values to categorical data. Then we
can calculate the sample mean. In that situation, the sample mean
would be the sample proportion.
e.g. if we toss a coin 10 times and get the result T, H, T, T, H,
T, H, H, H, T, we can assign 0 to T and 1 to H. Then, the sample
mean would be (1 + 1 + 1 + 1 + 1)/10 = 0.5 which is exactly the
proportion of heads in the sample data.
Liang Zhang (UofU)
Applied Statistics I
June 9, 2008
36 / 36
Download