Applied Statistics I Liang Zhang Department of Mathematics, University of Utah June 9, 2008 Liang Zhang (UofU) Applied Statistics I June 9, 2008 1 / 36 Introduction What is statistics? Liang Zhang (UofU) Applied Statistics I June 9, 2008 2 / 36 Introduction What is statistics? “Utah Democrats are more sure. Thirty-six percent said Obama will take the oath of office, 24 percent didn’t know, and 22 percent said it will be Clinton.” from Desert News: Liang Zhang (UofU) It’s Utah’s turn: Local voters favor Mitt and Obama, poll shows Applied Statistics I June 9, 2008 2 / 36 Introduction What is statistics? “Utah Democrats are more sure. Thirty-six percent said Obama will take the oath of office, 24 percent didn’t know, and 22 percent said it will be Clinton.” from Desert News: It’s Utah’s turn: Local voters favor Mitt and Obama, poll shows R “GE Spiral lamps: long life – from 8,000 to 12,000 hours” from http://www.geconsumerproducts.com/ Liang Zhang (UofU) Applied Statistics I June 9, 2008 2 / 36 Introduction What is statistics? “Utah Democrats are more sure. Thirty-six percent said Obama will take the oath of office, 24 percent didn’t know, and 22 percent said it will be Clinton.” from Desert News: It’s Utah’s turn: Local voters favor Mitt and Obama, poll shows R “GE Spiral lamps: long life – from 8,000 to 12,000 hours” from http://www.geconsumerproducts.com/ “Dow Jones Industrial Average on Jun.5th” from http://www.finance.yahoo.com/ Liang Zhang (UofU) Applied Statistics I June 9, 2008 2 / 36 Introduction Latin word “status” meaning “state” Liang Zhang (UofU) Applied Statistics I June 9, 2008 3 / 36 Introduction Latin word “status” meaning “state” The discipline of statistics probides methods for organizing and summarizing data and for drawing conclusions based on information contained in the data. Liang Zhang (UofU) Applied Statistics I June 9, 2008 3 / 36 Introduction Latin word “status” meaning “state” The discipline of statistics probides methods for organizing and summarizing data and for drawing conclusions based on information contained in the data. Our Focus: Drawing Conclusions or Making Statistical Inferences Liang Zhang (UofU) Applied Statistics I June 9, 2008 3 / 36 Basic Concepts Liang Zhang (UofU) Applied Statistics I June 9, 2008 4 / 36 Basic Concepts Population: total collection of objects we are interested in Liang Zhang (UofU) Applied Statistics I June 9, 2008 4 / 36 Basic Concepts Population: total collection of objects we are interested in Sample: a subset of the population Liang Zhang (UofU) Applied Statistics I June 9, 2008 4 / 36 Basic Concepts Population: total collection of objects we are interested in Sample: a subset of the population Census: information for all objects in the population Liang Zhang (UofU) Applied Statistics I June 9, 2008 4 / 36 Basic Concepts Population: total collection of objects we are interested in Sample: a subset of the population Census: information for all objects in the population Examples: Liang Zhang (UofU) Applied Statistics I June 9, 2008 4 / 36 Basic Concepts Population: total collection of objects we are interested in Sample: a subset of the population Census: information for all objects in the population Examples: Number of students in this classroom who drove here today Liang Zhang (UofU) Applied Statistics I June 9, 2008 4 / 36 Basic Concepts Population: total collection of objects we are interested in Sample: a subset of the population Census: information for all objects in the population Examples: Number of students in this classroom who drove here today Population: all the students in the class room; Sample: All the boy; Census: possible Liang Zhang (UofU) Applied Statistics I June 9, 2008 4 / 36 Basic Concepts Population: total collection of objects we are interested in Sample: a subset of the population Census: information for all objects in the population Examples: Number of students in this classroom who drove here today Population: all the students in the class room; Sample: All the boy; Census: possible GE manufactured 100,000,000 lamps. What’s life range? Liang Zhang (UofU) Applied Statistics I June 9, 2008 4 / 36 Basic Concepts Population: total collection of objects we are interested in Sample: a subset of the population Census: information for all objects in the population Examples: Number of students in this classroom who drove here today Population: all the students in the class room; Sample: All the boy; Census: possible GE manufactured 100,000,000 lamps. What’s life range? Population: 100,000,000 lamps; Sample: randomly selected 1,000 lamps; Census: impossible Liang Zhang (UofU) Applied Statistics I June 9, 2008 4 / 36 Basic Concepts Liang Zhang (UofU) Applied Statistics I June 9, 2008 5 / 36 Basic Concepts Variable: a characteristic of the population that may differ from individual to individual Liang Zhang (UofU) Applied Statistics I June 9, 2008 5 / 36 Basic Concepts Variable: a characteristic of the population that may differ from individual to individual usually use lowercase letters to denote variables Liang Zhang (UofU) Applied Statistics I June 9, 2008 5 / 36 Basic Concepts Variable: a characteristic of the population that may differ from individual to individual usually use lowercase letters to denote variables Examples: x = yes or no a student drove to school today y = maximum hours a lamp can last Liang Zhang (UofU) Applied Statistics I June 9, 2008 5 / 36 Basic Concepts Variable: a characteristic of the population that may differ from individual to individual usually use lowercase letters to denote variables Examples: x = yes or no a student drove to school today y = maximum hours a lamp can last Univariate Data: observation on a single variable Liang Zhang (UofU) Applied Statistics I June 9, 2008 5 / 36 Basic Concepts Variable: a characteristic of the population that may differ from individual to individual usually use lowercase letters to denote variables Examples: x = yes or no a student drove to school today y = maximum hours a lamp can last Univariate Data: observation on a single variable Bivariate Data: observation on each of two variables Liang Zhang (UofU) Applied Statistics I June 9, 2008 5 / 36 Basic Concepts Variable: a characteristic of the population that may differ from individual to individual usually use lowercase letters to denote variables Examples: x = yes or no a student drove to school today y = maximum hours a lamp can last Univariate Data: observation on a single variable Bivariate Data: observation on each of two variables Multivariate Data: observations made on more than one variable Liang Zhang (UofU) Applied Statistics I June 9, 2008 5 / 36 Basic Concepts Variable: a characteristic of the population that may differ from individual to individual usually use lowercase letters to denote variables Examples: x = yes or no a student drove to school today y = maximum hours a lamp can last Univariate Data: observation on a single variable Bivariate Data: observation on each of two variables Multivariate Data: observations made on more than one variable Examples: The collection of data about whether students drove to school today and the gender of students Liang Zhang (UofU) Applied Statistics I June 9, 2008 5 / 36 Basic Concepts Variable: a characteristic of the population that may differ from individual to individual usually use lowercase letters to denote variables Examples: x = yes or no a student drove to school today y = maximum hours a lamp can last Univariate Data: observation on a single variable Bivariate Data: observation on each of two variables Multivariate Data: observations made on more than one variable Examples: The collection of data about whether students drove to school today and the gender of students The collection of data about whether students drove to school today, the gender of students and the distance from their home to campus Liang Zhang (UofU) Applied Statistics I June 9, 2008 5 / 36 Basic Concepts Liang Zhang (UofU) Applied Statistics I June 9, 2008 6 / 36 Basic Concepts Conceptual/Hypothetical Population: population which does not physically exist Liang Zhang (UofU) Applied Statistics I June 9, 2008 6 / 36 Basic Concepts Conceptual/Hypothetical Population: population which does not physically exist Examples: all possible values of tomorrow’s highest temperature; all possible pH values of some unknown liquid; etc. Liang Zhang (UofU) Applied Statistics I June 9, 2008 6 / 36 Basic Concepts Conceptual/Hypothetical Population: population which does not physically exist Examples: all possible values of tomorrow’s highest temperature; all possible pH values of some unknown liquid; etc. Enumerative v.s. Analytic Studies Liang Zhang (UofU) Applied Statistics I June 9, 2008 6 / 36 Basic Concepts Conceptual/Hypothetical Population: population which does not physically exist Examples: all possible values of tomorrow’s highest temperature; all possible pH values of some unknown liquid; etc. Enumerative v.s. Analytic Studies Enumerative Studies: the sample is available to an investigator or else can be constructed Liang Zhang (UofU) Applied Statistics I June 9, 2008 6 / 36 Basic Concepts Conceptual/Hypothetical Population: population which does not physically exist Examples: all possible values of tomorrow’s highest temperature; all possible pH values of some unknown liquid; etc. Enumerative v.s. Analytic Studies Enumerative Studies: the sample is available to an investigator or else can be constructed Examples: life of the GE lamps; the gender of students in this classroom Liang Zhang (UofU) Applied Statistics I June 9, 2008 6 / 36 Basic Concepts Conceptual/Hypothetical Population: population which does not physically exist Examples: all possible values of tomorrow’s highest temperature; all possible pH values of some unknown liquid; etc. Enumerative v.s. Analytic Studies Enumerative Studies: the sample is available to an investigator or else can be constructed Examples: life of the GE lamps; the gender of students in this classroom Analytic Studies: the sample is NOT available Liang Zhang (UofU) Applied Statistics I June 9, 2008 6 / 36 Basic Concepts Conceptual/Hypothetical Population: population which does not physically exist Examples: all possible values of tomorrow’s highest temperature; all possible pH values of some unknown liquid; etc. Enumerative v.s. Analytic Studies Enumerative Studies: the sample is available to an investigator or else can be constructed Examples: life of the GE lamps; the gender of students in this classroom Analytic Studies: the sample is NOT available Examples: tomorrow’s highest temperature; Champion of the 2009 NBA Liang Zhang (UofU) Applied Statistics I June 9, 2008 6 / 36 Basic Concepts Liang Zhang (UofU) Applied Statistics I June 9, 2008 7 / 36 Basic Concepts Descriptive Statistics & Inferential Statistics Liang Zhang (UofU) Applied Statistics I June 9, 2008 7 / 36 Basic Concepts Descriptive Statistics & Inferential Statistics Recall: The discipline of statistics probides methods for organizing and summarizing data and for drawing conclusions based on information contained in the data. Liang Zhang (UofU) Applied Statistics I June 9, 2008 7 / 36 Basic Concepts Descriptive Statistics & Inferential Statistics Recall: The discipline of statistics probides methods for organizing and summarizing data and for drawing conclusions based on information contained in the data. Descriptive Statistics: discipline of organizing and summarizing data Liang Zhang (UofU) Applied Statistics I June 9, 2008 7 / 36 Basic Concepts Descriptive Statistics & Inferential Statistics Recall: The discipline of statistics probides methods for organizing and summarizing data and for drawing conclusions based on information contained in the data. Descriptive Statistics: discipline of organizing and summarizing data Inferential Statistics: discipline of drawing conclusions from a sample to a population Liang Zhang (UofU) Applied Statistics I June 9, 2008 7 / 36 Basic Concepts Example(Example 1.2 p5): The article ‘‘Effects of Aggregates and Microfillers on the Flexural Properties of Concrete’’ reported on a study of strength properties of high performance concrete obtained by using superplasticizers and certain binders. The accompanying data on flexural strength (in MPa) appeared in the article cited: 5.9 7.2 7.3 6.3 8.1 6.8 7.0 7.6 6.8 6.5 7.0 6.3 7.9 9.0 8.2 8.7 7.8 9.7 7.4 7.7 9.7 7.8 7.7 11.6 11.3 11.8 10.7 We are interested in the average value of flexural strength for all beams that could be made in this way. Liang Zhang (UofU) Applied Statistics I June 9, 2008 8 / 36 Basic Concepts Example(Example 1.2 p5): The article ‘‘Effects of Aggregates and Microfillers on the Flexural Properties of Concrete’’ reported on a study of strength properties of high performance concrete obtained by using superplasticizers and certain binders. The accompanying data on flexural strength (in MPa) appeared in the article cited: 5.9 7.2 7.3 6.3 8.1 6.8 7.0 7.6 6.8 6.5 7.0 6.3 7.9 9.0 8.2 8.7 7.8 9.7 7.4 7.7 9.7 7.8 7.7 11.6 11.3 11.8 10.7 We are interested in the average value of flexural strength for all beams that could be made in this way. The stem-and-leaf plot: Liang Zhang (UofU) Applied Statistics I June 9, 2008 8 / 36 Basic Concepts Example(Example 1.2 p5): The article ‘‘Effects of Aggregates and Microfillers on the Flexural Properties of Concrete’’ reported on a study of strength properties of high performance concrete obtained by using superplasticizers and certain binders. The accompanying data on flexural strength (in MPa) appeared in the article cited: 5.9 7.2 7.3 6.3 8.1 6.8 7.0 7.6 6.8 6.5 7.0 6.3 7.9 9.0 8.2 8.7 7.8 9.7 7.4 7.7 9.7 7.8 7.7 11.6 11.3 11.8 10.7 We are interested in the average value of flexural strength for all beams that could be made in this way. The stem-and-leaf plot: 5 | 9 6 | 33588 7 | 00234677889 8 | 127 9 | 077 10 | 7 11 | 368 Liang Zhang (UofU) Applied Statistics I June 9, 2008 8 / 36 Basic Concepts Example(Example 1.2 p5): The article ‘‘Effects of Aggregates and Microfillers on the Flexural Properties of Concrete’’ reported on a study of strength properties of high performance concrete obtained by using superplasticizers and certain binders. The accompanying data on flexural strength (in MPa) appeared in the article cited: 5.9 7.2 7.3 6.3 8.1 6.8 7.0 7.6 6.8 6.5 7.0 6.3 7.9 9.0 8.2 8.7 7.8 9.7 7.4 7.7 9.7 7.8 7.7 11.6 11.3 11.8 10.7 We are interested in the average value of flexural strength for all beams that could be made in this way. The stem-and-leaf plot: The histogram graph: 5 | 9 6 | 33588 7 | 00234677889 8 | 127 9 | 077 10 | 7 11 | 368 Liang Zhang (UofU) Applied Statistics I June 9, 2008 8 / 36 Basic Concepts Example(Example 1.2 p5): The article ‘‘Effects of Aggregates and Microfillers on the Flexural Properties of Concrete’’ reported on a study of strength properties of high performance concrete obtained by using superplasticizers and certain binders. The accompanying data on flexural strength (in MPa) appeared in the article cited: 5.9 7.2 7.3 6.3 8.1 6.8 7.0 7.6 6.8 6.5 7.0 6.3 7.9 9.0 8.2 8.7 7.8 9.7 7.4 7.7 9.7 7.8 7.7 11.6 11.3 11.8 10.7 We are interested in the average value of flexural strength for all beams that could be made in this way. The stem-and-leaf plot: The histogram graph: 5 | 9 6 | 33588 7 | 00234677889 8 | 127 9 | 077 10 | 7 11 | 368 Liang Zhang (UofU) Applied Statistics I June 9, 2008 8 / 36 Basic Concepts Example(Example 1.2 p5): The article ‘‘Effects of Aggregates and Microfillers on the Flexural Properties of Concrete’’ reported on a study of strength properties of high performance concrete obtained by using superplasticizers and certain binders. The accompanying data on flexural strength (in MPa) appeared in the article cited: 5.9 7.2 7.3 6.3 8.1 6.8 7.0 7.6 6.8 6.5 7.0 6.3 7.9 9.0 8.2 8.7 7.8 9.7 7.4 7.7 9.7 7.8 7.7 11.6 11.3 11.8 10.7 We are interested in the average value of flexural strength for all beams that could be made in this way. Liang Zhang (UofU) Applied Statistics I June 9, 2008 9 / 36 Basic Concepts Example(Example 1.2 p5): The article ‘‘Effects of Aggregates and Microfillers on the Flexural Properties of Concrete’’ reported on a study of strength properties of high performance concrete obtained by using superplasticizers and certain binders. The accompanying data on flexural strength (in MPa) appeared in the article cited: 5.9 7.2 7.3 6.3 8.1 6.8 7.0 7.6 6.8 6.5 7.0 6.3 7.9 9.0 8.2 8.7 7.8 9.7 7.4 7.7 9.7 7.8 7.7 11.6 11.3 11.8 10.7 We are interested in the average value of flexural strength for all beams that could be made in this way. Moreover, we can make statistical inferences from this data set. Liang Zhang (UofU) Applied Statistics I June 9, 2008 9 / 36 Basic Concepts Example(Example 1.2 p5): The article ‘‘Effects of Aggregates and Microfillers on the Flexural Properties of Concrete’’ reported on a study of strength properties of high performance concrete obtained by using superplasticizers and certain binders. The accompanying data on flexural strength (in MPa) appeared in the article cited: 5.9 7.2 7.3 6.3 8.1 6.8 7.0 7.6 6.8 6.5 7.0 6.3 7.9 9.0 8.2 8.7 7.8 9.7 7.4 7.7 9.7 7.8 7.7 11.6 11.3 11.8 10.7 We are interested in the average value of flexural strength for all beams that could be made in this way. Moreover, we can make statistical inferences from this data set. It can be shown that, with a high degree of confidence, the population mean strength is between 7.48 MPa and 8.80 Mpa; this is called a confidence interval or interval. Liang Zhang (UofU) Applied Statistics I June 9, 2008 9 / 36 Basic Concepts Example(Example 1.2 p5): The article ‘‘Effects of Aggregates and Microfillers on the Flexural Properties of Concrete’’ reported on a study of strength properties of high performance concrete obtained by using superplasticizers and certain binders. The accompanying data on flexural strength (in MPa) appeared in the article cited: 5.9 7.2 7.3 6.3 8.1 6.8 7.0 7.6 6.8 6.5 7.0 6.3 7.9 9.0 8.2 8.7 7.8 9.7 7.4 7.7 9.7 7.8 7.7 11.6 11.3 11.8 10.7 We are interested in the average value of flexural strength for all beams that could be made in this way. Moreover, we can make statistical inferences from this data set. It can be shown that, with a high degree of confidence, the population mean strength is between 7.48 MPa and 8.80 Mpa; this is called a confidence interval or interval. Furthermore, with a high degree of confidence, the strength of a single such beam will exceed 7.35 MPa; this number 7.35 is called a lower prediction bound. Liang Zhang (UofU) Applied Statistics I June 9, 2008 9 / 36 Probability & Statistics Liang Zhang (UofU) Applied Statistics I June 9, 2008 10 / 36 Probability & Statistics Liang Zhang (UofU) Applied Statistics I June 9, 2008 10 / 36 Probability & Statistics Probability: know the information of population and ask question about sample Liang Zhang (UofU) Applied Statistics I June 9, 2008 10 / 36 Probability & Statistics Probability: know the information of population and ask question about sample A probability question: We have a fair coin and toss it many times. What’s the chance to get three consecutive heads? Liang Zhang (UofU) Applied Statistics I June 9, 2008 10 / 36 Probability & Statistics Probability: know the information of population and ask question about sample A probability question: We have a fair coin and toss it many times. What’s the chance to get three consecutive heads? Statistics: know the information of sample and ask question about population Liang Zhang (UofU) Applied Statistics I June 9, 2008 10 / 36 Probability & Statistics Probability: know the information of population and ask question about sample A probability question: We have a fair coin and toss it many times. What’s the chance to get three consecutive heads? Statistics: know the information of sample and ask question about population A statistic question: We have a coin and toss it 6 times. The results are THT,THH, HTT, HTH, TTH and HTT. Is this coin a fair coin? Liang Zhang (UofU) Applied Statistics I June 9, 2008 10 / 36 Pictorial and Tabular Methods Stem-and-Leaf Displays Liang Zhang (UofU) Applied Statistics I June 9, 2008 11 / 36 Pictorial and Tabular Methods Stem-and-Leaf Displays 1. Select one or more leading digits for the stem values. The trailing digits become the leaves. Liang Zhang (UofU) Applied Statistics I June 9, 2008 11 / 36 Pictorial and Tabular Methods Stem-and-Leaf Displays 1. Select one or more leading digits for the stem values. The trailing digits become the leaves. 2. List possible stem values in a vertical column. Liang Zhang (UofU) Applied Statistics I June 9, 2008 11 / 36 Pictorial and Tabular Methods Stem-and-Leaf Displays 1. Select one or more leading digits for the stem values. The trailing digits become the leaves. 2. List possible stem values in a vertical column. 3. Record the leaf for every observation beside the corresponding stem value. Liang Zhang (UofU) Applied Statistics I June 9, 2008 11 / 36 Pictorial and Tabular Methods Stem-and-Leaf Displays 1. Select one or more leading digits for the stem values. The trailing digits become the leaves. 2. List possible stem values in a vertical column. 3. Record the leaf for every observation beside the corresponding stem value. 4. Indicate the units for stems and leaves someplace in the display. Liang Zhang (UofU) Applied Statistics I June 9, 2008 11 / 36 Pictorial and Tabular Methods Example(Example 1.2 p5): The article ‘‘Effects of Aggregates and Microfillers on the Flexural Properties of Concrete’’ reported on a study of strength properties of high performance concrete obtained by using superplasticizers and certain binders. The accompanying data on flexural strength (in MPa) appeared in the article cited: 5.9 7.2 7.3 6.3 8.1 6.8 7.0 7.6 6.8 6.5 7.0 6.3 7.9 9.0 8.2 8.7 7.8 9.7 7.4 7.7 9.7 7.8 7.7 11.6 11.3 11.8 10.7 We are interested in the average value of flexural strength for all beams that could be made in this way. Liang Zhang (UofU) Applied Statistics I June 9, 2008 12 / 36 Pictorial and Tabular Methods 5.9 6.5 7.4 7.2 7.0 7.7 7.3 6.3 9.7 Liang Zhang (UofU) 6.3 7.9 7.8 8.1 9.0 7.7 6.8 8.2 11.6 7.0 8.7 11.3 Applied Statistics I 7.6 7.8 11.8 6.8 9.7 10.7 June 9, 2008 13 / 36 Pictorial and Tabular Methods 5.9 6.5 7.4 7.2 7.0 7.7 7.3 6.3 9.7 6.3 7.9 7.8 8.1 9.0 7.7 6.8 8.2 11.6 7.0 8.7 11.3 7.6 7.8 11.8 6.8 9.7 10.7 The decimal point is at the | Liang Zhang (UofU) Applied Statistics I June 9, 2008 13 / 36 Pictorial and Tabular Methods 5.9 6.5 7.4 7.2 7.0 7.7 7.3 6.3 9.7 6.3 7.9 7.8 8.1 9.0 7.7 6.8 8.2 11.6 7.0 8.7 11.3 7.6 7.8 11.8 6.8 9.7 10.7 The decimal point is at the | 5 | 9 Liang Zhang (UofU) Applied Statistics I June 9, 2008 13 / 36 Pictorial and Tabular Methods 5.9 6.5 7.4 7.2 7.0 7.7 7.3 6.3 9.7 6.3 7.9 7.8 8.1 9.0 7.7 6.8 8.2 11.6 7.0 8.7 11.3 7.6 7.8 11.8 6.8 9.7 10.7 The decimal point is at the | 5 6 | | 9 33588 Liang Zhang (UofU) Applied Statistics I June 9, 2008 13 / 36 Pictorial and Tabular Methods 5.9 6.5 7.4 7.2 7.0 7.7 7.3 6.3 9.7 6.3 7.9 7.8 8.1 9.0 7.7 6.8 8.2 11.6 7.0 8.7 11.3 7.6 7.8 11.8 6.8 9.7 10.7 The decimal point is at the | 5 6 7 | | | 9 33588 00234677889 Liang Zhang (UofU) Applied Statistics I June 9, 2008 13 / 36 Pictorial and Tabular Methods 5.9 6.5 7.4 7.2 7.0 7.7 7.3 6.3 9.7 6.3 7.9 7.8 8.1 9.0 7.7 6.8 8.2 11.6 7.0 8.7 11.3 7.6 7.8 11.8 6.8 9.7 10.7 The decimal point is at the | 5 6 7 8 | | | | 9 33588 00234677889 127 Liang Zhang (UofU) Applied Statistics I June 9, 2008 13 / 36 Pictorial and Tabular Methods 5.9 6.5 7.4 7.2 7.0 7.7 7.3 6.3 9.7 6.3 7.9 7.8 8.1 9.0 7.7 6.8 8.2 11.6 7.0 8.7 11.3 7.6 7.8 11.8 6.8 9.7 10.7 The decimal point is at the | 5 6 7 8 9 | | | | | 9 33588 00234677889 127 077 Liang Zhang (UofU) Applied Statistics I June 9, 2008 13 / 36 Pictorial and Tabular Methods 5.9 6.5 7.4 7.2 7.0 7.7 7.3 6.3 9.7 6.3 7.9 7.8 8.1 9.0 7.7 6.8 8.2 11.6 7.0 8.7 11.3 7.6 7.8 11.8 6.8 9.7 10.7 The decimal point is at the | 5 6 7 8 9 10 | | | | | | 9 33588 00234677889 127 077 7 Liang Zhang (UofU) Applied Statistics I June 9, 2008 13 / 36 Pictorial and Tabular Methods 5.9 6.5 7.4 7.2 7.0 7.7 7.3 6.3 9.7 6.3 7.9 7.8 8.1 9.0 7.7 6.8 8.2 11.6 7.0 8.7 11.3 7.6 7.8 11.8 6.8 9.7 10.7 The decimal point is at the | 5 6 7 8 9 10 11 | | | | | | | 9 33588 00234677889 127 077 7 368 Liang Zhang (UofU) Applied Statistics I June 9, 2008 13 / 36 Pictorial and Tabular Methods 5.9 6.5 7.4 7.2 7.0 7.7 7.3 6.3 9.7 6.3 7.9 7.8 8.1 9.0 7.7 6.8 8.2 11.6 7.0 8.7 11.3 7.6 7.8 11.8 6.8 9.7 10.7 The decimal point is at the | 5 6 7 8 9 10 11 | | | | | | | 9 33588 00234677889 127 077 7 368 Liang Zhang (UofU) • identification of a typical value Applied Statistics I June 9, 2008 13 / 36 Pictorial and Tabular Methods 5.9 6.5 7.4 7.2 7.0 7.7 7.3 6.3 9.7 6.3 7.9 7.8 8.1 9.0 7.7 6.8 8.2 11.6 7.0 8.7 11.3 7.6 7.8 11.8 6.8 9.7 10.7 The decimal point is at the | 5 6 7 8 9 10 11 | | | | | | | 9 33588 00234677889 127 077 7 368 Liang Zhang (UofU) • identification of a typical value • presence of any gaps in the data Applied Statistics I June 9, 2008 13 / 36 Pictorial and Tabular Methods 5.9 6.5 7.4 7.2 7.0 7.7 7.3 6.3 9.7 6.3 7.9 7.8 8.1 9.0 7.7 6.8 8.2 11.6 7.0 8.7 11.3 7.6 7.8 11.8 6.8 9.7 10.7 The decimal point is at the | 5 6 7 8 9 10 11 | | | | | | | 9 33588 00234677889 127 077 7 368 Liang Zhang (UofU) • identification of a typical value • presence of any gaps in the data • extent of symmetry in the distribution of values Applied Statistics I June 9, 2008 13 / 36 Pictorial and Tabular Methods 5.9 6.5 7.4 7.2 7.0 7.7 7.3 6.3 9.7 6.3 7.9 7.8 8.1 9.0 7.7 6.8 8.2 11.6 7.0 8.7 11.3 7.6 7.8 11.8 6.8 9.7 10.7 The decimal point is at the | 5 6 7 8 9 10 11 | | | | | | | 9 33588 00234677889 127 077 7 368 Liang Zhang (UofU) • identification of a typical value • presence of any gaps in the data • extent of symmetry in the distribution of values • number and location of peaks Applied Statistics I June 9, 2008 13 / 36 Pictorial and Tabular Methods 5.9 6.5 7.4 7.2 7.0 7.7 7.3 6.3 9.7 6.3 7.9 7.8 8.1 9.0 7.7 6.8 8.2 11.6 7.0 8.7 11.3 7.6 7.8 11.8 6.8 9.7 10.7 The decimal point is at the | 5 6 7 8 9 10 11 | | | | | | | 9 33588 00234677889 127 077 7 368 Liang Zhang (UofU) • identification of a typical value • presence of any gaps in the data • extent of symmetry in the distribution of values • number and location of peaks • presence of any outlying values Applied Statistics I June 9, 2008 13 / 36 Pictorial and Tabular Methods Remark: Liang Zhang (UofU) Applied Statistics I June 9, 2008 14 / 36 Pictorial and Tabular Methods Remark: 1. Each data in the population must consist of at least two digits. Liang Zhang (UofU) Applied Statistics I June 9, 2008 14 / 36 Pictorial and Tabular Methods Remark: 1. Each data in the population must consist of at least two digits. e.g. the stem-and-leaf display is not suitable for the data set 1,2,1,4,1,5,2,6,1,3,2,3 Liang Zhang (UofU) Applied Statistics I June 9, 2008 14 / 36 Pictorial and Tabular Methods Remark: 1. Each data in the population must consist of at least two digits. e.g. the stem-and-leaf display is not suitable for the data set 1,2,1,4,1,5,2,6,1,3,2,3 2. Ordering the leaves from smallest to largest is not necessary Liang Zhang (UofU) Applied Statistics I June 9, 2008 14 / 36 Pictorial and Tabular Methods The decimal point is at the | 5 6 7 8 9 10 11 | | | | | | | 9 38853 23060984787 127 077 7 638 Liang Zhang (UofU) The decimal point is at the | 5 6 7 8 9 10 11 Applied Statistics I | | | | | | | 9 33588 00234677889 127 077 7 368 June 9, 2008 15 / 36 Pictorial and Tabular Methods Dotplots: Liang Zhang (UofU) Applied Statistics I June 9, 2008 16 / 36 Pictorial and Tabular Methods Dotplots: e.g. The dotplot for the previous example: Liang Zhang (UofU) Applied Statistics I June 9, 2008 16 / 36 Pictorial and Tabular Methods Dotplots: e.g. The dotplot for the previous example: In a dotplot, each data is represented by a dot above the corresponding location on a horizontal measurement scale. When a value occurs more than once, there is a dot for each occurrence, and these dots are stacked vertically. Liang Zhang (UofU) Applied Statistics I June 9, 2008 16 / 36 Pictorial and Tabular Methods Histograms Liang Zhang (UofU) Applied Statistics I June 9, 2008 17 / 36 Pictorial and Tabular Methods Histograms e.g. The histogram for the previous example: Liang Zhang (UofU) Applied Statistics I June 9, 2008 17 / 36 Pictorial and Tabular Methods Discrete & Continuous Variables: Liang Zhang (UofU) Applied Statistics I June 9, 2008 18 / 36 Pictorial and Tabular Methods Discrete & Continuous Variables: A numerical variable is discrete if its set of possible values is either finite or can be listed in an infinite sequence. Liang Zhang (UofU) Applied Statistics I June 9, 2008 18 / 36 Pictorial and Tabular Methods Discrete & Continuous Variables: A numerical variable is discrete if its set of possible values is either finite or can be listed in an infinite sequence. e.g. x = number of students in this classroom who drove to school today Liang Zhang (UofU) Applied Statistics I June 9, 2008 18 / 36 Pictorial and Tabular Methods Discrete & Continuous Variables: A numerical variable is discrete if its set of possible values is either finite or can be listed in an infinite sequence. e.g. x = number of students in this classroom who drove to school today Usually arising from counting A numerical variable is continuous if its possible values consist of an entire interval on the number line. Liang Zhang (UofU) Applied Statistics I June 9, 2008 18 / 36 Pictorial and Tabular Methods Discrete & Continuous Variables: A numerical variable is discrete if its set of possible values is either finite or can be listed in an infinite sequence. e.g. x = number of students in this classroom who drove to school today Usually arising from counting A numerical variable is continuous if its possible values consist of an entire interval on the number line. e.g y = maximum hours a GE lamp can last Liang Zhang (UofU) Applied Statistics I June 9, 2008 18 / 36 Pictorial and Tabular Methods Discrete & Continuous Variables: A numerical variable is discrete if its set of possible values is either finite or can be listed in an infinite sequence. e.g. x = number of students in this classroom who drove to school today Usually arising from counting A numerical variable is continuous if its possible values consist of an entire interval on the number line. e.g y = maximum hours a GE lamp can last Usually arising from measuring Liang Zhang (UofU) Applied Statistics I June 9, 2008 18 / 36 Pictorial and Tabular Methods Frequency: the frequency of any particular data value is the number of times that value occurs in the data set. Liang Zhang (UofU) Applied Statistics I June 9, 2008 19 / 36 Pictorial and Tabular Methods Frequency: the frequency of any particular data value is the number of times that value occurs in the data set. Relative Frequency: the relative frequency of a value is the fraction of proportion of times the value occurs Liang Zhang (UofU) Applied Statistics I June 9, 2008 19 / 36 Pictorial and Tabular Methods Frequency: the frequency of any particular data value is the number of times that value occurs in the data set. Relative Frequency: the relative frequency of a value is the fraction of proportion of times the value occurs relative frequency = Liang Zhang (UofU) number of times the value occur number of observations in the data set Applied Statistics I June 9, 2008 19 / 36 Pictorial and Tabular Methods Frequency: the frequency of any particular data value is the number of times that value occurs in the data set. Relative Frequency: the relative frequency of a value is the fraction of proportion of times the value occurs relative frequency = number of times the value occur number of observations in the data set e.g. frequency of value 6.8: relative frequency of the value 6.8: Liang Zhang (UofU) Applied Statistics I 2 2 27 = 0.074 June 9, 2008 19 / 36 Pictorial and Tabular Methods Frequency: the frequency of any particular data value is the number of times that value occurs in the data set. Relative Frequency: the relative frequency of a value is the fraction of proportion of times the value occurs relative frequency = number of times the value occur number of observations in the data set e.g. frequency of value 6.8: 2 2 relative frequency of the value 6.8: 27 = 0.074 Frequency Distribution: a tabulation of the frequencies and/or relative frequencies. Liang Zhang (UofU) Applied Statistics I June 9, 2008 19 / 36 Pictorial and Tabular Methods Constructing a Histogram for a Data Set: Liang Zhang (UofU) Applied Statistics I June 9, 2008 20 / 36 Pictorial and Tabular Methods Constructing a Histogram for a Data Set: 1. Divide the data set into a suitable number of class interval or classes; Liang Zhang (UofU) Applied Statistics I June 9, 2008 20 / 36 Pictorial and Tabular Methods Constructing a Histogram for a Data Set: 1. Divide the data set into a suitable number of class interval or classes; 2. Determine the frequency and relative frequency for each class; Liang Zhang (UofU) Applied Statistics I June 9, 2008 20 / 36 Pictorial and Tabular Methods Constructing a Histogram for a Data Set: 1. Divide the data set into a suitable number of class interval or classes; 2. Determine the frequency and relative frequency for each class; 3. Mark the class boundaries on a horizontal measurement axis; Liang Zhang (UofU) Applied Statistics I June 9, 2008 20 / 36 Pictorial and Tabular Methods Constructing a Histogram for a Data Set: 1. Divide the data set into a suitable number of class interval or classes; 2. Determine the frequency and relative frequency for each class; 3. Mark the class boundaries on a horizontal measurement axis; 4. Above each class interval, draw a rectangle whose height is the corresponding relative frequency(or frequency) Liang Zhang (UofU) Applied Statistics I June 9, 2008 20 / 36 Pictorial and Tabular Methods classes 5.00 - 5.99 6.00 - 6.99 7.00 - 7.99 8.00 - 8.99 9.00 - 9.99 10.00 - 10.99 11.00 - 11.99 Liang Zhang (UofU) frequency 1 5 11 3 3 1 3 relative frequency 0.037 0.185 0.407 0.111 0.111 0.037 0.111 Applied Statistics I June 9, 2008 21 / 36 Pictorial and Tabular Methods Liang Zhang (UofU) Applied Statistics I June 9, 2008 22 / 36 Pictorial and Tabular Methods Remark: Liang Zhang (UofU) Applied Statistics I June 9, 2008 23 / 36 Pictorial and Tabular Methods Remark: 1. For discrete data, we usually don’t have to determine the class intervals. Liang Zhang (UofU) Applied Statistics I June 9, 2008 23 / 36 Pictorial and Tabular Methods Remark: 1. For discrete data, we usually don’t have to determine the class intervals. 2. There is no hard-and-fast rules for the choice of class intervals. A reasonable rule of thumb is √ number of classes = number of observation Liang Zhang (UofU) Applied Statistics I June 9, 2008 23 / 36 Pictorial and Tabular Methods Remark: 1. For discrete data, we usually don’t have to determine the class intervals. 2. There is no hard-and-fast rules for the choice of class intervals. A reasonable rule of thumb is √ number of classes = number of observation 3. Equal-width classes may not be a sensible choice if a data set “stretches out” to one side or the other. Liang Zhang (UofU) Applied Statistics I June 9, 2008 23 / 36 Pictorial and Tabular Methods Remark: 1. For discrete data, we usually don’t have to determine the class intervals. 2. There is no hard-and-fast rules for the choice of class intervals. A reasonable rule of thumb is √ number of classes = number of observation 3. Equal-width classes may not be a sensible choice if a data set “stretches out” to one side or the other. e.g. Liang Zhang (UofU) Applied Statistics I June 9, 2008 23 / 36 Pictorial and Tabular Methods Remark: 3. Equal-width classes may not be a sensible choice if a data set “stretches out” to one side or the other. e.g. Liang Zhang (UofU) Applied Statistics I June 9, 2008 24 / 36 Pictorial and Tabular Methods Remark: 3. Equal-width classes may not be a sensible choice if a data set “stretches out” to one side or the other. e.g. Use a few wider intervals near extreme observations and narrower intervals in the region of high concentration. Liang Zhang (UofU) Applied Statistics I June 9, 2008 24 / 36 Pictorial and Tabular Methods Remark: 3. Equal-width classes may not be a sensible choice if a data set “stretches out” to one side or the other. e.g. Use a few wider intervals near extreme observations and narrower intervals in the region of high concentration. rectangle height = Liang Zhang (UofU) relative frequency of the class class width Applied Statistics I June 9, 2008 24 / 36 Pictorial and Tabular Methods Shapes of Histograms: Liang Zhang (UofU) Applied Statistics I June 9, 2008 25 / 36 Measure of Location Notation: We use n to denote the sample size; i.e. the number of observations in a single sample. Liang Zhang (UofU) Applied Statistics I June 9, 2008 26 / 36 Measure of Location Notation: We use n to denote the sample size; i.e. the number of observations in a single sample. e.g. if the sample of students’ heights is {180cm, 175cm, 191cm, 184cm, 178cm, 188cm}, then n = 6. Liang Zhang (UofU) Applied Statistics I June 9, 2008 26 / 36 Measure of Location Notation: We use n to denote the sample size; i.e. the number of observations in a single sample. e.g. if the sample of students’ heights is {180cm, 175cm, 191cm, 184cm, 178cm, 188cm}, then n = 6. Furthermore, we use x1 , x2 , . . . , xn to denote the sample data. Liang Zhang (UofU) Applied Statistics I June 9, 2008 26 / 36 Measure of Location Notation: We use n to denote the sample size; i.e. the number of observations in a single sample. e.g. if the sample of students’ heights is {180cm, 175cm, 191cm, 184cm, 178cm, 188cm}, then n = 6. Furthermore, we use x1 , x2 , . . . , xn to denote the sample data. e.g. in the above example, x1 = 180, x2 = 175, x3 = 191, x4 = 184, x5 = 178, x4 = 188. Liang Zhang (UofU) Applied Statistics I June 9, 2008 26 / 36 Measure of Location Sample Mean: Liang Zhang (UofU) Applied Statistics I June 9, 2008 27 / 36 Measure of Location Sample Mean: The sample mean x̄ of observations x1 , x2 , . . . , xn is defined as Pn xi x1 + x2 + · · · , xn = i=1 x̄ = n n Liang Zhang (UofU) Applied Statistics I June 9, 2008 27 / 36 Measure of Location Sample Mean: The sample mean x̄ of observations x1 , x2 , . . . , xn is defined as Pn xi x1 + x2 + · · · , xn = i=1 x̄ = n n Remark: Liang Zhang (UofU) Applied Statistics I June 9, 2008 27 / 36 Measure of Location Sample Mean: The sample mean x̄ of observations x1 , x2 , . . . , xn is defined as Pn xi x1 + x2 + · · · , xn = i=1 x̄ = n n Remark: 1. For simplicity, we can informally write x̄ = summation is over all sample observations. Liang Zhang (UofU) Applied Statistics I P xi n , where the June 9, 2008 27 / 36 Measure of Location Sample Mean: The sample mean x̄ of observations x1 , x2 , . . . , xn is defined as Pn xi x1 + x2 + · · · , xn = i=1 x̄ = n n Remark: P 1. For simplicity, we can informally write x̄ = nxi , where the summation is over all sample observations. 2. When reporting x̄, we use decimal accuracy of one digit more than the accuracy of the xi ’s. Liang Zhang (UofU) Applied Statistics I June 9, 2008 27 / 36 Measure of Location Sample Mean: The sample mean x̄ of observations x1 , x2 , . . . , xn is defined as Pn xi x1 + x2 + · · · , xn = i=1 x̄ = n n Remark: P 1. For simplicity, we can informally write x̄ = nxi , where the summation is over all sample observations. 2. When reporting x̄, we use decimal accuracy of one digit more than the accuracy of the xi ’s. 3. The average of all values in the population is defined as population mean and it is denoted by the Greek letter µ. In statistics, µ is usually unavailable and we want to get some infomation about population mean µ from sample mean x̄. Liang Zhang (UofU) Applied Statistics I June 9, 2008 27 / 36 Measure of Location Example: In the previous example, the sample is {180, 175, 191, 184, 178, 188} and the sample size is 6; then the sample mean is calculated as x̄ = Liang Zhang (UofU) 180 + 175 + 191 + 184 + 178 + 188 = 182.7 6 Applied Statistics I June 9, 2008 28 / 36 Measure of Location Pros and Cons Liang Zhang (UofU) Applied Statistics I June 9, 2008 29 / 36 Measure of Location Pros and Cons Pros: the sample mean tells us the location (center) of the sample. Liang Zhang (UofU) Applied Statistics I June 9, 2008 29 / 36 Measure of Location Pros and Cons Pros: the sample mean tells us the location (center) of the sample. Cons: the sample mean can be significantly affected by outliers Liang Zhang (UofU) Applied Statistics I June 9, 2008 29 / 36 Measure of Location Pros and Cons Pros: the sample mean tells us the location (center) of the sample. Cons: the sample mean can be significantly affected by outliers Liang Zhang (UofU) Applied Statistics I June 9, 2008 29 / 36 Measure of Location Sample Median Liang Zhang (UofU) Applied Statistics I June 9, 2008 30 / 36 Measure of Location Sample Median The sample median is obtained by first ordering the n observations from smallest to largest (with any repeated values included so that every sample observation appears in the ordered list). Then, ( th ( n+1 if n is odd 2 ) ordered value, x̃ = n n th th average of ( 2 ) and ( 2 + 1) ordered values, if n is even Liang Zhang (UofU) Applied Statistics I June 9, 2008 30 / 36 Measure of Location Liang Zhang (UofU) Applied Statistics I June 9, 2008 31 / 36 Measure of Location e.g. in the previous example, the sample is x1 = 180, x2 = 175, x3 = 191, x4 = 184, x5 = 178, x4 = 188. Then the ordered observation is x1:6 = 175, x2:6 = 178, x3:6 = 180, x4:6 = 184, x5:6 = 188, x6:6 = 191. Liang Zhang (UofU) Applied Statistics I June 9, 2008 31 / 36 Measure of Location e.g. in the previous example, the sample is x1 = 180, x2 = 175, x3 = 191, x4 = 184, x5 = 178, x4 = 188. Then the ordered observation is x1:6 = 175, x2:6 = 178, x3:6 = 180, x4:6 = 184, x5:6 = 188, x6:6 = 191. And the sample median is the average of x3:6 and x4:6 , which is 182, since the sample size is even. Liang Zhang (UofU) Applied Statistics I June 9, 2008 31 / 36 Measure of Location e.g. in the previous example, the sample is x1 = 180, x2 = 175, x3 = 191, x4 = 184, x5 = 178, x4 = 188. Then the ordered observation is x1:6 = 175, x2:6 = 178, x3:6 = 180, x4:6 = 184, x5:6 = 188, x6:6 = 191. And the sample median is the average of x3:6 and x4:6 , which is 182, since the sample size is even. If we have one more observation x7 = 189, then the ordered observation is x1:7 = 175, x2:7 = 178, x3:7 = 180, x4:7 = 184, x5:7 = 188, x6:7 = 189, x7:7 = 191 and the sample median is x4:7 = 184, since the sample size now is odd. Liang Zhang (UofU) Applied Statistics I June 9, 2008 31 / 36 Measure of Location Liang Zhang (UofU) Applied Statistics I June 9, 2008 32 / 36 Measure of Location Remark: 1. Contrary to the sample mean, the sample median is very insensitive to outliers. In fact, the sample median is affected by at most two values in the sample. Liang Zhang (UofU) Applied Statistics I June 9, 2008 32 / 36 Measure of Location Remark: 1. Contrary to the sample mean, the sample median is very insensitive to outliers. In fact, the sample median is affected by at most two values in the sample. 2. Similar to the sample mean and the population mean, we can define the population median. However, in general, the sample median DOES NOT equal to the population median. In statistics, we want to use sample median to infer population median. Liang Zhang (UofU) Applied Statistics I June 9, 2008 32 / 36 Measure of Location Other Measures of Location: Liang Zhang (UofU) Applied Statistics I June 9, 2008 33 / 36 Measure of Location Other Measures of Location: Quartiles: a quartile is any of the three values which divide the ordered data set into four equal parts, so that each part represents ( 41 )th of the sample. Liang Zhang (UofU) Applied Statistics I June 9, 2008 33 / 36 Measure of Location Other Measures of Location: Quartiles: a quartile is any of the three values which divide the ordered data set into four equal parts, so that each part represents ( 41 )th of the sample. e.g. If our sample data about the students’ height is 180, 175, 191, 184, 178, 188,189, 183, 197, 186, 172, 169, 181, 177, 170, 172, then the ordered data would be 169 170 172 172 | 175 177 178 180 | 181 183 184 186 | 188 189 191 197. And a summer of this sample data is given by: Liang Zhang (UofU) Applied Statistics I June 9, 2008 33 / 36 Measure of Location Other Measures of Location: Quartiles: a quartile is any of the three values which divide the ordered data set into four equal parts, so that each part represents ( 41 )th of the sample. e.g. If our sample data about the students’ height is 180, 175, 191, 184, 178, 188,189, 183, 197, 186, 172, 169, 181, 177, 170, 172, then the ordered data would be 169 170 172 172 | 175 177 178 180 | 181 183 184 186 | 188 189 191 197. And a summer of this sample data is given by: Min. 1st Qu. Median Mean 3rd Qu. Max. 169.0 173.5 180.5 180.8 187 197.0 Liang Zhang (UofU) Applied Statistics I June 9, 2008 33 / 36 Measure of Location Other Measures of Location: Liang Zhang (UofU) Applied Statistics I June 9, 2008 34 / 36 Measure of Location Other Measures of Location: Percentiles: A percentile is the data value below which a certain percent of observations fall. e.g. the 20th percentile is the value below which 20 percent of the observations may be found. In our previous example, the sampel size is 16, 20% which is 3.2. So the 20th percentile is 171. Liang Zhang (UofU) Applied Statistics I June 9, 2008 34 / 36 Measure of Location Other Measures of Location: Percentiles: A percentile is the data value below which a certain percent of observations fall. e.g. the 20th percentile is the value below which 20 percent of the observations may be found. In our previous example, the sampel size is 16, 20% which is 3.2. So the 20th percentile is 171. Trimmed Mean: a p% trimmed mean is obtained by eliminating the smallest p% data values and the largest p% data values and averaging the left data values. It is a compromise between sample mean and sample median. Liang Zhang (UofU) Applied Statistics I June 9, 2008 34 / 36 Measure of Location Other Measures of Location: Liang Zhang (UofU) Applied Statistics I June 9, 2008 35 / 36 Measure of Location Other Measures of Location: Trimmed Mean: e.g. in our previous example, the sample data is 180, 175, 191, 184, 178, 188,189, 183, 197, 186, 172, 169, 181, 177, 170, 172. If we want to eliminate the largest and smallest observation, then it is a 1 16 = 6.25% trimmed mean. Then the 6.25% trimmed mean is x̄tr (6.25%) = 180.4. Liang Zhang (UofU) Applied Statistics I June 9, 2008 35 / 36 Measure of Location Categorical Data: In some cases, we can assign values to categorical data. Then we can calculate the sample mean. In that situation, the sample mean would be the sample proportion. Liang Zhang (UofU) Applied Statistics I June 9, 2008 36 / 36 Measure of Location Categorical Data: In some cases, we can assign values to categorical data. Then we can calculate the sample mean. In that situation, the sample mean would be the sample proportion. e.g. if we toss a coin 10 times and get the result T, H, T, T, H, T, H, H, H, T, we can assign 0 to T and 1 to H. Then, the sample mean would be (1 + 1 + 1 + 1 + 1)/10 = 0.5 which is exactly the proportion of heads in the sample data. Liang Zhang (UofU) Applied Statistics I June 9, 2008 36 / 36