Ch1.1 Population and Samples I. What is Statistics? Statistics is the science of collecting and analyzing (numerical) data (taken from The Oxford American Dictionary) Usually it involves collecting partial information (a sample) from a population, and using it to make generalizations (inference) about the population. Ex1. Sue wants to know the mean height of undergraduate students in NC State University. Since she doesn’t have the resources to measure every student, she chose to measure 100 random students in the University. Ex2. A GE engineer wants to know the average time life of their 13-W energysaving light bulbs produced by a new procedure. Some number of random light bulbs is necessary. Suppose data on life time of 30 such light bulbs were collected. II. Some statistical terms: Data: collection of facts or observations Variable: A characteristic of the object (or individual) in the population Univariate data: the data where there is only one variable Bivariate data: the data where there are only two variables Multivariate data: the data where there are more than two variables Population: A collection of objects (or individuals) to which we would like to make inference Sample: A subset of the population of interest Ex 1. In Sue’s study, The data is: 100 students’ heights The variable of interest is: (students’) Height The data set is a set containing 100 students’ heights The population of interest is: NSCU students The sample is: 100 selected NSCU students 1 Ex 2. In the GE study, The data is: 30 life times The variable of interest is: life time of GE’s light bulbs The data set is a set of 30 life times The population of interest is: GE’s 13-W energy-saving light bulbs The sample is: 30 light bulbs II. Branches of Statistics 1. Producing data: Sampling design, experiment design Collect data to answer specific questions by sampling or experimentation. 2. Describing data: Descriptive statistics Deal with the presentation of the data-------summarizing the data with numerical and graphical methods Making inference: Inferential statistics Use information from a sample to draw conclusions about a population One key aspect of inferential statistics is that there is some amount of uncertainty associated with using sample data to draw conclusions about a population Ex 1. (Sue’s example) Sue can follow a certain random sampling scheme to select the 100 students. Such sampling scheme guarantees that the selected students are representative of NCSU students 1. Sue can use methods in descriptive statistics to summarize the information of the 100 students (i.e., her sample), such as to report the average height of the 100 students. 2. Sue can use techniques in inferential statistics to draw conclusions about the overall population of undergraduate students in NCSU based on the information obtained from her sample. Suppose that the average height of the 100 students was 65’. Sue may estimate that, 2 based on her sample, the average height of all undergraduate students in NCSU is also 65’ and with possible error of 1.1’ (that is, 65 1.1). EX 2. The GE engineer can do the same thing as Sue. In this class, we’ll concentrate on descriptive statistics and inferential statistics. Big picture of the class: (also see syllabus) 3