What is Statistics Section 1.1, Page 4 1 Definition: Statistics Statistics: The science of collecting, describing and interpreting data. Why Study Statistics? Statistics helps us make better decisions as businesses, governments and individuals. Section 1.1, Page 4 2 Definitions Population: A collection, or set, of individuals, objects, or events whose properties are to be analyzed. Sample: A subset of the population. We desire knowledge about an entire population but is most often the case that it is prohibitively expensive, so we select representative sample from the population and study the individual items in the sample. Descriptive Statistics: The collection, presentation, and description of of the sample data. Inferential Statistics: The technique of of interpreting the values resulting from the descriptive techniques and making decisions and drawing conclusions about the population. Section 1.1, Page 4 3 Definitions Parameter: A numerical value summarizing all the data of a population. For example, the average high school grade point of all Shoreline Students is 3.20. We often use Greek letters to identify parameters, μ = 3.20. Statistic: A numerical value summarizing the sample data. For example, the average grade point of a sample of Shoreline Students is 3.18. We would use the symbol, x 3.18 The statistic corresponds to the parameter. We usually don’t know the value of the parameter, so we take a sample and estimate it with the corresponding statistic. Sampling Variation: While the parameter of a population is considered a fixed number, the corresponding statistic will vary from sample to sample. Also, different populations give rise to more or less sampling variability. Considering the variable age, samples of 60 students from a Community college would have less variability than samples of a Seattle neighborhood. Section 1.1, Page 4 4 Problems Objective 1.1, Page 18 5 Problems Objective 1.1, Page 18 6 Variables Variable: A characteristic of interest about each element of a population. Data: The set of values collected for the variable from each of the elements that belong to the sample. Numerical or Quantitative Variable: A variable that quantifies an element of the population. The HS grade point of a student is a numerical variable. Numerical variables are numbers for which math operations make sense. The average grade point of a sample makes sense. Continuous Numerical Variable: The variable can take on take on an uncountable number of values between to points on the number line. An example is the weight of people. Discrete Numerical Variable: The variable can take on a countable number of values between two points on a number line. An example is the price of statistics text books. Section 1.1, Page 8 7 Variables (2) Categorical or Qualitative Variable: A variable that describes or categorizes an element of a population. The gender of a person would be a categorical variable. The categories are male and female. Nominal Categorical Variable: A categorical variable that uses a number to describe or name an element of a population. An example is a telephone area code. It is a number, but not a numerical variable used on math operations. The average area code does not make sense. Ordinal Categorical Variable: A categorical variable that incorporates an ordered position or ranking. An example would be a survey response that ranks “very satisfied” ahead of “satisfied” ahead of “somewhat satisfied.” Limited math operations may be done with ordinal variables. Section 1.1, Page 8 8 Problems Identify each of the following examples of variables as to categorical or numerical. If categorical, indicate the categories. If numerical, indicate discrete or continuous. Objective 1.1, Page 18 9 Problems Objective 1.1, Page 19 10 Data Collection Section 1.2, Page 11 11 Data Collection Process Section 1.2, Page 12 12 Data Collection Process Section 1.2, Page 12 13 Observational Studies and Experiments Observational Study: Researchers collect data without modifying the environment or controlling the process being observed. Surveys and polls are observational studies. Observational studies cannot establish causality. Example: For a randomly selected high school researchers collect data on each student, grade point and whether the student has music training, to see if there is a relationship between the two variables. Experiments: Researchers collect data in a controlled environment. The investigator controls or modifies the environment and observes the effect of a variable under study. Experiments can establish causality. Example: Randomly divide a sample of people with migraine headaches into a control and treatment groups. Give the treatment group a experimental medication and the control group a placebo, and then measure and compare the reduction of frequency and severity of headaches for both groups. Section 1.3, Page 12 14 Sampling Frame Sample Frame: A list, or set, of the of the elements belonging to the population from which the sample will be drawn. Ideally, the sample frame is equal to the population. Example: For a 1936 Presidential Election Poll Literary Digest sent out 10 million “straw ballots” prior to the election and got back 2.4 million. Straw Ballots Franklin Roosevelt 43% Alf Landon 57% Actual Results 62% 37% The sampling frame used was telephone records. What could have gone so wrong to misjudge the final result? Section 1.3, Page 13 15 Sample Designs Convenience Samples Volunteer Samples Judgmental samples (chosen for some specific reason), voluntary samples (respondents select themselves), and convenience samples (chosen because convenient) are usually not acceptable methods for formal statistical procedures! Probability Samples: The elements are drawn on the basis of probability – randomly. Each element of the population has a certain probability of being selected. Section 1.3, Page 13 16 Single-Stage Sampling Methods Single-stage sampling: A sample design in which the elements of the sampling frame treated equally and there is no subdividing or partitioning of the frame. Simple Random Sample: Sample selected in such a way that every element of the population has an equal probability of being selected and all samples of size n have an equal probability of being selected. Example: Select a simple random sample of 6 students from from a class of 30. 1.Number the students from 1 to 30 on the roster. 2.Get 6 non-recurring random numbers between 1 and 30. 3.The six students who match the six random numbers are the sample. Section 1.3, Page 13 17 Single-Stage Sampling Methods Systematic Sample: A sample in which every k-th item from the sampling frame is selected which is randomly selected from the first k elements. Example: Select a systematic sample of six students from a class of 30. 1. K = 30/6 = 5 2. Select a random number between 1 and 5. Say 3 is selected. 3. The sample will include the 3rd, 8th, 13th, 18th, 23rd , and 28th students on the roster. Section 1.1, Page 8 18 Multistage Sampling Designs Multistage Sampling: A sample design in which the elements of the sampling frame are subdivided and the sample is chosen in more than one stage. Stratified Random Sampling: A sample is selected by stratifying the population, or sampling frame, and then selecting a number of items from each of the strata by means of a simple random sampling technique. The strata are usually subgroups of the sampling frame that are homogeneous but different from each other. Example: Select a sample of six students from a class of 30 so that the sample contains an equal number of males and females. 1.List the males and females separately 2.Take a simple random sample of 3 students from each group. 3.The six students selected are the sample. Section 1.3, Page 15 19 Multi-Stage Sampling Designs Cluster Sample: A sample obtained stratifying the population, or sampling frame, and then selecting some or all of the items from some, but not all of the strata. The strata are usually easily identified subgroups of the sampling frame that are similar to each other. This is often the most economical way to sample a large population. Example: Take a sample of 300 Catholics in the Seattle Area. 1. Get a list of the Catholic Parishes in the Seattle area. 2. Take a random sample of 3 parishes. 3. In each parish, select a simple random sample of 100 parishioners. Section 1.3, Page 16 20 Problems Section 1.3, Page 20 21 Problems Section 1.3, Page 20 22 Problems a. What kind of study was this – experiment or observational study? b. What sampling method was used? c. Can these results be use for statistical inference? Why or why not? Problems, Page 20 23 Probability vs. Statistics If a chip is drawn at random from a bag containing these chips, the probability that it will be green is 20/60 =1/3. A sample of ten 10 is drawn from the bag. There were 3 green chips. We are 95% sure that the true proportion of green chips is between .25 and .35. Section 1.4, Page 16 24 Problems Problems, Page 21 25 Karl Pearson Father of Modern Statistics “In the 20th century, the role of mathematics has become increasingly decisive, and studies of these new statistical tools and practices are gradually being written, episode by episode discipline by discipline. In the end, a picture will emerge of a powerful body of mathematics, allied to schemes for data gathering and designing experiments, that has become one of the most important sources of scientific expertise and guarantors of objectivity in the modern world. It is the narrow gate through which must pass new pharmaceuticals, manufacturing processes, official measures of all descriptions, and empirical findings of psychologists, economists, biologists and many others. In that sense, its import goes far beyond the history of a mathematical discipline. Statistics has functioned as no narrow specialty, but as a vital if often invisible element of the cultural history of government, business, and the professions, as well as science.” ”Karl Pearson, The Scientific Life in a statistical age” by Theodore Porter, 2004. page 4. 26