CHAPTER 1 STATISTICS Statistics is a way of reasoning, along with a collection of tools and methods, designed to help us understand the world READ THE BOOK Think Show Tell For Example Step-by-Step What can go wrong* What have we learned? CHAPTER 2 DATA Information together with its context Numerical Names Labels Five W’s Who, What, When, Where, Why How WHO Respondents: Individuals who answer a survey Subjects or Participants: People on whom we experiment. (Experimental Units) Records or Cases: Rows in a database or data table. Individuals about whom or about which, we have the data. WHAT Variables Characteristics recorded about each individual. These are usually columns in a data table, and they should have a name that identifies what has been measured. Categorical (or Qualitative) Quantitative (Numerical values with measurement units) Ordinal …more W’s Where and When? How? Country? Year? How the data was collected? Why? Reason for the study Exercise Investments. According to an article in Fortune (Dec.28, 1992), 401(K) plans permit employees to shift part of their before-tax salaries into investments such as mutual funds. Employers typically match 50% of the employees’ contribution up to about 6% of salary. One company, concerned with what it believed was a low employee participation rate in its 401(k) plan, sampled 30 other companies with similar plans and asked for their 401(k) participation rates. Identify the W’s Who ? 30 Companies What ? Participation Rates Quantitative (Units : Percent) When ? Sometime after 1992 Identify the W’s (cont.) Where ? Why ? USA The company was concerned with its participation rate compared with other companies How ? Companies were sampled using an unspecified method Exercise Flowers. In a study appearing in the journal Science a research team reports that plants in southern England are flowering earlier in the spring. Records of the first flowering dates for 385 species over a period of 47 years indicate that flowering has advanced an average of 15 days per decade, an indication of climate warming according to the authors. Identify the W’s Who ? 385 species of flowers over 47 years What ? First flowering date Quantitative (Units : days) When ? Not specified Identify the W’s (cont.) Where ? Why ? Southern England Researchers associate this behavior with climate warming How ? Observation. ( Method not specified) Chapter 3. Displaying and Describing Categorical Data Make a picture First Make piles Organize the counts by categories in a frequency table (counts) or a relative frequency table (percentages) Both types of tables describe the distribution of the categorical variable because they name the possible categories and tell how frequently each occurs The Area Principle The area occupied by a part of the graph. It should correspond to the magnitude of the value it represents Bar Charts A bar chart displays the distribution of a categorical variable, showing the counts for each category next to each other for easy comparison. Bar Chart 1000 800 Frequency 600 400 200 0 First Second Third Class Crew Pie Charts Relative proportion (percentages instead of counts). Pie charts show the whole group of cases as a circle, each of the pieces has a size proportional to the fraction of the whole in each category. Pie Chart 15% 40% 13% First Second Third Crew 32% Contingency Tables Two categorical variables Class Survival First Second Third Crew Total Alive Dead 202 123 118 167 178 528 212 673 710 1491 Total 325 285 706 885 2201 Marginal and Conditional distributions Marginal Distribution Conditional Distribution Distribution of either variable alone (at the margin of the table) A distribution in one variable for only those individuals satisfying some condition on another variable. Note : If the distribution of one variable is the same for all categories of another we say that the variables are independent. Exercises Step-by-Step page 31 What can go wrong Check the charts on pages 34 Simpson’s Paradox (page 35)