1.2 KEY TERMS • Statistics:The collection, analysis, interpretation and presentation of data • Population: A collection of persons, things or objects under study (ex. All students at FBHS) • Sample: A portion or subset of the larger population to collect data from and study (ex. This class as a subset of students at FBHS) • Statistic: A number that represents a property of the sample (ex. If this class is a sample of the school, the average GPA of this class) • Parameter: A number that is a property of the population (ex. The average GPA of all FBHS students) • Variable: (X or Y) a characteristic of interest for each person or thing in a population • • • Numerical variables: Take on values with number measures (ex. Weight in lbs., time in min.) Categorical variables: Place the person or thing in a category (ex. If X is favorite colors, examples could be purple, blue, etc.) Data: values for the variable from each individual IDENTIFY THE POPULATION, SAMPLE, PARAMETER, STATISTIC, VARIABLE AND DATA: To determine the average time FBHS students take to get ready in the morning, we ask each student in one English class how long they take. Population = all FBHS students Sample = students in the English class Parameter = average time for all students Statistic = average time for the English class Variable = X = time it takes one student Data = ex. 20 min., 1 hour, 35 min. IDENTIFY THE POPULATION, SAMPLE, PARAMETER, STATISTIC, VARIABLE AND DATA: A phone survey is conducted to determine what proportion of eligible voters have already decided for whom they will vote for president Population = all eligible voters Sample = voters called Parameter = proportion of voters already decided Statistic = proportion of called voters already decided Variable = Y = the number of voters who have already decided Data = decided, undecided PRACTICE: IDENTIFY THE POPULATION, SAMPLE, PARAMETER, STATISTIC, VARIABLE AND DATA: To determine how long, on average, U.S. high school students sleep a night, all students from 10 randomly selected schools are asked Population = U.S. high school students Sample = students in the 10 chosen schools Parameter = average time sleeping of U.S. students Statistic = average time sleeping of students in the 10 chosen schools Variable = X = time one student sleeps Data = hours, ex. 7 hours, 5 hours, 9 hours 1.3 KEY TERMS • Qualitative data: categorize or describe a population (ex. Hair color, favorite music, blood type) • Quantitative data: Always numbers (ex. Height, number of people with a certain trait) • Quantitative discrete: data that can take on only certain numerical values (ex. Number of days per week someone exercises, number of magazines a person reads) • Quantitative continuous: data that results from measuring (ex. Time spent in line, weight of fruit) WORK COLLABORATIVELY TO DETERMINE THE CORRECT DATA TYPE (QUANTITATIVE OR QUALITATIVE). INDICATE WHETHER QUANTITATIVE DATA ARE CONTINUOUS OR DISCRETE. HINT: DATA THAT ARE DISCRETE OFTEN START WITH THE WORDS "THE NUMBER OF." a. the number of pairs of shoes you own b. the type of car you drive c. where you go on vacation d. the distance it is from your home to the nearest grocery store e. the number of classes you take per school year. f. the tuition for your classes g. the type of calculator you use h. movie ratings i. political party preferences j. weights of sumo wrestlers k. amount of money (in dollars) won playing poker FIGURE 1.3 What type of data does this pie chart display? FIGURE 1.4 What type of data does the graph display? FIGURE 1.5 Pie chart to display qualitative data. FIGURE 1.6 This bar graph displays the same data as the previous chart. FIGURE 1.7 What do you notice about this bar graph? FIGURE 1.8 What do you notice about this bar graph? FIGURE 1.9 Bar graph with Other/Unknown Category FIGURE 1.10 Pareto Chart With Bars Sorted by Size FIGURE 1.11 Organization matters! 1.2 KEY TERMS (CONT.) • Random sampling: each member of a population has an equal chance of being selected for the sample. The goal is that the sample has the same characteristics of the population • Simple random sample: choose n objects at random (ex. Put numbers in a hat, use a random number generator) • Stratified sample: divide the population into groups (strata) and choose an equal proportion from each stratum • Cluster sample: divide the population into groups (clusters) and randomly select some of the clusters • Systematic sample: randomly select a starting point and take every nth piece of data from a listing of the population • Convenience sample: using results that are readily available • Sampling bias: when a sample is collected and some members of the population chosen are not as likely to be chosen as the others FIGURE 1.12 VARIATION HAPPENS! • Samples WILL be different, even on well chosen samples. We try to choose representative samples to get as close as we can to the parameter, but any statistic WILL have variability • There are some critical errors to be looking for: • • • • • • • • • Problems with samples: a non-representative sample Self-selected samples: responses by people who choose to respond Sample size issues: large samples are better Undue influence: collecting data or asking questions in a way that influences the response Non-response or refusal of subject to participate: The collected responses may no longer be representative of the population. Causality: A relationship between two variables does not mean that one causes the other to occur Self-funded or self-interest studies: A study performed by a person or organization in order to support their claim. Misleading use of data: improperly displayed graphs, incomplete data, or lack of context Confounding: When the effects of multiple factors on a response cannot be separate FIGURE 1.14 As the graphs show, Acme consistently outperforms the Other Guys! Examining this statement, as a statistician, what do you notice? FIGURE 1.15 What critical errors do you notice with this graph? 1.4 KEY TERMS • Frequency: The number of times a value of data occurs • Relative Frequency: the ratio (fraction or proportion) of the number of times a value of the data occurs in the set of all outcomes to the total number of outcomes • Cumulative relative frequency : the accumulation of the previous relative frequencies. CREATE A FREQUENCY TABLE FOR HOW LONG EVERYONE SLEPT LAST NIGHT Hours Tally Frequency Relative Frequency 3 4 5 6 7 8 9 • What percent of people slept 8 hours? • What percent of people slept up to 7 hours? • What percent of people slept at least 6 hours? Cumulative Relative Frequency FIGURE 1.13 Which graph represents cumulative relative frequency?