STA 2023 Elementary Statistics Lecture Notes Chapter 1 – Introduction to Statistics Professor Achenbach Introduction Statistics - the science of collecting, organizing, analyzing, and interpreting Data. Chapter 1: Collecting Data Chapter 2: Organizing/Analyzing Data Chapters 3-8: Interpreting Data Types of Data Sets Population - data set consisting of all outcomes, measurements, or responses of interest Sample - data set which is a subset of the population data set Examples: If we are interested in measuring the salaries of American high-school teachers, the population data set would be a list of the salaries of every high-school teacher in America. A sample data set could be obtained by selecting 100 high-school teachers from a across the country and listing their salaries. A polling organization wants to know whether Americans favor increased defense spending. The population data set would consist of the responses of every American. A common way of choosing a sample data set would be to randomly call 1000 Americans and gather their responses to the question of whether they favor increased defense spending. A biologist wants to measure the weights of female Alaskan grizzly bears. What would be the population data set? A possible sample set? 1 Types of Measurements Parameter - a numerical measurement made using the population data set Statistic - a numerical measurement made using a sample data set Examples: Using the teacher salary data sets, we could calculate the average salary for the high-school teachers. The average calculated from the population data set would be the parameter. The average calculated from the sample of 100 teachers would be a statistic. Using the opinion poll data on defense spending, we could calculate the percentage of Americans who favor increased defense spending. The actual percentage of all Americans who favored increased defense spending would be the parameter. The percentage of the 1000 Americans in our sample who favored increased spending would be a statistic. Notice that unless the population is very small it is probably impossible to gather the population data set, and so it is usually impossible to calculate the parameter we are interested in. The main idea of the science of statistics is that we can get around this difficulty by selecting a sample, calculating the sample statistic, and use the sample statistic to make an estimate of the parameter. Unfortunately, statistical estimates can never be 100% certain. (But they can be 90% or 95% or 99% certain) Types of Data Qualitative Data - non-numerical characteristics or labels Examples: Eye Color, First Name Favorite Movie, Political Party Quantitative Data - numerical measurements or quantities Examples: Height, Weight, Income Resting Pulse Rate, Blood Alcohol Level 2 Levels of Measurement Nominal Data – Can be qualitative only. Data values serve as labels, but the labels have no meaningful order. Examples: Blood Type, College Major, Breed of Dog Shape of Bacteria in a Petri Dish Ordinal Data – Can be qualitative or quantitative. Data values serve as labels but the labels have a natural meaningful order. Differences between values, however, are meaningless. Examples: Statistics Grade, NCAA Basketball Rankings Terror Threat Level Interval Data – Are always quantitative. Data values are numerical, so they have a natural meaningful order, and differences between data values are meaningful. The ratio of two data values, however, is meaningless. This occurs when zero is an arbitrary measurement rather than actually indicating “nothing”. Examples: Temperature, Year of Birth Ratio Data – Are always quantitative. Data values are numerical, have order, and both differences and ratios between values are meaningful. Zero measurement indicates absence of the quantity being measured. Examples: Weight, Height, Volume, Number of Children 3 Methods of Data Collection Method Examples Census - collect measurements from the entire population Used when population is small. Sampling - choose a sample from your population and collect measurements from sample. Determine average grade on a Statistics exam Measure salaries of all 50 state governors Opinion Polls Determine average income in U.S Temperature at the core of the Sun Monte Carlo Simulations Used when population is large. (Most Common) Simulation - Program a computer with a mathematical or physical model to simulate population data. Used when impossible to collect sample data. Experiment - Collect a sample, split the sample into two groups: The Case Group receives treatment. The Control Group does not. Used to measure the effect of treatment by comparing the characteristics of the case and control groups. Additional Terms: Placebo, Placebo Effect Single Blind Experiment Double Blind Experiment 4 A sample of 200 cancer patients is selected. An experimental drug is given to 100 patients and the remaining 100 patients receive a placebo. The survival rates of the two groups are then compared Methods of Sampling Method Examples Random Sampling - The sample is chosen as a result of chance occurrences Systematic Sampling - The population is placed on a list, a random starting point is chosen and then every k-th member is selected. Telephone polling random telephone numbers Drawing names out of a hat Choosing a sample of registered voters by choosing every 25th voter from the county registration roll Testing every 300th product from the assembly line Stratified Sampling - The population is divided into groups (strata) usually with meaningful differences, and a sample is chosen from each group. Cluster Sampling - The population is divided into groups in a more or less random way, and then a sample is chosen by randomly selecting entire groups. Randomly choose 10 polling stations in a city and exit poll all voters at those stations Convenience Sampling - Choose individuals for a sample because they are easy to include. Internet Polls Mail-In Customer Survey 5 Choosing 200 men and 200 women for a sample Stratify the population by income level and then choose a sample of low, middle, and high income individuals