5/19/22 Sampling & Data Introductory Statistics Ellen Smyth 2 1 2 Data Data is information we gather through experiments and surveys (sampling). 1. Experiment on coronavirus vaccine • Getting the virus or not • Symptoms of vaccine vs placebo • Severity of virus 2. Survey (observational study) on effectiveness of a TV ad Data: percentage who went to Starbucks since ad aired Statistics Statistics involves: Designing studies, Analyzing resulting data, and Translating data into knowledge and understanding. 3 3 4 4 1 5/19/22 Sample Statistics & Population Parameters Population and Samples Parameter: numerical summary of population Statistic: numerical summary of sample • Population: Subjects of interest • Sample: Subset for whom we have data • Often want answers about large group but can’t measure all, so a subset is chosen • Use statistical techniques to make conclusions 5 6 Randomness Observational Study Each possible outcome in the entire population has an equal chance of being chosen Merely observe values of response and explanatory variables without doing anything to control the subjects (Survey, census, or just tracking their data anonymously) 7 7 8 2 5/19/22 Simple Random Sampling (SRS) Cluster Random Sample 1. Divide population into large number of clusters, such as city blocks 2. Select simple random sample of clusters 3. Use all subjects in clusters as sample Each possible sample of set size n has equal chance of being selected • To get a truly simple random sample, we must either: 1. Put names in hat 2. Assign each subject a number and use a random number generator to choose subjects 9 10 Cluster Random Sample Stratified Random Sample 1. Divide the population into groups, strata 2. Select SRS from each strata 3. Combine samples from each for total sample Advantages • Sampling frame unavailable • Cost Disadvantage • Need larger sample size for same reliability 11 12 3 5/19/22 Systematic Random Sample Convenience Samples: Poor Ways to Sample Process: • Use random number generator for 1st subject • Choose every nth person after that where n = population / sample size 13 Convenience Sample: easy to get • Unlikely to represent population • Often severe biases • Results apply only to observed subjects 14 Elements of an Experiment Experimental unit: Single subject or individual to be measured Explanatory variable: Explains by coming first and possibly causing change in the other variable Experiment Response variable: Affected or responding variable Treatment: Entire set of values of explanatory variable in experiment Imposes certain conditions to control subjects and observes outcomes 16 15 16 4 5/19/22 Control Group Subjects Group in a randomized experiment that receives an inactive treatment (placebo) but is otherwise managed exactly as the other groups Subjects - The entities that we measure in a study Subjects could be 1. individuals, 2. schools, 3. rats, 4. counties, 5. widgets This Photo by Unknown Author is licensed under CC BY-NC-ND 17 17 18 18 Variable A variable is any characteristic that is recorded for the subjects in a study Examples: Marital status, Height, Weight, IQ A variable can be classified as either Categorical or Quantitative Discrete or Continuous Sampling Error Sampling Error – the natural variation that results from selecting a sample to represent a larger population 19 19 20 20 5 5/19/22 Categorical Variable Quantitative Variable A variable is categorical if each observation belongs to one of a set of categories. Examples: 1. Gender (Male or Female) 2. Religion (Catholic, Jewish, …) 3. Type of residence (Apt, Condo, …) 4. Belief in life after death (Yes or No) A variable is called quantitative if observations take numerical values for different magnitudes of the variable. Examples: 1. Age 2. 3. Number of siblings Annual Income 21 21 22 22 Discrete Quantitative Variable Continuous Quantitative Variable • A quantitative variable is discrete if its possible values form a set of separate numbers: 0, 1, 2, 3, …. • Examples: 1. Number of pets in a household 2. Number of children in a family 3. Number of foreign languages spoken by an individual A quantitative variable is continuous if its possible values form an interval Measurements Examples: 1. 2. 3. 4. Height Weight Age Blood pressure 23 23 24 6 5/19/22 Quantitative vs. Categorical Proportion & Percentage (Rel. Freq.) • For Quantitative variables, key features are the center and spread (variability). • For Categorical variables, a key feature is the percentage of observations in each of the categories . Proportions and percentages are also called relative frequencies. 25 25 26 Count all but leading zeros Outlier An outlier falls far from the rest of the data 28 27 28 7 5/19/22 Look to next digit: As you work on: • • • • Round up if 5 or more Discussions Homework Projects Quizzes Tools Formula card Calculator Lecture Notes Textbook & Knewton Instruction Round down if 4 or less Message Me 29 29 30 pollev.com/smyth 31 31 32 32 8 5/19/22 33 33 34 34 The sample 35 35 36 36 9 5/19/22 The population Very specific example of a parameter 37 37 38 38 Exactly how you’d acquire a simple random sample including details for how students would be selected assuming you had unlimited resources Very specific example of a statistic 39 39 40 40 10 5/19/22 Whether the previous simple random sample would give us a truly representative sample for APSU students 41 41 11