- A few more notes about Z SPSS and the normal curve Chapter 6: Samples vs. Populations Convenience/accidental sampling: why online polls suck Last day, we looked at the relationship between standard scores (z-scores) and raw scores. For example, if the average alcohol consumption of all towns had a mean μ = 8/week and σ = 2/week. If in Burnaby, people drank an average of 7.2/week. Their z-score would be… Z= -0.4, and they would drink more than… 34.46% of towns as a whole, or less than 65.54% towns as a whole. Z scores and SPSS. Start with the data set from the web page “Dragons”. There are a bunch of variables of 300 adult bearded dragons (artificially made, sorry). We’ll be using this dataset for some future exercises, so it has more than we need at the moment. Go to Analyze Descriptive Stats Frequencies, and choose “Weight” and “Length” Go to Statistics, and choose Mean, Median, and Standard Deviation. Go to Charts, select Histogram”, and check the box “Include normal curve”. The number of bearded dragons in each equally spaced category is the height of each bar in the histogram. The bars are about the same height as the normal curve, so length is approximately normal. The weight of bearded dragons is right-skewed, so weight is non-normal. Likewise, the mean is greater than the median. Basil has a length of 24 cm, given that μ = 27.83 cm, cm, we get the z-score. Z = (X - μ ) / σ = (24 – 27.83) / 5.06= -0.76 By the table he’s bigger than 22.36% of the dragons. σ = 5.06 We can verify by getting the 22.36th percentile, under Analyse Descriptive Frequencies and in Stats again. … Then click Percentile(s), put in 22.36% and click ‘Add’. For this data set, 22.36% of the values are below 24, which is close to basil’s weight of 24. We only have a sample of dragons, so it’s not going to be dead on. For perfect precision, we would need the entire population of bearded dragons. Beginning of Chapter 6: Samples and Populations Usually we’re interested in the features of an entire population, but often it’s impossible to get information about every single member of that population. Instead we take a sample, which is a small portion of the population of interest. We hope the sample represents the population fairly. Example: Blood test. If you’re going for a blood test, you’re interested in knowing the state of all the blood. Rather than take ALL the blood out of you to test, the clinic will take a SAMPLE of your blood as a representative. Example: Phone polls In an opinion poll, we’re interested in the opinion of all the people in an area. (The parameter) What we get are the opinions of the people that we call and ask. (The statistic) The parameter (of the population) is what we want. A statistic (of a sample) is what we get. What we want What we get The symbols we use reflect this relationship: Statistics, the values pertaining to Samples, have ordinary looking symbols like for the mean, or s for the standard deviation. Parameters, the value related to Populations, have fancy greek symbols like μ for the mean and σ for the standard deviation. Mnemonic (memory trick): Application: Label each of the bolded values as a statistic or a parameter. Of the 1046 people polled, 719 knew where the circuit breaker was in their home. (Statistic, 1046 polled is a SAMPLE) Of all the people in Vancouver, 70% of them know where the circuit breaker was in their home. Parameter, all of Vancouver is the population) A car was tested and found consume 7.8 L per 100km on the highway. Canada consumes 24.2 Barrels of Oil per year per capita. Alice won the election with 55% of the votes. But the week before, the polls showed her at 42%. In all of these sample examples, we’re making one really big assumption: The sample is representative of the population. This lets us take the sample and generalize it to the whole population. e.g. The car we tested consumed 7.8L/100km, we assume that most cars of the same model and year will have similar mileage. To make this assumption of representation, our sample has to chosen randomly. Random for our purposes means every member of the population has an equal chance to be in the sample. (Important!) A simple random sample, or SRS, is a sample in which every member has an equal chance of being in the sample AND this is independent of other members. In other words, an SRS is a random sample with no other structure / plan to it. (also important) Example: Raffle tickets From a large drum of names, pick a few. This is: Example: Raffle tickets From a large drum of names, pick a few. This is: SRS. Example: Opinion Polls. Opinion polls are done by choosing phone numbers at random and calling them. This is: Example: Opinion Polls. Opinion polls are done by choosing phone numbers at random and calling them. This is: SRS. Simple Random Sample (SRS) because choosing one phone number isn’t going to affect choosing another one. Example: Class opinion. I try to get an opinion from the class by asking the front row. This is: Example: Class opinion. I try to get an opinion from the class by asking the front row. This is: Not Random!! Why is not random bad in this case? People in the front of the class tend to be more engaged in the material and less likely to slumber. Engaged people are overrepresented. Also, the people in the front have self-selected themselves to be there. That’s a common problem with polls. Polls on webpages and social media are self-selected. This means people are choosing for themselves to response, rather than being randomly chosen. This is called convenience sampling, or accidental sampling. It’s easy but it has a lot of problems. People that don’t know about the poll or decide not to be polled have zero chance of being in the sample. This is also why I made a to-do about the representative assumption in the class survey in the first week. Like the first row sample, it’s probably over representing the engaged students, but making it random and compulsory seemed like overkill. (for interest) Convenience/Accidental sampling can also be easy to manipulate. A specific group within the population can make a dedicated effort to throw the results in one direction artificially. - Stratified Samples Systematic Samples Samples can vary If time: Landlines and the Canadian election