- A few more notes about Z

advertisement
-
A few more notes about Z
SPSS and the normal curve
Chapter 6: Samples vs. Populations
Convenience/accidental sampling: why online polls suck
Last day, we looked at the relationship between standard
scores (z-scores) and raw scores.
For example, if the average alcohol consumption of all towns
had a mean μ = 8/week and σ = 2/week. If in Burnaby,
people drank an average of 7.2/week. Their z-score would
be…
Z= -0.4, and they would drink more than…
34.46% of towns as a whole, or less than 65.54% towns as a
whole.
Z scores and SPSS.
Start with the data set from the web page “Dragons”.
There are a bunch of variables of 300 adult bearded dragons
(artificially made, sorry).
We’ll be using this dataset for some future exercises, so it has
more than we need at the moment.
Go to Analyze  Descriptive Stats  Frequencies, and choose
“Weight” and “Length”
Go to Statistics, and choose Mean, Median, and Standard
Deviation.
Go to Charts, select Histogram”, and check the box “Include
normal curve”.
The number of bearded dragons in each equally spaced
category is the height of each bar in the histogram. The bars
are about the same height as the normal curve, so length is
approximately normal.
The weight of bearded dragons is right-skewed, so weight is
non-normal. Likewise, the mean is greater than the median.
Basil has a length of 24 cm, given that μ = 27.83 cm,
cm, we get the z-score.
Z = (X - μ ) / σ = (24 – 27.83) / 5.06= -0.76
By the table he’s bigger than 22.36% of the dragons.
σ = 5.06
We can verify by getting the 22.36th percentile, under Analyse
 Descriptive  Frequencies and in Stats again.
… Then click Percentile(s), put in 22.36% and click ‘Add’.
For this data set, 22.36% of the values are below 24, which is
close to basil’s weight of 24.
We only have a sample of dragons, so it’s not going to be dead
on. For perfect precision, we would need the entire population
of bearded dragons.
Beginning of Chapter 6: Samples and Populations
Usually we’re interested in the features of an entire
population, but often it’s impossible to get information about
every single member of that population.
Instead we take a sample, which is a small portion of the
population of interest. We hope the sample represents the
population fairly.
Example: Blood test.
If you’re going for a blood test, you’re interested in knowing
the state of all the blood.
Rather than take ALL the blood out of you to test, the clinic will
take a SAMPLE of your blood as a representative.
Example: Phone polls
In an opinion poll, we’re interested in the opinion of all the
people in an area. (The parameter)
What we get are the opinions of the people that we call and
ask. (The statistic)
The parameter (of the population) is what we want.
A statistic (of a sample) is what we get.
What we want
What we get
The symbols we use reflect this relationship:
Statistics, the values pertaining to Samples, have ordinary
looking symbols like
for the mean, or s for the standard deviation.
Parameters, the value related to Populations, have fancy greek
symbols like
μ for the mean and σ for the standard deviation.
Mnemonic (memory trick):
Application: Label each of the bolded values as a statistic or a
parameter.
Of the 1046 people polled, 719 knew where the circuit
breaker was in their home. (Statistic, 1046 polled is a
SAMPLE)
Of all the people in Vancouver, 70% of them know where the
circuit breaker was in their home. Parameter, all of Vancouver
is the population)
A car was tested and found consume 7.8 L per 100km on
the highway.
Canada consumes 24.2 Barrels of Oil per year per capita.
Alice won the election with 55% of the votes.
But the week before, the polls showed her at 42%.
In all of these sample examples, we’re making one really big
assumption:
The sample is representative of the population.
This lets us take the sample and generalize it to the whole
population.
e.g. The car we tested consumed 7.8L/100km, we assume
that most cars of the same model and year will have similar
mileage.
To make this assumption of representation, our
sample has to chosen randomly.
Random for our purposes means every member
of the population has an equal chance to be in
the sample.
(Important!)
A simple random sample, or SRS, is a sample in
which every member has an equal chance of
being in the sample AND this is independent of
other members.
In other words, an SRS is a random sample with
no other structure / plan to it.
(also important)
Example: Raffle tickets
From a large drum of names, pick a few.
This is:
Example: Raffle tickets
From a large drum of names, pick a few.
This is:
SRS.
Example: Opinion Polls.
Opinion polls are done by choosing phone numbers at random
and calling them. This is:
Example: Opinion Polls.
Opinion polls are done by choosing phone numbers at random
and calling them. This is:
SRS.
Simple Random Sample (SRS) because choosing one phone
number isn’t going to affect choosing another one.
Example: Class opinion.
I try to get an opinion from the class by asking the front row.
This is:
Example: Class opinion.
I try to get an opinion from the class by asking the front row.
This is:
Not Random!!
Why is not random bad in this case?
People in the front of the class tend to be more engaged in the
material and less likely to slumber. Engaged people are overrepresented.
Also, the people in the front have self-selected themselves to
be there. That’s a common problem with polls.
Polls on webpages and social media are self-selected. This
means people are choosing for themselves to response, rather
than being randomly chosen.
This is called convenience sampling, or accidental sampling.
It’s easy but it has a lot of problems.
People that don’t know about the poll or decide not to be
polled have zero chance of being in the sample.
This is also why I made a to-do about the representative
assumption in the class survey in the first week.
Like the first row sample, it’s probably over representing the
engaged students, but making it random and compulsory
seemed like overkill.
(for interest) Convenience/Accidental sampling can also be
easy to manipulate.
A specific group within the population can make a dedicated
effort to throw the results in one direction artificially.
-
Stratified Samples
Systematic Samples
Samples can vary
If time: Landlines and the Canadian election
Download