Practicing the Concepts #1 * Basic Concepts and Terminology

advertisement
Example 1.2 Can you determine representative?
Consider the following exercise. The goal of this simple exercise is to determine
the average number of squares per bunch in the following picture.
Instead of counting the number of squares in all 100 bunches, identify 10
representative bunches and record their identification number in the first row of
the table below. For each bunch, count and record the number of squares and
place the results in the second row.
Bunch ID
# of Squares
Questions:
•
What is the average number of squares per bunch for your representative
sample?
•
What is the average number of squares from one or more of your
neighbors?
•
How do the results from your representative sample compare to your
neighbors?
10
Obtain the average number of squares per bunch from several individuals in the
class and record their values in the following table.
Individual
Average
How well did we do?
•
On the following number line, sketch each of the averages recorded
above.
____________________________________________________
•
The average number of squares per bunch (for all 100 bunches) is _____
Discussion…
11
What about random sampling?
The goal of random sampling is to ensure that a representative sample is taken.
There are various random sampling methods with the simplest being simple
random sampling.
Definition: Simple random sampling – a sampling method is which each
observations in the population has an equal chance of being selected.
Taking a simple random sample traditionally meant putting a piece of paper for
each ”observation” in a hat and random selecting observations. Even though this
may sound exciting, statisticians use computers to select simple random
samples.
Obtaining a simple random sample using JMP
Open the Random_Rectangles.JMP data file. Select Tables > Subset.
In the subset window, select 10 in the Random – sample size: box. Specify that
you want All Columns from the original table. Finally, give resulting table of
randomly selected observations name in the Output table name: box.
12
The following randomly selected subset is returned.
Example 1.3 Summarizing the random sample results. In the following table, list
the IDs and counts for the randomly selected observations given above.
Bunch ID
# of Squares
13
How well does simple random sampling do?
Consider the following 10 random samples I’ve selected.
Plot the averages from these 10 random samples on the same number line for
which you plotted the results of class on earlier.
How do the results from the 10 random samples compare to the results from the
10 representative samples selected in Example 1.2? Discuss the similarities /
differences?
14
Section 1.4: Sampling Errors
There are two types of sampling errors.
Sampling: Errors that naturally occur in a random sampling process
The behaviors of these errors are well understood when good sampling
techniques are used
Summary
> errors cause by the act of sampling
> have the potential to be bigger in smaller samples than in larger samples
> it is possible to determine to what degree they will effect the outcome
> unavoidable (this is the price of ensuring representative sample)
Nonsampling: Errors due to things other than the sampling process
The errors are more difficult to control and should be of concern whenever
measurements are taken.
Some Examples:
> Nonresponse
> Voluntary Response
> Hidden Biases / Lurking Variables
> Survey design effects / question effect
Summary
> are more problematic than sampling errors
> are always present
> may be impossible to correct after data is collected
> nearly impossible to determine the degree to which they adversely effect
the analysis
> minimized by using good survey / data collection methodologies
15
Section 1.5: Random Variables / Distributions
Definition: Observation – the collect of measurements from a particular object
Definition: Variable – is any measurable characteristics of an observation
The definition of variable is often used more loosely and is used to represent the
set of measurable characteristics across all observations.
Example 1.4 Consider the following data from the Lead El Paso study. Of
interest here is the Location=Close children in the study.
Questions:
•
Give an example of two different observations.
•
Give an example of three variables.
16
The concept of a random variable and probability distribution are important to
your understanding of inferential statistics.
Definition: Random Variable – is simply a variable or measurement that is
obtained through some random process
Definition: Distribution – a table or graph of all possible random variables. A
distribution list the possible values for the random variable and also gives the
frequency of occurrence for each random variable.
Comments
•
All random variables have a distribution
•
Certain types of random variables occur so frequently that we name
their distribution. For example, the bell-shaped distribution is thought
to occur so frequently that we’ve labeled it the normal distribution.
Example 1.5: Consider the following 22 observations from the El Paso Lead
Study whose Location = Close. Let these 22 observations represent the
population. That is, we only care about making decision about these 22
individuals.
17
Take a simple random sample of 5 individuals from this population. Place their
value in the table below.
ID
Sex Age Colic Clum Irr Loc
Years Test
Year1
IQ
Lead1 Lead2
Close Type
Year2
1
2
3
4
5
Main Ideas:
•
EVERTHING in the population is unknown and fixed
•
EVERTHING in the sample is known and random
•
EVERTHING in the population has a corresponding component in the sample
Two final definitions
Definition: Parameter – summary characteristic of a distribution
Definition: Statistic – summary characteristic of a sample
18
Download