Example 1.2 Can you determine representative? Consider the following exercise. The goal of this simple exercise is to determine the average number of squares per bunch in the following picture. Instead of counting the number of squares in all 100 bunches, identify 10 representative bunches and record their identification number in the first row of the table below. For each bunch, count and record the number of squares and place the results in the second row. Bunch ID # of Squares Questions: • What is the average number of squares per bunch for your representative sample? • What is the average number of squares from one or more of your neighbors? • How do the results from your representative sample compare to your neighbors? 10 Obtain the average number of squares per bunch from several individuals in the class and record their values in the following table. Individual Average How well did we do? • On the following number line, sketch each of the averages recorded above. ____________________________________________________ • The average number of squares per bunch (for all 100 bunches) is _____ Discussion… 11 What about random sampling? The goal of random sampling is to ensure that a representative sample is taken. There are various random sampling methods with the simplest being simple random sampling. Definition: Simple random sampling – a sampling method is which each observations in the population has an equal chance of being selected. Taking a simple random sample traditionally meant putting a piece of paper for each ”observation” in a hat and random selecting observations. Even though this may sound exciting, statisticians use computers to select simple random samples. Obtaining a simple random sample using JMP Open the Random_Rectangles.JMP data file. Select Tables > Subset. In the subset window, select 10 in the Random – sample size: box. Specify that you want All Columns from the original table. Finally, give resulting table of randomly selected observations name in the Output table name: box. 12 The following randomly selected subset is returned. Example 1.3 Summarizing the random sample results. In the following table, list the IDs and counts for the randomly selected observations given above. Bunch ID # of Squares 13 How well does simple random sampling do? Consider the following 10 random samples I’ve selected. Plot the averages from these 10 random samples on the same number line for which you plotted the results of class on earlier. How do the results from the 10 random samples compare to the results from the 10 representative samples selected in Example 1.2? Discuss the similarities / differences? 14 Section 1.4: Sampling Errors There are two types of sampling errors. Sampling: Errors that naturally occur in a random sampling process The behaviors of these errors are well understood when good sampling techniques are used Summary > errors cause by the act of sampling > have the potential to be bigger in smaller samples than in larger samples > it is possible to determine to what degree they will effect the outcome > unavoidable (this is the price of ensuring representative sample) Nonsampling: Errors due to things other than the sampling process The errors are more difficult to control and should be of concern whenever measurements are taken. Some Examples: > Nonresponse > Voluntary Response > Hidden Biases / Lurking Variables > Survey design effects / question effect Summary > are more problematic than sampling errors > are always present > may be impossible to correct after data is collected > nearly impossible to determine the degree to which they adversely effect the analysis > minimized by using good survey / data collection methodologies 15 Section 1.5: Random Variables / Distributions Definition: Observation – the collect of measurements from a particular object Definition: Variable – is any measurable characteristics of an observation The definition of variable is often used more loosely and is used to represent the set of measurable characteristics across all observations. Example 1.4 Consider the following data from the Lead El Paso study. Of interest here is the Location=Close children in the study. Questions: • Give an example of two different observations. • Give an example of three variables. 16 The concept of a random variable and probability distribution are important to your understanding of inferential statistics. Definition: Random Variable – is simply a variable or measurement that is obtained through some random process Definition: Distribution – a table or graph of all possible random variables. A distribution list the possible values for the random variable and also gives the frequency of occurrence for each random variable. Comments • All random variables have a distribution • Certain types of random variables occur so frequently that we name their distribution. For example, the bell-shaped distribution is thought to occur so frequently that we’ve labeled it the normal distribution. Example 1.5: Consider the following 22 observations from the El Paso Lead Study whose Location = Close. Let these 22 observations represent the population. That is, we only care about making decision about these 22 individuals. 17 Take a simple random sample of 5 individuals from this population. Place their value in the table below. ID Sex Age Colic Clum Irr Loc Years Test Year1 IQ Lead1 Lead2 Close Type Year2 1 2 3 4 5 Main Ideas: • EVERTHING in the population is unknown and fixed • EVERTHING in the sample is known and random • EVERTHING in the population has a corresponding component in the sample Two final definitions Definition: Parameter – summary characteristic of a distribution Definition: Statistic – summary characteristic of a sample 18