What is Statistics? Statistics is the gathering, organizing, analyzing, and presenting of numerical information. The data gathered by statistical studies are used to guide decisions, explain events, predict future courses of action, or provide the basis for a solution to a problem. Population vs. Sample Once you have decided on the topic you wish to study, the first major step of your study involves gathering the data. From whom you are going to gather the data is your first decision. Population all individuals who belong to a group being studied Group being studied Sample a selection of individuals taken from a population People that are actually asked or polled Identify the population for each of the following questions a) Whom do you plan to vote for in the next Ontario election? All Canadian Citizens that live in Ontario of voting age _____________________________ b) Do women prefer to wear ordinary glasses or contact lenses? Women who require corrective eyewear _______________________________ Determine if the following is a sample or a population a) A representative from each hockey team is asked to complete a survey on game times Sample __________________ b) Canada census survey Population __________________ c) One in every 10 bottles of pop are tested for defects in a factory Sample ________________________ Types of Data and Sampling Once you have determined the population that you are considering for your study. The next step in completing your study is obtaining a sample that best represents your population. Sample selection is one of the key factors that will determine if your survey is valid and will produce legitimate conclusions Types of Data Raw Data This is the name given to data that has not yet been analyzed, only collected. Discrete Data There is a limit to the categories that data can be placed in. Ex. The soft drink size at the movie theatre There are only the 4 categories and it is not possible to go in between them. Continuous Data All rational values. The data can take on any value, particularly decimal values of infinite place value. Discrete Data Population numbers Counts of physical objects where fractions don’t make sense (people) Continuous Data Time ( can win a race in 3 seconds or 3.4 seconds or 3.148 etc..) Length Mass 4 Types of Data Interval Data Discrete This is data that can be linked into categories but those categories can not be ranked or quantified Ex: if a survey asks what type of food you prefer: Chinese, Italian, American or Indian. Discrete Data is organized into rankings. Ex: Rank your top five favourite movies. Matrix = 1 Batman Begins = 2 etc… The order doesn’t matter as long as the data can be ranked the way that you want it to be. Ex: Matrix = 100 Batman Begins = 300 Discrete Data is categorized into numerical groupings in which the distance between these groupings is the same The initial or zero point is arbitrary Ex: Intervals 2006-2007 is the same 2005-2006 Ex: IQ intervals as Continuous All continuous data is Ratio Data. The name ratio comes from Rational, the number system which contains decimal values Ex: Your time in the 100 m dash Sampling The method used to collect sample data from a population is very important and can mean the difference between a credible conclusion or a biased one Simple Random Sampling Gives all the elements of the population an equal chance of being a part of the sample. Must be as impartial as possible and not favouring one over the other Systematic Sample Selecting a sample from a population is done systematically or through a constant counting process Ex: picking every 100th person from a phone book To determine if you should choose ever 5th or 100th item find the ratio of the population and sample If you wanted a tenth of the population then select every 10th item. Ex: A telephone company is planning a marketing survey of its 760 000 customers. For budget reasons, the company wants a sample size of about 250. a) Determine the interval that should be used for a systematic sample. population size interval = Therefore the company sample size should be selecting every 760000 3040th customer for their survey 250 3040 Stratified Sample Takes into account that a population is made up of many demographics that tend to react differently If a population of turtles has more females than males, then if the sample is purposely weighted with more females than males in a proportional number to the population, it is stratified sample. To determine how many subjects from each subgroup to select determine the percent of that subgroup is in the population and multiply by the number desired in the sample # subgroup # sample population Ex: Before booking bands for the school dances, the students’ council at Statsville H.S. wants to survey the music preferences of the student body. The following table shows the enrolment at the high school a) Design a stratified sample for a survey of 25% of the student body 25% of the student body is 880 x 0.25 = 220 Grade # Students 9 255 10 232 11 209 12 184 Total 880 # subgroup # sample population 255 220 880 63.75 Complete this step for each grade and you should get that there should be: • • • = 64 gr 9's should be selected • 64 58 52 46 gr gr gr gr 9’s 10’s 11’s 12’s To check they should add up to 220 Cluster Sample Takes advantage of groups that have similar characteristics of other similar groupings Randomly selecting whole classes assuming they are random Multi-Stage Sample Uses compound randomization A study that determines passenger safety in cars randomly picks a car manufacturer (stage 1), then randomly picks a vehicle type like a van, compact, truck (stage 2), then randomly picks a type of car in that class (stage 3). Ex: Suppose that your population consisted of all Ontario households. How would you create a Multi-Staged Sample? You could first randomly select from the different towns/cities in Ontario Then randomly select a sample of blocks or subdivision within the selected cities Finally you could then select from individual homes on that block Voluntary-Response Sample Depends on the initiative of the sample itself Internet and mail polls Elements selected for the sample may or may not respond This creates a potential bias Convenience Sample Samples local elements that are nearby or elements that are accessible with little or no cost Telephone or internet Homework Pg 117 #4,6,8,9,11 Pg 123 # 1-6