Math 123- Statistics Chapter 1 Notes Name_______________________________ 1.1 An Overview of Statistics Def- Data is information from observations, measurements, or responses. Ex- The collection of data resulted in the discovery that 15% of the people in Ms. King’s classroom have black hair. Def- Statistics is the science of collecting, organizing, and interpreting data in order to make decisions. Types of Data Sets: 1. Populations 2. Samples Def- A population is the collection of all outcomes, responses, measurements, or counts that are of interest. Def- A sample is a subset, or part, of a population. Ex- Out of 30 students surveyed in elementary school, 25 said that they liked peanut butter and jelly sandwiches. Identify the population and sample in the scenario. Venn Diagram Ex- Identify the population and the sample. a) b) c) A survey of 898 U.S. adult VCR owners found that 16% had VCR clocks that were blinking “12:00.” d) A study of 860 U.S. ATM’s found that the average surcharge for withdrawals from a competing bank was $1.15. Def- A statistic is a numerical description of a sample characteristic. Def- A parameter is a numerical description of a population characteristic. Ex- Determine whether the numerical value describes a parameter or a statistic. a) The 2001 team payroll of the Baltimore Orioles was $74,279,540. b) In a survey of 300 U.S. adults, 22% own a cell phone. c) 19% of a sample of Indiana 9th graders surveyed smoke daily. d) In a survey of all freshmen at the University of Arizona, 89 students were majoring in astronomy. Branches of Statistics: 1. Descriptive Statistics 2. Inferential Statistics Def- Descriptive statistics is the branch of statistics that involves the organization, summarization, and display of data. Def- Inferential statistics is the branch of statistics that involves using a sample to draw conclusions about a population. Ex- Which part of the survey represents the descriptive branch of statistics? Make an inference based on the results of the survey. a) A survey conducted among 1000 men and women found that 25% of men and 32% of women exercised at least four days per week. b) A survey conducted among the residents of Arroyo Grande found that people owned an average of two dogs. 1.2 Data Classification Types of Data: 1. Qualitative Data 2. Quantitative Data Def- Qualitative data consists of attributes, labels, or non-numerical entries. Def- Quantitative data consists of numerical measurements or counts (in which you do mathematical calculations). Ex- Determine which data are qualitative or quantitative. a) The monthly salaries of the employees at Spencer’s Market. b) The social security number of employees at an accounting firm. c) The age of everyone in my family. d) The zip codes of 200 people surveyed in California. e) Store Albertson’s Ralph’s Vons Spencer’s Cost of apples $1.52 per pound $1.63 per pound $1.84 per pound $1.37 per pound Levels of Measurement: (Determines which data are meaningful.) 1. nominal least meaningful 2. ordinal 3. interval 4. ratio most meaningful Def- Data at the nominal level of measurement are qualitative only. Data at this level are categorical using names, labels, or qualities. No mathematical computation can be made at this level. Def- Data at the ordinal level of measurement are qualitative or quantitative. Data at this level can be arranged in order, but differences between data entries are not mathematically meaningful. Def- Data at the interval level of measurement are quantitative. The data can be ordered and you can calculate meaningful differences between data entries. At the interval level of measure, a zero entry simply represents a position on a scale. Def- Data at the ratio level of measurement are similar to data at the interval level with the added property that a zero entry really means 0. A ratio of two data values can be formed so one data value can be expressed as a multiple of another. (Inherent zero is present. Zero implies “none.”) Hint: To determine the difference between interval and ratio, see if the statement “twice as much” has meaning for the data. If the statement doesn’t make sense, then your data is at the interval level of measurement. If it does make sense, then your data is at the ratio level of measurement. Ex- Identify the level of measurement for each set of data described below. a) The daily high temperature for Atascadero for a week in June was 93, 91, 86, 94, 103, 104, 103. b) The four names of my animals: Bella, Matrix, Tiny Tim, and Stupido. c) The EPA size classes for cars are: subcompact, compact, midsize, and full size. d) The heights in inches of the 2001 – 2002 Chicago Bulls team members: 83, 79, 85, 78, 84, 77, 83, 75, 81, 82, 73, 77, 77, 79, 79, 84, 80, 81, 83. Note: There is a nice chart on p. 12 that gives a nice outline for the levels of measurement. Nominal Ordinal Interval Ratio Inherent Zero: 0 socks = no socks 0 miles = no miles 0 inches = no inches $0 = no money 0 o’clock no time 0 degrees no temperature Year 0000 no year 1.3 Experimental Design Designing a Statistical Study 1. Identify the variables of interest (the focus of the study) and the population of the study. 2. Develop a detailed plan for collecting data. If you use a sample, make sure the sample is representative of the population. 3. Collect the data. 4. Describe the data using descriptive techniques. 5. Interpret the data and make decisions about the population using inferential statistics. 6. Identify any possible errors. Methods of Data Collection 1. Do an observational study- In an observational study, a researcher observes and measures characteristics of interest of part of a population but does not change existing conditions. 2. Perform an experiment- In performing an experiment, a treatment is applied to part of a population and responses are observed. A second part of the population may be used as a control group in which no treatment is applied, but could be given a placebo instead. A placebo is a harmless, unmedicated treatment. 3. Use a simulation- A simulation is the use of a mathematical or physical model to reproduce the conditions of a situation or process. Collecting data usually involves the use of computers. Simulations allow you to collect data that may be dangerous or impractical to create in real life. 4. Use a survey- A survey is an investigation of one or more characteristics of a population. Most often, surveys are carried out on people by asking them questions. Ex- Determine which method of data collection you would use to collect data for each study. Explain you reasoning. a) A study of the effects of introducing the wild boar to Catalina Island. b) A study of the effects of a plant hormone on trees. c) A study to determine the average number of spots on the green spotted frogs at the Santa Barbara Zoo. d) A study to determine the average water consumption of people living in Nipomo. Def- A confounding variable occurs when an experimenter cannot tell the difference between the effects of different factors on a variable. Ex- It was found that people could find the gene for whether or not you could live a long time. One sample consisted of people at least 100 years old and they were tested with one genome method. Another sample consisted of people younger than 100 and they were tested with a different method. This study doesn’t work because the results are confounded by the genome method or by the actual gene isolation. Ex- A shop owner is tired of teenagers loitering outside his shop, so he puts up a sign that says “No loitering.” The shop owner notices a tremendous decrease in loitering teenagers. School just started this week. We cannot determine if the decrease in loitering occurs because of the sign that the shop owner posted or if it is due to students going back to school, so now they are too busy to loiter. This is confounding. Def- The placebo effect occurs when a subject reacts favorably to a placebo when in fact the subject has been given no medical treatment at all. Def- Blinding is a technique where subjects do not know whether they are receiving a treatment or a placebo. Def- In a double-blind experiment, neither the experimenter nor the subjects know if the subjects are receiving a treatment or a placebo. The experimenter is informed after all data have been collected. Ex- Doctors give some patients a new pill and others a placebo. Lab technicians set up the tray of pills for the doctors so that neither the doctors nor the patients know which pills they are receiving. Def- Randomization is a process of randomly assigning subjects to different treatment groups. Def- A completely randomized design is when subjects are assigned to different treatment groups through random selection. Ex- To achieve a completely randomized design, a group of 50 pigs are being used. 25 of the pigs are randomly chosen to be placed in the control group (and will receive not treatment). The remaining 25 pigs are assigned to the treatment group where they are given a special feed designed to build muscle. The groups are observed for three weeks and the results are compared. Def- A block is a group of subjects that share common characteristics of importance. Def- In a randomized block design, the subjects are divided into blocks in which they share common characteristics of importance, then within each block, the subjects are randomly assigned to the treatment groups. Def- In a matched-pairs design, subjects are paired up according to a similarity. One of the subjects in each pair receives the treatment while the other is in the control group. Ex- Subjects could be paired because they have similar heights, incomes, skin tone, etc. Def- Replication is the repetition of an experiment under the same or similar conditions. Def- A census is a count or measure of an entire population. (This is nearly impossible and very time consuming, so sampling techniques are used to gather a smaller portion of data.) Def- A random sample is one in which every member of the population has an equal chance of being selected. Note: There are many types of random samples- simple random, stratified, cluster, systematic, convenience. Def- A simple random sample is one in which every possible sample of the same size has the same chance of being selected. Ex- A random sample is taken by numbering people from 1 to 400 then choosing one number at random. That person wins a prize. Def- In a stratified sample, it is important to have members from each segment of the population. Members are divided into two or more groups called strata. Members of each strata have similar characteristics such as age, gender, ethnicity, etc. A sample is selected from each strata. Def- In cluster sampling the population falls into naturally occurring subgroups, each having similar characteristics. To select a cluster sample, divide the population into subgroups called clusters and select all of the members in one or more (but not all) of the clusters. Every single member of the chosen clusters are included in the sample. (Each cluster has a good representation of the population within each cluster.) Def- A systematic sample is a sample in which each member of the population is assigned a number. The numbers are ordered sequentially, a starting point is randomly selected and then every kth member of the population forms the sample. Ex- Choose every 7th person for the sample. Choose every 102nd person for the sample. Def- A convenience sample consists of selecting only people who are available. This method should be avoided. Ex- You want to determine how many hours students study per day, so you stand in front of the library and ask everyone that leaves the library how many hours they studied. Ex- Name the sampling method used for each and discuss any potential sources of bias. a) A list of patients discharged from all hospitals is obtained. Divide the patients into groups according to the length of their hospital stay: 2 days or less, 3 – 7 days, 8 – 14 days, more than 14 days. Draw a simple random sample from each group. b) At the beginning of the year, instruct each hospital to survey every 500 th patient that is discharged. (The number 500 was randomly selected as a starting point.) c) Instruct each hospital to survey 10 discharged patients this week and send in the results. Ex- Use a random number table (p. A7 Table 1) to choose a simple random sample with the given sample size. a) A sample of 7 people from a total of 20 people. 057979 43984 21575 09908 70221 19791 051578 36432 01494 19888 b) A sample of 7 people from a total of 200 people. 057979 43984 21575 09908 70221 19791 051578 36432 01494 19888