Notes Ch 1 - wsutter.net

advertisement
Math 123- Statistics
Chapter 1 Notes
Name_______________________________
1.1 An Overview of Statistics
Def- Data is information from observations, measurements, or responses.
Ex- The collection of data resulted in the discovery that 15% of the people in Ms. King’s classroom
have black hair.
Def- Statistics is the science of collecting, organizing, and interpreting data in order to make
decisions.
Types of Data Sets:
1. Populations
2. Samples
Def- A population is the collection of all outcomes, responses, measurements, or counts that are of
interest.
Def- A sample is a subset, or part, of a population.
Ex- Out of 30 students surveyed in elementary school, 25 said that they liked peanut butter and jelly
sandwiches. Identify the population and sample in the scenario.
Venn Diagram
Ex- Identify the population and the sample.
a)
b)
c) A survey of 898 U.S. adult VCR owners found that 16% had VCR clocks that were blinking
“12:00.”
d) A study of 860 U.S. ATM’s found that the average surcharge for withdrawals from a competing
bank was $1.15.
Def- A statistic is a numerical description of a sample characteristic.
Def- A parameter is a numerical description of a population characteristic.
Ex- Determine whether the numerical value describes a parameter or a statistic.
a) The 2001 team payroll of the Baltimore Orioles was $74,279,540.
b) In a survey of 300 U.S. adults, 22% own a cell phone.
c) 19% of a sample of Indiana 9th graders surveyed smoke daily.
d) In a survey of all freshmen at the University of Arizona, 89 students were majoring in astronomy.
Branches of Statistics:
1. Descriptive Statistics
2. Inferential Statistics
Def- Descriptive statistics is the branch of statistics that involves the organization, summarization,
and display of data.
Def- Inferential statistics is the branch of statistics that involves using a sample to draw conclusions
about a population.
Ex- Which part of the survey represents the descriptive branch of statistics? Make an inference
based on the results of the survey.
a) A survey conducted among 1000 men and women found that 25% of men and 32% of women
exercised at least four days per week.
b) A survey conducted among the residents of Arroyo Grande found that people owned an average
of two dogs.
1.2 Data Classification
Types of Data:
1. Qualitative Data
2. Quantitative Data
Def- Qualitative data consists of attributes, labels, or non-numerical entries.
Def- Quantitative data consists of numerical measurements or counts (in which you do mathematical
calculations).
Ex- Determine which data are qualitative or quantitative.
a) The monthly salaries of the employees at Spencer’s Market.
b) The social security number of employees at an accounting firm.
c) The age of everyone in my family.
d) The zip codes of 200 people surveyed in California.
e)
Store
Albertson’s
Ralph’s
Vons
Spencer’s
Cost of apples
$1.52 per pound
$1.63 per pound
$1.84 per pound
$1.37 per pound
Levels of Measurement: (Determines which data are meaningful.)
1. nominal least meaningful
2. ordinal
3. interval
4. ratio
most meaningful
Def- Data at the nominal level of measurement are qualitative only. Data at this level are categorical
using names, labels, or qualities. No mathematical computation can be made at this level.
Def- Data at the ordinal level of measurement are qualitative or quantitative. Data at this level can
be arranged in order, but differences between data entries are not mathematically meaningful.
Def- Data at the interval level of measurement are quantitative. The data can be ordered and you
can calculate meaningful differences between data entries. At the interval level of measure, a zero
entry simply represents a position on a scale.
Def- Data at the ratio level of measurement are similar to data at the interval level with the added
property that a zero entry really means 0. A ratio of two data values can be formed so one data
value can be expressed as a multiple of another. (Inherent zero is present. Zero implies “none.”)
Hint: To determine the difference between interval and ratio, see if the statement “twice as much”
has meaning for the data. If the statement doesn’t make sense, then your data is at the interval
level of measurement. If it does make sense, then your data is at the ratio level of measurement.
Ex- Identify the level of measurement for each set of data described below.
a) The daily high temperature for Atascadero for a week in June was 93, 91, 86, 94, 103, 104, 103.
b) The four names of my animals: Bella, Matrix, Tiny Tim, and Stupido.
c) The EPA size classes for cars are: subcompact, compact, midsize, and full size.
d) The heights in inches of the 2001 – 2002 Chicago Bulls team members: 83, 79, 85, 78, 84, 77,
83, 75, 81, 82, 73, 77, 77, 79, 79, 84, 80, 81, 83.
Note: There is a nice chart on p. 12 that gives a nice outline for the levels of measurement.
Nominal
Ordinal
Interval
Ratio
Inherent Zero:
0 socks = no socks
0 miles = no miles
0 inches = no inches
$0 = no money
0 o’clock  no time
0 degrees  no temperature
Year 0000  no year
1.3 Experimental Design
Designing a Statistical Study
1. Identify the variables of interest (the focus of the study) and the population of the study.
2. Develop a detailed plan for collecting data. If you use a sample, make sure the sample is
representative of the population.
3. Collect the data.
4. Describe the data using descriptive techniques.
5. Interpret the data and make decisions about the population using inferential statistics.
6. Identify any possible errors.
Methods of Data Collection
1. Do an observational study- In an observational study, a researcher observes and measures
characteristics of interest of part of a population but does not change existing conditions.
2. Perform an experiment- In performing an experiment, a treatment is applied to part of a
population and responses are observed. A second part of the population may be used as a control
group in which no treatment is applied, but could be given a placebo instead. A placebo is a
harmless, unmedicated treatment.
3. Use a simulation- A simulation is the use of a mathematical or physical model to reproduce the
conditions of a situation or process. Collecting data usually involves the use of computers.
Simulations allow you to collect data that may be dangerous or impractical to create in real life.
4. Use a survey- A survey is an investigation of one or more characteristics of a population. Most
often, surveys are carried out on people by asking them questions.
Ex- Determine which method of data collection you would use to collect data for each study.
Explain you reasoning.
a) A study of the effects of introducing the wild boar to Catalina Island.
b) A study of the effects of a plant hormone on trees.
c) A study to determine the average number of spots on the green spotted frogs at the Santa
Barbara Zoo.
d) A study to determine the average water consumption of people living in Nipomo.
Def- A confounding variable occurs when an experimenter cannot tell the difference between the
effects of different factors on a variable.
Ex- It was found that people could find the gene for whether or not you could live a long time. One
sample consisted of people at least 100 years old and they were tested with one genome method.
Another sample consisted of people younger than 100 and they were tested with a different method.
This study doesn’t work because the results are confounded by the genome method or by the actual
gene isolation.
Ex- A shop owner is tired of teenagers loitering outside his shop, so he puts up a sign that says “No
loitering.” The shop owner notices a tremendous decrease in loitering teenagers. School just
started this week.
We cannot determine if the decrease in loitering occurs because of the sign that the shop owner
posted or if it is due to students going back to school, so now they are too busy to loiter. This is
confounding.
Def- The placebo effect occurs when a subject reacts favorably to a placebo when in fact the subject
has been given no medical treatment at all.
Def- Blinding is a technique where subjects do not know whether they are receiving a treatment or a
placebo.
Def- In a double-blind experiment, neither the experimenter nor the subjects know if the subjects are
receiving a treatment or a placebo. The experimenter is informed after all data have been collected.
Ex- Doctors give some patients a new pill and others a placebo. Lab technicians set up the tray of
pills for the doctors so that neither the doctors nor the patients know which pills they are receiving.
Def- Randomization is a process of randomly assigning subjects to different treatment groups.
Def- A completely randomized design is when subjects are assigned to different treatment groups
through random selection.
Ex- To achieve a completely randomized design, a group of 50 pigs are being used. 25 of the pigs
are randomly chosen to be placed in the control group (and will receive not treatment). The
remaining 25 pigs are assigned to the treatment group where they are given a special feed designed
to build muscle. The groups are observed for three weeks and the results are compared.
Def- A block is a group of subjects that share common characteristics of importance.
Def- In a randomized block design, the subjects are divided into blocks in which they share common
characteristics of importance, then within each block, the subjects are randomly assigned to the
treatment groups.
Def- In a matched-pairs design, subjects are paired up according to a similarity. One of the subjects
in each pair receives the treatment while the other is in the control group.
Ex- Subjects could be paired because they have similar heights, incomes, skin tone, etc.
Def- Replication is the repetition of an experiment under the same or similar conditions.
Def- A census is a count or measure of an entire population. (This is nearly impossible and very
time consuming, so sampling techniques are used to gather a smaller portion of data.)
Def- A random sample is one in which every member of the population has an equal chance of
being selected. Note: There are many types of random samples- simple random, stratified, cluster,
systematic, convenience.
Def- A simple random sample is one in which every possible sample of the same size has the same
chance of being selected.
Ex- A random sample is taken by numbering people from 1 to 400 then choosing one number at
random. That person wins a prize.
Def- In a stratified sample, it is important to have members from each segment of the population.
Members are divided into two or more groups called strata. Members of each strata have similar
characteristics such as age, gender, ethnicity, etc. A sample is selected from each strata.
Def- In cluster sampling the population falls into naturally occurring subgroups, each having similar
characteristics. To select a cluster sample, divide the population into subgroups called clusters and
select all of the members in one or more (but not all) of the clusters. Every single member of the
chosen clusters are included in the sample. (Each cluster has a good representation of the
population within each cluster.)
Def- A systematic sample is a sample in which each member of the population is assigned a
number. The numbers are ordered sequentially, a starting point is randomly selected and then
every kth member of the population forms the sample.
Ex- Choose every 7th person for the sample. Choose every 102nd person for the sample.
Def- A convenience sample consists of selecting only people who are available. This method
should be avoided.
Ex- You want to determine how many hours students study per day, so you stand in front of the
library and ask everyone that leaves the library how many hours they studied.
Ex- Name the sampling method used for each and discuss any potential sources of bias.
a) A list of patients discharged from all hospitals is obtained. Divide the patients into groups
according to the length of their hospital stay: 2 days or less, 3 – 7 days, 8 – 14 days, more than 14
days. Draw a simple random sample from each group.
b) At the beginning of the year, instruct each hospital to survey every 500 th patient that is
discharged. (The number 500 was randomly selected as a starting point.)
c) Instruct each hospital to survey 10 discharged patients this week and send in the results.
Ex- Use a random number table (p. A7 Table 1) to choose a simple random sample with the given
sample size.
a) A sample of 7 people from a total of 20 people.
057979 43984 21575 09908 70221 19791 051578 36432 01494 19888
b) A sample of 7 people from a total of 200 people.
057979 43984 21575 09908 70221 19791
051578
36432
01494
19888
Download