Chapter 3 Statistics Deterministic vs Stochastic

advertisement
Chapter 3 Statistics
Deterministic vs Stochastic
Deterministic systems are ones in which the same input produces the same output each time. For
example, the same inputs for the length and width of a rectangle produces the same area. Investments
in fixed rate accounts produce the same yield.
Stochastic systems are ones in which the same input produces different outputs each time. For
example, medicines have different effects on different people. Disciplinary strategies have different
effects on different children.
Example:
The economic system is very complex. To make a good stock and flow model for the economy requires
knowing a lot of details. One such detail is the effect of changes in the minimum wage.
If the Federal government raises the minimum wage, will employment go down because employers
can’t afford to pay the extra cost, will employment go up because more people will have more money to
spend leading to more jobs, or other.
Circle your choice:
Down
Up
Other
Percent Confidence Your Choice is Correct _____________
How do we find evidence to prove who is correct? Can we trust our intuition? Is there a
formula we can use? Is the answer going to be the same in all circumstances? Is there one perfect
answer?
Evidence-Based decision making: Because of the complexity of stochastic systems, it would be
beneficial if leaders made use of evidence, instead of ideology, when making decisions. Below is a graph
of the changes in employment following increases in minimum wage. What evidence does this graph
provide?
Because of the variation in stochastic systems, it is better to think about the distribution of possible
outcomes when taking an action than to expect one particular outcome every time. Therefore, in this
example, the arguments on both sides of the issue should be that sometimes employment increases and
sometimes it goes down and sometimes it stays approximately the same. That is the distribution of
possible outcomes.
The challenge that is typically faced when trying to understand a stochastic system is
that the system is complex, with many interactions, and that we can almost never get complete
information. Therefore we must function with only partial evidence. The field of mathematics that is
designed to help us use partial information to answer big questions is Statistics.
1. Ask specific questions
2. Design research
3. Randomly select from the population
4. Make graphs and produce summaries of the data
5. Explore the probabilities of different possible outcomes.
Before looking at the details of statistical analyses, we need to understand the concept of probability
because the evidence we do get is subject to chance.
Probability
(Formula 3.1)
What values can it take?
Examples: Coins, dice
Probability Distributions – start with a possibility distribution then determine or estimate the probability
of each possibility.
Example: What is the distribution of outcomes at the end of the quarter for a randomly selected
student?
Possibility Distribution
Probability
A. Pass (2.0+)
53/67 = 0.791
B. [1,2)
6/67 = 0.089
C. Fail (0)
6/67 = 0.089
D. Withdraw
2/67 = 0.030
The probabilities are based on prior classes, since that is the best available evidence.
Make your own probabilities for yourself. You will base it on how well you are currently doing, your
math abilities, effort, attendance, normal grades, and motivation.
Complete Activity 3.1 Possibility Distributions and Probability Distribution on page 50 and 51.
How does thinking about probability distributions affect your thinking?
Key terms:
Population – all times minimum wage is raised
Census – all times minimum wage is raised
Sample – some of the times minimum wage was raised
Example: If we wanted to know the effect of state funding for music programs in k12, what is the
population?
Population: All students who could learn music
Census: Getting data from every person in the population.
Sample: Students from some schools or states with programs.
If we want to know the effect of the Mediterranean diet on weight, what is the population?
Population: All people using the Mediterranean diet to lose weight
Census: Getting data from every person in the population.
Sample: finding the results from some people using the diet
For each group of students, think of one question that relates to your project, then define the
population, census, and sample. Record it in your notes for later use.
Population:
Census:
Sample:
You will now have the opportunity to take a sample of data. This data will be used to practice statistical
techniques that you will learn during the chapter.
Experiential 3a or 3b Page 125/127
In order for statistics to provide useful understanding, it is necessary to ask the right questions.
Asking the right questions
Is state funding of school music programs a good use of money?
This is a general question that is sufficiently vague so that it can be argued both ways. To get the details,
we need more specific questions.
Are students in school music programs more likely to attend college than students not in music
programs?
Are students in school music programs less likely to get into trouble than students not in music
programs?
Do schools with music programs have a higher graduation rate than schools without music
programs?
Is the GPA of music students different than of athletes?
Is the Mediterranean Diet the best Diet?
Do people lose more weight on it in the first month?
Do people keep weight off longer?
Are Cholesterol levels better?
Are blood pressures better?
Is eating that diet sustainable, e.g. it becomes part of a lifelong way to eat?
Different specific questions yield different interpretations of the general questions.
Ask specific questions for your group’s problem.
Data is the evidence.
Answers to specific questions are data. They are the evidence that show the variety of
responses to the questions. There are two types of data that can be collected, categorical and
quantitative.
Are students in school music programs more likely to attend college than students not in music
programs?
The data is: attend college or don’t attend college, this is categorical
Is the GPA of music students different than of athletes?
The data is: GPA This is quantitative
What data would be collected to answer the following question? Is the data categorical or
quantitative?
1.
Should the United States begin a transition from gas powered cars to electric cars?
2. Has the average number of passenger miles using mass transit increased faster than the
population?
What kind of data will you get for your group’s question?
Observational Studies and Experiments
Research is often done because we want to either understand the characteristics of a
population or see the effect of an action on a population. In the former case, we conduct an
observational study while in the latter case we conduct an experiment.
Observational study – researcher collects data
Compare data from schools with music programs and schools without.
Compare weight changes from people who used the Mediterranean diet and those who used other
diets.
Experiment – to test the impact of something, need a comparison and a control group.
Randomly allocate funding to different school then compare the students with music to those without.
Randomly assign people to different diets then compare the effect.
Would an observational study or experiment be appropriate for your question?
We will now begin a multiday analysis of the electricity production system because of its impact on the
climate system.
IV Generation Nuclear - Bill Gates
Electrical Power - Energy Justice Map
Theoretical I, Page 117
Suppose the goal for the United States is to replace all the coal power plants with something that does
not produce carbon emissions.
What do we need to know?
Would we conduct an experiment or study?
Sampling
The results from studies and experiments are worthless if sampling is
inadequate. Random sampling from the population.
Simple random Sample
Row
Number
1
83984
22116
01657
83717
24799
00515
37723
23445
02705
26127
2
78425
65082
07792
43850
22134
76033
87273
13972
58089
12538
3
96268
62423
63347
09111
12079
58082
88984
76565
62765
35923
4
58037
43470
88497
98909
79230
36845
30325
82655
48666
55431
5
52354
04992
47754
31246
36779
27029
88187
19275
89632
21684
6
65936
11549
15979
92704
42288
07121
54938
08990
00190
81402
7
01849
40765
97487
56378
80291
40351
95246
58004
56115
53197
8
94368
20871
13867
61232
87091
67621
27560
81197
63987
01118
9
24504
75557
58840
99065
49850
55957
14117
62890
24961
54550
10
13283
33042
69362
92759
81354
76328
76438
29699
86996
65089
Sampling with replacement
Stratified sampling – e.g For music programs, we could sample from various
socioeconomic areas.
For Mediterranean Diets, we could sample from various backgrounds (past
exercise and eating habits, gender, age).
How would you sample for your group’s question?
Randomly Select States on Page 129 for Theoretical I
Graphs and Statistics Overview
For data to be useful as evidence in decision making, it is necessary to organize and summarize
it. This is accomplished using graphs and statistics. The graphs and statistics used for categorical data
are different than those used for quantitative data. Statistics are numerical summaries of the sample
data. Common statistics include proportion, mean, median. Below is a brief overview of the graphs and
statistics.
Categorical Data
For categorical data, the two most commonly used graphs are bar graphs and pie charts. The
two most commonly used statistics are counts and proportions.
A bar graph is used when there are separate categories and a count or other measurement is
recorded for each category. The bar graph below shows the number of calories burned with one hour of
each of the activities.
A pie chart shows the proportion of each category out of a whole. According to the Pew
Research Center, in 2013, 56% of all American adults have a smartphone.
Statistics for categorical data can be shown as either counts or proportions, although
proportions is more common. For example, if a survey of students in a class showed that 23 out of 35
had a good, home cooked meal last night, this result could be reported as a count – 23 or as a
proportion which is identified by the variable where
.
Make a pie chart based on classroom data
Quantitative Data
When there is only one quantitative variable, the graphs that are normally used are histograms
and box plots. If there are two quantitative variables, scatter plots are use.
A histogram is shown below of the 2013 attendance at randomly selected National Parks, which
includes national monuments and other historical sites. https://irma.nps.gov/Stats/Reports/Park. The x
axis shows the range of values for each bar. The y axis, which is labeled No of obs (number of
observations), shows how many parks had attendance within each range. Thus in this sample, there
were 14 parks which had between 0 and 500,000 visitors. There was one park with between 3 and 3.5
million visitors (Yellowstone).
Make a histogram for the QAW scores
Spring 2016
3.2
2.89
1.37
3.19
1.78
2.9
2.1
3.46
1.64
2.65
3.22
3.21
1.85
3.07
0.78
2.75
3.03
2.73
1.89
2.98
The statistics that are used for quantitative data include the mean, median, and standard
deviation. The mean and median are two ways to represent the center of the data. The mean shows
the balance point of a histogram. It can be influenced by one or several extreme values. The mean of
the sample is found with the formula x 
x
. This formula indicates that all the data values should be
n
added and then the total is divided by the number of data values. The median has an equal number of
values above and below it and is not affected by extreme values. The standard deviation is one way to
express the variation that exists in the data set. It can be interpreted as the approximate average
distance each point is from the mean. The formula for the standard deviation of a sample
is s 
 x  x 2
. Larger standard deviation values indicate more variation.
n 1
Inference – parameters and statistics
Statistic (from a sample)
Proportion
Parameter (from a population)
p
Mean
x
µ (mu)
Standard Deviation
s
σ (lower case sigma)
Explain the reason for inference – want to know the parameter to make the best decision but can only
know the statistic. Statistics vary. Consider doing the marble sampling from stats.
There are two types of inference, hypothesis tests and confidence intervals. Hypothesis tests are done
when a researcher has a theory about the parameter and wants to test if the theory is reasonable.
Confidence intervals are done when there is no theory, but only an estimate is desired.
Probability
Because random sampling results produces only one of the many statistics that make up a sampling
distribution, then it is only by chance that that particular sample was selected.
Probability is the proportion of times an outcome will occur over the long run. The emphasis on
long term is very important.
We view probability as a fraction: P( x) 
number of favorable outcomes
. Assume all
number of possible outcomes
possible ways are equally likely as is the case when random sampling is done with replacement.
Probability is always a numerical value between 0 and 1. This can be shown as 0 ≤ P(x) ≤ 1. The
probability is 0 if the event cannot occur. The probability is 1 if the event is a sure thing – it occurs every
time.
Random processes – coin flips, marbles, random selection
Sample Space
Complements
Normal Curve
Fits neatly over binomial distribution and central limit theorem distribution.
Area corresponds to probability
68-95-99.7 rule
p̂
x
p̂ p̂ p̂
x x x
p̂ p̂ p̂ p̂ p̂
x x x x x
Mean
Standard Deviation
(standard error)
 pˆ  p
 pˆ 
p1  p 
n
x  
x 

n
From the empirical rule, it is known that 95% of all statistics are within 2 standard deviations
(standard errors) of the mean. This means that 95% of all sample proportions are within 2 standard
deviations of the proportion of the population. Likewise, 95% of all sample means are within 2 standard
deviations of the mean of the population.
If we define an event as any outcome within 2 standard errors of the mean, then the probability
of that event is 0.95. The probability that an outcome is not within 2 standard errors is the complement.
This is found by subtracting 0.95 from 1. Thus the probability of the complement is 0.05.
Confidence Intervals
Reason through the process of creating confidence intervals.
The estimated standard error for proportions is s pˆ 
is s x 
s
n
pˆ 1  pˆ 
. The estimated standard error for means
n
. Approximately 2 of these standard errors are added and subtracted from the point estimate
to produce the confidence intervals. A simplified version of the two confidence interval formulas are
pˆ  2
pˆ 1  pˆ 
s
and x  2
. The terms after the plus or minus sign is the margin of error.
n
n
Limits of Statistics
The branch of mathematics called statistics is a collection of wonderful mathematical tools that
help us understand our world. The more rigorous the application of statistics, the better results we can
obtain. But there are limits to statistics as well and these limits should be considered as you evaluate
the results of someone’s research. Following are some of the questions you should ask about any
research.
1.
2.
3.
4.
5.
Was the sample size sufficiently large? There is considerable variation among small sample
sizes, thus results are more easily contradicted with further research.
Has the experiment or study been replicated and produced similar results?
What sorts of bias could have affected the results? A few of the many possible sources include:
a. Poor survey questions, or questions in which none of the responses seem to be
appropriate options.
b. Participant awareness of being part of an experiment that influences outcomes.
c. External validity – do the results of an experiment have legitimate broader implications?
d. Were participants randomly selected? Are all the subjects college undergraduates?
Does the researcher, or reporter of the research, provide enough information to evaluate the
conclusions. For instance, it can be frustrating to be told a result, but not given any information
on standard deviations, sample size, or p-value.
Keep in mind that we live in a complex world full of interactive systems. Statistics help us
understand a reductionist view of the world. As such, the insights we gain can then be
incorporated into a systems view, enhancing our ability to model the world.
Download