Basic Ideas of Statistics

advertisement
Basic Ideas of Statistics
Unit 1.1
Basic Ideas of Statistics
Corresponds to Chapter 1 in Triola
Overview of Section 1.1
•
•
•
•
Measurement
Robustness
Populations and Samples
Quantitative vs. Categorical Variables
Measurement
• Measurement is the fundamental activity that we do in
statistics. A statistic is a measurement of a particular
attribute that we want to know something about. Any time
that we want to collect information about something and
use statistics to draw conclusions, we MUST have some way
to measure it. Statistics is a way of quantifying and drawing
conclusions about the world that we live in. We transform
the world into measurable quantities and then we use the
practice of statistics to draw conclusions. In this section,
we will examine the best ways to conduct measurement
and how we can make our measurements as good as
possible to make our conclusions as good as possible.
Robustness
•
•
•
•
•
Fundamentally, there is no such thing as a fact in science. There are only
conclusions and hypotheses that have yet to be disproven. What we are always
attempting to do in science is to disprove what we think in order to give us new
insights.
However, we can’t operate from the position that we know nothing, so we must
have some conclusions that are thought to be very well tested and that we treat
as-if they were facts.
We say that a conclusion that has been well tested and has withstood repeated
attempts at disproof is robust.
We also say that a finding is robust if it can be applied to a lot of different areas
and under a lot of different circumstances. For example, the most robust finding in
educational statistics is that socioeconomic status (SES), or the amount of income
of a family or individual, factors significantly into educational achievement. This
holds true regardless of whether you are looking at Math, English, Science or
whatever.
Ideally, what we want to create from the practice of science are robust
conclusions. Conclusions that can be treated as facts.
An example of how robust findings
work.
• For over 100 years, the conclusion that nothing can travel faster than light
has been thought to be extremely robust. So robust that a giant portion of
quantum mechanics and all of relativity has been founded on this single
principle. However, a group of scientists at CERN, the giant particle
accelerator in Switzerland, have found a group of neutrinos that have
been measured to travel faster than the speed of light. If this is true, and
a lot more tests will have to be conducted to prove the robustness of this
claim, then much of physics might have to be seriously reconsidered. But
the scientists at CERN were skeptical of their findings. Einstein’s findings
of the absolute limit of light speed were so robust that any claim
disproving it would be met with skepticism. So, when they presented their
findings, they welcomed the opportunity for scrutiny. It turns out that
experimental error, even with very carefully conducted checks, was the
cause of the unusual findings and physics continues to believe that
nothing moves faster than light.
Samples and Populations
• Population- The group of people or objects about
which we want to know something
– A parameter- a numerical value that describes a
characteristic of the population
• Sample- A subset of the population, generally of a
given size, that we gather information about in
order to draw conclusions about the population
– Statistic- a numerical value that describes a
characteristic of a sample
Samples and Populations
• A statistic are a description of a sample
• A parameter is a description of a population
• We rarely will have the parameter for a
population. We always can gather statistics
from a sample.
Samples vs. Populations
• In statistical reasoning, we are primarily
drawing conclusions using statistics
• We want these conclusions to tell us
something about the population
• We use statistics in order to help
us draw conclusions about
parameters and the
characteristics of populations
Categorical data
• Categorical data consists of names or labels that
are not numbers representing counts or
measurements.
– Eg. Colors, Gender, Race, Level of Parental Education
• Discrete data- When the number of possible
values in the sample space can be counted.
– Eg. 14 women in the class, Number of children to a
particular couple, number of students who passed a
course.
• Nominal level of measurement- The data cannot
be ordered from highest to lowest.
Quantitative Data
• Quantitative data consists of numbers
representing counts or measurements
– Weight at the beginning of September, Weight at the
end of August, Body Mass Index, velocity, dollars
• Discrete Data- There can be an infinite number of
data points, but each value is separate and
countable
• Continuous Data- There are infinite numbers of
values that the data can take
– Eg. 1, ½, 3.05, 3,005,141, 6.744
Quantitative Data
• Ordinal level of measurement- We can arrange
the data in some order, but differences between
individual values are meaningless
– Rankings of colleges, grades
• Interval level of measurement- We can arrange
the data in some order, and differences between
values have meaning. However, there is no
natural zero.
– There is no real idea of nothing being present.
– Eg. Years in which something occurred or body
temperature, BMI, Weights in September
Quantitative Data
• The ratio level of measurement- The interval
level except there is such a thing as having
nothing present. There is a natural zero
starting point.
– Eg. Distances, prices
Quantitative vs. Categorical Data
• Categorical Data can only be broken up using the
nominal level of measurement because we are often
not able to order it meaningful ways. We do order
categorical data, when the names for the data are
orders. Like first place, second place, third place, etc.
• Quantitative Data is generally not categorized using the
nominal level of measurement, but can be ordered and
so can use either the ordinal, the interval, or ratio level
of measurement.
– It is your understanding of what makes something ordinal
or interval that determines if this set of data actually uses
the kind of measurement.
This is important!
• We are going to separate everything in this
class into these two categories: categorical
and quantitative variables.
• Proportions are the statistics that we derive
from categorical variables.
• Means, standard deviations, variances, and
medians are the kinds of statistics that we
derive from quantitative variables
Review of Section 1.1
•
•
•
•
Measurement
Robustness
Populations and Samples
Quantitative vs. Categorical Variables
Download