Stat Applications and Types of Data Notes

advertisement
Stat 225 Lecture Notes Day 1
Day 1 (Chapter 1, 7.8) Stat Applications and Types of Data
Data: facts and figures collected, analyzed and summarized for presentation
and interpretation.
Data Set: all data collected in a particular study
Elements: individual entities of a data set
Variables: a characteristic of interest for the elements
Observations: The set of measurements collected for a particular element
Student 1
Student 2
Student 3






Gender
F
M
M
Class
Fresh
Junior
Senior
Miles from Home
65
1,005
32
Credits
18
12
16
Ice Cream
Chocolate
Vanilla
Rocky Road
The data set is the entire spreadsheet full of data.
Each student is an element.
For a class with 40 people, there will be 40 elements in the data set.
The variables are gender, class, miles from home.
The observations are (F, Fresh, 65, 18, Chocolate) for the first element.
For a class of 40 students, there will also be 40 sets of observations.
Types of Data
Qualitative: labels or names used to identify an attribute of each element
Nominal: order does NOT matter (gender, race, marital status)
Ordinal: order DOES matter (class [fresh, soph, jr, sr])
Quantitative: require numeric values that indicate how much or how many
Interval: ratios of quantities cannot be compared
(temp (C/F), IQ score)
for example, if it is 4 degrees out today and it was 2 degrees out yesterday, it
does NOT mean it is twice as warm today as it was yesterday. If that were the
case, if it was 0 today and its twice as yesterday, how cold was it yesterday?
Ratio: ratios of quantities have meaning (height, weight, age)
For example, a person who weighs 200 pounds weighs twice as much as someone
who weighs 100 pounds.
Qualitative
Quantitative
Nominal
Ordinal
Interval
Ratio
Gender
Letter Grade
Level of Tide
Height
Marital Status
Class
Longitude
Weight
Race
Class Rank
Year-Dates
Temp Kelvin
Ethnicity
Lickhert Scale
Temperature (C/F) Time to run Mile
Hometown
Degree of Illness
IQ Score
Salary
Religious Affiliation Level of Satisfaction
SAT Score
Age
Smoking Status
Birthday Month
Length
Favorite Sport
Clothing Size
GPA
Instrument Play
Prize Medal (G, S, B)
Ounces in Glass
Compare Qualitative and Quantitative:







What
What
What
What
What
What
What
type
type
type
type
type
type
type
of
of
of
of
of
of
of
variable
variable
variable
variable
variable
variable
variable
is
is
is
is
is
is
is
smoking status? qual
SAT score? quant
income? quant
level of satisfaction? qual
GPA? quant
clothing size (s, m, l, xl)? qual
time it takes to run a mile? quant
Cross-Sectional: data collected at the same or approximately the same point in
time
Time Series Data: data collected over several time periods.
Cross-Sectional
Temperature of Jan 1st in each state
Profit of each store in Tippecanoe Mall
earned on Black Friday
Resting Heart rate of each person in
classroom
Number of car accidents on Friday
afternoon in each county of IN
Time Series
Temperature on the first of every month
Profit of a store for each day from
Thanksgiving to Christmas
Heart rate before, during, and
immediately after vigorous exercise
Number of car accidents each weekend of
Football season in West Lafayette
Sources of Data
Existing Sources:
Student Records including: student ID, GPA, number of credits taking, courses
Surveys:
Teacher Evaluations: how well class time was used, best aspects of course,
how helpful was homework etc.
Experiments:
When testing a new drug to lower blood pressure, have three groups, one
taking a placebo, one taking generic brand, one taking name brand. The data
collected will represent the difference in blood pressure from start.
*key thing in an experiment rather than observational study is that you
manipulate and control the groups, such as assigning different treatments
(drugs) to each one*
Observational Studies:
Comparing the sales of different grocery stores in West Lafayette (simply
observing their sales records and are not applying a treatment to any group)
*key thing in an observational study rather than an experiment is that you
are simply observing what happens and are not giving a specific treatment
to anything*
Types of Sampling:
Simple Random Sampling: A sample is selected from a population in such a way that
each element has the same probability of being selected.
Sampling With Replacement: Elements are put back in the population after being
selected for the sample allowing for a chance of being selected more than once
for a single sample
Sampling Without Replacement: Elements are not replaced after being selected and
are therefore only chosen once to be in a sample.
**what is more common, with replacement or without replacement?**
Stratified Random Sampling: the population is first divided into strata (groups) and a
simple random sample is then taken from each strata.
Cluster Sampling: the population is first divided into clusters and then a simple
random sample of the clusters is taken (where each element in the selected
cluster is surveyed).
Systematic Sampling: first, one of the first k elements is selected and then every kth
element thereafter is selected.
Convenience sampling: elements are selected for the sample of a basis of
convenience.
Judgment sampling: elements are selected for the sample based on the judgment of
the person doing the study.
What are examples of each type of sample and when might they be used?
Population: the set of all elements of interest in a particular study
Sample: a subset of the population
**Who you are sampling**
Census: the process of conducting a survey to collect data for the entire
population
Sample Survey: the process of conducting a survey to collect data for a sample
**How you would sample**
Statistical Inference: using data from a sample to estimate the characteristic
of a population.
***Activity from Ellen – counting e’s***
1. You will have 2 minutes to do your inspection.
2. You may not alter the product you are inspecting. This means no writing on or folding the
paper.
Now, I need to make sure the instructions are clear. Does everyone understand what a letter
“e” looks like? We are counting both capital “E” and lowercase “e” as defects.
Elegant, extravagant elephants entertain every evening at seven. They serve escargot and eggs
benedict and endive. Eight elderly elegant elephants elevate themselves to the expensive
entrance with elevators exceeding expectations. Eating everything edible, elephants expand
exponentially. “Excellent!” the entertained elephants express after the entertaining entrees were
served. Everything was expedited by the energetic efforts of the executive elephant empress.
Everyone was entertained to excess and enjoyed the edible endeavors immensely. The evening
ended enchantedly with Echinacea herbal tea.
Why are you all getting different numbers? You said you understood the directions, and you
didn’t have any other questions for me. You had plenty of time. What happened?
The right answer is 126.
You just did what is called a “census.” You inspected every single letter of the entire paragraph
to figure out how many “e”s you had.
Now let’s try this with a sample. There are 7 lines in this paragraph. Each of you should pick
one line and count the number of “e”s in just your one line. (Give them about 30 seconds for
this.)
Now multiply your number by 7. What do you get this way? (Ask around. You should get
between 119 and 133 “e”s with the sampling method. Line 1 has 17, line 2 has 19, line 3 has 17,
line 4 has 18, line 5 has 19, line 6 has 19, and line 7 has 17.)
Why sample rather than population?
 Hard to sample EVERYONE
 Too expensive to sample everyone
 Too much time/effort to sample everyone
Download