Uploaded by afa can

week 1

advertisement
STAT2054
Statistics for Engineers
Week 1: Introduction to Staticstics
2
Objectives of Week 1
▫
▫
Learn the basic vocabulary of statistics
Distinguish between sample and population,
parameters and statistics
▫
▫
▫
▫
▫
Categorize types of data
Classify data as discrete or continuous
Classify variables as qualitative or quantitative
Identify cases and variables in a research study
Identify explanatory and response variables in a
research study
▫
Distunguish between observational and
experimantal studies
▫
▫
Identify some sampling methods
Understand vocabulary terms associated with
statistical studies
▫
Identify experimental research design
3
Intro
Statistics is the science of collecting, describing, and analyzing data.
4
Identify Cases and Variables
▫
The subjects/objects that we obtain information about are called the cases or units in a
dataset. (Each row of the dataset corresponds to a different case.)
▫
A variable is any characteristic that is recorded for each case. Each column of our dataset
corresponds to a different variable.
▫
The information gathered about a specific variable is collectively called data (the singular
form of data is datum).
5
Data Types
Qualitative (Categorical)
Quantitavie (Numerical)
Divides the cases into groups, placing each
Measures or records a numerical quantity for
case into exactly one of two or more categories
each case
▫
Nominal: consisting of labels or names
▫
▫
Ordinal: that can be arranged in a
meaningful order, but where calculations
don’t make sense.
Continuous Data: that can take on any
value in a given interval and are usually
measurements.
▫
Discrete Data: that can take on only
particular values and are usually counts.
6
Data Types
Qualitative (Categorical)
Quantitavie (Numerical)
▫
Nominal:
Gender – Smoke – Award
▫
Continuous Data:
Exercise – TV – GPA
▫
Ordinal:
Birth
▫
Discrete Data:
Pulse
7
Statistical Inference
Statistical inference:
The process of using data from a sample to gain information about the population.
Example: A machine that makes steel rods for use in optical storage
devices. The specification for the diameter of the rods is 0.45 ± 0.02
cm. During the last hour, the machine has made 1000 rods. The
quality engineer wants to know approximately how many of these
rods meet the specification.
He does not have time to measure all 1000 rods. So he draws a
random sample of 50 rods, measures them, and finds that 46 of them
(92%) meet the diameter specification.
8
Samples & Populations
▫
Population: includes all individuals or
▫
objects of interest
▫
Parameter: measure concerning a
population (e.g., population mean)
Sample: representative subset of the
population
▫
Staticstic: measure concerning a sample (e.g.,
sample mean)
9
Samples & Populations
10
Samples & Populations
Example: In a survey, 257 residential college students at Bellevue University were asked if they
had eaten lunch in the student center. 72% of the students surveyed said yes. After analyzing the
results, university determines that approximately 70% of residential students have eaten lunch in
the student center.
▫
▫
▫
▫
Population:
residential college students at Bellevue University
Sample:
257 residential college students at Bellevue University
Parameter:
%70
Statistic:
%72
11
Statistical Study
Steps of conducting a statistical study;
1.
2.
3.
4.
Design the study
-
State the question
Determine the population
Determine the variables
Determine the type of study: observational or experimantal
Collect the data
Organize the data
Analyze the data to answer the question
12
State the question
▫
▫
describe and analyze a single variable
describe and analyze relationships between two or more
▫
▫
variables (relationships might be between two categorical
variables, two quantitative variables, or a quantitative and
a categorical variable)
What percentage of students smoke?
What is the average number of hours a week
spent exercising?
▫
▫
Do males or females watch more television?
Do students who exercise more tend to have
lower pulse rates?
13
Determine the population & variables
A teacher wants to know if students who spend more time reading at home get higher homework
and exam grades.
▫
▫
Population:
students
Variables:
amount of time spent reading at home, homework grades, and exam grades
A researcher wants to know if dogs who are fed only canned food have different body mass
indexes (BMI) than dogs who are fed only hard food. They collect BMI data from 50 dogs who eat
only canned food and 50 dogs who eat only hard food.
▫
▫
Population:
dogs
Variables:
type of food and BMI
14
Type of Study
Observational Study
Experimental Study
A study in which the researcher collects data
A study in which the researcher manipulates the
without performing any manipulations;
treatments (i.e., level of the explanatory variable)
received by subjects and collects data
Example: A team of researchers want to know if Advil or Tylenol is more effective.
▫
Researchers survey a sample of adults and ask if they use Advil or Tylenol. They ask them to rate the
effectiveness of the one they use.
▫
Observational Study
Researchers obtain a random sample of adults. They randomly assign half of the participants to take Advil
and the other half to take Tylenol. They ask each participant to rate the effectiveness of the one that they
were assigned to take.
Experimental Study
15
Sampling
A sample should be selected from a population randomly (random sample), otherwise it may be
prone to bias. Our goal is to obtain a sample that is representative of the population.
Sampling bias occurs when the method of selecting a sample causes the sample to differ
from the population in some relevant way. If sampling bias exists, then we cannot trust
generalizations from the sample to the population.
Convenience sample: Individuals who are easily accessible are more likely to be included in
the sample.
Example: A construction engineer has just received a shipment of 1000 concrete blocks, each weighing
approximately 50 pounds. The blocks have been delivered in a large pile. The engineer wishes to
investigate the crushing strength of the blocks by measuring the strengths in a sample of 10 blocks. To
draw a simple random sample would require removing blocks from the center and bottom of the pile,
which might be quite difficult. For this reason, the engineer might construct a sample simply by taking 10
blocks off the top of the pile. A sample like this is called a sample of convenience.
16
Sampling
Do we need to eat an entire large pot of soup to know what the soup tastes like?
▫
When you taste a spoonful of soup and decide the spoonful you tasted isn't salty enough,
that's exploratory analysis.
▫
▫
If you generalize and conclude that your entire soup needs salt, that's an inference.
For your inference to be valid, the spoonful you tasted (the sample) needs to be
representative of the entire pot (the population).
-
If your spoonful comes only from the surface and the salt is collected at the bottom of
-
If you first stir the soup thoroughly before you taste, your spoonful will more likely be
the pot, what you tasted is probably not representative of the whole pot.
representative of the whole pot.
17
Sampling Methods
There are several techniques we can use to collect sample data.
▫
▫
Simple Random Sampling (SRS)
Stratified Sampling
▫
▫
Cluster Sampling
Multi-stage Sampling
18
Sampling Methods
▫
Simple Random Sample (SRS)
Each case in the population has an equal chance of being included and there is no implied
connection between the cases in the sample.
Example: Randomly select one of the
department in engineering faculty.
19
Sampling Methods
▫
Stratified Sampling
A divide-and-conquer sampling strategy. The population is divided into groups called strata. The
strata are chosen so that similar cases are grouped together, then a second sampling method,
usually simple random sampling, is employed within each stratum.
Example: Randomly select student from
engineering faculty.
CSE
ME
MSE
IE
ENVE
EE
KMM
BIOE
20
Sampling Methods
▫
Cluster Sampling
the population is divided into many groups, called clusters. Then fixed number of clusters and
include all observations from each of those clusters are included in the sample.
21
Sampling Methods
▫
Multi-stage Sampling
It is like a cluster sampling, but rather than keeping all observations in each cluster, a random
sample within each selected cluster is collected
CLUSTER 1
CLUSTER 2
CLUSTER 3
CLUSTER 4
CLUSTER 5
CLUSTER 6
CLUSTER 7
CLUSTER 8
22
Experimental Study
▫
Experiments are the only way to show a cause-and-effect relationship
Explanatory Variable
Response Variable
Also known as the independent or predictor
Also known as the dependent or outcome
variable, it explains variations in the response
variable, its value is predicted or its variation is
variable
explained by the explanatory variable
one variable is used to predict
or explain differences in
another variable
23
Explanatory & Response Variables
Example: Researcher believes that the origin of the beans used to make a cup of coffee affects
hyperactivity. He wants to compare coffee from three different regions: Africa, South America,
and Mexico.
▫
▫
explanatory variable : origin of coffee bean
response variable
: hyperactivity level
Example: A group of middle school students wants to know if they can use height to predict age.
They take a random sample of 50 people at their school, both students and teachers, and record
each individual's height and age.
▫
▫
explanatory variable :
height
response variable
age
:
24
Experimental Study
▫
Association: Two variables are associated if values of one variable tend to be related to
the values of the other variable.
▫
Example:
▫
▫
▫
▫
▫
Causation: Two variables are causally associated if changing the value of one variable
influences the value of the other variable.
Studies show that taking a practice exam increases your score on an exam.
causation
Families with many cars tend to also own many television sets.
association
Sales are the same even with different levels of spending on advertising.
no association
Goldfish who live in large ponds are usually larger than goldfish who live in small ponds
association
Putting a goldfish into a larger pond will cause it to grow larger.
causation
25
Experimental Study
▫
Confounding Variable (Lurking Variable or Confounding Factor): is a third
variable that is associated with both the explanatory variable and the response variable. A
confounding variable can offer a plausible explanation for an association between two
variables of interest.
Example: the number of vehicles (in
millions) registered in the US and
the average life expectancy (in
years) of babies born in the US
every four years from 1970 to 2014.
confounding variable: year
26
Experimental Study
▫
▫
▫
Treatment group: a group of subjects to which researchers apply a treatment.
Control group: a group of subjects to which no treatment or a placebo is applied.
Placebo effect: is something that appears to the participants to be an active treatment, but
does not actually contain the active treatment.
▫
Single-blind experiment: subjects do not know if they are in the control group or the
treatment group, but the people interacting with the subjects in the experiment know in
which group each subject has been placed.
▫
Double-blind experiment: neither the subjects nor the people interacting with the subjects
know to which group each subject belongs.
27
Design of Experiment
▫
▫
Randomization: Researchers randomize patients into treatment groups and control groups
Controlling: Researchers assign treatments to cases, and they do their best to control any
other differences in the groups (Control for outside effects)
▫
Replication: The more cases researchers observe, the more accurately they can estimate the
effect of the explanatory variable on the response. (to see meaningful patterns)
▫
Blocking: Researchers sometimes know or suspect that variables, other than the treatment,
influance the response. Under these circumstances, they may first group individuals based
on this variable into blocks and then randomize cases within each block to the treatment
groups
28
Design of Experiment
Was the sample
randomly selected?
Possible to
generalize from the
sample to the
population
Cannot generalize
from the sample to
the population
Was the
explanatory
variable randomly
assigned?
Possible to make
conclusions about
causality
Cannot make
conclusions about
causality
29
Exercise
A recent article claims that “Green Spaces Make Kids Smarter.” The study described in the article involved 2623
schoolchildren in Barcelona. The researchers measured the amount of greenery around the children’s schools,
and then measured the children’s working memories and attention spans. The children who had more vegetation
around their schools did better on the memory and attention tests.
▫
▫
▫
▫
▫
▫
▫
What are the population and sample in this study?
Kids and the 2,623 schoolchildren in Barcelona
What is the explanatory variable?
Amount of greenery
What is the response variable?
the children’s test scores
Does the headline imply causation?
Yes
Is the study an experiment or an observational study?
Observational study
Is it appropriate to conclude causation in this case?
No
Suggest a possible confounding variable.
The socioeconomic status
Download