The Department of Mathematics & Statistics

advertisement
Stats 245.3
Introduction to Statistical Methods
Instructor:
W.H.Laverty
Office:
235 McLean Hall
Phone:
966-6096
Lectures:
Evaluation:
MWF
11:30am - 12:20pm Thorv 271
Lab: W 3:30 - 4:20 Physics107
Assignments, Labs, Term tests - 40%
Every 2nd Week (approx) – Term Test
Final Examination - 60%
Dates for midterm tests:
1.
2.
3.
4.
5.
6.
Wednesday, Sept 17 (in the lab, 3:30pm)
Wednesday, October 01 (in the lab, 3:30pm)
Wednesday, October 15 (in the lab, 3:30pm)
Wednesday, October 29 (in the lab, 3:30pm)
Wednesday, November 19 (in the lab, 3:30pm)
Wednesday, December 03 (in the lab, 3:30pm)
Each test and the Final Exam are Open Book
Students are allowed to take in Notes, texts, formula sheets,
calculators (No laptop computers.)
The tests and the Final Exam are multiple choice and
computer marked – Students need an HB pencil and to
identify their paper with their student number.
Computer Assignments – due dates and time
1.
2.
3.
4.
TBA
TBA
TBA
TBA
Computer Assignments
It is important to learn to use at least one of the powerful
statistical Packages – SPSS, Minitab, S-plus, SAS, R
Very quickly statistical computations become outside the range
of feasibility of simple computing devices (hand-held
calculators, computer spreadsheets)
These assignments are designed to give some initial experience
with these packages.
Computer Assignments will be accepted
and given a mark if they are submitted
after the due date and time, however
assignments that are submitted late will
not be returned.
Text
• The lectures will be given in Power Point
• These will be posted on the Stats245 website
• Tables that are required will be posted on the
Stats 245 website
• A text is not be required
• I will post a list books in the library can be
consulted
Alternative Texts (Available in Library)
Title
1. Statistics Informed Decision using Data
2. Introductory Statistics
3. Modern Elementary Statistics
4. Elementary Statistics: A Brief version
5. Elementary Statistics
6. Statistics The Exploration and Analysis of Data
7. Statistics -A first course
8. Statistics -A first course
9. Basic Statistical Concepts
10. An Introduction To Statistical Methods and
Data Analysis
11. Introductory Statistics
Author(s)
Sullivan
Mann
Freund
Bluman
Hoel
Devore and Peck
Freund
Saunders, Smit, Adatia & Larson
Bartz
Ott
Wonnacott & Wonnacott
To download lectures
1. Go to the stats 245 web site
a) Through PAWS or
b) by going to the website of the department of
Mathematics and Statistics -> people -> faculty
-> W.H. Laverty -> Stats 245-> Lectures.
2. Then
a) select the lecture
b) Right click and choose Save as
To print lectures
1. Open the lecture using MS Powerpoint
2. Select the menu item File -> Print
Stat 245.3
The following dialogue box appear
In the Print what box, select handouts
Set Slides per page to 6 or 3.
6 slides per page will result in the least amount
of paper being printed
1
2
3
4
5
6
3 slides per page leaves room for notes.
1
2
3
Course Outline
Introduction
• Populations, samples
• Variables
• Data Collection
Exploratory Statistics
Organizing and displaying Data
Numerical measures of Central Tendency
and Variability
Describing Bivariate Data
Probability Theory
 Concepts of Probability
 Random variables and their distributions
 Binomial distribution, Normal distribution
Inferential Statistics





Estimation, Hypotheses testing
Comparing Samples
Analyzing count data
Regression and Correlation
Non-parametric Statistics
End – Lecture 1
Introduction
The circular process of research:
Questions arise about
a phenomenon
A decision is made to
collect data
Conclusion are drawn
from the analysis
A decision is made as
how to collect the
data
The data is
summarized and
analyzed
The data is collected
What is Statistics?
It is the major mathematical tool of
scientific inference (research) – with
an interest in drawing conclusion from
data.
Data that is to some extent corrupted
by some component of random
variation (random noise)
Random variation or (random noise)
can be defined to be the variation in the
data that is not accounted for by factors
considered in the analysis.
Example
Suppose we are collecting data on
• Blood Pressure
• Height
• Weight
• Age
Suppose we are interested in how
• Blood Pressure
is influenced by the following factors
• Height
• Weight
• Age
Blood Pressure will not be perfectly
predictable from :
• Height
• Weight
• Age
There will departures (random variation)
from a perfect prediction because of other
factors the could affect Blood pressure
(diet, exercise, hereditary factors)
Another Example
In this example we are interested in the use of:
1.
2.
3.
4.
5.
antidepressants,
mood stabilizing medication,
anxiety medication,
stimulants and
sleeping pills.
The data were collected for n = 16383 cases
In addition we are interested in how the use
these medications is affected by:
1. Age
20-29, 30-39,40-49, 50-59, 60-69, 70+
2. Gender
Male, female
3. Education
–
–
–
–
< Secondary,
Secondary Grad.,
some Post-Sec.,
Post-Sec. Grad.
4. Income
– Low, Low Mid, Up Mid, High
5. Role
–
–
–
–
–
–
–
–
parent, partner , worker
parent, partner
parent, worker
partner, worker
worker only
parent only
partner only
no roles
Some questions of interest
1. How are the dependent variables
(antidepressant use, mood stabilizing
medication use, anxiety medication use,
stimulants use, sleeping pill use)
interrelated?
2. How are the dependent variables (drug
use) related to the independent variables
(age, gender, income, education and role)?
• Again the relationships will not be perfect
• Because of the effects of other factors
(variables) that have not been considered in
the experiment
• If the data is recollected, the patterns
observed at the second collection will not be
exactly the same as that observed at the first
collection
The data appears in the following Excel file
Drug data
In Statistics
• Questions
– About some scientific, sociological, medical or
economic phenomena
• Data
– The purpose of the data is to find answers to the
questions
• Answers
– Because of the random variation in the data (the
noise). Conclusions based on the data will be
subject to error.
The circular process of research:
In what part of this process does statistics play
a role?
Questions arise about
a phenomenon
A decision is made to
collect data
Conclusion are drawn
from the analysis
Statistics
Statistics
A decision is made as
how to collect the
data
The data is
summarized and
analyzed
The data is collected
Experimental
Design
Statistical Theory is interested in
1. The design of the data collection
procedures. (Experimental designs,
Survey designs). The experiment can be
totally lost if it is not designed correctly.
2. The techniques for analyzing the data.
In any statistical analysis it is
important to assess the
magnitude of the error made
by the conclusions of the
analysis.
Consider the following statement:
You can prove anything with Statistics.
In fact:
One is unable to “prove” anything with
Statistics.
At the end of any statistical
analysis there always is a
possibility of an error in any of the
decisions that it makes.
The success of a research project
does not depend on the its
conclusions
The success of a research project
depends on the accuracy of its
conclusions
If one is testing the effectiveness
of a drug
There is two possible conclusions:
1. The drug is effective:
2. The drug is not effective:
The success of a this project does
not depend on the its conclusions
The success depends on the
accuracy of its conclusions
For this reason:
It is extremely important in any
study to assess the accuracy of its
conclusions
Some definitions
important to Statistics
A population:
this is the complete collection of subjects
(objects) that are of interest in the study.
There may be (and frequently are) more
than one in which case a major objective
is that of comparison.
A case (elementary sampling
unit):
This is an individual unit (subject) of the
population.
A variable:
a measurement or type of measurement
that is made on each individual case in the
population.
Types of variables
Some variables may be measured on a
numerical scale while others are
measured on a categorical scale.
The nature of the variables has a great
influence on which analysis will be used. .
For Variables measured on a numerical scale
the measurements will be numbers.
Ex: Age, Weight, Systolic Blood Pressure
For Variables measured on a categorical scale
the measurements will be categories.
Ex: Sex, Religion, Heart Disease
Note
Sometimes variables can be measured on
both a numerical scale and a categorical
scale.
In fact, variables measured on a numerical
scale can always be converted to
measurements on a categorical scale.
Example
The following variables were evaluated
for a study of individuals receiving head
injuries in Saskatchewan.
1. Cause of the injury (categorical)
•
•
•
•
Motor vehicle accident
Fall
Violence
other
2. Time of year (date) (numerical or
categorical)
•
•
•
•
summer
fall
winter
spring
3. Sex on injured individual (categorical)
•
•
male
female
4. Age (numerical or categorical)
•
•
•
•
•
•
< 10
10-19
20 - 29
30 - 49
50 – 65
65+
5. Mortality (categorical)
•
•
Died from injury
alive
Types of variables
In addition some variables are labeled as
dependent variables and some variables
are labeled as independent variables.
This usually depends on the objectives of
the analysis.
Dependent variables are output or
response variables while the
independent variables are the input
variables or factors.
Usually one is interested in determining
equations that describe how the dependent
variables are affected by the independent
variables
Example
Suppose we are collecting data on
• Blood Pressure
• Height
• Weight
• Age
Suppose we are interested in how
• Blood Pressure
is influenced by the following factors
• Height
• Weight
• Age
Then
• Blood Pressure
is the dependent variable
and
• Height
• Weight
• Age
Are the independent variables
Example – Head Injury study
Suppose we are interested in how
• Mortality
is influenced by the following factors
• Cause of head injury
• Time of year
• Sex
• Age
Then
• Mortality
is the dependent variable
and
• Cause of head injury
• Time of year
• Sex
• Age
Are the independent variables
dependent
Response
variable
independent
predictor
variable
A population:
this is the complete collection of subjects
(objects) that are of interest in the study.
There may be (and frequently are) more
than one in which case a major objective
is that of comparison.
A case (elementary sampling
unit):
This is an individual unit (subject) of the
population.
A variable:
a measurement or type of measurement
that is made on each individual case in the
population.
Variables may be measured on a numerical
scale while others are measured on a
categorical scale.
Variables may be labeled as dependent
variables and some variables are labeled
as independent variables.
Dependent variables are output or
response variables while the independent
variables are the input variables or factors.
Independent
Dependent
A sample:
Is a subset of the population
In statistics:
One draws conclusions about the
population based on data collected
from a sample
Reasons:
Cost
It is less costly to collect data from a
sample then the entire population
Accuracy
Accuracy
Data from a sample sometimes leads
to more accurate conclusions then data
from the entire population
Costs saved from using a sample can
be directed to obtaining more accurate
observations on each case in the
population
Types of Samples
different types of samples are determined
by how the sample is selected.
Convenience Samples
In a convenience sample the subjects that
are most convenient to the researcher are
selected as objects in the sample.
This is not a very good procedure for
inferential Statistical Analysis but is
useful for exploratory preliminary work.
Quota samples
In quota samples subjects are chosen
conveniently until quotas are met for
different subgroups of the population.
This also is useful for exploratory
preliminary work.
Random Samples
Random samples of a given size are
selected in such that all possible samples
of that size have the same probability of
being selected.
Convenience Samples and Quota samples
are useful for preliminary studies. It is
however difficult to assess the accuracy
of estimates based on this type of
sampling scheme.
Sometimes however one has to be
satisfied with a convenience sample and
assume that it is equivalent to a random
sampling procedure
Population
× Case
Sample
Variables
X
Y
Z
Some other definitions
A population statistic
(parameter):
Any quantity computed from the values
of variables for the entire population.
A sample statistic:
Any quantity computed from the values
of variables for the cases in the sample.
Since only cases from the sample are
observed
– only sample statistics are computed
– These are used to make inferences about
population statistics
– It is important to be able to assess the accuracy
of these inferences
Organizing Data
the next topic
Download