Uploaded by Joshua Quaye

1st lecture- SM1

advertisement
Statistical Methods: Math 153
By
Wilhemina Adoma Pels
Department of Statistics and Actuarial Science
KNUST
FIRST LECTURE
March 4, 2022
1 / 42
Weekly Content
Week 1:
1
2
3
4
5
6
7
8
Course introduction, provision of course outline and
recommended textbooks
Introduction to Statistics.
Uses of Statistics.
Basic terms in Statistics
Variable and Data
Measurement scales
Stages of statistical investigation
Data Collection (Primary and Secondary data)
Week 2:
1
2
3
Questionnaire Design
Quiz 1
Summarizing and describing data
Week 3, 4
1
Using numerical summaries to characterize sample data
Week 5, 6
1
Using graphical summaries to characterize sample data
2 / 42
Week 7: Midsem Exams
Week 8
1
2
3
4
5
Introduction to Probability
Axioms, Sets, Sample space, Measure of probability of
events
Mutually exclusive
Independent events
Conditional probability, Bayes’ theorem
Week 9: Counting techniques: combination and
permutations
Week 10: Random variables and some discrete probability
distribution
Week 11: Some Continuous Probability Distributions
Week 12: Revision
3 / 42
Week 1
4 / 42
What is Statistics?
Statistics is the science concerned with developing and studying
methods for collecting, organizing, analyzing, interpreting and
presenting empirical data.
Statistics is the science of learning from data.
Types of Statistics
1 Descriptive statistics
Summarizing and describing the data
Uses numerical and graphical summaries to characterize
sample data
2
Inferential statistics
Uses sample data to make conclusions about a broader
range of individuals–a population–than just those who are
observed (a sample)
5 / 42
Types of Inferential Statistics
Inductive: Generalization for the population based on
knowledge of the sample.
Deductive: Generalization for the sample based on
knowledge of the population.
6 / 42
USE OF STATISTICS
1
To Present Facts in Definite Form
2
Comparisons
3
Policy Making
4
Forecasting
5
It Enlarges Knowledge
7 / 42
BASIC TERMS
Population and Sample
A population is the collection of all possible individual
units whose characteristics are to be studied
A sample is a subset of the population that is studied in
order to make inference about the population
8 / 42
9 / 42
What is a variable?
A variable is any attribute, characteristic, or measurable
property that can vary from one observation to another.
Example: height, hair color, Gender, Age of Patient, Weight
TYPES OF VARIABLES
1
Qualitative or Categorical Variables
Take on values that are names or labels
Allow for classification of individuals based on some
attribute or characteristic.
Examples : Religion, Regions in Ghana, Gender
2
Quantitative Variable
Numeric
Represent a measurable quantity of individual.
Examples: Volume, height, prices, Number of students in
the class
10 / 42
Types of Quantitative Variables
Discrete Variable : Is a quantitative variable that has either a
finite number of possible values or a countable number of
possible values. The term countable means that the values
result from counting, such as 0, 1, 2, 3, and so on. Examples:
number of children in a family, number of heads or tails,
income
Continuous Variable one that can take on any value within
some range or interval (i.e., within a specified lower and upper
limit). Examples: height, temperature, Weight
Figure : Illustration of the relationship among qualitative,
quantitative, discrete, and continuous variables.
10 / 42
DATA VRS VARIABLE
The list of observed values for a variable is data.
Example, gender is a variable; the observations male or
female are data.
Qualitative data are observations corresponding to a
qualitative variable.
Quantitative data are observations corresponding to a
quantitative variable.
Discrete data are observations corresponding to a discrete
variable
Continuous data are observations corresponding to a
continuous variable
Univariate vs Bivariate data
Univariate data when only one variable is involved in the study
Bivariate data when two variables are involved
Multivariate data when a study has more than two variables
11 / 42
MEASUREMENT
If a thing exists, it exists in some amount; and if
it exists in some amount, it can be measured
E. L. Thorndike (1914)
12 / 42
MEASUREMENT
What is measurement?
Measurement is the application of mathematics to things or
events.
A system of measurement is a crucial component of research.
Simple example: How tall is Jane? More complex example:
How shy is Jane?
13 / 42
Scales of measurement
Nominal Scale
Data that represent categories or names or labels. There is
no implied order to the categories of nominal data.
Observations are classified into mutually exclusive
categories
Examples: Eye color (blue, brown, green), Biological sex (male
or female) Political Affiliation (GUM, CPP, NPP, GFP, NDC)
Marital Status( single, divorced, widowed)
Sometimes numbers are used to designate category membership.Here,
the numbers do not have numeric implications; they are simply
convenient labels.
Example: Country of Origin
Ghana= 1 Cameroon= 2 Nigeria= 3 Other= 4
14 / 42
Scales of Measurement
Ordinal Scale:
This scale has a logical ordering of the categories.
designates an ordering (greater than, less than). It does
not assume that the intervals between numbers are equal.
1
For example, finishing place in a race (first place, second
place)
2
A psychiatrist may, for example, grade patients on an
anxiety scale as ’not anxious’, ’mildly anxious’, ’moderately
anxious’, or ’severely anxious’ and use the numbers 0, 1, 2,
and 3 to label the categories, with lower numbers
indicating less anxiety.
3
Adverse effect [no AE = 0, mild AE = 1, severe AE = 2]
15 / 42
16 / 42
MEASUREMENT
Scales of Measurement
Interval Scale:
An important point to make about interval scales is that
the zero point is simply another point on the scale; it does
not represent the starting point of the scale or the total
absence of the characteristic being measured.
Designates an equal-interval ordering. For example,
Temperature in Fahrenheit or Celsius is an interval scale
measurement. The difference in temperature between 20
degrees F and 25 degrees F is the same as the difference
between 76 degrees F and 81 degrees F.
Likert scale is another example of interval scale
measurement.
17 / 42
Example: Temperature
18 / 42
MEASUREMENT
Scales of Measurement
Ratio Scale: designates an equal-interval ordering with a true
zero point (i.e. the zero implies an absence of the thing being
measured).
Examples Temperature in Kelvin (zero is the absence of heat
or can’t get colder) and measurements of heights of students in
this class (zero means complete lack of height).
19 / 42
MEASUREMENT
Summary of Measurement Scales
Measurement scales differ by order, equal intervals between
adjacent units and absolute zero point.
Nominal: None
Ordinal: Order
Interval: Order + Equal intervals
Ratio: Order + Equal intervals + True zero
Nominal or ordinal scaled data Use Bar Charts (simple,
multiple, compound, etc ) or Pie Charts
Interval or ratio scaled data Use Histogram, polygon,
ogive, etc
Scatter plot to assess association between quantitative
variables. Note: No inference drawn at this point. The
object being to convey information
20 / 42
Summary
21 / 42
STAGES OF STATISTICAL INVESTIGATIONS
If the investigation is to optimize the use of the available
resources, expertise and time, it is essential to carefully examine
all aspects of the design and application of statistical
investigations (experiments and surveys) at the planning level.
STEPS
1. Statement of problem and objectives: We must
identify the cause for concern and state explicitly what the
problem is, characteristics to be measured, collection,
processing and publishing methods
2. Target population and the use of sample or entire
population: Define in clear unambiguous terms the population
of interest, define the sample units to make them distinct,
non-overlapping and recognizable and select an appropriate
sampling design
22 / 42
STAGES OF STATISTICAL INVESTIGATIONS
3. Design of Questionnaire or Schedule: Construction of
questionnaire or schedule is extremely important since the
respondent and data collector must interpret them
4. Method of data collection: You have to decide whether
data will be collected by personal interview, online, physical
observation or some other method. Cost is a major factor here
Personnel must be thoroughly trained to correctly locate
sampling units and take measurements.
5. Required data: The data to be collected should be guided
by the objective of the investigation.
23 / 42
STAGES OF STATISTICAL INVESTIGATIONS
6. List of available resources: A wide variety of resources is
likely to be required for the operation of the investigation and
the analysis of the results. These include the following:
Physical resources: Sampling frame, maps etc
Human resources: Data collectors, data analysts
Financial resources
7. Conducting a pilot Survey: This must be carried out
before the main survey.
8. Collection, Editing, Storage and organization of data
9. Interpretation and Presentation of Results
24 / 42
DATA COLLECTION METHODS
Pros and Cons of Primary and Secondary Data
Where do data come from?
We have often seen our data all nice and collated in a database
form:
Results of product and process improvement experiments
Firms/Institutions (demographic data, student enrollment,
productivity data, etc)
Take a step back if were starting from scratch, how do we
collect or find data?
Secondary data
Primary data
25 / 42
DATA COLLECTION METHODS
Secondary Data
Secondary data is data someone else has collected
EXAMPLES OF SOURCES
Vital Statistics birth, death certificates
Hospital, clinic, school nurse records
Private and foundation databases
City and regional governments
Surveillance data from state government programs
Federal agency statistics - Census, NHIS, etc
26 / 42
DATA COLLECTION METHODS
Secondary Data - LIMITATIONS
Finding secondary data could sometimes be frustrating
27 / 42
DATA COLLECTION METHODS
Secondary Data - LIMITATIONS
When was it collected? For how long? Maybe out of date
for what you want to analyze. May not have been collected
long enough for detecting trends.
Is the data set complete? There may be missing
information on some observations. Unless such missing
information is seen and corrected for, analysis will be
biased.
Are there confounding problems? Sample selection bias?
Source choice bias? In time series, did some observations
drop out over time?
Are the data consistent/reliable? Did variables drop out
over time? Did variables change in definition over time?
For example, number of years of education versus highest
degree obtained
28 / 42
DATA COLLECTION METHODS
Secondary Data - LIMITATIONS
Is the information exactly what you need? In some cases,
may have to use proxy variables. Variables that may
approximate something you really wanted to measure. Are
they reliable? Is there correlation to what you actually
want to measure?
USES OF SECONDARY DATA
As an alternative to a survey
As a source of supplementary information
As a check on possible survey biases
As a means of improving survey estimates
29 / 42
DATA COLLECTION METHODS
Secondary Data - ADVANTAGES
No need to reinvent the wheel. If someone has already
found the data, take advantage of it.
30 / 42
DATA COLLECTION METHODS
Secondary Data - ADVANTAGES
It will save you money. Even if you have to pay for access,
often it is cheaper in terms of money than collecting your
own data.
It will save you time. Primary data collection is very time
consuming.
It may be very accurate. When especially a government
agency has collected the data, incredible amounts of time
and money went into it. Its probably highly accurate.
It has great exploratory value. Exploring research
questions and formulating hypothesis to test.
31 / 42
DATA COLLECTION METHODS
PRIMARY DATA
Primary data is data you collect.
32 / 42
DATA COLLECTION METHODS
PRIMARY DATA - EXAMPLES
Surveys
Focus groups
Questionnaires
Personal interviews
Experiments and observational study
33 / 42
DATA COLLECTION METHODS
PRIMARY DATA - LIMITATIONS
Do you have the time and money for:
Designing your collection instrument?
Selecting your population or sample?
Pretesting/piloting the instrument to work out sources of
bias?
Administration of the instrument?
Entry/collation of data?
Uniqueness. May not be able to compare to other
populations
Researcher error (Sample bias, Other confounding factors)
DATA COLLECTION CHOICE
What you must ask yourself:
WILL THE DATA ANSWER MY RESEARCH
QUESTION?
34 / 42
DATA COLLECTION METHODS
DATA COLLECTION CHOICE
To answer that, you much first decide what your research
question is. Then you need to decide what data/variables are
needed to scientifically answer the question
If that data exists in secondary form, then use them to the
extent you can, keeping in mind limitations
But if it does not, and you are able to fund primary
collection, then it is the method of choice. For example,
Direct Observation/Experiments
Telephone
Postal or electronic mails
Documents and reports
Interviewing
35 / 42
DATA COLLECTION METHODS
Questionnaire design
A survey is only as good as the questions it asks
What you should ask?
36 / 42
DATA COLLECTION METHODS
Questionnaire design
The questions asked are a function of previous decisions
The questions asked are a function of future decisions (such
as statistical analysis)
Key Criteria
Questionnaire relevancy: No unnecessary information is
collected and only information needed to solve the problem
is obtained. Be specific about your data needs; tie each
question to an objective
Questionnaire accuracy: Information is both reliable
and valid
Phrasing Questions
Open ended response versus fixed alternative questions
Decision criteria: type of research; time; method of
delivery; budget; concerns regarding researcher bias
37 / 42
DATA COLLECTION METHODS
AVOID
Leading questions
Overly complex questions
Use of jargon
Loaded questions (can use a counter-biasing statement)
Ambiguity
Double barreled questions
Making assumptions
DECISIONS
Ranking, sorting, rating or choice
How many categories or response positions
Balanced or unbalanced
Forced choice or non-forced choice
38 / 42
DATA COLLECTION METHODS
Types of questions
Types of fixed alternative questions
Single dichotomy or dichotomous-alternative questions
Example: Are you currently registered in a course at the
University of Science and Technology?
Yes
No
Respondent chooses one of two alternatives (yes/no;
male/female)
What scale would this data create?
39 / 42
DATA COLLECTION METHODS
Multi-choice alternative questions
Multi-choice alternative (Respondent chooses from several
alternatives)
1.Determinant choice
Choose only one from several possible responses
Example: Which College are you currently registered in
at the University?
Engineering
Science
Arts/Soc. Science
Health sciences
Planning and Architecture
40 / 42
DATA COLLECTION METHODS
Frequency determination
Asks for an answer about frequency of occurrence
Example: In a typical week, how often do you purchase
chocolate chip cookies?
Never
Once
Two or more times
Multi-choice alternative questions
3.Check list
Provide multiple answers to a single question
Should be mutually exclusive and exhaustive
Example: What brands of chocolate chip cookies have you, to
the best of your memory, purchased in the past month (check
all that apply?)
Golden Tree
Cadbury’s
Presidents Choice Decadent
41 / 42
Thank You.
42 / 42
Download