Uploaded by Jan Tricia Dela Rosa

Stats Lectures

advertisement
STATISTICAL
ANALYSIS
APPLICATIONS
WITH
SOFTWARE
LESSON 1 - NATURE OF STATISTICS
•
Lesson 1.1. Basic terminologies used in Statistics
Statistics and Accountancy ๏ƒ  Statisticians and
Accountants
•
•
•
•
•
•
•
•
•
•
•
•
Statisticians work with quantitative and
qualitative data; different types of data (e.g.,
drug efficacy, citizens' living conditions,
consumer purchasing patterns, performance of
an individual/company/institution)
design data collection methods, such as
surveys
identify digital sources for data collection
oversee data collection
compile and analyze data collected using
statistical methods and software
report findings and prevent misinterpretation of
results
provide recommendations for using results
Accountants work mostly with quantitative
data than qualitative data financial figures
Accountants review financial documents to
make sure clients manage funds appropriately
Prepare financial documents and report them
as necessary
Ensure proper procedures are being used for
collecting, storing, and reporting financial
information
Make recommendations for improving financial
performance
research conducted, industry financial
statements,
business
periodicals,
government reports.
DATA COLLECTION:
- Primary Data – data gathered by the
researcher
- Secondary Data – data of other sources
- Census survey – complete enumeration in
which every member of the population is
included
- Sample survey – survey of a portion of the
population
Statistics and statistic
•
•
•
Statistics is the science concerned with
developing and studying methods for collecting,
analyzing, interpreting and presenting empirical
data.
A statistic is the descriptor of a set of sample
data.
The descriptors of population data are referred
to as population parameters.
Population parameters and sample statistics and
their relevant symbols
Population and Sample
Variable
•
•
a characteristic or measurement that can be
determined for each member of a population.
Variables may be numerical or categorical.
- Numerical variables take on values with
equal units such as weight in pounds and
time in hours.
- Categorical variables place the person or
thing into a category
- We could do some math with numerical
values, but it makes no sense to do math
with categorical values.
Data
•
•
Are the actual values of the variable. They may
be numbers or they may be words.
Sources of Data:
- Primary data – data that come from an
original source, and are intended to answer
specific research questions; may be taken
by interview, mail-in questionnaire, survey,
or experimentation.
- Secondary data – data that are taken from
previously recorded data - information in
Descriptive and Inferential Statistics
•
•
•
•
Descriptive statistics is the term given to the
analysis of data that helps describe, show or
summarize data in a meaningful way such that,
for example, patterns might emerge from the
data.
! Not used to make conclusions beyond the data
which have been analyzed
! Not used to reach conclusions regarding any
hypotheses made.
Inferential statistics is a branch of statistics
that makes the use of various analytical tools to
draw inferences about the population data from
sample data.
Two general types of statistic that are used to
describe data:
•
Measures of central tendency: these are
ways of describing the central position of a
frequency distribution for a group of data. The
•
central position may be described using the
mode, median, and mean.
Measures of spread: these are ways of
summarizing a group of data by describing how
spread out the scores are. To describe this
spread, a number of statistics are available.
These include the range, quartiles, absolute
deviation, variance and standard deviation.
•
•
Two main types of inferential statistics: hypothesis
test and regression analysis
•
•
•
•
•
•
A hypothesis test is a type of inferential
statistics that is used to test assumptions and
draw conclusions about the population from the
available sample data.
- Involves setting up a null hypothesis (H0)
and an alternative hypothesis (Ha)
- Followed by conducting a statistical test of
significance.
A conclusion is drawn based on the value of the
test statistic, the critical value and the
confidence intervals.
A hypothesis test can be left-tailed, right-tailed,
and two-tailed.
A regression analysis is used to quantify how
one variable will change with respect to another
variable.
There are many types of regressions available
such as:
- Simple Linear regression
- Multiple Linear regression
- Nominal regression
- Logistic regression
- Ordinal regression
The most commonly used regression in
inferential statistics is linear Regression which
checks the effect of a unit change of the
independent variable in the dependent variable.
Lesson 1.2. Levels of Measurement
•
•
•
•
Dependent variable and independent variable
An independent variable is used to test the
effects on the dependent Variable.
- changed or controlled
- “cause”
A dependent variable is the variable being
tested and Measured in a scientific experiment.
- changes in response to the independent
variable
- depends upon the values and/or changes
of the independent variable
- “effect” by the changes in independent
variable is seen
- Ex. 1. Drug/medicine (dosage) and its
effect on patients’ blood pressure
- Ex. 2. Amount of fertilizer given to plants
and its effect on plant growth
NOMINAL SCALE
- a scale of measurement that uses a label or
category to define an attribute of an
element. Nominal data may be recorded
with a nonnumeric description or with a
numeric code
ORDINAL SCALE
- a scale of measurement that has the
properties of a nominal scale and can be
used to RANK or ORDER the observations.
Ordinal data may be recorded with a
nonnumeric description or with a numeric
code
- Ex. Restaurant evaluation
- Variable: Customer Service
INTERVAL SCALE
- a scale of measurement that has the
properties of an ordinal scale and the
interval between observations is expressed
in terms of a fixed unit of measure. Interval
data are always numeric.
RATIO SCALE
- a scale of measurement that has the
properties of an interval scale and the ratio
of observations is meaningful. Ratio data
are always numeric.
- A requirement of a ratio scale is that a
ZERO value is inherently defined in the
scale. Specifically, it must indicate nothing
exists for the variable at the zero point.
- Examples: distance, height, weight, time,
cost Cost of three cars A, B, and C: 0, 1.5
million, 3.0 million respectively Car A is free,
Car C is twice as expensive as Car B ->
3,000,000/1,500,000
Fish bowl – 1, 2, 3, 4
If 1 is drawn, then the number pointed to at the
TRN will be a single digit number. If the number
pointed to is 7, then your samples are the 7th, 27th,
47th, 67th….
If 2 is drawn, then the number pointed to at the
TRN will be a two-digit number. If the number
pointed to is 2
Lesson 1.3. Sampling Methods
•
•
RANDOM (or PROBABLITY) SAMPLE
- sample obtained from a population where
all members (of the sample) are chosen
without particular preference
- ! all members of the population have equal
chances of being selected
- Examples: getting a sample of 5 senior
officials and 5 junior officials from a
population of 90 officials (45 junior and 45
senior officials); getting a sample of 10
male and 10 female respondents from a
population of 208 employees (104 male
and 104 female employees)
- Simple Random Sampling - Create a list
with label and choose random samples by
fish bowl, roulette, OR use fish bowl,
roulette/online random picker without
creating a list
- Systematic Sampling, StratifiedProportional Sampling, Cluster
Sampling,
- Multi-stage Sampling - Combination of
different sampling techniques
NON-RANDOM (or NON-PROBABILITY)
SAMPLE
- sample obtained from a population where
all members of the sample are picked on
the basis of some preference
- Examples: getting a sample of 10 senior
officials from a population of 90 junior and
senior officials; getting a sample of 10
male respondents from a population of
208 male and female employees
- Quota Sampling, Purpose Sampling,
Convenience Sampling
Kth value
๐‘ฒ=
๐‘ต
๐’
๐Ÿ๐ŸŽ๐ŸŽ
Quota Sampling- a sampling method of
gathering representative data from a group
Application of this method ensures that sample
group represents certain characteristics of the
population chosen by the researcher.
! A sample should be a good estimate of a
population parameter
Simple Random Sample Without Replacement and
Simple Random Sample With Replacement
-
Sample With Replacement does not
change the probabiility of the second,
third… nth pick
-
The number of different simple random
samples of size n that can be selected
from a finite population of size N is
-
๐‘ต!
๐’!(๐‘ต−๐’)!
Lesson 1.4. Summation Notations
The summation sign Σ
โ–ช
Denotes the addition of a series of numbers
๐‘›
∑ ๐‘ฅ๐‘– = ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 + โ‹ฏ + ๐‘ฅ๐‘›
๐‘–=1
i is called the index of summation
๐‘ต
๐‘ฒ=
๐’
๐Ÿ๐ŸŽ๐ŸŽ๐ŸŽ
•
Ex. The sum of n observations of variable x from x1 to
xn, that is x1 + x2 + x3 + . . . + xn is denoted as
Example: Consider a population size of 2000 (4
digits) and sample size = 100
๐‘ฒ=
NON-RANDOM SAMPLING
1 and n are the lower and upper limits of summation
= 20
Difference between
โœ“
that the annual salary and the management
training participation information for all 2500
managers have been obtained from the firm’s
personnel records
โœ“
that the population mean and the population
standard deviation have been computed using
the following computations:
๐Ÿ
๐’
(∑ ๐’™๐’Š )
๐’Š=๐Ÿ
and
๐’
∑(๐’™๐’Š )
๐Ÿ
๐’Š=๐Ÿ
๐‘ƒ๐‘œ๐‘๐‘ข๐‘™๐‘Ž๐‘ก๐‘–๐‘œ๐‘› ๐‘š๐‘’๐‘Ž๐‘›: ๐œ‡ =
∑ ๐‘ฅ๐‘–
๐‘
=
∑ ๐‘ฅ๐‘–
2500
= $41,800
∑(๐‘ฅ๐‘– −๐œ‡)2
Adding a constant to each observation in the given
data in the previous Example may be expressed as
๐‘ƒ๐‘œ๐‘๐‘ข๐‘™๐‘Ž๐‘ก๐‘–๐‘œ๐‘› ๐‘ ๐‘ก๐‘Ž๐‘›๐‘‘๐‘Ž๐‘Ÿ๐‘‘ ๐‘‘๐‘’๐‘ฃ๐‘–๐‘Ž๐‘ก๐‘–๐‘œ๐‘›: ๐œŽ = √
√
∑(๐‘ฅ๐‘– −๐œ‡)2
2500
๐‘
=
= $4000
๐’
∑(๐’™๐’Š + ๐’„)
๐’Š=๐Ÿ
Example 3.
Given: 156, 205, 270, 309, 311 ; c = 4
Assume that a review of the 2500 records shows
that 1500 managers have completed the training
program
๐‘ƒ๐‘Ÿ๐‘œ๐‘๐‘œ๐‘Ÿ๐‘ก๐‘–๐‘œ๐‘› ๐‘œ๐‘“ ๐‘กโ„Ž๐‘’ ๐‘๐‘œ๐‘๐‘ข๐‘™๐‘Ž๐‘ก๐‘–๐‘œ๐‘› โ„Ž๐‘Ž๐‘ฃ๐‘–๐‘›๐‘” ๐‘๐‘œ๐‘š๐‘๐‘™๐‘’๐‘ก๐‘’๐‘‘ ๐‘กโ„Ž๐‘’ ๐‘ก๐‘Ÿ๐‘Ž๐‘–๐‘›๐‘–๐‘›๐‘” ๐‘๐‘Ÿ๐‘œ๐‘”๐‘Ÿ๐‘Ž๐‘š (๐‘)
๐‘ = 1500⁄2500 = 0.60
๐‘›
∑(๐‘ฅ๐‘– + ๐‘) = ∑(๐‘ฅ1 + ๐‘) + (๐‘ฅ2 + ๐‘) + (๐‘ฅ3 + ๐‘)
๐‘–=1
+ (๐‘ฅ4 + ๐‘) + (๐‘ฅ5 + ๐‘)
Numerical characteristic of the population→
parameters
= ∑[(156 + 4) + (205 + 4) +
(270 + 4) + (309 + 4) + (311 + 4)]
๐’
๐œ‡=
$41,800
๐’
∑(๐’™๐’Š + ๐’„) = ∑ ๐’™๐’Š + ๐’๐’„
๐’Š=๐Ÿ
๐’Š=๐Ÿ
EAI
Managers
Example 4.
Given: 156, 205, 270, 309, 311 ; c = 4
๐œŽ=
๐‘
= 0.60
$4000
๐‘›
∑(๐‘๐‘ฅ๐‘– ) = (∑[4(156) + 4(205) + 4(270) + 4(309)
๐‘–=1
+ 4(311)] )
= 624+820+1080+1236+1244
= 5004
ILLUSTRATIVE PROBLEM
Suppose that we would like to take a sample of 100
families in a barangay that is composed of 2000
families which is not homogenous. It consists of Low,
Middle, and High-Income brackets.
ILLUSTRATIVE PROBLEM
Simple Random or Systematic method?
Electronics Associates Inc. is an international company
that manufactures a diverse line of products in plants
located throughout the United States, Canada, and
Europe.
STRATIFIED RANDOM SAMPLING
INCOME BRACKET
NUMBER OF FAMILIES
The firm’s director of personnel has been assigned the
task of developing a profile of the company’s 2500
managers. The group includes department heads,
plant superintendents, and division managers.
High Income
400 families
The characteristics that are to be identified include the
mean annual salary for the managers and the
proportion of managers having completed the
company’s management training program.
Middle Income
600 families
Low Income
1000 families
TOTAL
2000 milies
Assume:
๐‘ƒ๐‘’๐‘Ÿ๐‘๐‘’๐‘›๐‘ก๐‘Ž๐‘”๐‘’ ๐‘ โ„Ž๐‘Ž๐‘Ÿ๐‘’/๐‘ƒ๐‘Ÿ๐‘œ๐‘๐‘œ๐‘Ÿ๐‘ก๐‘–๐‘œ๐‘› (๐‘›๐‘† )
๐‘›๐‘  =
๐‘๐‘ 
∗๐‘›
๐‘
INCOME
BRACKET
NUMBER
OF
FAMILIES
High
400
400
= 0.2 ∗ 100
2000
= 20
600
600
= 0.3 ∗ 100
2000
= 30
1000
1000
= 0.5 ∗ 100
2000
= 50
Income
Middle
Income
Low
Income
PERCENTAGE
SHARE
(PROPORTION)
Suppose that we would like to take a sample of 250
families
INCOME
BRACKET
NUMBER
OF
FAMILIES
PERCENTAGE SHARE
High
400
400
= 0.2 ∗ 250 = 50
2000
600
600
= 0.3 ∗ 250 = 75
2000
1000
1000
= 0.5 ∗ 250 = 125
2000
Income
Middle
Income
Low
Income
(PROPORTION)
Download