Uploaded by Joshua Reinier P. Lorcha

Reviewer

advertisement
Advanced Statistics
is a scientific body of knowledge that
deals with collection, organization &
presentation,
analysis
&
interpretation of data
The basic idea behind all statistical
methods of data analysis is to make
inferences about a population by
studying small sample chosen from
it.
Population
The
complete
collection
of
measurements outcomes, object or
individual under study.
Parameter
A number that describes a
population characteristics.
Sample
A subset of a population, containing
the objects or outcomes that are
actually observed.
Statistic
A number that describes a sample
characteristics.
Descriptive statistics
methods that focus on the collection,
presentation, and characterization of
a set of data in order to properly
describe the various features of that
set.
Inferential statistics
make possible the estimation of a
characteristic of population or the
making of a decision concerning a
population based only on sample
results.
Examples of Descriptive Statistics
Out of the 2.3 M documented
overseas Filipino workers, about
46.3 % are male (Quickstat, NSO,
September, 2017).
The teacher- student ratio in the
elementary public schools is 1: 31
(Fact Sheet, DepEd as cited from The
Philippine Star, 2018).
Among the currently married
women with ages 15 – 49 and
belonging to the low income group,
around 52.7% are not using any
birth control method (2006 Family
Planning Survey, NSO).
Jeff Bezos is the richest man in the
world with a net worth of 145.3
billion dollars (Forbes, 2019).
Examples of Inferential Statistics
With the rate the river system in
Metro Manila is being polluted then
the water supply in the metropolis
will be totally depleted by the year
2025 ( Greenpeace, 2007).
Wearing of seatbelt increases the
chances of survival in vehicular
accidents.
A new milk formulation designed to
improve
the
psychomotor
development of infants was tested
on randomly selected infants. Based
on the results, it was concluded that
the new milk formulation is effective
in improving the psychomotor
development of infants.
Variables
is a characteristic or attribute that
can assume different values.
is also a characteristics of interest,
one that can be expressed as a
number that possessed by each item
under study.
Dependent variable
the variable we wish to explain
1
Independent variable
the variable used to explain the
dependent variable
Confounding Variables
refer to all factors that the
researcher has not accounted for but
which may have influenced the
social interactions.
Data
are also the values that variables can
assume.
Qualitative Data
refers to the attributes or
characteristics of the samples. They
represent differences in character or
kind but not in amount. Such data
are
gathered
in
categorical
responses.
gender ( male or female) ,
personality (introvert, ambivert,
extrovert), socio-economic status
(high, middle, low), educational
attainment (elementary, high school,
college graduate)
Quantitative Data
refers to the numerical information
gathered about the sample. Such
data can be ordered or ranked. They
are as a result of counting or
measuring.
scores in an achievement test,
number of votes received by an
election candidate, monthly income
of an employee, number of graduate
school students in BulSU, length time
to perform a statistical problem,
prices of houses, gross sales
Quantitative (Numerical)
Data that represent counts or
measurements (can be count or
measure)
Are numerical in nature and can be
ordered or ranked.
Discrete Variables
Assume values that can be counted
and finite
Ex : no of students in a class
Continuous variables
Can assume all values between any
two specific values & is obtained by
measuring
Ex: weight, age, height, temperature,
salary
Levels of Measurement
1. Nominal Data – can be classified
into two or more distinct categories.
They are either:
Real nominal – classified
based
on
naturally
occurring
characteristics.
(Examples: sex, nationality,
race).
Artificial
nominal
–
classified based man-made
characteristics
following
certain rules. (Examples:
political affiliation, religious
denomination,
smoking
habits of people, passing or
failing a test.)
2. Ordinal Data – data are grouped
according to ranks or orders of
categories. In this classification, one
category is higher or lower than the
other categories.
Examples: winners in a
contest, birth order, faculty
rank,
income categories,
boxing divisions
3. Interval Data – not only ordering of
observations are possible but as well
as the arithmetic differences
2
4.
between them are meaningful. Here,
every value is an actual amount and
there is equal unit of measurement
separating each score. However, the
zero point is arbitrary in this scale
and does not reflect the absence of
the characteristic.
Examples:
achievement
scores, temperature, RBC
count, time
Ratio Data – data wherein the
equality of ratio or proportion has
meaning. In this scale, the zero point
is not arbitrary, since it indicates a
total absence of the attribute being
measured. The concepts of algebraic
operations, absolute zero and
inequalities have meaning.
Examples: length, weight,
area,
volume,
density,
money, age, power
Reliable
Measures
Center
Spread
Skew/Symmetry
Peaked
Mean
Variance
(standard
deviation)
Range
(max-min)
Skewness
Kurtosis
Hypothesis Testing
is the process of making an inference
or generalization about a population
by using data gathered from a
sample of the population.
It is an area of statistical inference in
which one evaluates a conjecture
about some characteristic of the
parent population based upon the
information contained in the random
sample.
Null hypothesis (H0 )
It is the hypothesis to be tested
which one hopes to reject. It shows
equality or no significant difference,
effect, or relationship between
variables.
For the mean, the null hypothesis
will be stated in one of these three
possible forms: Ho: µ = some value,
Ho: µ ≤ some value, Ho: µ ≥ some
value
Alternative hypothesis (Ha )
It generally represents the idea
which the researcher wants to
prove.
For the mean, the alternative
hypothesis will be stated in only one
of three possible forms: Ha: µ ≠ some
value, Ha: µ > some value, Ha: µ <
some value
Types of Hypothesis Testing
1. Two-tailed test - It is nondirectional test with the region of
rejection lying on both tails of the
normal curve. It is used when the
alternative hypothesis uses words
such as not equal to, significantly
different, etc.
Non-mean
based
measure
Mode, median
Interquartile
range (1st 3rd quartile)
Mean
Deviation
---
Summary of when to use the mean,
median and mode
3
2.
samples is taken from the same
population, the probability of getting
a result similar to the present study
is 95%.
0.01 level - 99% sure that the error
is only 1% An  of 0.01 (compared
with 0.05) means the researcher is
being relatively careful. He is only
willing to risk being wrong 1 in 100
times in rejecting the null hypothesis
which is true.
Critical Value
the dividing point between the
region where the null hypothesis is
rejected and the region where it is
not rejected. This is also known as
the tabular value.
One-tailed test - It is a directional
test with the region of rejection lying
on either left or right tail of the
normal curve.
Right directional test - The
region of rejection is on the
right tail. It is used when the
alternative hypothesis uses
comparatives
such
as
greater than, higher than,
better than, superior to,
exceeds, etc.
Decision Rule
a statement of the specific conditions
under which the null hypothesis is
rejected and the conditions under
which it is accepted.
P – VALUE
is the probability of observing a
sample value as extreme as, or more
extreme than, the value observed,
given that the null hypothesis is true.
- a way to express the likelihood that
Ho is false.
This
process
compares
the
probability called the p – value with
the significance level.
If the p-value is smaller than the
significance level, Ho is rejected. If it
is larger than the significance level,
Ho is not rejected.
Left directional test - The
region of rejection is on the
left tail. It is used when the
alternative hypothesis uses
comparatives such as less
than, smaller than, inferior
to, lower than, below, etc.
Level of Confidence
0.05 level - 95% sure that the error
is only 5%. When a different set of
4
Make a Decision
This step includes the computation
of the test statistic, comparing it to
the critical value, and making a
decision to reject or not to reject the
null hypothesis.
Errors in Decision Making
Type I ( error) – rejecting a true
Ho
Type II ( error) – accepting a false
Ho
Your dependent variable should be
approximately normally distributed.
Example 1
A ten randomly selected oil wells in a
large field produced 21, 19, 20, 22,
24, 21, 19, 22, 22, and 20 barrels of
crude oil per day. Is this enough
evidence to conclude that the oil
wells are not producing an average
of 22.5 barrels of crude oil per day?
Test at 0.01 level of significance.
The 5-step solution of t-test
A. Ho :  = 22.5 (the oil wells are
producing 22.5 barrels of oil a day)
Ha:   22.5 (the oil wells are not
producing 22.5 barrels of oil a day)
B. Let  = 0.01. The df = n – 1) = 9,  =
22.5, and t tabular = 3.250 (critical t)
C. Decision Rule: Use two – tailed test.
Reject the null hypothesis if tcomp >
-3.250 or t comp< 3.250; otherwise
Test Statistic for Testing the Significance
of Difference Between Means
Assumptions of the One Sample T- test
The dependent variable should be
measured at the interval or ratio
level (i.e., continuous).
The data are independent (i.e., not
correlated/related), which means
that there is no relationship between
the observations.
There should be no significant
outliers.
accept it.
D. Computed: 𝑥 = 21, sx = 1.56
E.
F. Decision: Do not reject Ho/Accept
Ho because -3.04 > -3.250
G. Conclusion: The oil wells are
producing 22.5 barrels of oil a day.
The difference could only brought
about by chance or by sampling
error.
5
Assumptions of the T test for Independent
Samples
1. Variables
involved:
One
independent, categorical variable
that has two levels/groups. One
continuous dependent variable.
2. The groups under study are
unrelated groups, also called
unpaired groups or independent
groups. These are groups in which
the cases (e.g., participants) in each
group are different.
3. The independent t-test requires that
the
dependent
variable
is
approximately normally distributed
within each group.
4. The independent t-test assumes the
variances of the two groups being
measured are equal in the
population.
(Homogeneity
of
Variance)
Test of Differences of Two Means
she considered 14 students with
whom she used the IGI method.
After several sessions, a 30-items
test was given. The scores are shown
in the table below.
The 5-step solution for the t-test
Ho: The Team Based Instruction
method of teaching Statistics is as
effective as the individually Guided
Instruction method. (Ho: TBI =
IGI)
Ha: The Team Based Instruction
method of teaching Statistics is more
effective than the Individually
Guided Instruction method. (Ha:
TBI > IGI)
 = 0.05; one-tailed; df = 27; ttab =
1.703
Criterion: Reject Ho if tcomp  ttab.
Decision: Do not reject Ho since
tcomp(1.69) < ttab(1.703).
Conclusion: The Team Based
Instruction method of teaching
Statistics is as effective as the
Individually Guided Instruction
method.
Two-Sample Tests of Hypothesis
DEPENDENT SAMPLES
Now we consider situations when
samples are not independent. Rather
Example
A teacher wanted to find out if the
Team Based Instruction (TBI)
method of teaching Statistics is more
effective than the Individually
Guided Instruction (IGI) method.
Two classes of approximately equal
intelligence were selected. From one
class, she considered 15 students
with whom she used TBI method of
teaching and from the other class,
6
the samples are dependent or
related. How do we tell the
difference between dependent and
independent samples?
There are two types of dependent
samples: One is characterized by
measurement, followed by an
intervention of some type, and then
another measurement. This could be
called “before” and “after study”.
The other one is characterized by
matching and pairing observations.
Why do we prefer dependent
samples over independent samples?
By using dependent sample, we are
able to reduce the variation in the
sampling distribution.
T test for DEPENDENT SAMPLES (Paired T
test)
The test statistic used is the Paired t – Test
two related groups should be
approximately normally distributed.
Example
The following are the weights in
pounds of 15 students before and
after 6 months of attending aerobics.
The 5-step solution is as follows:
Ho: Aerobics is not effective in
reducing weight. (Ho: B = A)
Ha: Aerobics is effective in reducing
weight. (Ha: B > A)
Let  = 0.05; one-tailed; df = 14; ttab
= 1.761.
Criterion: Reject Ho if tcomp  ttab.
Decision:
Reject
Ho,
since
tcomp(4.21) > ttab(1.761).
Conclusion: Based on the sample
evidence, aerobics is effective in
reducing weight.
It can be observed that the two
quantities (weights before and after)
are positively related (strong). The
paired samples design, in this case,
provides
a
more
powerful
hypothesis test than would an
independent samples test carried
out on the same data.
The computed t-value we got from
the SPSS is equal to the one we got
using the formula. Also, since p =
0.001/2 = 0.0005 is less than 0.05,
we reject Ho in favor of Ha.
Conclusion: Aerobics is effective in
reducing weight.
Assumptions of the Paired T-test ( T- test
for Dependent or Correlated Means)
1. The dependent variable should be
measured on a continuous scale.
2. The independent variable should
consist of two categorical, "related
groups" or "matched pairs".
3. There should be no significant
outliers in the differences between
the two related groups.
4. The distribution of the differences in
the dependent variable between the
7
ANALYSIS OF VARIANCE
Assumptions of ANOVA
1. The populations follow the normal
distribution.
2. The populations have equal standard
deviations.
3. The populations are independent.
4. The data must be at least interval
scale. When these conditions are
satisfied, F is used as the distribution
of the test statistic.
Interpretation of the Correlation
Coefficient Once the value of r is
found significant, the rule of thumb
for assessing the degree of
relationship between the two
quantitative variables can be
interpreted using the following
criteria:
CORRELATION
Correlation and regression are two
related statistical tools. We use
correlation to determine if a
relationship exists between two
variables. On the other hand, we use
regression to predict the value of
one variable from our knowledge of
the other variable.
The Scatter Diagram
One can usually and roughly
estimate if a relationship exists
between
two
variables
by
constructing a scatter diagram. This
is done by plotting the point
corresponding to each observation
on a rectangular coordinate system.
Correlational Tests:
1. Pearson
–
Product
Moment
Correlation It measures the degree
of relation between two at least
interval scale data.
2. Spearman’s
Rank
Correlation
Coefficient – It is the measure of the
correlation between two ordinal
variables.
3. Phi-Coefficient – The phi coefficient
determines
the
degree
of
relationship between two variables
which
are
both
nominal
dichotomous like sex (male/female)
and
marital
status
(married/unmarried).
8
4.
Point Biserial – it measures
correlation between an interval and
a nominal dichotomous data.
Regression Analysis
Regression analysis is used to:
1. Predict the value of a dependent
variable based on the value of at
least one independent variable
2. Explain the impact of changes in
an independent variable on the
dependent variable
Dependent variable: the variable we
wish to explain
Independent variable: the variable
used to explain the dependent
variable
EXAMPLE: Describing the Problem
A random sample of 14 students is
selected from an elementary school,
and each student is measured on a
creativity score (Create) using a new
testing instrument and on a task
score (Task) using a standard
instrument. The Task score is the
mean time taken to perform several
hand-eye
coordination
tasks.
Because the test for the creativity
test is much cheaper, it is of interest
to know whether you can substitute
it for the more expensive Task score.
That is, can you create a regression
equation that will effectively predict
a Task score (the dependent
variable) from the Create score (the
independent variable)?
Assumptions of Linear Regression
1. The two variables should be
measured at the continuous level.
2. There needs to be a linear
relationship between the two
variables.
3. There should be no significant
outliers.
4. There should be independence of
observations.
5. Data
needs
to
show
homoscedasticity, which is where
the variances along the line of best
fit remain similar as you move along
the line.
6. The residuals (errors) of the
regression line are approximately
normally distributed.
The figure shows a scatterplot of these
two variables along with the regression
line and the confidence intervals for Y
given X. In the plot, we use the standard
practice of plotting the independent
variable (Create) on the x-axis and the
dependent variable (Task) on the y-axis.
By observing the scatterplot, it can be
seen that there is a positive correlation
between the two variables (in this case, r
= .74), and it appears that knowing Create
should help in predicting Task. It is also
clear that knowing Create does not in any
way perfectly predict Task.
9
Regression analysis results are shown in
the Table. The “create” line gives the
results of the two-sided hypothesis test
that the theoretical slope of the regression
line for predicting Task from Create is
0.083. In this case, p  0.002 indicates that
you should reject the null hypothesis and
conclude that there is a statistically
significant linear relationship between the
two variables and, therefore, that Create
should be useful in predicting Task.
are using to predict the value of the
dependent variable are called the
independent
variables
(or
sometimes,
the
predictor,
explanatory or regressor variables).
NONPARAMETRIC TESTS
The inferential techniques normally
require the use of parametric tests. A
parametric
test
has
certain
assumptions that need to be
satisfied. Among these are:
1. The samples are randomly
chosen
from
normal
populations
with
equal
variances.
2. The sample sizes are relatively
large (greater than 30).
3. The samples are measured at
least in the interval scale.
However, in many studies, one or
more of these assumptions may not
be met. Hence, there is a need to
consider other types of tests; tests
which are not very restrictive. The
tests are called nonparametric tests.
These types of tests may be used in
the following conditions:
1. Samples
coming
from
populations which are of
doubtful normality.
2. It is more applicable for a
small sample size (n < 30)
provided that nature of the
population distribution from
which the sample came from
is known.
3. It may be used for treating
samples
made
up
of
observations from different
populations.
The sample regression equation is created
from the “Unstandardized Coefficients” in
the coefficients table. Thus, the regression
equation for predicting Task from Create
is Predicted Task  1.599  0.083 Create,
or, in words, you predict the Task value by
multiplying Create by 0.083 and adding
1.599.
Previously, it was shown the scatterplot
along with the regression line, provides
reasonable estimates for Task for each
value of Create. For a new student who
has a Create score of 52, you would
predict a Task score using the following
equation: Predicted Task  1.599 
0.083(52)  5.915, which is visually
consistent with the regression line in
Figure 4.5 at X  52.
Multiple Regression
Multiple regression is an extension
of simple linear regression. It is used
when we want to predict the value of
a variable based on the value of two
or more other variables.
The variable we want to predict is
called the dependent variable (or
sometimes, the outcome, target or
criterion variable). The variables we
10
4.
It can be used for data that are
only nominal or ordinal.
Overview of Nonparametric Methods
There is at least one nonparametric
equivalent for each parametric
general type of test. General type of
tests fall into the ff. Categories:
1. Tests of differences between
groups (independent samples)
2. Tests of differences between
variables (dependent samples)
3. Tests of relationships between
variables
11
Download