Uploaded by Kim Delos Santos

Introduction to Statistics Lesson

advertisement
LESSON 1: Introduction to Statistics
STATISTICAL METHODS
STATISTICS- refers to a set of mathematical
procedures for organizing, summarizing,
and interpreting information. (STATISTICS,
SCIENCE, AND OBSERVATIONS)
Descriptive statistics- are statistical
procedures used to summarize, organize,
and simplify data.
POPULATION- is the set of all the
individuals of interest in a particular study.
Non-probability EX. Mean, media, mode;
range, standard deviation, variance,
interquartile.
SAMPLE- is a set of individuals selected
from a population, usually intended to
represent the population in a research
study.
Inferential statistics- consist of techniques
that allow us to study samples and then
make generalizations about the populations
from which they were selected.
VARIABLE- is a characteristic or condition
that changes or has different values for
different individuals (height, weight,
gender, or personality; Temperature, time
of day, or the size of the room).
Probability Ex. T-test, Analysis of Variance
(ANOVA), correlation, regression.
DATA (plural) - are measurements or
observations. A data set is a collection of
measurements or observations. Adatum
(singular) is a single measurement or
observation and is commonly called a score
or raw score.
PARAMETER- is a value, usually a numerical
value that describes a population. A
parameter is usually derived from
measurements of the individuals in the
population.
STATISTIC- is a value, usually a numerical
value that describes a sample. A statistic is
usually derived from measurements of the
individuals in the sample.
SAMPLING ERROR- is the naturally
occurring discrepancy, or error, that exists
between a sample statistic and the
corresponding population parameter.
RESEARCH METHODS, AND STATISTICS
CORRELATIONAL METHOD- two different
variables are observed to determine
whether there is a relationship between
them. Non-experimental study- The
"independent variable" that is used to
create the different groups of scores is
often called the QUASI-INDEPENDENT
VARIABLE.
EXPERIMENTAL METHOD- one variable is
manipulated while another variable is
observed and measured.

To establish a cause-and-effect
relationship between the two
variables, an experiment attempts
to control all other variables to
prevent them from influencing the
results (Ex. T-test and ANOVA).
The independent variable (IV) - is the
variable that is manipulated by the
researcher.
The dependent variable (DV) - is the
variable that is observed to assess the
effect of the treatment.
observations in terms of size or magnitude
(Ex. 1st, 2nd. 3rd, so on)
Interval Scale- consists of ordered
categories that are all intervals of exactly
the same size. Equal differences between
numbers on a scale reflect equal differences
in magnitude. However, the zero point on
an interval scale is arbitrary and does not
indicate a zero amount of the variable being
measured. (EX. Fahrenheit and Celsius)
Ratio Scale- is an interval scale with the
additional feature of an absolute zero point.
With a ratio scale, ratios of numbers do
reflect ratios of magnitude (EX. physical
measures such as height and weight;
number of errors on a test)
Individuals in a control condition do not
receive the experimental treatment.
Instead, they either receive no treatment or
they receive a neutral, placebo treatment.
The purpose of a control condition is to
provide a baseline for comparison with the
experimental condition.
VARIABLES AND MEASUREMENT
Individuals in the experimental condition do
receive the experimental treatment.
OPERATIONAL DEFINITION- identifies a
measurement procedure (a set of
operations) for measuring an external
behavior and uses the resulting
measurements as a definition and a
measurement of a hypothetical construct.
SCALES/LEVEL OF MEASUREMENT
Nominal Scale- consists of a set of
categories that have different names.
Measurements on a nominal scale label and
categorize observations, but do not make
any quantitative distinctions between
observations (College Program: Psychology,
Biology, etc.)
Ordinal Scale- consists of a set of categories
that are organized in an ordered sequence.
Measurements on an ordinal scale rank
CONSTRUCTS- are internal attributes or
characteristics that cannot be directly
observed but are useful for describing and
explaining behavior (EX. Intelligence,
Anxious, Hungry)
Note that an operational definition has two
components:
First, it describes a set of operations for
measuring a construct.
Second, it defines the construct in terms of
the resulting measurements (EX.
Behavior...intelligent behavior).
DISCRETE VARIABLE- consists of separate,
indivisible categories. No values can exist
between two neighboring categories (EX.
the number of children in a family or the
number of students attending class.
EX: An ice cream shop keeps track of how
much ice cream they sell versus the
temperature of the day.
CONTINUOUS VARIABLE- there are an
infinite number of possible values that fall
between any two observed values. A
continuous variable is divisible into an
infinite number of fractional parts (EX.
Weights).
Dependent (DV) and Independent (IV)
Variables
INDEPENDENT VARIABLES- are variables
that the researcher controls and
manipulates in accordance with the
purpose of the investigation.
Multivariable distribution- each datum
belongs to three or more variables.
EX: the teacher would like to keep track of
the enrollment in the College in terms of
program, year level and gender.
DEPENDENT VARIABLES- are variables that
are measures based on the effect of the
independent variables.
Example:
The researcher would like to determine the
predictive validity of the entrance
requirements for freshman students, the
(___) are the national achievement test,
entrance examination, and school grade.
The (____) is the performance in first year
college.
Univariable, bivariable, and multivariable
distribution
Univariable Distribution- there is only one
variable involved.
Ex: Age of Grade 7 pupils.
Bivariable Distribution- in which data are
classified on the basis of two variables.
Steps in determining the sample size
Determine the population where the data
researcher needs can be gathered.
Determine the kind of sample to be drawn
from it or to be selected from the identified
population. (Criteria: age, gender, working
experiences, etc.
Determine the desired sample size.
The Slovin's sampling formula
EX: A researcher may want to determine
the reading deficiencies of the students in
his school. However, he may not probably
be able to test all the students on account
of their big numbers with 5,000 population.
Let us estimate the sample size using a 5%
acceptable margin of error.
Sampling method
Probability Sampling- this refers to a
sampling process where each unit in the
population has known nonzero probability
(every unit in the population has a chance
of being selected in the sample) of being
included in the sample.
1. Simple Random sampling- the
simplest method available...each
member in the population will have
an equal chance of being selected
(This makes impossible to predict
who will be chosen).
EX: Fishbowl technique, lottery or
raffle type method, roulette wheel
method. When to use: if the
population is not widely spread
geographically.
2. Stratified Random Sampling- the
method where the samples are
randomly selected from the
different groups or sections of the
population used in the study.
(EX: age, gender, economic status
and others) When to use: This is
preferred to use if precise estimates
are desired for stratified parts of the
population and if the sampling
problems differ in various strata or
groups of the population.
3. Systematic Random SamplingEvery member of the population is
listed with a number, but instead of
randomly generating numbers,
individuals are chosen at regular
intervals.
When to use: this is advisable to use
if the ordering of the population is
essentially random and when
stratification with numerous data is
used.
4. Cluster Sampling- involves dividing
the population into subgroups, but
each subgroup should have similar
characteristics to the whole sample.
Instead of sampling individuals from
each subgroup, you randomly select
entire subgroups
(EX: neighborhoods, school district
or region) When to use: if the
population can be grouped into
clusters or where the individual
population samples are known to be
different with respect to the
characteristics under study.
LESSON 2: Descriptive Statistics
Descriptive statistics is the term given to
the statistical treatments of data that helps
describe, show or summarize data in a
meaningful way.
This form of statistics does not allow us to
make conclusions to prove/disprove any
hypotheses that we established in our
study.

Making comparisons between
groups of individuals or between
sets of figures.
Mean
 is sensitive to the exact value of all
the scores (Affected by extremely
high or low values, called outliers)

Under most circumstances, of the
measures used for central tendency,
the mean is least subject to
sampling variation

"Balance Point"(between the
highest score and the lowest score)
They are simply a way to give a general
overview of our data.
Types of Mean
Regular Average (Mean)

Defined as the sum of the scores
divided by the number of scores.
Measures of Central Tendency:
1. Mean
2. Median
3. Mode
Measures of Variability:
1. Range
2. Standard Deviation
3. Variance
Central Tendency
 Measure used to compare the
quantity of two or more sets of
data.
 "average" or "typical"
 Mean (Average), Median, Mode
Weighted Average (Overall Mean)

Defined as the mean of a certain
distribution which considers the
weight (Value) of each indicator (to
combine two sets of scores and then
find the overall mean for the
combined group).
Median

Defined as the scale value below
which 50% of the scores fall.

The centermost score if the number
of the scores is odd. If the number is
even, the median is taken as the
average of the two centermost
scores.

Determine whether the data value
falls into the upper half or lower half
of the distribution.
Mode

Most Frequent Score in the
distribution

There is a situation that there is only
one mode on a certain distribution
(unimodal) but most of the time it is
multiple (bimodal/multimodal)

Not used very much in studying
behavioral sciences because it is
very unstable.
CENTRAL TENDENCY AND THE SHAPE OF
THE DISTRIBUTION
SYMMETRICAL DISTRIBUTIONS

The right-hand side of the graph is a
mirror image of the left-hand side. If
a distribution is perfectly
symmetrical, the median is exactly
at the center because exactly half of
the area in the graph is on either
side of the center.
The mean also is exactly at the center of a
perfectly symmetrical distribution because
each score on the left side of the
distribution is balanced by a corresponding
score (the mirror image) on the right side.
The mean (the balance point) is located at
the center of the distribution (perfectly
symmetrical distribution, the mean and the
median are the same).

The curve is symmetric; both sides
of a vertical line passing through the
center.
Variability
- Specifies the extent to which scores
are different from each other.
SKEWED DISTRIBUTION –especially
distributions for continuous variables,
there is a strong tendency for the mean,
median, and mode to be located in
predictably different positions.
SKEWNESS- is a measurement of the
distortion of symmetrical distribution or
asymmetry in a data set. Skewness is
demonstrated on a bell curve when data
points are not distributed symmetrically to
the left and right sides of the median on a
bell curve.
POSITIVE SKEWNESS- when its tail is more
pronounced on the right side than it is on
the left (This means that the most extreme
values are on the right side)
NEGATIVE SKEWNESS - when the tail is
more pronounced on the left rather than
the right side (The most extreme values are
found further to the left)
NORMAL DISTRIBUTION

The mean, median, and mode are
equal and located at the center of
the distribution.

A normal distribution curve is
UNIMODAL.
-
Signify scores dispersion.
-
"Difference and Degree"
-
Statistic that represent variability
are the following: Range, Standard
Deviation and Variance.
Range
- Defined as the difference between
the highest and the lowest scores in
the distribution.
-
Range = Highest Score - Lowest
Score
-
Crude measure of Dispersion
Standard Deviation
- Give us a measure of dispersion
relative to the mean
-
Sensitive to each Score in the
distribution
-
Stable with regard to sampling
fluctuation.
Variance

Square of SD
represents the entirety and 200 percent
specifies twice the given quantity
(Encyclopedia Britannica, 2021).

Not used widely in Descriptive
Statistics but most of the time in
inferential Statistics
Percentile Rank – the percentage of scores
with values lower than the score in
question. Opposite of the percentile point.

Used in ANOVA statistical
treatment
Percentile Rank = [cumfL + (fi / i) (X - XL) / N]
x 100
Standard Score (Z-Score)


A Z score is a transformed score
that designates how many standard
deviation units the corresponding
raw score is above or below the
mean
Above or Below the mean
Remember:
If the Z-score is positive. The score is above
the mean. If the Z score is 0, the Score is the
same as the mean. And if the Z scores is
negative, the score is below the mean.
Percentage, Percentile and Percentile Rank
Percentage - is a relative value indicating
hundredth parts of any quantity.
One percent (symbolized 1%) is a
hundredth part; thus, 100 percent
Percentile - is a value on the measurement
scale which a specified percentage of the
scores in the distribution fall below.
Percentile = XL + (i /fi ) (cumfp - cumfL)
PSYCH STATS
Chapter 3: Writing Descriptive Stats results
& Research Questions, Inferential Stats
Overview (with Normality testing)
WRITING RESULTS OF
DESCRIPTIVESTATISTICS
Commonly Reported Descriptive Statistics:
Mean, Standard Deviation and Verbal
Interpretation
- These are descriptive stats that
represent and are sensitive to the
entire values in the distribution.
Mean
- Is all about the average or the usual
score of your respondents on your
studied variable.
Standard deviation
- Refers to usual dispersion of the
scores in reference to the mean.
Verbal Interpretation
- Is the category used to tag your
yielded average values. This is done
through the usage of a Cut-Off
Scores or a transmutation table.
Sample Problem:
A certain team of experimental
psychologists wanted to identify which type
of competition (group/individual) catalyzed
better performance in accomplishing logic
quizzes.
They decided that the population
that will be asked to participate are 3rd
year philosophy majors studying at a
College located at Pangasinan. They
requested the administrator of the College
to lend them 20 students on each of the
four sections of the said program.
RESEARCH QUESTIONS, INFERENTIAL STATS
AND NORMALITY TESTING
Research Question
-
This is a list of questions that your
study is required to answer through
the data that you will gather from
your sample.
-
This is the compass that will guide
your data gathering as well as the
statistical treatment that you will
implement on your data.
How to write your own Research
questions?
1. Identify the general concept that
you want to work on. Then look for
a theory that explains its
relationship to other variables or
concepts. Then confirm this
relationship by reviewing literature
(Look for Blindspot).
2. Once you identify your topic, the
next thing that you will do is to
create a specific title for your
prospect research. This will be the
guide for your Statement of the
Problem (SOP).
Example:
 Rumination and its relationship on
Depression
3. Write your statement of the
Problem together with your
research questions.
 Descriptive Stats Question/s
 Inferential Stats Question/s
 Implication Question/s
Statement of the Problem:
Hypothesis
-
Is the educated guess that you will
be giving to the question that
requires inferential statistics to be
answered.
-
These are the points that you will try
to prove or not.
To evaluate the relationship of
Rumination to Depression.
There are two types of Hypotheses:
Research Questions:
Null Hypothesis (H0:)
1.) What is the level of the participants'
Rumination?
-
There’s no effect in the population.
Alternative Hypothesis (H1:)
2.) What is the level of the participants'
Depression?
3.) Is Rumination related to the depression
experienced by the participants?
4.) What implications can be drawn from
the results of the study?
KINDS OF HYPOTHESIS
Scientific Hypothesis
-
Is a suggest explanation or solution
to a phenomenon
Statistical Hypothesis
-
There’s an effect in the population.
Non Directional: Hypothesis that doesn't
specify the direction of the effect of the
independent variable on the dependent
variable.
Two-tailed Test: The region of rejection
lying on both tails of the normal curve. It is
used when the alternative hypothesis uses
words such as not equal to, significantly
different, etc.
Directional: Hypothesis that specifies the
type of effect the independent variable has
on the dependent variable.
-
It is a guess or prediction made by a
researcher regarding the possible
outcome of the study.
One-tailed test: The region of rejection
lying on either left or right tail of the normal
curve.
-
It is a claim or statement about an
unknown parameter.
Right directional test: the region of
rejection is on the right tail (greater than,
higher than, better than, superior to,
exceeds, etc.
Left directional test: The region of rejection
is on the left tail (less than, smaller than,
inferior to, lower than, below, etc.)
INFERENTIAL STATISTICS
-
With inferential statistics, you are
trying to reach conclusions that
extend beyond the immediate data
alone.
-
Infer (Conclude, Deduce & Assume)
-
Measure if the scores yielded can be
considered significant or just
brought by chance.
-
Strengthen the Point we will be
making in the conclusions of our
data.
Independent T-Test
-
Identifying effect of IV to DV
through comparing two groups.
Dependent T-Test
-
Identifying effect of IV to DV
through comparing2 situations of a
certain group.
One-way ANOVA
-
Identifying effect of IV to DV
through comparing3 or more
groups.
Two Way ANOVA
-
Identifying effect of IV to DV
through multiple comparisons.
Pearson Correlation - to know the
relationship between 2 or more interval /
ratio variables.
DESCRIPTION OF A NORMAL CURVE
(FROST, 2020)
-
Also known as the Gaussian
distribution and the bell curve which
is first described by the German
mathematician Carl Gauss
-
The normal distribution is a
probability function that describes
how the values of a variable are
distributed.
Spearman's Rho - to know the relationship
between 2 ordinal variables (usage of
deviations).
Kendall's Tau - to know the relationship
between 2 or more ordinal variables
(usage of concordant and discordant pairs).
Linear Regression -to know the ability of a
certain IV by predicting a DV
Multiple Regression - to know the ability of
a certain model in predicting a DV
IMPORTANT CHARACTERISTICS OF A
NORMAL CURVE
1. Mean, median and Mode are equal.
Chi-Square - to know relationships between
2 or more categorical / nominal variables.
THE NORMAL CURVE
- A theoretical Distribution of
Population Scores. It is a bell-shaped
curve that is described by a specific
equation.
2. Both sides from the median are
symmetrical.
3. The tails are asymptotic, which
means that they approach but
never quite meet the horizon
(Asymptotic).
4. Kolmogorov Smirnov and ShapiroWilk tests can be used to test your
data against Normal Curve. P should
be more than .05
WHY IT HAS TO BE NORMAL CURVE?
-
Variables measured in behavioral
Science closely resemble Normal
Curve.
-
It is a requirement to lot of
inference test that the data should
project Normal Distribution.
PSYCH STATS
Chapter 4: Correlational Analyses
Correlational Study
-
Positive (+) or negative (-) and range
from -1.00 (perfect negative correlation) to
1.00 (Perfect positive correlation).
A correlation coefficient of 0.00
indicates no relationship between variables.
A form of empirical study wherein
the researcher examines the
relationships between variables by
identifying the direction and the
significance of the tested
relationship.
DESCRIPTION OF CORRELATION
(PEARSON R)
PEARSON R
-
An inferential statistics used to
identify if there is a significant linear
relationship between two
interval/ratio variables.
-
It gives information about the
magnitude of the association, or
correlation, as well as the
directionof the relationship
(Statistics Solution, 2020).
Requirements:
A.) Variables to be correlated shouldhave a
linear relationship.
B.) Both variables should bein interval/ratio
form.
C.) Normal Distribution
D.) Random Sampling
Scatter Plot: The Graph for Correlations
-
The graph used in presenting
Correlation Results
-
A Graph of paired X and Y values.
-
The more the points were on the
trend line the more it is correlated.
Ascending - correlation is positive
Descending -correlation is negative
FORMULATING RESEARCH QUESTION
CORRELATION COEFFICIENT
The magnitude and the direction of
the relationship between two variables are
indicated by a correlation coefficient (r).
-
Is there a significant relationship
between Variable A and Variable B?
-
Is Variable A significantly related to
Variable B?
ALTERNATIVE HYPOTHESIS (H1)
- There is a positive/negative
relationship between variable A and
Variable B.
Variable A is positively/negatively
correlated/related to Variable B.
participants on the two variables) of
ranks on the participant's data
-
2.) Yielded Coefficient
-
RANK CORRELATION ANALYSES
-
-
These are statistical analyses that
analyze correlation of two variables
represented by not-normally
distributed data, minimal sample
size and in the ordinal form.
Both produce coefficients (+1 to -1)
similar to Pearson r. Due to that,
hypothesis making and results
reporting is relatively similar with
the manner of Pearson r.
Basic difference:
1.) Coefficient Derivation (Formula)
-
The coefficient of Spearman's rho is
based on the deviation (Absolute
difference of the ranks of the
Most of the time, the yielded
coefficient for Spearman's rho is
larger compare to Kendall's tau due
to their difference in formula.
3.) Accuracy in smaller sample size
-
Kendall's Tau yields a more accurate
coefficient in studies which has
small number of participants (12 or
less).
4.) Popularity
-
Examples: Kendall's Tau and Spearman's
Rho.
Kendall's tau it is based on the
number of concordant and
discordant pairs (Concordance of
the second variable to the first
variable ranking).
Spearman's rho is quite popular
compare to Kendall's Tau since this
is launched to the scientific
community earlier by a renowned
English Psychologist/theorist named
Charles Spearman.
Download