Discrete variable- A variable with a basic unit of measurement that cannot be
subdivided. (people)
Hypothesis- A statement about the relationship between variables that is
derived from a theory. Hypothesis are more specific than theories, and all terms
and concepts are fully defined.
Data- in social science research, information that is represented by numbers
Descriptive Statistics- The branch of statistics concerned with 1)summarizing
the distribution of a single variable 2) measuring the relationship between two or
more variables.
Measure of association- statistics that summarize the strength and direction of
the relationship between variables.
Theory- generalized explanation of the relationship between two or more
variables.
Populations- total collection of all cases in which the research is interested
Inferential statistics- statistics concerned with making generalizations from
samples to populations.
Independent variable- variable that is identified as a causal variable. the
independent variable is thought to cause the dependent variable.
level of measurement- mathematical characteristics of a variable as determined
by the measurement process. A major criterion for selecting statistical
techniques.
Statistics- set of mathematical techniques for organizing and analyzing data.
Data reduction - summarizing many scores with a few statistics
dependent variable- variable that is identified as an effect, result of outcome of
something.
Variable- any trait that can change values from case to case.
Data value- value of variable associated with one element of population
Data set-collection of measurements or observations
Univariate- summarize one variable (GPA)
Bivariate- describe relationship between two variables
Mulitivariate- descriptive statistics relationship between three or more variables
Raw Score-single measurement/observation
Continuous variable- variable with a unit of measurement that can be
subdivided. (rounding of the scores)
Research- process of gathering information systematically to answer questions
or test theories.
Sample- subset of a population in inferential statistics. Discrete variable- A
variable with a basic unit of measurement that cannot be subdivided.
Hypothesis- A statement about the relationship between variables that is
derived from a theory. Hypothesis are more specific than theories, and all terms
and concepts are fully defined.
Data- in social science research, information that is represented by numbers
- the sum of scores
Basic rule of precedence- find all squares and square roots first, multiple, divide,
Sigma, add, subtract
Validity- describe if it measures the concept it is intended to measure
Reliability- quality of measuring instrument
Nominal Level of Measurement:
Categories are not numbers
Gender, area code, provinces
Cannot be ranked, added, divided
Categories must be exhaustive (categories must exists for every score)
Homogenous (comparable cases)
No ambiguity exists mutually exclusive
discrete
Ordinal Level of Measurement:
Ranked from high to low
More or less = classified
Limitation-scores position respect to other scores
discrete
Interval Level
numbers
Ordered categories exactly the same
Distance is not equal
onlly interval ratio level can be continous
Ration Scale Equal differences on scale reflect equal difference in magnitude
Distance is equal
******
Type
Nominal
Ordinal
Ratio
Interval
dichotomous
******
Description
Classification of objects
-can only be discrete
Variable can be ranked
-can only be discrete
Variables can be ranked,
distance is equal
-can only be continous
Variables can be ranked,
distance is not equal
-can only be continous
Variable comprises only
two categories
example
-ethic groups
-job satisfaction
productivity
Income, age ,salary
temperature
Gender
********
*****
Chapter 2
Percentage: %= x 100 -frequency over # of cases in all categories
n
Proportion:
N
-with small # of cases less than 20 report actual frequency
-always report proportions and percentages
Ratio compares parts to parts
23females/19 males
=1.21 females for every male
(1)
f2
Rates
# of actual occurrence divided by possible occurrence
usually multiplied by 10 to eliminate decimal points
crude death rate (CDR) multiplied by 1000
CDR= # of deaths X 1000
Total pop
Percentage Change
Measures increase or decrease in a score @ 2 different times
o {(f2-f1)} X100
f1
-2nd set(-)1st set / divide by 1st set X100
Frequency distribution
organized table of # of individuals in each category on the scale of
measurement
first step in any statistical analysis
graph or table
set of categories that make up original measurement scale
record of the number of individuals in each category
to understand how many times something has occurred
first step in statistical data
categories must be discrete 0-99/100-199/200-299 (class intervals)
# of categories must be between 6 and 20
lower class limits- smallest # that can belong to different classe interval 0-99
upper class limits- largest number that can belong to different classe interval 0-99
Class midpoints-middle of two classes
-add lower case to upper, divide by 2
State class limits:
class intervals that organize variables into discrete, non-overlapping
intervals.
when stated as a discrete category
Real class limits:
divide distance between the class intervals and add to upper class, and
subtract from lower class . ex: stated limits: 18-19, real limits: 17.5-19.5
when stated as a continous category
Cumulative frequency and Percentage:
give glance at how many cases fall below a given score in the distribution.
research may want to make a point of how cases are spread acress the range
of scores.
Histograms
each bar represents a range of values
use real limits rather than stated limits
values contact with each other show continuous variable
display distribution of data
used for continous, but commonly used for discrete interval ratio level
frequency always on vertical axes
tells you if the data is skewed right/left, bell-shaped
Chapter 3
Measure of Central Tendency:
Idea of the typical mean, median or mode case in the distribution
Mode
Mean
frequency that occurs most
frequently
quick easy indicator of central
tendency
nominal-level variable
seldom reported alone
"average"
add all values, divide by N of values
most commonly used measure
interval-ratio level, but also used
ordinal-level (highly skewed
distribution)
X bar
Weighted/aggregate mean
o occurance of more than one
value.
o (Xi) x (2f)
o multiply values by frequencies
Value Frequency Value X
frequency
97
4
97x4=
388
94
11
94x11 =
1034
92
12
92x12
=1104
91
21
91x21 =
1911
90
30
90x30=
2700
89
12
78
9
60
total
1
100
89x12=
1068
78x9 =
702
60z1= 60
8967
A) All scores cancel out around the
mean.
B) Uses all scores-strength, weakness
affected by every score.(skews)
C) Least squares principle- mean is
closer to all scores than the other
measures of central tendency.
mean pulled in the direction of the
extremes.
Symmetric-mean and median having
same value.
Positive Skew- skewed to left. mean is
higher in value than median
Negative skew- skewed to the right,
mean is lower in value than median.
Median
represents center of distribution of
scores
when N is odd, value of median is
unambiguous (always a middle case)
when N is even the score halfway
between the two scores must be
attained
ordinal or interval-ratio
measures position or location
good choice in extreme values
(outliers)
household income=extreme values
Percentile
used for median. media is the 50th percentile
identifies specific point of case
find 37th percentile of 78,
o 78 X .37 =28.86 is the case
Decile
divide distribution into 10's
Quartile
divide distribution into quarters 0 q1(25%) q2(50%) q3(75%) q4(100%)
Measure of Dispersion
How much variety of the distribution
ex. how 'often' do graduates receive $40,000 per year.
Chapter 4
Measure of Dispersion
variety in a distribution
the taller the curve=less dispersion, the flater the curve=more dispersion
amount of diversity, heterogeneity
R -range is the distance from highest value to the lowest
o quick easy indication of variablity
o ordinal, interval-ratio
o limited= based on only 2 scores
o no information about variation between high and low scores
Q-interquartile range, only considers the middle 50% of cases in distribution
boxplot- Diagram of LH
o L--------Q1--------Q2--------Q3--------H
o
o based on information on five-number summary
o Q2(median) is marked with vertical line
o useful when two ore more data sets are being compared
Good measure of dispersion:
o use all scores in distribution
o describe average deviation
o increase in value as the distribution becomes more diverse
Standard Deviation (S)
uses all scores in distribution
increases in value as the distribution of scores becomes more diverse
distance between socres and mean (deviation)
if scores are clustered around each other, deviation would be small, vise
versa.
value of S can increase with the inclusion of one or more outliers
units of standard deviation are the same as the units of orginal data values
average distance of each score from the mean
interval-ratio, but often used with ordinal-level
the higher the SD=more distribution, lower SD=less distributions
0 value- no dispersion
N-1 is used when working with random samples rather than entire
populations.
Index of Qualitative Variation (IQV)
only measur of dispersion for nominal level variables (but can be used with any
variable)
varies from 0.00 (no variation) to 1.00 (maximum variation)
raio of the amount of variation observed in the distribution.
Variance
measure of variation equal to the square of the standard deviation s2
used in inferential statistics
Coefficient of Varition
presents the standard deviation as a percentage of the mean value
allows you to compare the variability of different variables.
The range rule of thumb-principle that many data sets (95%) of sample values lie within
two standard deviation of the mean.
Chapter 5
Normal distribution
great importance
combo of mean and SD can use normal distribution curve to contruct precise
descriptive statements about empirical distributions
theoretical model, frequency polygon or line chart that is 'unimodal' (single
mode/peak) perfectly smooth, and symmetrical=mean, median and mode are
same value.
crucial point is distance along the abscissa (horizontal)
Empirical rule
68% of all values fall within 1 standard deviation of mean
95% of all values fall within 2 standard deviations of mean
99.7% of all values fall within 3 standard deviations of the mean
Z scores:
percentage of are above, below or between scores in empirical distribution
always have same value for mean and standard deviation
convert the original units of measuremen into Z scores, "standardize the normal
curve to a distribution that has a mean of 0.
how many standard deviation units a case is above or below a mean
a ruler from x to the mean
when value is less than the mean, the Zscore is negative
Ordinary values Zscore between -2 and 2
unusual values Zscore less than -2 <-2>
Normal curve table
Appendix A, detailed description of the area between Z score and the mean
Probability
method for measuring and quantifying the likelihood of obtaining a specific
sample from a specific population.
define as a fraction or a proportion.
ratio comparing frequency of occurrence / total number of possible events
in frequency distributions probability can be defined by proportions of
distribution
in graphs, can be defined as a proportion of area under the curve
Unit normal table:
lists different proportions of corresponding to each z-score location.