Chapter 9 - Montello and Sutton
An Introduction to Scientific
Research Methods in Geography
Statistical data analysis
Statistical description
Statistical inference
Geospatial Analysis
Set of display and mathematical techniques
Logical and conceptual considerations
Allows us to:
Extract meaning from systematically collected measurements
Communicate that meaning to others
Geographers view data as statistical
(complex and imperfect) rather than deterministic
Three reasons:
Imperfect sample of larger population
Measurement involves error
Phenomena are expressions of complex sets of many interacting variables
Goal: summarize potentially important properties of our data using
Parameters - summary indices to describe the population
Properties:
Central tendency
Variability / dispersion
Form / shape of distribution
Relationships
Average or representative value
Three most common:
Mode - most frequent
Median - middle value
Mean (“average”)
Tells how data points differ from the central tendency
How representative the central tendency is
Greater when variability is low
Three common:
Range - distance between high and low
Variance - average of deviations from the mean
Standard deviation - square root of the variance
Shape of entire data set
Modality - number of local modes
Skewness - distribution uneven
Positive - mostly low and medium scores
Negative - mostly medium and high scores
Symmetry - mirror around central tendency
Bimodal
Unimodal normal or “bell-shaped” curve
Derived scores
Describe the value of individual scores relative to the rest of the data set
Three common:
Rank - 1, 2, 3, etc.
Percentile rank - percentage of the data that is less than the score in question
z-score - standard deviation units above or below the mean of the data set
Systematic (consistent) patterns of high or low values across pairs of variables
Linear relationship - two variables form a straight line when graphed
Positive (or direct) - high value A has high value B; low value A has low value B
Negative (or indirect) - high value A has low value B; low value A has high value B
Relationship strength - degree that patterns hold across all cases
Correlation coefficient - square of correlation measure of relationship strength
Regression analysis - expresses relationship as an equation that predicts the values of Y (criterion variable) as a function of X (predictor variable)
Monotonic relationship - goes up or down; not necessarily in a straight line
Goal: Draw informed guesses about likely patterns in population, based on sample data evidence
Assign probabilities to guesses
Sampling distribution - distribution of a sample statistic based on all possible samples of a given size, from a given population
Assumptions:
Distribution is normal and variances are equal
Data values are independent
Model specification (such as linearity, inclusive of relevant predictor constructs)
Two approaches:
Estimation
Point estimate - guess about specific parameter value
Confidence interval - range of values distributed around the point estimate, expressed as probability
Hypothesis Testing
Null hypothesis ( H
0 parameter
) is about exact point of
Alternative hypothesis ( H
A
) is that the exact point of the parameter is not the null
Four possible outcomes, based on:
Two possible truths ( H
0 is true, H
A
Two possible decisions (reject H accept H
A
; reject both H
0 and H
A
0
) is false) and
Two types of errors:
Type I - reject H
0 when H
0
Type II - fail to reject H
0 is true when H
0 is false
Geography data are different :
They are spatially distributed
Have location, extent or size, shape, pattern, connectivity, etc.
They represent natural and human earthsurface features and processes
Spatiality is the focus or is central to the analysis
Influences the accuracy of inferential statistical analyses of nonspatial variables
Spatial autocorrelation exists when there are patterns of spatial dependence – places are “like” other places
Distance decay – near things are “more like” each other than things further away
Which areal units to use?
Problems:
Using data from continuous source, but treat with discrete spatial analysis techniques
Politicization of unit determination (like gerrymandering )
Modifiable Areal Unit Problem (MAUP) – effect that theoretically arbitrary areal geometries have on geographic analysis
Why is data analysis in geography usually conceptualized in statistical (probabilistic) terms?
What is meant by strength and form of statistical relationships?
What is the purpose of statistical inference? Why are statistical inferences necessarily and ultimately uncertain?
What are two types of correct decisions and two types of errors possible when hypothesis testing?
What is spatial autocorrelation, what forms can it take, and why is it so important to geographic data analysis?