Statistical Data Analysis

Statistical Data Analysis

Chapter 9 - Montello and Sutton

An Introduction to Scientific

Research Methods in Geography

Overview

 Statistical data analysis

 Statistical description

 Statistical inference

 Geospatial Analysis

Data Analysis

 Set of display and mathematical techniques

 Logical and conceptual considerations

 Allows us to:

 Extract meaning from systematically collected measurements

 Communicate that meaning to others

Geographers and Data

 Geographers view data as statistical

(complex and imperfect) rather than deterministic

 Three reasons:

 Imperfect sample of larger population

 Measurement involves error

 Phenomena are expressions of complex sets of many interacting variables

Statistical Description

 Goal: summarize potentially important properties of our data using

 Parameters - summary indices to describe the population

 Properties:

 Central tendency

 Variability / dispersion

 Form / shape of distribution

 Relationships

Central Tendency

 Average or representative value

 Three most common:

 Mode - most frequent

 Median - middle value

 Mean (“average”)

Variability / Dispersion

 Tells how data points differ from the central tendency

 How representative the central tendency is

 Greater when variability is low

 Three common:

 Range - distance between high and low

 Variance - average of deviations from the mean

 Standard deviation - square root of the variance

Form / Distribution I

 Shape of entire data set

 Modality - number of local modes

 Skewness - distribution uneven

 Positive - mostly low and medium scores

 Negative - mostly medium and high scores

 Symmetry - mirror around central tendency

 Bimodal

 Unimodal normal or “bell-shaped” curve

Form / Distribution II

 Derived scores

 Describe the value of individual scores relative to the rest of the data set

 Three common:

 Rank - 1, 2, 3, etc.

 Percentile rank - percentage of the data that is less than the score in question

 z-score - standard deviation units above or below the mean of the data set

Relationships I

 Systematic (consistent) patterns of high or low values across pairs of variables

 Linear relationship - two variables form a straight line when graphed

 Positive (or direct) - high value A has high value B; low value A has low value B

 Negative (or indirect) - high value A has low value B; low value A has high value B

Relationships II

 Relationship strength - degree that patterns hold across all cases

 Correlation coefficient - square of correlation measure of relationship strength

 Regression analysis - expresses relationship as an equation that predicts the values of Y (criterion variable) as a function of X (predictor variable)

 Monotonic relationship - goes up or down; not necessarily in a straight line

Statistical Inference I

 Goal: Draw informed guesses about likely patterns in population, based on sample data evidence

 Assign probabilities to guesses

 Sampling distribution - distribution of a sample statistic based on all possible samples of a given size, from a given population

Statistical Inference II

 Assumptions:

 Distribution is normal and variances are equal

 Data values are independent

 Model specification (such as linearity, inclusive of relevant predictor constructs)

Statistical Inference III

 Two approaches:

 Estimation

 Point estimate - guess about specific parameter value

 Confidence interval - range of values distributed around the point estimate, expressed as probability

 Hypothesis Testing

 Null hypothesis ( H

0 parameter

) is about exact point of

 Alternative hypothesis ( H

A

) is that the exact point of the parameter is not the null

Statistical Inference IV

 Four possible outcomes, based on:

 Two possible truths ( H

0 is true, H

A

 Two possible decisions (reject H accept H

A

; reject both H

0 and H

A

0

) is false) and

 Two types of errors:

 Type I - reject H

0 when H

0

 Type II - fail to reject H

0 is true when H

0 is false

Geospatial Analysis

 Geography data are different :

 They are spatially distributed

 Have location, extent or size, shape, pattern, connectivity, etc.

 They represent natural and human earthsurface features and processes

 Spatiality is the focus or is central to the analysis

Spatiality

 Influences the accuracy of inferential statistical analyses of nonspatial variables

 Spatial autocorrelation exists when there are patterns of spatial dependence – places are “like” other places

 Distance decay – near things are “more like” each other than things further away

Areal Units

 Which areal units to use?

 Problems:

 Using data from continuous source, but treat with discrete spatial analysis techniques

 Politicization of unit determination (like gerrymandering )

 Modifiable Areal Unit Problem (MAUP) – effect that theoretically arbitrary areal geometries have on geographic analysis

Questions

 Why is data analysis in geography usually conceptualized in statistical (probabilistic) terms?

 What is meant by strength and form of statistical relationships?

 What is the purpose of statistical inference? Why are statistical inferences necessarily and ultimately uncertain?

 What are two types of correct decisions and two types of errors possible when hypothesis testing?

 What is spatial autocorrelation, what forms can it take, and why is it so important to geographic data analysis?

Statistical Data Analysis

Statistical Data Analysis

Overview

Data Analysis

Geographers and Data

Statistical Description

Central Tendency

Variability / Dispersion

Form / Distribution I

Form / Distribution II

Relationships I

Relationships II

Statistical Inference I

Statistical Inference II

Statistical Inference III

Statistical Inference IV

Geospatial Analysis

Spatiality

Areal Units

Questions

Related documents

Products

Support

Statistical Data Analysis