# Powerpoint slides

## Large Scale Datasets

Dr. Joni M. Lakin

Dr. Margaret Ross

Dr. Yi Han

### Presentation Files Are Available:

http://www.auburn.edu/~jml0035/

(Under “Conference materials and resources” at the bottom of the page)

### Opening questions

How many of you primarily use SPSS for data analysis?

How many are comfortable with using syntax (in SPSS or other programs)?

How many already have plans to use a specific dataset?

How many just curious about what’s available?

Dr. Yi Han

NCES

PISA PIAAC

### Started

Dr. Margaret Ross

See PDFs

Dr. Joni Lakin

### Key issues

1.

2.

3.

Statistical weighting in SPSS

Practical significance and large samples

Matrix sampling

4.

Plausible values

SPSS skills that make working with large datasets easier:

5.

6.

Keeping and managing syntax

Merging datasets

7.

8.

Checking for duplicate cases

Missing data imputation

### 1. Statistical weighting in SPSS

Weights allow us to better approximate the full population

If African American students are 18% of population but 9% of my sample, I could weight each AA student 2.0 (so each observation is included twice in analyses) to get results that better reflect population-level effects.

Types of weights

Scale weights = multiplies observations to create a weighted sample of same size as population

Proportional weights = may be below 1 to keep overall sample size the same as the sample

Note

When you’re reporting results, you

can

report weighted sample size, but you should also report unweighted sample sizes too

### Using weights

These “weight” values are already in large datasets

ELS:2002 Race

UNWEIGHTED

Asian, Hawaii/Pac. Islander

Black or African American

Hispanic, no race specified

Hispanic, race specified

More than one race

White, non-Hispanic

Total

Amer.

Native

1%

Freq.

130

1460

2020

996

1221

735

8682

16197

Asian,

Hawaii/Pac.

Islander

10%

Black or African

American

13%

%

.8

9.0

12.5

6.1

7.5

4.5

53.6

100.0

White, non-

Hispanic

57%

Hispanic, no race specified

6%

Hispanic, race specified

8%

More than one race

5%

ELS:2002 Race

WEIGHTED

Asian, Hawaii/Pac. Islander

Black or African American

Hispanic, no race specified

Hispanic, race specified

More than one race

White, non-Hispanic

Total

Freq.

32781

142518

491321

243607

298648

147896

2054103

3410873

Amer.

Native

1%

White, non-

Hispanic

60%

%

1.0

4.2

14.4

7.1

8.8

4.3

60.2

100.0

Asian,

Hawaii/Pac.

Islander

4%

Black or

African

American

15%

Hispanic, no race specified

7%

Hispanic, race specified

9%

More than one race

4%

### 2. Practical significance and large datasets

Because of large sample size, many negligible effects

(and ALL correlations) will be significant

Must consider effect sizes and practical significance

ELS:2002 variables

Math test score

Mathematics self-efficacy

English self-efficacy scale

Independent Samples

Test

t

8.71

df Sig.

8593 <.001

-4.14

8593 <.001

14.65 8593 <.001

-2.19

8593 .029

### Practical significance and large datasets

Actually negligible differences for reading and small differences for math

ELS:2002 variables

Independent Samples

Test

Math test score

Mathematics self-efficacy

English self-efficacy scale t

8.71

-4.14

14.65

-2.19

df

8593

8593

8593

8593

Sig.

<.001

<.001

<.001

.029

Cohen’s

d

0.19

-0.09

0.32

-0.05

### 3. Matrix sampling (be aware of…)

Used in large-scale assessments when

Large domain being sampled (e.g., world history)

Need to cover many topics in limited time

Individual estimates of the constructs are less important than aggregate estimates (state level achievement)

Usually requires IRT (item response theory) scoring methods to allow for comparable scores across examinees completing different items

Table from von Davier et al., http://www.ierinstitute.org/fileadmin/Documents/IERI_Monograph/IERI_Monograph_Volume_02_Chapter_01.pdf

### 4. Plausible values

Can result from matrix sampling (with IRT models), bootstrapping, and missing data imputation

In matrix sampling, individual estimates of skills are less reliable and plausible values better capture this error variance compared to single scores

Results in multiple estimates

of the student’s true score on the construct (will appear as multiple variables)

Poor practice = averaging plausible values before analysis

Produces biased estimates

(von Davier et al., see notes)

Better practice = using methods that analyze the different estimates together and produce standard error bars

Refer to von Davier et al. link in notes

### 5. Keeping and managing syntax

From any command window, can select “Paste”

Makes sure analyses start with the same data selections:

Sample weights, split files, selecting relevant cases

Good for keeping record of computed and recoded variables

### 6. Merging datasets

Add cases = add more participants’ data

Add variables = add variables for same participants from another dataset

### Merging datasets--Adding variables

Have to exclude duplicate variables from one dataset

Check that values are really identical (if not, change variable name)

Use Key Variables to match cases

### Duplicate cases output

Will appear as a new variable “PrimaryLast”

Will need to decide how to handle on case-by-case basis

Merging datasets incorrectly can result in duplicates

If variables are identical, delete one

If variables are different, check that identification variables are correct

### 8. Missing data

Methods that bias results:

Mean substitution, listwise or pairwise deletion

Methods that can provide less biased estimates

Single imputation regression (better than above, but restricts variability)

Expectation-maximization (EM)

—best of SPSS options, works well when data is missing at random

Analyze

Missing Value Analysis

Be sure to read up on “missing completely at random, missing at random”, and “missing not at random”

Dr. Lakin

### Dissertation Grants

“The program seeks to stimulate research on U.S. education issues using data from the large-scale, national and international data sets supported by the National Center for Education Statistics (NCES), NSF, and other federal agencies, and to increase the number of education researchers using these data sets

.”

Suggestions based on personal observations and the RFP:

Must use a

strong quasi-experimental design

(

Schneider et al.,

Estimating Causal Effects: Using Experimental and Observational Designs

)

Regression discontinuity, propensity score matching, etc.

Bringing in new quantitative approaches for other fields also very appealing (economics, epidemiology, etc.)

Check past grants to

see which datasets are “neglected”

(more recent datasets better)

Prefer ideas that involve more successful

multiple datasets

in meaningful research are

Analyses of recently

international datasets

have been more successful

### Other opportunities

IES Research Grants do fund secondary data analyses with

Exploration grant goals (any subject area) http://ies.ed.gov/funding/

IES data training workshops http://ies.ed.gov/whatsnew/conferences/?cid=2

AERA annual meeting usually has data training events:

PDC02: Analyzing

NAEP

Assessment Data with Plausible

Values…

PDC13: Advanced Analysis using

Adult International Large Scale

Assessment Databases

PDC16: Using

NAEP

Data on the Web for Educational Policy Research

Several on quantitative methods (including propensity scores)

AERA Institute on Statistical Analysis for Education Policy

(summer)

IES/NCES hosts STATS-DC conferences and summer institutes to train researchers in using specific datasets

### Q&A

Presentation files are available from http://www.auburn.edu/~jml0035/

(Under “Conference materials and resources”)