Wie Eltern ihr Erwerbsleben arrangieren und das Wohlbefinden von

advertisement

Working with EU-SILC using the hierarchical data structure, matching & aggregating data

Practical computing session I – Part 2

Heike Wirth

GESIS – Leibniz Institut für Sozialwissenschaften

DwB-Training Cource on EU-SILC , February 13-15, 2013

Romanian Social Data Archive at the Departement of Sociology

University of Bucharest, Romania

2

Introduction

• EU-SILC data has a hierarchical structure

• more than one level of analysis is possible household & individual levels are represented by separate files data are stored in multiple data files

Example of household level data

Example 1: Household record

#

Year of survey

Country HH-ID Dwelling type Total disposable

HHLD

Income

Ability to make ends meet

1

2

HB010 HB020 HB030 HH010

2010

2010

1500 2010

1501 2010

AT

AT

RO

RO

1

1

2 apartment or flat in

2 detached house detached house detached house

HY020

15,271

30,081

2,243

2,409

HS120 with great difficulty fairly easily fairly easily with difficulty

… … … … … … …

….

3

1 observation = 1 Household

Please note: HHLD-ID does not differentiate between countries

To be on the safe side use HHLD-ID with country & year of survey

Example of individual level data

Example 2: Individual data

Marital status record

#

Year of survey

Country HH-ID Person-

ID

1

2

3

4

PB010 PB020 PX030 PB030

2010

2010

AT

AT

1

1

11

12

2010

2010

AT

AT

1

1

13

14

30001 2010

30002 2010

30003 2010

RO

RO

RO

1

1

1

11

12

13

4

1 observation = 1 Person

Person-ID sequential within household

PB190 married married never married never married married married never married

Gross monthly earnings

PY0200G

3500

1400

1450

2307

Highest ISCED Level attained

PE040

(upper) secondary lower secondary

(upper) secondary lower secondary

1500

750

250

(upper) secondary lower secondary

(upper) secondary

5

Working with this kind of data, requires

• Decision on the appropriate unit of analysis for your research question, e.g.

• research interest in households or persons?

% of households /persons/men/women/children who live in poverty?

% of households with only 1 person or % of persons who live alone?

• Knowledge of procedures for manipulating the data

6

Types of Matching

• One-to-one matching

• Household Register to Household Data;

Personal Register to Personal Data

• One-to-many matching

• Household variables to Individual data

• Many-to-one matching (‘aggregation’)

• e.g. adding information from the individual data to the household data

7

EU-SILC – Types of matching n:1

1:n

Household-

Register File (D)

1:1

Household-

Data File (H) n:1

1:n

Personal-

Register File (R)

1:1

Personal-

Data File (P)

Linking EU-SILC files (cross-sectional)

• Key variables provide links between the related records

• between household files between individual files between household and individual files

• Key variables (depending on the files) are

household id (DB030; HB030; RX030; PX030)

personal id (RB030; PB030)

8

• to be on the safe side: Use key variables always with

• ‘year of survey’ (DB010; HB010; RB010; PB010) &

country’ (DB020; HB020; RB020; PB020)

9

Example 1: one-to-one

• Attach household register information (D-File) to household data file (H-File)

• e.g. ‘Degree of urbanisation’ (DB100) is only included in the household register, it might be of use having this information in the household data, too.

One-to-One Match, e.g. household information

Household Register ( separate file)

DB010 DB020 DB030

2010 AT 2

2010

2010

AT

AT

12

13

2010

2010

2010

AT

AT

AT

19

26

59

DB075

3

2

3

2

3

4

(…)

(…)

(…)

(…)

(…)

(…)

(…)

DB100 intermediate area thinly populated area thinly populated area thinly populated area thinly populated area densely populated area

Household Data (separate file)

HB010 HB020 HB030

2010

2010

2010

2010

2010

2010

AT

AT

AT

AT

AT

AT

2

12

13

19

26

59

10

HS090 HS120 (…) HX060 no - cannot afford with great difficulty (…) yes with difficulty

One person household

(…) Other hhlds without dep. children no - other reason yes yes yes fairly easily fairly easily

(…) One person household

(…) Other hhlds without dep. children easily (…) Other hhlds without dep. children with some difficulty (…) One person household

Result: Combined Household File

Household Data (combined file)

HB010 HB020 HB030

2010 AT 2

11

2010

2010

2010

2010

2010

AT

AT

AT

AT

AT

12

13

19

26

59

HS090 no - cannot afford yes no - other reason yes yes yes

HS120 with great difficulty with difficulty fairly easily fairly easily easily with some difficulty

(…)

(…)

(…)

(…)

(…)

(…)

(…)

HX060

One person household

DB100 intermediate area

Other households without dependent children

One person household thinly populated area thinly populated area

Other households without dependent children thinly populated area

Other households without dependent children

One person household thinly populated area densely populated area

12

Example 2: one-to-many

• Attach household register information (D-File) to personal data file (P-File)

• Attach ‘Degree of urbanisation’ (again) to the personal data file

13

Attaching household data to personal data (1:n)

Household Register ( separate file)

DB010

2010

2010

DB020

AT

AT

DB030

2

12

DB075

3

2

(…)

(…)

(…)

DB100 intermediate area thinly populated area

2010 AT 26 3 (…) thinly populated area

Personal Data (combined)

PB010 PB020

2010

2010

2010

AT

AT

AT

PX30 PB030

2

12

12

201

1201

1202

PH010 fair fair fair

PH020 PH030 PX020 DB100 yes yes, limited 71 intermediate area no no, not limited 32 thinly populated area yes yes, limited 31 thinly populated area

2010

2010

(…)

AT

AT

12 1203 good

12 1204 fair no no, not limited 30 thinly populated area no no, not limited 26 thinly populated area

14

Example 3: many-to-one

• e.g. number of persons in a households who are

• unemployed, full-time employed self-employed?

• such information is not included in the data

=> own computation

Matching: many-to-one (summarizing information)

Personal Data

PB010 PB020 PX30 PB030

2010 AT 2 201

2010

2010

2010

AT

AT

AT

12

12

12

1201

1202

1203

2010

(…)

AT 12 1204

PL031

Unemployed (5)

Empl. full time (1)

Emp. full time (1)

Emp. part time (2)

Self-employed (3)

Summarized variables

# unempl

1

0

0

0

# employed full time

0

2

2

2

# self employed

0

1

1

1

0 2 1

Household Data( combined file)

HB010 HB020 HB030 # unempl # employed # self employed

2010

2010

AT

AT

2

12

1

0

0

2

0

1

2010 AT 26 ..

15

Hands on – matching 1:1

• Attach ‘Degree of Urbanisation’ (DB100) to household data file (H-File)

• Open the EU-SILC training dataset – D-File *.

• Check the variables you are interested in .

• Sort your data according to key variables used für linkage *.

• Names of key variables in files to be matched must identical

=> Create new key variables (ID010, ID020, ID_HH) in such a way that

DB010 = ID010

DB020 = ID020

DB030 = ID_HH

• Create a new file with only the key variables & the variable(s) you are interested in

16

• name the new file DB100.sav

SPSS–Matching: one-to-one

• **** Before you start ************.

* specify the path where the EU-SILC training dataset is stored.

FILE HANDLE data_path / NAME='H:\wirth\DWB_TRAINING\SILC\DATA\'.

* specify the path where you want to save your data.

FILE HANDLE mydata_path /NAME='H:\wirth\DWB_TRAINING\SILC\EXERCISE_1\'.

open the EU-SILC training dataset – D-File *.

GET FILE='data_path/udb_c10d_silc_course.sav'.

* check the variables you are interested in .

17 cross DB020 by DB100.

SPSS–Matching: one-to-one

* open the EU-SILC training dataset – D-File *.

GET FILE='data_path/udb_c10d_silc_course.sav'.

* check the variables you are interested in .

cross DB020 by DB100.

* Step 1- Sort your data according to key variables used für linkage *.

sort cases by DB010 DB020 DB030.

* Step 2 - Names of key variables in files to be matched must identical *. rename variables (DB010 DB020 DB030 = ID010 ID020 ID_HH).

* create a new file with the key variables & the variable(s) you are interested in *.

18 save outfile = 'mydata_path/DB100.sav'

/keep ID010 ID020 ID_HH DB100.

SPSS–Matching: one-to-one

GET FILE='data_path/udb_c10H_silc_course.sav'.

sort cases HB010 HB020 HB030.

* Key – Variables *.

* either rename (like before) or better generate a new variable *

STRING ID020 (A2).

compute ID010 = HB010.

compute ID020 = HB020.

compute ID_HH = HB030.

MATCH FILES FILE= *

/file ='mydata_path/DB100.sav'

/BY ID010 ID020 ID_HH.

execute.

19

* check whether it worked.

cross HB020 by DB100.

SPSS–Matching: One-to-many Match (1:n)

Example 2: Combing household and personal data

E.g. ‘Degree of Urbanisation’ (DB100) to personal data.

GET FILE='data_path/udb_c10p_silc_course.sav'.

* Sort key variables used für linkage *.

sort cases by PB010 PB020 PX030.

* PB020 = string variable - create a new string variable ID020 /or use the rename command *

20

STRING ID020 (A2).

compute ID010 = PB010.

compute ID020 = PB020.

compute ID_HH = PX030.

21

SPSS–Matching: One-to-many Match (1:n)

MATCH FILES FILE= *

/table = 'mydata_path/DB100.sav'

/BY ID010 ID020 ID_HH.

execute.

* Check whether it worked *.

cross pb020 by db100.

save outfile = 'mydata_path/personal_data.sav'.

22

Matching: many-to-one (n : 1)

• Create new summary variables for personal data (P-File)

• number of persons living in the same household number of unemployed persons living in a household number of full-time employed persons living in a household number of part-time employed persons living in a household number of self-employed persons living in a household sum of ‘pensions from individual private plans (PY080G)

• *********************************************************.

• * many-to-one (n:1)

• * Personal Data

• * example 1

• * number of persons living in the same household

• * number of unemployed persons living in a household

• *********************************************************.

• * specify the path where the EU-SILC training dataset is stored.

• FILE HANDLE data_path / NAME='H:\wirth\DWB_TRAINING\SILC\DATA\'.

• * specify the path where you want to save your data.

• FILE HANDLE mydata_path / NAME='H:\wirth\DWB_TRAINING\SILC\EXERCISE_1\'.

• * open the EU-SILC training dataset.

• GET FILE='data_path/udb_c10p_silc_course.sav'.

23

Download