Restructuring longitudinal data

advertisement

A guide to the unknown…

A dataset is longitudinal if it tracks the same type of information on the same subjects at multiple points in time or space. For example, part of a longitudinal dataset could contain specific students and their standardized test scores in six successive years.

One type of Longitudinal data is also known as

“Panel data” and is data from a (usually small) number of observations over time on a (usually large) number of cross-sectional units like individuals, households, firms, or governments.

Subset of hierarchical data — observations that are correlated because there is some tie to same unit.

E.g. in educational studies, where we observe student i in school u. Presumably there is some tie between the observations in the same school.

In such data, observe y j,u where u indicates a unit and j indicates the j’th observation drawn from that unit. Thus no relationship between y j,u and y j,u’ even though they have the same first subscript.

In true longitudinal data, t represents comparable time.

One approach to working with longitudinal data sets is to restructure the data set- either going from one observation per subject to several or vice versa. For example, you may have several diagnosis codes in a single observation (visit) and want to compute frequencies of each possible diagnosis code. To do this, you will find it more convenient to have one observation for each diagnosis code, resulting in possibly several observations per subject.

Data structure analysis includes making sure that all the components of the data structures are closely related, that closely related data are not in separate structures, and that the best type of data structure is being used. The data may be a lot easier to manage and understand when it is a representation which tries to abstract its relevant similarities.

Often, in data warehouses, data restructuring involves changing some aspects of the way wherein the database is logically or physically arranged.

There are generally four types of data restructuring operations namely:

Trimming

Flattening

Stretching

Grafting

In trimming, the extracted data from the input is placed in the output without having to change any of the change in the hierarchical relationships but some unwanted components of the data removed.

In flattening, the operation produced a form from a structure branch of an input by extracting all information at the level of the values of the basic attributes of the branch.

The stretching operating can produce a data structure output which has hierarchical levels than the input.

Finally, a grafting operating involves combining two hierarchies horizontally to form a wider hierarchy by matching common values.

In SPSS you go to data/restructure. This allows you to restructure your data from multiple variables(columns) in a single case to groups of related cases(rows) or vice versa, or you can choose to transpose your data.

SPSS SYNTAX:

VARSTOCASES

/ID=id

/MAKE trans1 FROM VAR00001 VAR00002 VAR00003 VAR00004

/INDEX=Index1(4)

/KEEP=

/NULL=KEEP.

You can create observations using an array staement and a do loop or you can simply transpose the existing data.

data neonatal; infile 'F:\Thesis Docs\Data\neonatal.txt' delimiter='09'x truncover dsd missover obs=104; input location $ _1990_ _1991_ _1992_ _1993_ _1994_ _1995_ _1996_ _1997_ _1998_ _1999_ _2000_ _2001_ _2002_

_2003_ _2004_ _2005_ _2006_ _2007_; run; proc sort data=neonatal; by location; run; proc transpose data=neonatal out=neonatal2 name=year prefix=neonatal; by location; var _1990_ _1991_ _1992_ _1993_ _1994_ _1995_ _1996_ _1997_ _1998_ _1999_ _2000_ _2001_ _2002_

_2003_ _2004_ _2005_ _2006_ _2007_; run; data neonatal3 (drop=neonatal2); set neonatal2; run; proc print data=neonatal3 noobs; run;

Restructuring is fun!

Download