paper A - Life courses in context

advertisement
Putting People into Past Places:
Geographical information and British
longitudinal microdata
Humphrey Southall
(Department of Geography, University of
Portsmouth/Great Britain Historical GIS)
Introduction
This paper has two linked aims. Firstly, to discuss the potential for linking longitudinal
microdatasets – basically, individual life histories – with aggregate historical data for
localities. Secondly, to present an alternative methodology for managing large scale
aggregate datasets which differs strikingly from traditional historical applications of
Geographical Information Systems technology. The two goals are linked by an analysis
linking the Office of National Statistics Longitudinal Study (LS), which brings together
information about particular people from the 1971, 1981 and 1991 censuses, with
information about the areas these people were living in as children in the 1930s.
The paper begins with a brief survey of available longitudinal datasets. These are
arguably the most important datasets created by British empirical social science, and are
certainly the most expensive; but they are relatively little known among non-UK
historical demographers. They are mainly based on direct contact with the subjects and
are limited to the last sixty years, but the paper briefly touches on similar datasets from
earlier periods, constructed from documentary sources.
The next section briefly presents the results of an analysis, already published,
combining the LS with 1931 census data held in the Great Britain Historical GIS.
However, its main focus is a discussion of the method behind that analysis. The original
plan was to use vector overlay methods based on a detailed boundary GIS to convert
between the very different local government geographies of 1931 and 1939. Delays in
completion of the GIS necessitated development of an alternative set of methods working
more directly with the textual boundary change information provided by the census; but
this other methodology proved anyway superior.
The third section of the paper briefly presents the new architecture developed for the
“Great Britain Historical GIS” since 2001, heavily influenced by our experience with the
inter-war analysis. The quotation marks are necessary because the new system is not
necessarily a GIS. The conclusion argues that traditional GIS methods are poorly suited
to the needs of historical demographers because we generally lack both the data they
require and the funding.
Introducing British longitudinal microdata
Traditionally, most empirical social science has been based on cross-sectional surveys.
Historically, censuses are the most important of our surveys, but increasingly sample
surveys are preferred because they can ask more questions and be carried out more often.
In either case, even when surveys are repeated for the same populations, no linkage is
made between repeated answers from the same respondents. Meanwhile, economists
-1-
have developed very different methodologies based on time series, but generally for
highly aggregated measures lacking geography.
Repeatedly gathering information from the same individuals poses many additional
challenges, not least the need to keep the survey team itself in existence for many years.
Unsurprisingly, the initial background of the earliest studies was medical and the
immediate issues were the cost of childbirth and the quality of associated health care. The
National Survey of Health and Development (NSHD) surveyed all sixteen and a half thousand
births in England, Wales and Scotland during a specific week in March 1946. However, a
representative sample (5,362) of this population was then chosen for follow-up and has now
been studied 21 times. During childhood interviews were at annual or bi-annual intervals (at
ages 2, 4, 6, 7, 8, 9, 10, 11, 13 and 15). Further interviews of the full cohort then took place
throughout adulthood (at 19, 20 22, 23, 25, 26, 29, 31, 36, 43, and 53). A further follow-up at
age 60 is planned in 2006.
There are three further major birth cohort studies in the UK. The National Child
Development Study (NCDS) originally concerned the social and obstetric factors associated
with stillbirth and death in early infancy, following the lives of 17,634 children born in Great
Britain during one week of March 1958. The 1970 British Cohort Study (BCS70) covered the
births and families of all 17,634 babies born in the UK in a single week in April 1970. The
Millennium Cohort Study (MCS) began in 2000/2001, with an initial sample of 18,553 babies
born in the UK, deliberately over-representing babies born in deprived areas and areas with
high proportions of ethnic minorities. Each study has carried out further rounds of
interviewing, broadening study goals. Systematic information on the birth cohort studies is
tabulated in table??, setting out the age at which follow-ups have been conducted, the years
when follow-ups took place, and the achieved sample sizes amongst the original birth cohort
members.
Placing the birth cohort studies, and especially the 1946 study, in geographical context
currently looms large in the likely future work of the GB Historical GIS. However, this paper
is more concerned with a rather different longitudinal data set, the Office of National
Statistics Longitudinal Study (LS). This is a 1 percent representative sample of the
population of England and Wales drawn initially from the 1971 Census. The sample has been
followed up subsequently by linking information about the same individuals from the 1971,
1981 and 1991 censuses. Vital events (parenthood, death) since 1971 are also included (Fox
and Goldblatt, 1982; Britton, 1990). Because the LS data cover a large number of living
persons and were constructed without their individual consent, access is severely restricted:
firstly, each specific analysis must be approved by the LS Board; secondly, academic
researchers have no direct access to the data, instead requesting cross-tabulations; thirdly,
those cross-tabulations must be sufficiently general that cell counts of one never appear.
Unlike the birth cohort studies, the LS already contains many elderly people with declining
health. In particular, we know whether individuals died between 1981 and 1991 (the research
described here was completed before 2001 census data were added to the LS). We also know
whether individuals covered by the 1991 census experienced limiting long-term illness.
However, we clearly need information over a much longer period than 1971 to 1991 if we are
to understand the life-course as a whole, and in particular the relationship between childhood
environments and health in later life; a longer period is also needed, of course, before any
analysis of the LS could be described as ‘historical’.
-2-
This proved possible due to a fortunate accident. Individuals within the LS are identified
by their National Health Service number, the NHS Register being as close as the UK comes
to having a population register. If an individual was living in Britain on September 29th,
1939, and – a non-trivial additional requirement – they did not subsequently serve in the
armed forces, their NHS number identifies the specific local government district they were
resident in on that date. One irritating limitation of British census data is that a census has
been taken every ten years between 1801 and 2001, except 1941. A 1941 census was never
taken due to World War II, but planning for it had begun. Given the worsening political
outlook and the expectation that war would mean an unprecedented mobilisation of the
civilian population, 1941 census planning also covered a simplified population enumeration,
limited to gender and age, to be carried out on the outbreak of war. It was this National
Registration which took place on September 29th. Unfortunately, as further discussed below,
this was just after a large scale evacuation of children from many cities, especially London.
The hidden meaning of the LS ID numbers was discovered by Strachan et al (1995), and
used as the basis for an analysis of the relative importance of region of origin in 1939 and
area of residence in 1971 for ischaemic heart disease and stroke. Our aim was to go
substantially further, adding systematic statistics from those 1930s to characterise those 1939
districts, and relate recent health experience to specific risk factors in childhood. The next
section explains in detail how this was done, and what we learnt about ‘historical GIS’ in the
process. It also presents a brief summary of the results, which have already been published
elsewhere (Curtis et al, 2003). The remainder of this section is a more general discussion of
the potentials for bringing together longitudinal microdata and aggregate historical statistics,
still limited to UK examples.
Longitudinal data is historical by definition, but the examples given so far are perhaps
insufficiently historical for many historical demographers. If such datasets can be constructed
only through repeated direct contact between trained social researchers and the subjects,
datasets going significantly further back than the 1946 birth cohort will be hard to find; and
this of course means that longitudinal datasets covering all their subjects from birth to death
will not exist. However, they can be constructed retrospectively from documentary sources.
In this sense, some of the best known datasets in historical demography are longitudinal
microdata, such as all the work of the Cambridge Group and their associates on family
reconstitution from parish registers and Hollingsworth (1964) on the longevity of British
aristocrats. Both these examples have well-known limitations: family reconstitutions
inevitably exclude the geographically mobile, seriously biasing results (Ruggles, 1992), as do
most studies based on tracing individuals and families through successive census enumerators
books; aristocrats and other prominent individuals can be tracked between localities, but are
inherently unrepresentative.
In other countries, such as Belgium (Alter, 1988) and Sweden (Hoppe and Langton, 1990),
population registers and other government records enable individuals to be followed over
their lives, as they move around the country. Such official data does not exist in the UK, but
there are other ways forward. Firstly, some historical sources themselves assembled
retrospective data on individuals. Indents for convicts transported to Australia (Nicholas and
Shergold, 1987) documented their past movements within England, while Garrett et al (2001)
used the data recorded by the 1911 census on what children women had previously given
birth to, and what had happened to those absent from the household in 1911, to analyse the
individual-level determinants of infant mortality. Secondly, administrative records do enable
quite detailed reconstructions of the lives of particular, inevitably never a random sample
from the overall population but maybe more representative than Hollingsworth’s aristocrats.
-3-
My own contribution here is a reconstruction of the lives of early 19 th century engineering
workers, and especially their mobility and health histories, from trade union records
(Southall, 1991a and b; Southall and Garrett, 1991). Lastly, Pooley and Turnbull (1998)
assembled the life histories assembled by family historians tracing their ancestors as they
moved between localities, mostly using the same sources as historical demographers but
willing to search through enormous volumes of material for isolated references.
However they are constructed life histories require geographical context, and evidence
from within the life histories is often misleading. For example, the autobiographies of
tramping artisans emphasised 'travelling' as a lifecycle stage, between the completion of
apprenticeships and marriage, but statistical evidence showed much movement by older men
in response to the chronology and geography of the trade cycle (Southall, 1991a). Perhaps
the least convincing aspect of Pooley and Turnbull's analysis is their reliance on, arguably,
undocumented assertions by family historians about the reasons why their ancestors moved.
Strachan's analysis of the LS could not go deeper than 1930s regions because it lacked
systematic independent evidence on 1930s geographies.
There are broader theoretical frameworks for combining longitudinal microdata with areabased contextual evidence. Firstly, during the 1970s and 1980s, time geographers sought not
only to add a third, temporal, dimension to quantitative geography but also place individual
space-time trajectories within broader social processes (Carlstein et al, 1978; Parkes and
Thrift, 1980). Secondly, these ideas heavily influenced Anthony Giddens work on
structuration, which sought to link abstract problems of social theory to developments in the
empirical methodologies of the social sciences (Giddens, 1984, esp. ch. 3: 'Time, space and
regionalisation'). It is arguable that the relatively limited impact of these ideas on empirical
social research is precisely the consequence of our limited technical tools for organising
datasets which combine individual and locality-level data over long periods of time.
Tracking individuals who are not simply static components within localities but active agents
moving within and modifying whole systems of localities is hard. One basic argument of this
paper is that large scale historical geo-demographic data will not make much of a contribution
until we abandon borrowed technologies originally designed to track the geographicallydispersed assets of utility companies, and suchlike. About the only interesting feature of an
individual electricity pylon is its particular location in Cartesian space, but this is not true of
either individuals or localities.
Contextualising the life course: linking the LS to the 1931 census
The latter parts of the previous section emphasized studies providing evidence on
migration partly because the commonest kind of longitudinal microdatasets created by
British historical demographers, family reconstitutions via nominal record linkage from
parish registers necessarily neglect both migrants and migration as a process. However,
migration is precisely why we need analyses combing individual and locality-level data
over long periods. One very specific example of this need is in attempts to understand the
geography of health, and the impact of environmental factors on individual health. It is a
commonplace that people in the old industrial regions of Britain have worse health,
measured through both mortality and morbidity, than the country as a whole. One
explanation of this, however, is that these differentials are wholly or mainly the
consequence of migration: over most of the twentieth century, heavy industry employed
declining numbers of people, so the areas in which they were based experienced large
scale net out-migration. Unsurprisingly, the healthier individuals were the more likely
-4-
they were to move away, and this in itself would mean that the residual population of the
industrial areas would exhibit poorer health.
Obviously, if we are to measure the effects of the underlying environments on health
we must somehow include in our analysis individuals who grew up in the industrial areas
and then moved away. This question was the main focus of one part of a project funded
by the Economic and Social Research Council’s Health Variations Programme. Other
parts of that project used data held by the Great Britain Historical GIS and database to
analyse long-run area-level trends, from the 1920s to the present, relating annual districtlevel infant mortality data to socio-economic variables from the census, but here we used
all that historical data as potential explanations for health outcomes as recorded within the
ONS Longitudinal Study. This work had clear relevance to public policy: children born
between 1925 and 1935, growing up during the worst years of the inter-war recession, are
now aged between 70 and 80. The recession had a very clear geographical focus,
especially in its later years, so could we measure the specific burden these “children of
the Great Depression” place on the modern health service?
The major practical problem with this analysis was that, as explained above, the NHS
numbers within the LS told us what local government district subjects were living in at
the time of the National Registration in 1939, but the only personal characteristics
recorded and reported on but the National Registration were gender and age. For
information on socio-economic attributes of area of residence, we must turn to the 1931
census. This is problematic because of changes in the geographical boundaries used to
define administrative areas during the 1930s. Although the basic architecture of local
government remained constant between 1931 and 1939, consisting of County Boroughs,
Municipal Boroughs (all urban units), Rural Districts and London Boroughs, the detailed
geography of local government was greatly changed through a rolling programme of
county reviews: the 1931 census reports on 1,800 LGDs, but this had been reduced to
1,472 in 1939. Further, many of the districts which were not abolished were altered
through boundary changes: 289 (19.6%) of the 1939 LGDs were new creations or had
been affected by boundary changes. A set of supplementary census reports was issued,
re-tabulating 1931 data for the units that existed after the county reviews, but
unfortunately these were again limited to gender and age.
Some method was therefore needed to re-cast 1931 data to 1939 units based only on the
published information available to us. Our original plan was to use our changing
boundary GIS to redistrict from 1931 to 1939, estimating what proportion of each 1931
district’s data should be assigned to each 1939 district using more detailed parish-level
population statistics and the assumption that population was evenly distributed over each
parish: by overlaying vector boundaries for each administrative geography, we could
then calculate a set of conversion weights. Unfortunately, technical problems and in
particular the enormous problems identifying inconsistencies in our arc-based GIS
(Gregory and Southall, 1998) meant that both our local government district-level and
parish-level changing boundary GISs were not ready for analytic use even at the start of
2001, and we had to meet the overall timetable of the Health Variations Programme. The
LGD-level changing boundary GIS has since been completed but the much larger parishlevel system had to be abandoned. We eventually completed a set of conventional static
boundary GIS coverages for each census, and are now working towards using these to
create a full time-variant system within the new polygon-based spatial database described
below.
-5-
We had therefore to find some other method for converting 1931 census data to the
1939 geography without using the non-existent GIS. The method we actually used was
based entirely on textual evidence from various official reports: the 1931 census reports
obviously list the 1931 population of each LGD as it was defined in 1931; but the 1939
National Registration report also lists the 1931 population of each LGD as defined in
1939; and, crucially the lists of boundary changes in the Registrar General’s annual
Statistical Reviews gave the 1931 populations of the areas transferred, as well as the
names of the losing and gaining districts. Very fortunately, those lists of changes had
been computerised much earlier in the life of the project, simply to provide contextual
evidence in the construction of the changing boundary GIS.
A geography conversion table was therefore constructed from the 1931 and 1939
reports plus the 1,805 boundary changes listed for the intervening period, using 1931
populations rather than geographical areas. By linking this table to 1931 census data, we
can cut data for the 1,800 1931 districts up into 2,916 fragments, and then reassemble
them into the 1,472 1939 units. The table was very carefully cross-checked by comparing
the 1931 populations of 1939 units computed by applying the boundary change
information to the 1931 census figures, with the 1931 populations listed in the 1939
report. This method avoids any problematic assumptions about population density.
However, we are still assuming that the population of a district had uniform socioeconomic characteristics; for example, that there would have been an equal proportion of
unemployed workers in the middle of a town and on the rural fringe. However, boundary
changes by which part of an urban area was transferred to the surrounding rural district
were very unlikely: most changes consisted of either part of a rural area being transferred
into an urban unit, or very small urban units being abolished through merger with an
adjacent urban area.
[The final version will go into substantially more detail here, in particular discussing
the relative accuracy of redistricting using area-based weights derived from the parishlevel GIS and using a population weighted geography conversion table directly derived
from the census boundary change data. The basic conclusion is that population-based
weights are far more accurate, and that in urban areas quite small inaccuracies in
boundaries within the GIS can lead to large numbers of people being incorrectly reassigned; for example, across the Mersey between Liverpool and Birkenhead, areas never
linked in any pre-1974 administrative geography.]
Conditions in the area of residence during childhood appear to have had a measurable
association with health outcomes later in life, even after allowing for more recent
circumstances. Our results show that those that had lived in areas classified in 1934 as
‘depressed areas’ had a relative risk of mortality, or of illness reporting, which was 1415% higher than those who had not been registered as living in such areas in the 1939.
These ‘depressed areas’ were mainly in the north of England (and in Wales) and they had
particularly high levels of unemployment during the 1930s. They include mining areas in
regions such as the north east of England, which also had high levels of unemployment
(especially for men) in the 1990s and have been shown to have particularly high
prevalence of long term illness reported in the 1991 census. Some authors (e.g. Haynes et
al, 1997) have suggested that this may have been affected by local labour market
conditions in the late 1980s and early 1990s and that in areas of high unemployment,
people were more likely to declare themselves to have long term illnesses preventing
them from working. Our results certainly support the view that men who were
unemployed in 1981 had relatively poor health outcomes by 1991.
-6-
The effects of unemployment on health outcomes appear to have been less striking for
women. It therefore might be argued that the data on ‘depressed area’ status is acting as a
marker for areas of special health disadvantage in the 1980s, rather than the 1930s.
However, data on individual employment status in 1981, local unemployment levels in
1981 and broad region of residence were included in our models, which still show an
independent effect of area deprivation in the 1930s. Furthermore the relative risks
reported for poor health outcomes linked to ‘depressed area’ conditions in the 1930s are
similar for men and women. This suggests a broader, early influence of community level
deprivation, distinct from later effects of individual unemployment and labour market
difficulties in the 1980s. Therefore another possible explanation for poor health today, in
areas like the north east of England and Wales, may be that this is a legacy of deprived
environments experienced in childhood.
Geographical information without GIS
The previous section is a barely disguised but only partial autobiographical sketch for
2001: at the start of the year the construction of our changing boundary parish-level GIS
was clearly in deep trouble, but there appeared to be no alternative but to raise more
money and press on. By the end of the year, the bare bones of our new architecture were
in place. In this case, desperation very clearly was the mother of invention, and the most
specific source of that desperation was the need to meet the substantive goals of the
project funded by the ESRC Health Variation Programme without waiting for the
completion of the parish-level GIS. However, there is another narrative of 2001. The
year saw much travel funded by an ESRC fellowship, by an amazingly generous lecturing
fee from the University of Michigan, and via the Electronic Cultural Atlas Initiative.
Highlights included meeting Wendy Thomas and Bill Block of the University of
Minnesota, principle architects of the Data Documentation Initiative's Aggregate Data
Extension, and discussing data models for gazetteers in Taipei and Shanghai; Linda Hill
and Lex Berman need particular mentions here. Lastly, by the end of the year work had
begun on a very large new project funded by the UK National Lottery which both
required us to rebuild the system from ground up and made it possible. All that can be
presented here are the conclusions from this odyssey.

Before you construct a GIS of historic administrative boundaries, a reliable
master list of administrative units is essential. This is something the original GB
Historical GIS got half right: we did computerize the 1911 census parish-table
before digitizing the administrative area maps from the 1900s which were the
starting point for the changing boundary system, but we then added to the GIS
boundaries for units abolished before 1911 or created after 1911 without extending
the 1911-derived authority list. They were therefore identified only by names held
within label points.

Units can change their names, locations and hierarchical relationships, so the
only reliable way of identifying them is via ID numbers. Because so much can
change over time, only very limited information should be held in the master unit
record. We now hold an ID number, a type best understood as telling us what
coverage the unit belongs to, and dates of creation and abolition, plus source
information.

You need to be able to hold multiple names and hierarchical relationships for
each unit; in other words, you need to record this information not simply as a list,
-7-
in a spreadsheet or similar, but in a database where geographic names exist as a
child table to the list of units. In British records, administrative units like parishes
can be part of more than one higher level unit both concurrently and consecutively,
these many-to-many “IsPartOf” relationships need to be held in another child table
which identifies the higher and lower unit involved in each relationship. In the
British case, we also need another child table to hold status, i.e. information about
the kind of unit which is more detailed than type, and liable to change. This is very
relevant to the work with the LS: pre-1974 local government districts were divided
(mainly) into Rural Districts, Urban Districts, Municipal Boroughs and County
Boroughs. Many urban units changed status from UD to MB or MB to CB, while
Urban Districts and Rural Districts within the same county were usually named after
the same town.

Having constructed the three or four-table ontology or poly-hierarchic
thesaurus outlined above, use it as a framework for as much information as
possible (an ontology is basically a set of propositions about what exists, but the
concept is now widely used in information science to describe knowledge bases,
often in connection with semantic web development). It turns out that a structure
like this can organize a vast body of place-specific information without the
enormous costs of boundary mapping: boundary changes can be held as another
kind of relationship; statistical data can be held in another child table to the unit
table; our future plans include holding geography conversion data as yet another
kind of relationship. The way we link these additional data into the system is by
matching on names, systematically adding variant names to the names table as we
go. NB at this point you will have a system capable not just of performing the 1931
to 1939 conversion, but of holding a full audit trail for the data conversions – all
without mapping a single boundary.

Do everything you can to avoid or delay boundary mapping. Reconstructing
historic boundaries is very expensive in terms of time and money (and NB the
author has raised and spent about £1.6m, over ten years, on what most people view
as a small island). See how far you can get with a text database as already
described. See how far modern digital boundaries provide the lines you need.
Consider adding point coordinates for each of your lowest level units, and then
simply compute a set of Thiessen polygons around those coordinates. This may
sound unacceptably crude, but if you then construct polygons for your higher level
by assembling the generated polygons for the lower-level units, using the “IsPartOf”
relationships already constructed, the results will often be acceptably accurate.
When we print out our parish-level map for the whole country, even at wall chart
size, it is quite hard to tell that most of the parishes are not simply Thiessen
polygons.

When you do start boundary mapping, be thorough. One of the hardest lessons
we have learnt is that it is very hard to add a greater level of detail to an existing
system: the GB Historical GIS mapped the 600 or so 19th century Registration
Districts quite quickly in 1994-5, but then had to do them all over again so we had
boundaries that matched those of the component parishes.

Most of your data will be statistics and other attribute data, not boundaries
any other locational data – so think hard about how to organize them. This is a
vast set of issues: tables of numbers in censuses and other statistical reports convert
-8-
so easily into spreadsheets and database tables that researchers rarely give this much
thought; but as you computerize more and more of a country’s “statistical heritage”
you soon discover you have hundreds and hundreds of tables, and while a
systematic catalogue will help you find data it will not be that much help in using
them. The GB Historical GIS has now migrated a large part – currently, about
10.5m data values – into a new architecture where all the data are held in a single
column of a table with about 10.5m rows. Other columns in that table hold a date
and one of the administrative unit IDs described above. The obvious question is
how do we record what each number measures. All that can be said here is that we
use a metadata structure based on the Aggregate Data Extension developed by the
Data Documentation Initiative (http://www.icpsr.umich.edu/DDI).

All this is very expensive. To justify the cost, and get the grants, the system
must be designed for use and re-use by as many different user communities as
possible. It must be based on well-documented open standards – and until
recently commercial GIS software paid scant attention to this. Our use of the DDI
standard has already been mentioned. We also support a range of standards
developed by the Open Geospatial Consortium (http://www.opengeospatial.org) and
the
Alexandria
Digital
Library’s
gazetteer
development
(http://alexandria.sdc.ucsb.edu/~lhill/adlgaz).

Store your locational data in a spatial database, not a GIS. What is now the GB
Historical GIS began as the Labour Markets Database in the early 1990s, held in a
relational database management system, not a GIS. Boundary mapping work in the
late 1990s left us with a large body of statistics in much the same structure as
before, managed by Oracle RDBMS software, plus spatial data managed by
ArcInfo. Crucially, ArcInfo could access Oracle but not vice versa so all analyses
had to be done within ArcInfo – which was the tail wagging the dog. Today,
however, it is possible to hold the boundary data within the same object-relational
database as everything else. This does mean holding your data in a heavy duty
system, not in user friendly packages like Access or Filemaker Pro. Options include
Oracle, IBM’s DB2, not MS SQLserver, and also the two main open source
database systems, MySQL and PostgreSQL. Working with PostgreSQL, the more
mature of the open source solutions, involves a spatial extension called PostGIS.
While less friendly than simpler database packages, using these tools will be no
harder than additionally using a quite separate GIS package.

Maps consisting just of boundary lines are hard to interpret, but don’t start
digitizing the railways, the roads, the individual buildings and so on; scan and
geo-reference historic maps as backdrops. Big historical GIS systems can be like
big trainsets; there is always another piece that needs to be added to complete the
system. Given how relatively cheap it is to simply scan historic maps and then georeference these images, it is surprising how few projects have done it. The GB
Historical GIS now holds three complete sets of one inch to the mile maps of Great
Britain, which can be called up to appear underneath our statistical maps.
Conclusion
[Currently this is just the original abstract]
-9-
This paper argues that geographical analysis of demographic data does not require GIS
technology, and that historical demographers should instead explore alternative
technologies centered on data structures known as ontologies or polyhierarchic thesaurii.
The first part of the paper explains the methodology used in an analysis linking the UK
Office of National Statistics' Longitudinal Study, a 1% sample of the population of
England and Wales combining individual-level data from the 1971, 1981 and 1991
censuses, plus vital registration data, with area-level statistics from the 1931 census. The
purpose was to analyse the impact on living in high-unemployment areas in childhood on
health and survival chances in later life. Our methods relied not on GIS, but on very
systematic inventories of administrative units and manipulation of the purely textual
information on boundary changes published in census reports. This work led directly into
the development of a quite new architecture for the new "Great Britain Historical GIS",
which makes no use of commercial GIS software. The heart of the system is the
demographic data, not polygons, supported by three main sub-systems. The simplest of
these identifies sources. Our data documentation system, based on the DDI standard and
structured as an ontology, records what data items measure. Our gazetteer, again
organised as an ontology or polyhierarchic thesaurus, enables us to hold data for units
whose locations are unknown but can hold a detailed record of changing boundaries when
such data are available. Because of its ability to work with incomplete data, this system is
far more appropriate to historical research than a conventional GIS. This architecture
supports our public web site.
Acknowledgments
This paper draws extensively on text written by collaborators, and any published
version will involve co-authors. I must take whole responsibility for the current draft,
while noting that analytical results linking GIS data to the ONS Longitudinal Study are
taken from a paper jointly authored by Sarah Curtis and Peter Congdon of Queen Mary,
University of London, and Brian Dodgeon of the Institute of Education, as well as myself.
The description of the birth cohort studies is based on work by Alissa Goodman of the
Institute for Fiscal Studies.
References
Alter, G. (1988) Family and the Female Life Course: The women of Verviers, Belgium,
1849-1880 (Madison: U. of Wisconsin Press).
Britton, M. (1990), ‘Sources of Data and Limitations’ in Britton M. (ed) Mortality and
Geography: a Review in the mid-1980s England and Wales. OPCS series DS no. 9
(London: HMSO), chapter 2, pp. 6-17.
Carlstein, T. et al (1978), Human Activity and Time Geography (London: Edward
Arnold).
Curtis, S., Southall, H., Congdon, P. , Dodgeon, B. (2003) ‘Analysis of the Longitudinal
Study sample in England using new data on area of residence in childhood’, Social
Science and Medicine, 57(12).
Fox, A. J. and Goldblatt, P. (1982), Longitudinal Study: Socio-economic mortality
differentials, 1971-75, series LS no.1 (London: HMSO).
Garrett, E., Reid, A., Schurer, K. and Szreter, S. (2001) Changing Family Size in England
and Wales: Place, Class and Demography (Cambridge: CUP, 2001).
-10-
Giddens, A. (1984), The Constitution of Society: Outline of the Theory of Structuration
(Cambridge: Polity Press).
Gregory, I.N., and Southall, H.R., ‘Putting the Past in Its Place: the Great Britain Historical GIS’,
in Carver, S (ed.) Innovations in GIS 5 (Taylor & Francis, 1998), pp.210-21.
Haynes, R., Bentham, G., Lovett, A., Eimermann, J. (1997) Effect of labour market
conditions on reporting of limiting long term illness and permanent sickness in
England and Wales, Journal of Epidemiology and Community Health, 51,3, 282-8.
Hollingsworth, T.H. (1964), 'The demography of the British Peerage', Population Studies,
suppl., 18: pp. 3-107.
Langton, J., and Hoppe, G., (1990) 'Urbanization, social structure and population
circulation in pre-industrial times: flows of people through Vadstena (Sweden) in the
mid-nineteenth century' in Corfield, P.J., & Keene, D. (eds.) Work in Towns 850-1850
(Leicester), pp.138-63.
Nicholas, S., and Shergold, P.R. (1987) 'Internal Migration in England, 1817-1839',
Journal of Historical Geography, vol.13, pp. 155-68.
Parkes, D.N., and Thrift, N.J. (1980) Times, Spaces, and Places: A chronogeographic
perspective (Chichester: Wiley).
Pooley, C., and Turnbull, J. (1998), Migration and mobility in Britain since the eighteenth
century (London: UCL Press).
Ruggles, S. (1992), ' Migration, Marriage, and Mortality: Correcting Sources of Bias in
English Family Reconstitutions', Population Studies, vol. 46, pp. 507-22.
Southall, H.R. (1991a) 'The Tramping Artisan Revisits: Labour mobility and economic
distress in early Victorian England', Economic History Review, II, Vol.44, pp.272-96.
Southall, H.R. (1991b) 'Mobility, the Artisan Community, and Popular Politics in early
nineteenth century England', in G.Kearns & C.W.Withers (eds.), Urbanising Britain:
class and community in the nineteenth century (Cambridge UP, 1991), pp. 103-30.
Southall, H.R., & Garrett, E. (1991) 'Morbidity and Mortality among mid-Nineteenth
Century Artisans', Social History of Medicine, vol.4, pp.231-52.
Strachan, D.P. Leon, D.A., and Dodgeon, B. (1995) ‘Mortality from cardiovascular
disease among interregional migrants in England and Wales’, British Medical
Journal, 310, pp. 423-7.
-11-
Figure 1:
The four birth cohort studies, age and year at data collection
(full cohort follow-ups only), and longitudinal sample size (compiled by Alissa Goodman)
Age
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
NSHD
1946 n= 5,362
NCDS
1958 n=17,415
BCS70
1970 n=16,571
MCS
2000/2001 n=18,553
1948
2003
1950
1975 n=12,981
1952
1953
1954
1955
1956
1957
1965 n=15,051
1980 n=14,350
1969 n=14,757
1959
1961
1974 n=13,917
1986 n=11,206
1965
1966
1968
1969
1981 n=12,044
1971
1972
1996 n=8,654
1975
2000/2001 n= 10,833
1977
1991 n=10,986
2004
1982
2000/2001 n=10,979
1989
2004
1999 n = xxxx
2006
-12-
2005
Figure 2:
“Example of an individual's path in a time-space coordinate system. … In the example, the
individual starts from the home and visits his workplace, a bank, his work place and, finally, a
post office, before returning home.” (From p. 164 of Carlstein et al, 1978)
-13-
Download