Putting People into Past Places: Geographical information and British longitudinal microdata Humphrey Southall (Department of Geography, University of Portsmouth/Great Britain Historical GIS) Introduction This paper has two linked aims. Firstly, to discuss the potential for linking longitudinal microdatasets – basically, individual life histories – with aggregate historical data for localities. Secondly, to present an alternative methodology for managing large scale aggregate datasets which differs strikingly from traditional historical applications of Geographical Information Systems technology. The two goals are linked by an analysis linking the Office of National Statistics Longitudinal Study (LS), which brings together information about particular people from the 1971, 1981 and 1991 censuses, with information about the areas these people were living in as children in the 1930s. The paper begins with a brief survey of available longitudinal datasets. These are arguably the most important datasets created by British empirical social science, and are certainly the most expensive; but they are relatively little known among non-UK historical demographers. They are mainly based on direct contact with the subjects and are limited to the last sixty years, but the paper briefly touches on similar datasets from earlier periods, constructed from documentary sources. The next section briefly presents the results of an analysis, already published, combining the LS with 1931 census data held in the Great Britain Historical GIS. However, its main focus is a discussion of the method behind that analysis. The original plan was to use vector overlay methods based on a detailed boundary GIS to convert between the very different local government geographies of 1931 and 1939. Delays in completion of the GIS necessitated development of an alternative set of methods working more directly with the textual boundary change information provided by the census; but this other methodology proved anyway superior. The third section of the paper briefly presents the new architecture developed for the “Great Britain Historical GIS” since 2001, heavily influenced by our experience with the inter-war analysis. The quotation marks are necessary because the new system is not necessarily a GIS. The conclusion argues that traditional GIS methods are poorly suited to the needs of historical demographers because we generally lack both the data they require and the funding. Introducing British longitudinal microdata Traditionally, most empirical social science has been based on cross-sectional surveys. Historically, censuses are the most important of our surveys, but increasingly sample surveys are preferred because they can ask more questions and be carried out more often. In either case, even when surveys are repeated for the same populations, no linkage is made between repeated answers from the same respondents. Meanwhile, economists -1- have developed very different methodologies based on time series, but generally for highly aggregated measures lacking geography. Repeatedly gathering information from the same individuals poses many additional challenges, not least the need to keep the survey team itself in existence for many years. Unsurprisingly, the initial background of the earliest studies was medical and the immediate issues were the cost of childbirth and the quality of associated health care. The National Survey of Health and Development (NSHD) surveyed all sixteen and a half thousand births in England, Wales and Scotland during a specific week in March 1946. However, a representative sample (5,362) of this population was then chosen for follow-up and has now been studied 21 times. During childhood interviews were at annual or bi-annual intervals (at ages 2, 4, 6, 7, 8, 9, 10, 11, 13 and 15). Further interviews of the full cohort then took place throughout adulthood (at 19, 20 22, 23, 25, 26, 29, 31, 36, 43, and 53). A further follow-up at age 60 is planned in 2006. There are three further major birth cohort studies in the UK. The National Child Development Study (NCDS) originally concerned the social and obstetric factors associated with stillbirth and death in early infancy, following the lives of 17,634 children born in Great Britain during one week of March 1958. The 1970 British Cohort Study (BCS70) covered the births and families of all 17,634 babies born in the UK in a single week in April 1970. The Millennium Cohort Study (MCS) began in 2000/2001, with an initial sample of 18,553 babies born in the UK, deliberately over-representing babies born in deprived areas and areas with high proportions of ethnic minorities. Each study has carried out further rounds of interviewing, broadening study goals. Systematic information on the birth cohort studies is tabulated in table??, setting out the age at which follow-ups have been conducted, the years when follow-ups took place, and the achieved sample sizes amongst the original birth cohort members. Placing the birth cohort studies, and especially the 1946 study, in geographical context currently looms large in the likely future work of the GB Historical GIS. However, this paper is more concerned with a rather different longitudinal data set, the Office of National Statistics Longitudinal Study (LS). This is a 1 percent representative sample of the population of England and Wales drawn initially from the 1971 Census. The sample has been followed up subsequently by linking information about the same individuals from the 1971, 1981 and 1991 censuses. Vital events (parenthood, death) since 1971 are also included (Fox and Goldblatt, 1982; Britton, 1990). Because the LS data cover a large number of living persons and were constructed without their individual consent, access is severely restricted: firstly, each specific analysis must be approved by the LS Board; secondly, academic researchers have no direct access to the data, instead requesting cross-tabulations; thirdly, those cross-tabulations must be sufficiently general that cell counts of one never appear. Unlike the birth cohort studies, the LS already contains many elderly people with declining health. In particular, we know whether individuals died between 1981 and 1991 (the research described here was completed before 2001 census data were added to the LS). We also know whether individuals covered by the 1991 census experienced limiting long-term illness. However, we clearly need information over a much longer period than 1971 to 1991 if we are to understand the life-course as a whole, and in particular the relationship between childhood environments and health in later life; a longer period is also needed, of course, before any analysis of the LS could be described as ‘historical’. -2- This proved possible due to a fortunate accident. Individuals within the LS are identified by their National Health Service number, the NHS Register being as close as the UK comes to having a population register. If an individual was living in Britain on September 29th, 1939, and – a non-trivial additional requirement – they did not subsequently serve in the armed forces, their NHS number identifies the specific local government district they were resident in on that date. One irritating limitation of British census data is that a census has been taken every ten years between 1801 and 2001, except 1941. A 1941 census was never taken due to World War II, but planning for it had begun. Given the worsening political outlook and the expectation that war would mean an unprecedented mobilisation of the civilian population, 1941 census planning also covered a simplified population enumeration, limited to gender and age, to be carried out on the outbreak of war. It was this National Registration which took place on September 29th. Unfortunately, as further discussed below, this was just after a large scale evacuation of children from many cities, especially London. The hidden meaning of the LS ID numbers was discovered by Strachan et al (1995), and used as the basis for an analysis of the relative importance of region of origin in 1939 and area of residence in 1971 for ischaemic heart disease and stroke. Our aim was to go substantially further, adding systematic statistics from those 1930s to characterise those 1939 districts, and relate recent health experience to specific risk factors in childhood. The next section explains in detail how this was done, and what we learnt about ‘historical GIS’ in the process. It also presents a brief summary of the results, which have already been published elsewhere (Curtis et al, 2003). The remainder of this section is a more general discussion of the potentials for bringing together longitudinal microdata and aggregate historical statistics, still limited to UK examples. Longitudinal data is historical by definition, but the examples given so far are perhaps insufficiently historical for many historical demographers. If such datasets can be constructed only through repeated direct contact between trained social researchers and the subjects, datasets going significantly further back than the 1946 birth cohort will be hard to find; and this of course means that longitudinal datasets covering all their subjects from birth to death will not exist. However, they can be constructed retrospectively from documentary sources. In this sense, some of the best known datasets in historical demography are longitudinal microdata, such as all the work of the Cambridge Group and their associates on family reconstitution from parish registers and Hollingsworth (1964) on the longevity of British aristocrats. Both these examples have well-known limitations: family reconstitutions inevitably exclude the geographically mobile, seriously biasing results (Ruggles, 1992), as do most studies based on tracing individuals and families through successive census enumerators books; aristocrats and other prominent individuals can be tracked between localities, but are inherently unrepresentative. In other countries, such as Belgium (Alter, 1988) and Sweden (Hoppe and Langton, 1990), population registers and other government records enable individuals to be followed over their lives, as they move around the country. Such official data does not exist in the UK, but there are other ways forward. Firstly, some historical sources themselves assembled retrospective data on individuals. Indents for convicts transported to Australia (Nicholas and Shergold, 1987) documented their past movements within England, while Garrett et al (2001) used the data recorded by the 1911 census on what children women had previously given birth to, and what had happened to those absent from the household in 1911, to analyse the individual-level determinants of infant mortality. Secondly, administrative records do enable quite detailed reconstructions of the lives of particular, inevitably never a random sample from the overall population but maybe more representative than Hollingsworth’s aristocrats. -3- My own contribution here is a reconstruction of the lives of early 19 th century engineering workers, and especially their mobility and health histories, from trade union records (Southall, 1991a and b; Southall and Garrett, 1991). Lastly, Pooley and Turnbull (1998) assembled the life histories assembled by family historians tracing their ancestors as they moved between localities, mostly using the same sources as historical demographers but willing to search through enormous volumes of material for isolated references. However they are constructed life histories require geographical context, and evidence from within the life histories is often misleading. For example, the autobiographies of tramping artisans emphasised 'travelling' as a lifecycle stage, between the completion of apprenticeships and marriage, but statistical evidence showed much movement by older men in response to the chronology and geography of the trade cycle (Southall, 1991a). Perhaps the least convincing aspect of Pooley and Turnbull's analysis is their reliance on, arguably, undocumented assertions by family historians about the reasons why their ancestors moved. Strachan's analysis of the LS could not go deeper than 1930s regions because it lacked systematic independent evidence on 1930s geographies. There are broader theoretical frameworks for combining longitudinal microdata with areabased contextual evidence. Firstly, during the 1970s and 1980s, time geographers sought not only to add a third, temporal, dimension to quantitative geography but also place individual space-time trajectories within broader social processes (Carlstein et al, 1978; Parkes and Thrift, 1980). Secondly, these ideas heavily influenced Anthony Giddens work on structuration, which sought to link abstract problems of social theory to developments in the empirical methodologies of the social sciences (Giddens, 1984, esp. ch. 3: 'Time, space and regionalisation'). It is arguable that the relatively limited impact of these ideas on empirical social research is precisely the consequence of our limited technical tools for organising datasets which combine individual and locality-level data over long periods of time. Tracking individuals who are not simply static components within localities but active agents moving within and modifying whole systems of localities is hard. One basic argument of this paper is that large scale historical geo-demographic data will not make much of a contribution until we abandon borrowed technologies originally designed to track the geographicallydispersed assets of utility companies, and suchlike. About the only interesting feature of an individual electricity pylon is its particular location in Cartesian space, but this is not true of either individuals or localities. Contextualising the life course: linking the LS to the 1931 census The latter parts of the previous section emphasized studies providing evidence on migration partly because the commonest kind of longitudinal microdatasets created by British historical demographers, family reconstitutions via nominal record linkage from parish registers necessarily neglect both migrants and migration as a process. However, migration is precisely why we need analyses combing individual and locality-level data over long periods. One very specific example of this need is in attempts to understand the geography of health, and the impact of environmental factors on individual health. It is a commonplace that people in the old industrial regions of Britain have worse health, measured through both mortality and morbidity, than the country as a whole. One explanation of this, however, is that these differentials are wholly or mainly the consequence of migration: over most of the twentieth century, heavy industry employed declining numbers of people, so the areas in which they were based experienced large scale net out-migration. Unsurprisingly, the healthier individuals were the more likely -4- they were to move away, and this in itself would mean that the residual population of the industrial areas would exhibit poorer health. Obviously, if we are to measure the effects of the underlying environments on health we must somehow include in our analysis individuals who grew up in the industrial areas and then moved away. This question was the main focus of one part of a project funded by the Economic and Social Research Council’s Health Variations Programme. Other parts of that project used data held by the Great Britain Historical GIS and database to analyse long-run area-level trends, from the 1920s to the present, relating annual districtlevel infant mortality data to socio-economic variables from the census, but here we used all that historical data as potential explanations for health outcomes as recorded within the ONS Longitudinal Study. This work had clear relevance to public policy: children born between 1925 and 1935, growing up during the worst years of the inter-war recession, are now aged between 70 and 80. The recession had a very clear geographical focus, especially in its later years, so could we measure the specific burden these “children of the Great Depression” place on the modern health service? The major practical problem with this analysis was that, as explained above, the NHS numbers within the LS told us what local government district subjects were living in at the time of the National Registration in 1939, but the only personal characteristics recorded and reported on but the National Registration were gender and age. For information on socio-economic attributes of area of residence, we must turn to the 1931 census. This is problematic because of changes in the geographical boundaries used to define administrative areas during the 1930s. Although the basic architecture of local government remained constant between 1931 and 1939, consisting of County Boroughs, Municipal Boroughs (all urban units), Rural Districts and London Boroughs, the detailed geography of local government was greatly changed through a rolling programme of county reviews: the 1931 census reports on 1,800 LGDs, but this had been reduced to 1,472 in 1939. Further, many of the districts which were not abolished were altered through boundary changes: 289 (19.6%) of the 1939 LGDs were new creations or had been affected by boundary changes. A set of supplementary census reports was issued, re-tabulating 1931 data for the units that existed after the county reviews, but unfortunately these were again limited to gender and age. Some method was therefore needed to re-cast 1931 data to 1939 units based only on the published information available to us. Our original plan was to use our changing boundary GIS to redistrict from 1931 to 1939, estimating what proportion of each 1931 district’s data should be assigned to each 1939 district using more detailed parish-level population statistics and the assumption that population was evenly distributed over each parish: by overlaying vector boundaries for each administrative geography, we could then calculate a set of conversion weights. Unfortunately, technical problems and in particular the enormous problems identifying inconsistencies in our arc-based GIS (Gregory and Southall, 1998) meant that both our local government district-level and parish-level changing boundary GISs were not ready for analytic use even at the start of 2001, and we had to meet the overall timetable of the Health Variations Programme. The LGD-level changing boundary GIS has since been completed but the much larger parishlevel system had to be abandoned. We eventually completed a set of conventional static boundary GIS coverages for each census, and are now working towards using these to create a full time-variant system within the new polygon-based spatial database described below. -5- We had therefore to find some other method for converting 1931 census data to the 1939 geography without using the non-existent GIS. The method we actually used was based entirely on textual evidence from various official reports: the 1931 census reports obviously list the 1931 population of each LGD as it was defined in 1931; but the 1939 National Registration report also lists the 1931 population of each LGD as defined in 1939; and, crucially the lists of boundary changes in the Registrar General’s annual Statistical Reviews gave the 1931 populations of the areas transferred, as well as the names of the losing and gaining districts. Very fortunately, those lists of changes had been computerised much earlier in the life of the project, simply to provide contextual evidence in the construction of the changing boundary GIS. A geography conversion table was therefore constructed from the 1931 and 1939 reports plus the 1,805 boundary changes listed for the intervening period, using 1931 populations rather than geographical areas. By linking this table to 1931 census data, we can cut data for the 1,800 1931 districts up into 2,916 fragments, and then reassemble them into the 1,472 1939 units. The table was very carefully cross-checked by comparing the 1931 populations of 1939 units computed by applying the boundary change information to the 1931 census figures, with the 1931 populations listed in the 1939 report. This method avoids any problematic assumptions about population density. However, we are still assuming that the population of a district had uniform socioeconomic characteristics; for example, that there would have been an equal proportion of unemployed workers in the middle of a town and on the rural fringe. However, boundary changes by which part of an urban area was transferred to the surrounding rural district were very unlikely: most changes consisted of either part of a rural area being transferred into an urban unit, or very small urban units being abolished through merger with an adjacent urban area. [The final version will go into substantially more detail here, in particular discussing the relative accuracy of redistricting using area-based weights derived from the parishlevel GIS and using a population weighted geography conversion table directly derived from the census boundary change data. The basic conclusion is that population-based weights are far more accurate, and that in urban areas quite small inaccuracies in boundaries within the GIS can lead to large numbers of people being incorrectly reassigned; for example, across the Mersey between Liverpool and Birkenhead, areas never linked in any pre-1974 administrative geography.] Conditions in the area of residence during childhood appear to have had a measurable association with health outcomes later in life, even after allowing for more recent circumstances. Our results show that those that had lived in areas classified in 1934 as ‘depressed areas’ had a relative risk of mortality, or of illness reporting, which was 1415% higher than those who had not been registered as living in such areas in the 1939. These ‘depressed areas’ were mainly in the north of England (and in Wales) and they had particularly high levels of unemployment during the 1930s. They include mining areas in regions such as the north east of England, which also had high levels of unemployment (especially for men) in the 1990s and have been shown to have particularly high prevalence of long term illness reported in the 1991 census. Some authors (e.g. Haynes et al, 1997) have suggested that this may have been affected by local labour market conditions in the late 1980s and early 1990s and that in areas of high unemployment, people were more likely to declare themselves to have long term illnesses preventing them from working. Our results certainly support the view that men who were unemployed in 1981 had relatively poor health outcomes by 1991. -6- The effects of unemployment on health outcomes appear to have been less striking for women. It therefore might be argued that the data on ‘depressed area’ status is acting as a marker for areas of special health disadvantage in the 1980s, rather than the 1930s. However, data on individual employment status in 1981, local unemployment levels in 1981 and broad region of residence were included in our models, which still show an independent effect of area deprivation in the 1930s. Furthermore the relative risks reported for poor health outcomes linked to ‘depressed area’ conditions in the 1930s are similar for men and women. This suggests a broader, early influence of community level deprivation, distinct from later effects of individual unemployment and labour market difficulties in the 1980s. Therefore another possible explanation for poor health today, in areas like the north east of England and Wales, may be that this is a legacy of deprived environments experienced in childhood. Geographical information without GIS The previous section is a barely disguised but only partial autobiographical sketch for 2001: at the start of the year the construction of our changing boundary parish-level GIS was clearly in deep trouble, but there appeared to be no alternative but to raise more money and press on. By the end of the year, the bare bones of our new architecture were in place. In this case, desperation very clearly was the mother of invention, and the most specific source of that desperation was the need to meet the substantive goals of the project funded by the ESRC Health Variation Programme without waiting for the completion of the parish-level GIS. However, there is another narrative of 2001. The year saw much travel funded by an ESRC fellowship, by an amazingly generous lecturing fee from the University of Michigan, and via the Electronic Cultural Atlas Initiative. Highlights included meeting Wendy Thomas and Bill Block of the University of Minnesota, principle architects of the Data Documentation Initiative's Aggregate Data Extension, and discussing data models for gazetteers in Taipei and Shanghai; Linda Hill and Lex Berman need particular mentions here. Lastly, by the end of the year work had begun on a very large new project funded by the UK National Lottery which both required us to rebuild the system from ground up and made it possible. All that can be presented here are the conclusions from this odyssey. Before you construct a GIS of historic administrative boundaries, a reliable master list of administrative units is essential. This is something the original GB Historical GIS got half right: we did computerize the 1911 census parish-table before digitizing the administrative area maps from the 1900s which were the starting point for the changing boundary system, but we then added to the GIS boundaries for units abolished before 1911 or created after 1911 without extending the 1911-derived authority list. They were therefore identified only by names held within label points. Units can change their names, locations and hierarchical relationships, so the only reliable way of identifying them is via ID numbers. Because so much can change over time, only very limited information should be held in the master unit record. We now hold an ID number, a type best understood as telling us what coverage the unit belongs to, and dates of creation and abolition, plus source information. You need to be able to hold multiple names and hierarchical relationships for each unit; in other words, you need to record this information not simply as a list, -7- in a spreadsheet or similar, but in a database where geographic names exist as a child table to the list of units. In British records, administrative units like parishes can be part of more than one higher level unit both concurrently and consecutively, these many-to-many “IsPartOf” relationships need to be held in another child table which identifies the higher and lower unit involved in each relationship. In the British case, we also need another child table to hold status, i.e. information about the kind of unit which is more detailed than type, and liable to change. This is very relevant to the work with the LS: pre-1974 local government districts were divided (mainly) into Rural Districts, Urban Districts, Municipal Boroughs and County Boroughs. Many urban units changed status from UD to MB or MB to CB, while Urban Districts and Rural Districts within the same county were usually named after the same town. Having constructed the three or four-table ontology or poly-hierarchic thesaurus outlined above, use it as a framework for as much information as possible (an ontology is basically a set of propositions about what exists, but the concept is now widely used in information science to describe knowledge bases, often in connection with semantic web development). It turns out that a structure like this can organize a vast body of place-specific information without the enormous costs of boundary mapping: boundary changes can be held as another kind of relationship; statistical data can be held in another child table to the unit table; our future plans include holding geography conversion data as yet another kind of relationship. The way we link these additional data into the system is by matching on names, systematically adding variant names to the names table as we go. NB at this point you will have a system capable not just of performing the 1931 to 1939 conversion, but of holding a full audit trail for the data conversions – all without mapping a single boundary. Do everything you can to avoid or delay boundary mapping. Reconstructing historic boundaries is very expensive in terms of time and money (and NB the author has raised and spent about £1.6m, over ten years, on what most people view as a small island). See how far you can get with a text database as already described. See how far modern digital boundaries provide the lines you need. Consider adding point coordinates for each of your lowest level units, and then simply compute a set of Thiessen polygons around those coordinates. This may sound unacceptably crude, but if you then construct polygons for your higher level by assembling the generated polygons for the lower-level units, using the “IsPartOf” relationships already constructed, the results will often be acceptably accurate. When we print out our parish-level map for the whole country, even at wall chart size, it is quite hard to tell that most of the parishes are not simply Thiessen polygons. When you do start boundary mapping, be thorough. One of the hardest lessons we have learnt is that it is very hard to add a greater level of detail to an existing system: the GB Historical GIS mapped the 600 or so 19th century Registration Districts quite quickly in 1994-5, but then had to do them all over again so we had boundaries that matched those of the component parishes. Most of your data will be statistics and other attribute data, not boundaries any other locational data – so think hard about how to organize them. This is a vast set of issues: tables of numbers in censuses and other statistical reports convert -8- so easily into spreadsheets and database tables that researchers rarely give this much thought; but as you computerize more and more of a country’s “statistical heritage” you soon discover you have hundreds and hundreds of tables, and while a systematic catalogue will help you find data it will not be that much help in using them. The GB Historical GIS has now migrated a large part – currently, about 10.5m data values – into a new architecture where all the data are held in a single column of a table with about 10.5m rows. Other columns in that table hold a date and one of the administrative unit IDs described above. The obvious question is how do we record what each number measures. All that can be said here is that we use a metadata structure based on the Aggregate Data Extension developed by the Data Documentation Initiative (http://www.icpsr.umich.edu/DDI). All this is very expensive. To justify the cost, and get the grants, the system must be designed for use and re-use by as many different user communities as possible. It must be based on well-documented open standards – and until recently commercial GIS software paid scant attention to this. Our use of the DDI standard has already been mentioned. We also support a range of standards developed by the Open Geospatial Consortium (http://www.opengeospatial.org) and the Alexandria Digital Library’s gazetteer development (http://alexandria.sdc.ucsb.edu/~lhill/adlgaz). Store your locational data in a spatial database, not a GIS. What is now the GB Historical GIS began as the Labour Markets Database in the early 1990s, held in a relational database management system, not a GIS. Boundary mapping work in the late 1990s left us with a large body of statistics in much the same structure as before, managed by Oracle RDBMS software, plus spatial data managed by ArcInfo. Crucially, ArcInfo could access Oracle but not vice versa so all analyses had to be done within ArcInfo – which was the tail wagging the dog. Today, however, it is possible to hold the boundary data within the same object-relational database as everything else. This does mean holding your data in a heavy duty system, not in user friendly packages like Access or Filemaker Pro. Options include Oracle, IBM’s DB2, not MS SQLserver, and also the two main open source database systems, MySQL and PostgreSQL. Working with PostgreSQL, the more mature of the open source solutions, involves a spatial extension called PostGIS. While less friendly than simpler database packages, using these tools will be no harder than additionally using a quite separate GIS package. Maps consisting just of boundary lines are hard to interpret, but don’t start digitizing the railways, the roads, the individual buildings and so on; scan and geo-reference historic maps as backdrops. Big historical GIS systems can be like big trainsets; there is always another piece that needs to be added to complete the system. Given how relatively cheap it is to simply scan historic maps and then georeference these images, it is surprising how few projects have done it. The GB Historical GIS now holds three complete sets of one inch to the mile maps of Great Britain, which can be called up to appear underneath our statistical maps. Conclusion [Currently this is just the original abstract] -9- This paper argues that geographical analysis of demographic data does not require GIS technology, and that historical demographers should instead explore alternative technologies centered on data structures known as ontologies or polyhierarchic thesaurii. The first part of the paper explains the methodology used in an analysis linking the UK Office of National Statistics' Longitudinal Study, a 1% sample of the population of England and Wales combining individual-level data from the 1971, 1981 and 1991 censuses, plus vital registration data, with area-level statistics from the 1931 census. The purpose was to analyse the impact on living in high-unemployment areas in childhood on health and survival chances in later life. Our methods relied not on GIS, but on very systematic inventories of administrative units and manipulation of the purely textual information on boundary changes published in census reports. This work led directly into the development of a quite new architecture for the new "Great Britain Historical GIS", which makes no use of commercial GIS software. The heart of the system is the demographic data, not polygons, supported by three main sub-systems. The simplest of these identifies sources. Our data documentation system, based on the DDI standard and structured as an ontology, records what data items measure. Our gazetteer, again organised as an ontology or polyhierarchic thesaurus, enables us to hold data for units whose locations are unknown but can hold a detailed record of changing boundaries when such data are available. Because of its ability to work with incomplete data, this system is far more appropriate to historical research than a conventional GIS. This architecture supports our public web site. Acknowledgments This paper draws extensively on text written by collaborators, and any published version will involve co-authors. I must take whole responsibility for the current draft, while noting that analytical results linking GIS data to the ONS Longitudinal Study are taken from a paper jointly authored by Sarah Curtis and Peter Congdon of Queen Mary, University of London, and Brian Dodgeon of the Institute of Education, as well as myself. The description of the birth cohort studies is based on work by Alissa Goodman of the Institute for Fiscal Studies. References Alter, G. (1988) Family and the Female Life Course: The women of Verviers, Belgium, 1849-1880 (Madison: U. of Wisconsin Press). Britton, M. (1990), ‘Sources of Data and Limitations’ in Britton M. (ed) Mortality and Geography: a Review in the mid-1980s England and Wales. OPCS series DS no. 9 (London: HMSO), chapter 2, pp. 6-17. Carlstein, T. et al (1978), Human Activity and Time Geography (London: Edward Arnold). Curtis, S., Southall, H., Congdon, P. , Dodgeon, B. (2003) ‘Analysis of the Longitudinal Study sample in England using new data on area of residence in childhood’, Social Science and Medicine, 57(12). Fox, A. J. and Goldblatt, P. (1982), Longitudinal Study: Socio-economic mortality differentials, 1971-75, series LS no.1 (London: HMSO). Garrett, E., Reid, A., Schurer, K. and Szreter, S. (2001) Changing Family Size in England and Wales: Place, Class and Demography (Cambridge: CUP, 2001). -10- Giddens, A. (1984), The Constitution of Society: Outline of the Theory of Structuration (Cambridge: Polity Press). Gregory, I.N., and Southall, H.R., ‘Putting the Past in Its Place: the Great Britain Historical GIS’, in Carver, S (ed.) Innovations in GIS 5 (Taylor & Francis, 1998), pp.210-21. Haynes, R., Bentham, G., Lovett, A., Eimermann, J. (1997) Effect of labour market conditions on reporting of limiting long term illness and permanent sickness in England and Wales, Journal of Epidemiology and Community Health, 51,3, 282-8. Hollingsworth, T.H. (1964), 'The demography of the British Peerage', Population Studies, suppl., 18: pp. 3-107. Langton, J., and Hoppe, G., (1990) 'Urbanization, social structure and population circulation in pre-industrial times: flows of people through Vadstena (Sweden) in the mid-nineteenth century' in Corfield, P.J., & Keene, D. (eds.) Work in Towns 850-1850 (Leicester), pp.138-63. Nicholas, S., and Shergold, P.R. (1987) 'Internal Migration in England, 1817-1839', Journal of Historical Geography, vol.13, pp. 155-68. Parkes, D.N., and Thrift, N.J. (1980) Times, Spaces, and Places: A chronogeographic perspective (Chichester: Wiley). Pooley, C., and Turnbull, J. (1998), Migration and mobility in Britain since the eighteenth century (London: UCL Press). Ruggles, S. (1992), ' Migration, Marriage, and Mortality: Correcting Sources of Bias in English Family Reconstitutions', Population Studies, vol. 46, pp. 507-22. Southall, H.R. (1991a) 'The Tramping Artisan Revisits: Labour mobility and economic distress in early Victorian England', Economic History Review, II, Vol.44, pp.272-96. Southall, H.R. (1991b) 'Mobility, the Artisan Community, and Popular Politics in early nineteenth century England', in G.Kearns & C.W.Withers (eds.), Urbanising Britain: class and community in the nineteenth century (Cambridge UP, 1991), pp. 103-30. Southall, H.R., & Garrett, E. (1991) 'Morbidity and Mortality among mid-Nineteenth Century Artisans', Social History of Medicine, vol.4, pp.231-52. Strachan, D.P. Leon, D.A., and Dodgeon, B. (1995) ‘Mortality from cardiovascular disease among interregional migrants in England and Wales’, British Medical Journal, 310, pp. 423-7. -11- Figure 1: The four birth cohort studies, age and year at data collection (full cohort follow-ups only), and longitudinal sample size (compiled by Alissa Goodman) Age 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 NSHD 1946 n= 5,362 NCDS 1958 n=17,415 BCS70 1970 n=16,571 MCS 2000/2001 n=18,553 1948 2003 1950 1975 n=12,981 1952 1953 1954 1955 1956 1957 1965 n=15,051 1980 n=14,350 1969 n=14,757 1959 1961 1974 n=13,917 1986 n=11,206 1965 1966 1968 1969 1981 n=12,044 1971 1972 1996 n=8,654 1975 2000/2001 n= 10,833 1977 1991 n=10,986 2004 1982 2000/2001 n=10,979 1989 2004 1999 n = xxxx 2006 -12- 2005 Figure 2: “Example of an individual's path in a time-space coordinate system. … In the example, the individual starts from the home and visits his workplace, a bank, his work place and, finally, a post office, before returning home.” (From p. 164 of Carlstein et al, 1978) -13-