Workshop introduction Linkages and anthropological holism

advertisement
Workshop introduction
Linkages and anthropological holism
Not the sort of linkages usually talked about.
Big general questions which have practical and specific consequences
Different factors –
disclosure/privacy – legal constraints
archiving
administrative data – ‘new types of social data’
blogs espy photo or video blogs
new statistic techniques – different types of connection/inter
relation
E-science !
These all pull in different directions – so part of the point of the
workshop is to start some conversations across domains, many of which do
have e-science involvement but normally separately – so for example the
workshop held in London yesterday on Monday 19th at which Peter was present
discussed, I guess, linkage between different administrative datasets and the
legal, ethical and hence research consequences of establishing such links.
One of people who sadly cannot attend today is Dorothy Sheridan the
1
director of the Mass Observation archive in Brighton. Since she cannot be here I
want to take a moment to explain why I invited her and why I think that Mass
Observation is a provocative example.
The thing that interests me most about MO are the diaries. These continue
to be written on a volunteer basis by a loyal band of anonymous research
volunteers. They may be anonymous but they tell us an immense amount about
themselves. They are anonymous in ways that are similar to a lot of bloggers,
where we are told almost everything but their name, and one suspects that a
curious investigative journalist probably could identify the writers in at least
some cases. Moreover, in a few cases the diarists have had a change of heart
and have asked for their diaries to be removed from the archive. Dorothy
Sheridan has told me that she has agreed to this in a handful of cases. Over the
70 years in which MO has been going this maybe manageable from the point of
view of the archivists but think of the scaling implications when one moves
from the few hundred diaries in MO to the millions of blogs which are currently
online.
I could talk more about this but for now that probably suffices: what is
important for this workshop is that MO is a pre-internet archive with many of
the same issues about privacy and disclosure that confronts those dealing with
internet archives especially those of blogs.
So let me talk a bit about blogs. They present a variety of issues
2
a) In some cases the bloggers are explicit and self identifying. They
absolutely are not anonymous. This is the blog of David Zeitlyn (actually only a
field diary from one of my Cameroon field trips). Others are more challenging
because there are no names but a lot of potentially disclosive information is
contained within them. And then there are video and photo blogs in which we
may or may not know their name but we know what they look like! A child
might make a photo blog available which will be invaluable in the distant future
because of the insight it affords into how teenagers decorate their bedrooms in
the 2000s. How are we to deal with the responsibilities of archivists which may
well conflict: on the one hand to the author and on the other hand to future
researchers? So, how can blogs be linked to the administrative data about the
bloggers in an ethically sensitive fashion will make their blogs more useful for
analysis in the future? Time may well resolve some of these issues – the one
thing archivists traditionally have on their side. Issues of privacy and
confidentially decrease with time. I think that the dead do not have a legal right
to privacy (although their descendants do). The legal status of the contents of
online archives of genealogical data is something that perhaps it is best not to
ask about. I note in passing that this involves yet another type of linkage.
The final thought to get us started is one to do with what I’m calling a
3
data pyramid: again involving a different type of linkage to that usually thought
of.
At the apex of the pyramid are the few trusting souls who are disdainful
of privacy and through blogs and other techniques broadcast a lot of
information about themselves (self disclosers or 'exhibitionists').
At the bottom are people who are scarcely documented and highly
resistant/suspicious of all researchers etc etc. An extreme case might be
homeless or criminals. But at this level there are things to be learnt not from
what is documented about such people but just what documents exist.
There are patterns in the types of documents concerning individuals from
which I think inferences can be made:
Contrast a homeless person who might have the following:
NI number, NHS number, several scattered NHS files, criminal and social
service files
With a university lecturer:
NI number, NHS number, several scattered NHS files, Employment
records, passport number, DVLA and driving license, bank and building society
records, credit agency records, Tesco cards etc etc, blogs and websites, (as well
as, if they’ve got any hope for the RAE, records in the ISI citation index etc).
4
That’s the top and bottom of the data pyramid. In between there are
several different layers: anonymous bloggers, and potentially participants in
research projects such as the British Household Panel Survey, the recently
announced ESRC Longitudinal Survey, administrative and commercial data.
My question for the more statistically savvy is how the different levels can be
linked in ways which enable research questions to be asked yet are nondisclosive? Or, to put it another way can one level be used as providing a social
context for another higher level? Are there ways of using data modelling on the
one hand to over come issues about data quality and incompleteness, and on the
other to provide anonymisation where required (perhaps on a time limited
basis)? What are the implications of this for archives? Should data be added to
blogs but not published for fifty years? Is there any realistic hope of establishing
such systems?
Outcomes? Identification of directions to pursue and blind alleys to
avoid. What are the Questions for other agencies and people. Answer to qn
What Next? Where’s the E-science?
What I hope is that the talks this morning and the beginning of the
afternoon will set the stage and frame the discussion in the afternoon which
will, ideally, move us from generalised angst ‘oh its all so complicated’ to the
identification of some concrete, focussed cases where we can actually plan to
5
try and do something!
6
Download