Transcript - WordPress.com

advertisement
New literary histories: The digitisation of print culture and transformation of the
archive
A contextual point to begin with: where much of the serial fiction published in Britain
appeared in magazines and journals, this wasn’t the case in Australia, where
newspapers dominated. Australian readers could subscribe to British and American
journals and magazines – or buy them in some metropolitan newsstands. Mainly for
this reason, local attempts to set-up such periodicals were almost inevitably shortlived. Newspapers, however, were numerous and widely read, and this is where the
vast majority of the fiction serialised in Australia can be found. This recognition of
the importance of newspapers as local publishers has led to a lot of interest from
literary scholars in exploring this archival record. But in Australia, as elsewhere, such
interest has been repelled by the sheer size of the archive. Manually cataloguing the
fiction in Australia’s hundreds of newspapers is simply not feasible.
This situation has now altered, profoundly, with new digital resources and methods.
In this paper I’ll be discussing a project that uses automatic search and harvesting to
identify and extract full text and bibliographic metadata of serialised fiction in the
more than 13 million pages of Australian newspapers, digitised by the National
Library of Australia, and freely available through the Trove database. The outcomes
of this process massively expand our bibliographic record of the fiction published in
Australian newspapers, and I want to show you some early indications of the type of
insights into Australian print and literary culture this expanded record might allow.
However, I also want to pose questions about what it means to know print and literary
culture in this way, and ultimately to suggest that the new approaches to the archive
enabled by digital resources and methods necessitate new ways of conceptualising
both the relationship between literary history and the archive, and the form and nature
of the archive itself. I’ll make this latter argument via a critique of existing data-led –
or “distant reading” – approaches to literary history, and by proposing the notion of
the “fractal” archive as an alternative way of engaging with “big data” while
remaining connected to the complexity and multidimensionality of literary works.
I’ll start, then, with an overview of my current project – its methods, scope, and some
initial findings – which means starting with Trove itself … Trove provides access to
over 13 million pages from over 680 Australian newspapers published up until 1955,
when copyright becomes relevant. To put the scale of Trove into context, we can
compare its 13.1 million pages with the 8.1 million in The British Newspaper Archive
and the 7.7 million in Chronicling America. So, as far as I know it’s the largest.
Rather than searching Trove for specific titles or authors (and thus finding records
already assumed to be present in the archive) we’re leveraging what Jim Mussell calls
the generic forms of these publications by employing words or phrases commonly
used to frame serial fiction in the nineteenth-century Australian press. When I say we,
I’m referring to myself and Carol Hetherington, who’s a bibliographer working fulltime on the project for 3 years, funded by the Australian Research Council.
The first search term we’re trialling is “chapter”, and it’s proven effective in
optimising results for fiction, because the word often occurs multiple times in the text
designated by Trove as an “article” (with a single instalment frequently containing
many chapters) and because it often appears in the “article” title (which is defined as
1
the first four lines, and is the only part of the text manually checked and transcribed,
thus reducing the effects of Optical Character Recognition, or OCR, errors, on search
results). As the project continues we’ll use other search terms, including “serial
story”, “our storyteller” and “new novelist”. Each will have its own benefits and
drawbacks, and ultimately our aim is to employ a range of terms until the returned
results demonstrate a high level of repetition with what is already indexed. We’ll then
sample newspapers to gauge how successful this method has been in identifying
fiction in the digitised record.
The search term “chapter” returned more than 800,000 results, and we exported the
full-text and bibliographic metadata for the first 250,000 using an Application
Programming Interface (or API) created by Tim Sherratt, now the manager of Trove.
Due to the usefulness of “chapter” in optimising the relevance ranking for fiction, we
found that the first 30 or so sets of 5000 results were almost exclusively fiction, with
the share of other records to fiction increasing over the next 20 sets of 5000 results.
We only looked at the first 50 sets of 5000 results because, after that point, we believe
the proportion of fiction found is not significant enough to warrant us continuing (the
relevance ranking algorithm has, for our processes in other words, exhausted its
usefulness). Other results of the “chapter” search not relevant to our project include
reports of meetings of a chapter of a lodge or religious association, accounts of a
chapter in the life of a town or person, or even public documents such as deeds of
grant and regulations organised in chapter divisions.
We took these 250,000 results and removed all duplicates – of which there were many
– deleted non-relevant material, and created and populated a large number of
additional metadata fields. Basically, the API extracts the full text as well as a unique
identifier for the article, the first four lines of text, dates and pages for publication,
information about the newspaper, permanent urls for the newspaper page and the
“article”, and a count of crowd-sourced corrections on the OCR text. We’ve added
multiple fields, some based on bibliographical research (such as the author name,
when the publication is anonymous, their gender and nationality, other sites of
publication for the fiction and so on) and some that relate specifically to the particular
publication event in the newspaper. These include: the author’s name and the story’s
title as they are given (including where that’s anonymous or with a signature – “by the
author of… [other titles]”), the source of the fiction (when it’s indicated in the
newspaper publication) and a range of other fields. I’m happy to discuss this process
in more detail for anyone who’s interested, but I’m basically trying to show that,
while automatic search and harvesting expedites the bibliographic process
enormously, it by no means removes the necessity of bibliographic scholarship.
PP: Our aim is to go up to the beginning of WW1, but at present we’ve only
processed the results of the “Chapter” search for the nineteenth century. This process
has yielded:
 58,717 unique records (or instalments – and remember, we also have the full
text for all of these)
 these instalments constitute 6,269 titles
 1,212 of those titles are completed in one issue (in some cases, these are short
stories with chapters; in other cases, the stories are more like novellas, running
over 10 or more pages in the case of some special supplements. This is a
tricky category, because in some cases, a story that is completed in one issue
2
in one newspaper is completed in two or many more in another).
 altogether we have found 4,076 unique titles (as you can see in the difference
between the number of titles and the number of unique titles, many stories are
published multiple times in different newspapers – and even, in some cases, in
the same newspaper, a decade or so apart).
As I said, many of the authors published anonymously, or used pseudonyms or
signatures only. We’ve been able to identify
 1,693 individual authors of these titles;
 there remain 1,301 titles by authors we have not yet been able to identify
What I want to do now is show some of the ways this dataset might allow us to
explore the publication of fiction in Australian newspapers before going on to
complicate what it might mean to know a print or literary culture on the basis of such
data. I’m only going to show three graphs – and all of these will exclude titles that are
completed in one issue to maintain the focus, however equivocally, on serial fiction.
PP: This graph shows the number of titles per year from 1830 to 1899, and looks
pretty much as we’d expect:
 very little serial fiction until the mid-1870s, around the time we know that
fiction began to be published in large volumes in British newspapers, and
when technological shifts – most importantly, the introduction into Australia
of the high-speed rotary press – led to an expansion in the number of
newspapers that could be printed and a push by newspaper editors to extend
their circulations, in part by publishing fiction.
 rapid growth in the number of titles in the 1880s and early 1890s, as fiction
syndication agencies built businesses on packaging fiction into supplements
(and there’s obviously further research we can do with respect to the incidence
of multiple publications carrying the same titles to explore this phenomenon).
 a fall off in the number of titles serialised in the second half of the 1890s as
the expensive three volume book declined and a plethora of cheap paperback
fiction imports became available, reducing the relevance of newspaper fiction.
PP: This graph shows the nationality of the authors of serial fiction in Australian
newspapers (and nationality, as I’m sure you’ll guess, is another tricky category in
that people move around and, in some cases, can be difficult to assign to single place).
It shows some things we would expect, such as the prevalence of British fiction, and
some surprises.
 In respect to Australian fiction: Bearing in mind there’s a lot of authors of
unknown nationality, this graph shows a much higher proportion of such
fiction than I would have assumed. If the ‘transnational turn’ in Australian
literary studies has done anything in the last decade it’s emphasised the lack of
interest of nineteenth-century readers in Australian writing. This graph
suggests there was a market for such titles.
 In respect to American fiction: Such fiction enters the scene as technological
changes increase the size and potential circulations of newspapers in the
1870s. Where the proportion of British fiction remains quite stable, the
proportion of Australian titles declines. Whether this trend is due to an
inadequate supply of local fiction – and the need to resort to American fiction
– or to something else is question worth exploring.
3
PP: This final graph indicates the gender of authors of these serialised titles; and
gender is actually the main reason I started this whole project because, in earlier work
on the serialisation of Australian novels, I was struck by the high proportion of men’s
novels serialised locally. In contrast, Australian women’s novels tended to be first
serialised in Britain and to achieve British book publication in that context. Given the
established account of the nineteenth-century novel, and its serialisation in particular,
as dominated by women, I was interested in how the high rates of local serialisation
of Australian men’s novels related to the broader context. Was nineteenth-century
serial fiction publishing in Australia – unlike in Britain or America – exceptionally
male-dominated; that is, was there a preference for writing by men, regardless of
national origin? Again, bearing in mind the high proportion of authors of unknown
gender, this graph suggests there may very well have been a particular focus on men’s
writing in Australian newspapers. Just a note here: the “Female?” and “Male?”
categories I’ve used when the pseudonym is explicitly gendered but the gender of the
author is unknown: so “A Mildura Lady”, “A London Man”, to give just two
examples.
PP: A final table, because I thought this might be a question that would interest some
of you, and because it suggests interesting avenues of exploration for this maledomination hypothesis, this shows the top 10 most serialised authors in 19th century
Australian newspapers, based on our dataset.
So, obviously, many more questions to ask: What can we make of this table? Which
were the most popular titles? How did this change over time? Does the serialisation of
fiction in metropolitan and regional newspapers differ? I could go on. But I’m not
going to.
Instead, I want to ask some different questions: epistemological ones relating to what
these graphs might be representing, and what it might mean to ‘know’ literary and
print culture in this way.
Let me segue to this issue by noting some of the complexities that these numbers and
graphs elide. I’ve mentioned some basic ones already: the fact that a story might be
completed in one issue in one newspaper while going across many in another; the
difficulty of ascribing a single nationality to certain authors. There are many more,
but I’ll specify just two:
1. The first we might call bibliographical complexities. These are present in any
publishing context, but are especially prominent in respect to nineteenthcentury serial fiction, particularly in newspapers. Not only were works
frequently published anonymously, pseudonymously, or only with initials or
signatures, but they were often reprinted with different titles and attributions
in different newspapers. We’ve found cases with up to eight different titles
accorded to substantially the same text. And as fiction moved across national
borders in the nineteenth century, it was often plagiarised, rewritten and
localised. For instance, American author “Old Sleuth’s” novel The American
Detective in Russia is serialised in several Australian newspapers as “Barnes,
the Australian Detective”.
2. Another layer of complexity relates to the collection from which this data is
drawn. Trove is a substantial database of digitised newspapers, the largest,
internationally, in terms of the number of pages and proportion of holdings
4
digitised. However, not all Australian newspapers are digitised, whether due to
priorities in the process or because the archival record has been damaged, lost
or destroyed. Our searches will also not uncover all instances of serialised
fiction in the digitised record, whether due to OCR errors or inappropriate
search terms. For instance, “Chapter” will not pick up titles where the word
“chapter” is not used, but where the fiction is divided into sections using
roman numerals.
How do we reconcile this complexity with the forms of analysis I’ve depicted? Is it
possible? These questions bring us to what I think is a central challenge confronting
humanities research at a time where we have – and increasingly will have – access to
previously unimagined masses of data. I want to engage with this debate, first, by
offering my own criticism of existing data-led approaches to literary history, and
second, by proposing the alternative model of the “fractal” archive as a means of
reconciling the power and potential of “big data” with the complexity of the cultural
objects we study.
The author who is most well-known in terms of data-led approaches to literary history
is undoubtedly Franco Moretti, whose “distant reading”, has become the default term
to refer to studies of this type.
PP: Here’s his influential book, Graphs, Maps, Trees, published in 2005, and
appearing just last year, Distant Reading.
PP: Less well known generally, but certainly very influential in digital humanities, is
Matt Jockers, and he calls data-led analysis “macroanalysis”. That’s his 2013 book
and one example of the types of visualisations he does, showing in this case how
Melville’s texts stand out from others in terms of the results of topic modelling.
This broad concept of data-led literary history, and Moretti’s work in particular, has
received a lot of criticism as well as a lot of acclaim, and I’m happy to discuss this.
But what I want to do, in the service of discussing how data-led analysis should
progress, is to make just one criticism of such research; one that is rarely, if ever,
made, but that underpins the problems I see with existing approaches: namely, the
failure of these authors to publish the datasets on which their analyses are based.
In not making their collections public, Moretti and Jockers have no need to describe
or justify the basis of their research: the extensive processes by which they collect,
clean and curate the bibliographic or textual data they analyse. In turn, these datasets
and their composition are not subjected to scrutiny, and other scholars have no way to
assess – and potentially challenge – these authors’ arguments, because they have no
access to the grounds on which they’re made. In failing to publish their datasets these
authors implicitly deny the inherently mediated nature of all data, and work with their
collections as if they provide direct access to things in the world.
One reason this denial works is because of the rhetorical power of data. For many
reasons, data appears to us – that is to say, the rhetoric surrounding data makes it
seem – as true, objective, seamless, totalising and commensurate with the world.
Based on this rhetoric, these data-led studies do not need to show us the underlying
dataset because these datasets simply reflects the world, which is there for all of us to
5
see. It’s difficult to propose an analogy in non-digital literary studies for what these
authors are doing without sounding hyperbolic, but it’s akin to a literary scholar
finding a set of documents in an archive or archives and transcribing them, analysing
the transcriptions, publishing the findings as demonstrating an entirely new (and more
accurate) perspective on the literary field, and then refusing to allow anyone to read
the transcriptions or to reveal what the original documents are or where they are
located. More importantly for my purposes, it’s as if the commentators on this
research focused on criticising – or celebrating – the transcriptions and their analysis,
and ignored the fundamental concealment.
I say more importantly because the absence of criticism of this type points to a more
insidious reason for the current state of affairs. Specifically, I would suggest that these
authors avoid criticism for not publishing their datasets because this practice resonates
with established biases in traditional literary studies: and I’m referring here to a lack
of attention – even a resistance to attending – to the foundations of scholarship. As
has been noted by a number of textual scholars, most vociferously, Jerome McGann,
those forms of literary studies such as scholarly editing and bibliography that produce
the infrastructure for literary criticism and theory have been marginalized for many
years as mere service or support disciplines. From this perspective, one reason
Moretti’s work in particular has garnered such acclaim – even winning the 2014
National Book Critics Circle Award – is not only because of its interpretive flair, but
because – in refusing to describe or even mention the processes by which he arrives at
his various “abstract models” – he replicates deeper biases within the discipline: that
is, the perception of activities that might be aligned with the “lower criticism”, and
thereby associated with the routine, the technical, the operational, as essentially
beneath notice. What I’m suggesting then, is that while data-led and traditional
literary studies are often presented as fundamentally different modes of analysis, in
fact they share a lack of attention to the infrastructure on which we base our research.
We can relate this seeming contrast, but actual resonance, between digital and nondigital literary studies to the frequently proposed opposition between “close” and
“distant” reading. Earlier I noted that, in Australian newspapers, stories with
substantially similar texts are published under multiple titles and with various
attributions. Such transformations highlight the problem with analyses that render
literary works – such as Melville’s novels – as single data points. In terms of serial
fiction in Australian newspapers we can ask, which of the many publication events
constitutes “the novel”? What is the relationship between these publication events and
prior or subsequent serial or book publications? Are works published with different
titles but substantially similar texts the same novel? And so on.
Such challenges to the boundaries and definition of a work are brought into stark
relief by the chaotic world of nineteenth-century newspaper publishing (what I’ve
called its bibliographic complexities). But the implications of such questions apply
generally. However we might think of them, literary work only exist in material form.
As a result, as the cultural record is altered – for instance, as the last copy of an
edition of a newspaper is destroyed – or new cultural objects accumulate under the
banner of the work – through republication and new editions – the components,
boundaries, and hence, the definition of a work changes. In treating literary works as
stable and singular entities, “distant reading” works to obscure their multidimensional
and transformative nature.
6
Although routinely presented as diametrically opposed to distant reading, the vast
majority of close readings abstract literary works in similar ways. By proposing to
identify their meaning through analysis of the (stable, singular) “text,” close readings
also ignore and obscure the way literary works transform over time. As a work is
reissued and republished, both text and paratext, and their combination, change, thus
altering the meanings available. Not only is each reading an encounter with a specific
material document, but it’s an encounter by a specific reader – at a specific time and
place – and this also changes the possibilities for meaning.
The way I see it, then, we have, on the one hand, digital literary scholars gesturing to
the constructed nature of data, but not publishing their collections. And on the other
hand, literary critics routinely describing “texts” as constituted in and by reading
contexts, but more often than not failing to incorporate any discussion of the material
evidence of these contexts into their analysis.
Is there an alternative? I believe we need to embed our work much more thoroughly
and consciously in the archive, but also, that as the remediation of our cultural
inheritance changes that archive, we need to retheorise what is meant by the archive
and the relationship of literary studies to it. To this end, I want to propose fractal
geometry as a framework for conceptualising the archive in a way that enables “big
data” analyses that do not obscure the complexities of literary works, while enabling
us to work across mixed – digital and non-digital repositories.
Many of you will be familiar with the concept of fractal geometry. It was originally
proposed by mathematician, Benoit Mandelbrot, for the purpose of measuring
irregular or fragmented shapes or patterns, particularly those that exist in nature. Such
shapes or patterns cannot be confined to a single dimension – or anything less than an
infinite number – because their dimensions change depending on the scale at which
they’re measured and, thus, the degree of complexity or detail encompassed by the
measurement. The example Mandelbrot used to demonstrate this point is the coastline
of a country. If you measure that coastline with a kilometre long stick – by which I
mean, a kilometre long unit of measurement – you’ll get a different total length than if
you measured it with a metre long or centimetre long stick. The point of fractal
geometry is not that the more detailed measurement is correct, because we never
approach absolute equivalence with the object measured: which means we could
never accurately measure it by that logic. Instead, the different measurements that
occur at different scales are a ratio of – or reference to – their own degree of detail
and complexity.
PP: This concept of fractal geometry has been discussed in literary studies previously,
most importantly probably by Wai Chee Dimock, who used it as a framework for
conceptualising genre, and specifically, the relationship between the novel and the
epic. Responding, in fact, to Moretti’s “distant reading” – and his claim that supposed
access to the “literary system” justifies the disappearance of individual texts –
Dimock responds (I won’t read the passage aloud; I’ll let you read it yourselves):
If fractal geometry has anything to tell us, it is that the loss of detail is almost
always unwarranted … [because] the literary field is still incomplete, its
kinship network only partly actualized, with many new members still to be
added. Such a field needs to maintain an archive that is as broad-based as
7
possible, as fine-grained as possible, an archive that errs on the side of
randomness rather than on the side of undue coherence, if only to allow new
permutations to come into being. (79)
Dimock’s work points in useful directions. But her deployment of fractal geometry
remains at an entirely metaphoric level. That is to say, she uses fractal patterns (which
contain within themselves near identical copies of the whole) as a metaphor for the
shape of discursive relationships, and the archive to which she refers is a metaphoric
one, comprised of all discourse (as in Derrida’s theorisation of this term) rather than
actually existing institutional holdings. I want to think about how to “thicken” the
archive in a more literal sense: that is, how we might align our knowledge
infrastructure more explicitly, critically, and practically, with actual archives,
including digital ones. For this purpose, Jerome McGann’s latest book, A New
Republic of Letters has proven useful. McGann argues that each cultural object is
comprised of the co-dependent relationship between its histories of production and
reception. These histories are produced by multiple actors, both organic and
inorganic, and as these histories are added to – including through our own scholarly
activities – the nature and meaning of those cultural object changes. Scholarship, from
this perspective, is both a performative act and an indeterminate endeavour, which
involves accumulating as much information as possible about the object of study,
while acknowledging that this process can never be complete because those histories
only survive in part, and because we can only ever be partially aware of the complex
system of influences we are exploring, contributing to and altering.
Based on this framework, a fractal archive would provide an approach to the archive
that thickened our understanding by focusing on accumulating information about the
histories and production and reception, specifically. But that wouldn’t rest at that
process. Rather, it would incorporate into our understanding of and approach to the
archive the very relationship fractal geometry proposes between measurement, scale,
dimensions and complexity. And it would do this in a specifically self-conscious way,
by reflecting its own part in producing and altering the histories it represents. My key
claim in proposing this idea of a fractal archive is that a focus on measurement – on
what is measured and how, as well as what is not measured – might enable a
conception of the archive in terms of dimensions that are never absolute, that change
over time, including through our scholarly activities.
Let me ground these rather abstract pronouncements in my specific project. A focus
on measurement in conceiving the archive emphasises that we are not accessing serial
fiction in Australian newspapers in any sort of direct or unmediated way. Rather,
automatic search and retrieval methods provide proxies or models – in the form of csv
and text files – for the digitized newspaper pages in Trove; these digitized pages are,
in turn, proxies – rendered through OCR – for the newspapers, or microfiche models
of those newspapers, held in libraries throughout Australia. These physical
newspapers are, in turn, proxies – by virtue of their collection in the archive – for the
many newspapers of the same name published on the same date, the vast majority of
which no longer exist. As this latter point emphasises, the print cultural objects in our
archives have always been representations of – or models or proxies for – the plethora
of print cultural objects that circulate as, and constitute, the same work. In an
important way, then, this conception of the archive is not a shift but an amplification
of a pre-existing state that now requires explicit theorisation as we move to work
8
across digital and non-digital repositories.
What is not measured is another important question. In respect to this project, the
search terms used to identify fiction in nineteenth-century Australian newspapers will
not uncover all relevant documents for multiple reasons, including OCR errors and
the fact that many newspapers have not been digitized (as I noted previously). More
broadly, while each instalment is related to earlier and later instalments of that story
in the particular periodical, and to the literary work as it exists as an abstract form in
literary history, the archive we are building does not – at present – measure most of
the ways in which these stories exist and operate in relation to other features of the
documentary record. I’m referring, here, to the news, advertising, illustrations, letters
and opinion which these fictions are published alongside or refer to, as well as the
innumerable systems to which each newspaper connects: whether relating to various
agents (authors, publishers, editors, typesetters, readers, and so on) or to broader
political, commercial, cultural, and economic contexts, within Australia and beyond.
This emphasis on what is and is not measured and how is not intended as an exercise
in self-flagellation (this archive is hopeless; it misses so much) but to highlight that
any change in what is measured or how will change the dimensions of the archive.
Such changes might include: improvements in OCR or alterations in the search
algorithms used by Trove. More broadly, I might focus on a particular title or author
and discover new information that is added to the archive; or at the other end of the
scale, someone else may use one of the other national newspaper digitisation projects
– more likely, a freely accessible one, than the British repository – to create an
archive of fiction in other newspapers, which could be linked to this one.
Another way of making this point is to say that emphasizing measurement
foregrounds the provisionality of the fractal archive, in that any new or different
measurement will, of necessity, change its dimensions (by changing the range of
cultural objects represented and, in the process, altering the nature and meaning of
those objects already represented). But such provisionality does not prevent analysis,
or render any analysis of the archive wrong, once the dimensions change. As long as
the fractal archive is also published, any interpretations made on the basis of it stand
in relation to – and index – what is known at a particular time and place: and
subsequent critics can access and assess this knowledge base when considering those
interpretations. In these ways, the fractal archive highlights the multilayered
composition of all cultural objects and of the histories of production and reception
that comprise them, while also standing as a marker for the current moment of
interpretation and a manifestation of how that particular interpretation transforms the
object/s represented.
There’s a lot more I could say – and a lot more than needs to be said – about this idea
of the ‘fractal archive’ if it’s to function as I’m suggesting – as a replacement for
current “distant readings” that abstract literary works from their material context. But
I want to end with two points that I hope clarify how the fractal archive might serve
this purpose, before raising some questions about how to proceed that remain for me.
The “fractal” archive I’m imagining can function as provisional, but still deployable,
and as systemic, but non-totalising, for two main reasons: first, it’s an archive, not a
9
theory; and second, unlike archives as we have traditionally constructed and
employed them, a “fractal” archive is built for the specific purposes of analysis.
I mean a fractal archive is an archive, not a theory, in two ways. First, I’m specifically
not using the notion of fractal geometry or fractal patterns to describe the shape of the
archive or of the literary field, as Dimock does. Second, and in relation to this, I’m
intending the fractal archive to specifically work against strong tendency in literary
studies to understand a literary field – whether that’s Victorian literature or the “world
literary system” – based on generalisations from a select number of examples. In this
approach, a theoretical framework or metaphor is proposed on the basis of a handful
of examples, and then generalised as a law or principle for understanding how
literature works (such as, with Dimock, the fractal pattern). This type of reification of
theory or metaphor is, in my opinion, just as problematic as Moretti or Jocker’s
reification of data.
Using a fractal archive, one does not approach the object of study with a theoretical
construct, but from the basis of ignorance: I don’t know what I’m going to find. This
is because, as I said, while fractal archives are archives (not theories) they’re also
different from archives as we’ve traditionally employed them because they’re
constructed for the purposes of exploration: to enable representation, and analysis, of
cultural objects in multiple dimensions. At the same time, because their construction
foregrounds the many things they do not measure – and the provisionality of their
measurement – they aim to avoid the perception of data as objective and
commensurate with the world and the flow on from this, the use of data to propose a
literary law that comes to function precisely in the way that theoretical constructs and
metaphors do: to ignore and obscure the irreducibility of print and literary culture.
So, that’s my thinking about fractal archives. I want to end with two questions.
First, I’m exploring this idea of the fractal archive in order to explore the
multidimensionality of literary works: for instance, using a consideration of the
publication of titles over time (as I showed in the first graph) as a framework for
focusing on particular titles, publications, authors, or years. But how this might
occur? Can it proceed on a provisional basis – things that strike me as interesting
about the data, much as textual analyses currently proceeds in respect to literary
works – or is a different, more typological or systematic approach required?
Second, if an analysis is embedded in a fractal archive – so indexed to the particular
dimensions and complexity of a particular collection of data at a particular time – do
we need to embed multidimensionality in visualisations of the data?
PP: To express this question starkly: is this graph on the left (indicating flatness but
based on an archive that considers literary works as multidimensional – constituted,
for instance, by multiple publication events) better or worse than this graph on the
right, from Jockers’ Macroanalysis (suggesting multidimensionality but is based on a
conception of literary works as singular and stable; not to mention based on a dataset
that can not be accessed and analysed)?
Those are some of the questions I’m considering, but you might have others and I’d
love to hear them.
10
Download