full paper - David Vishanoff

advertisement
Genealogies of Qur’anic Hermeneutics:
Tracing Trajectories through Online Data
David Vishanoff
Associate Professor, Religious Studies Program, University of Oklahoma, USA
August 6, 2015
International Conference “ New Trends in Qur’anic Studies”
Co-hosted by the International Qur’anic Studies Association (IQSA) and
the State Islamic University (UIN) Sunan Kalijaga, Yogyakarta, Indonesia.
Abstract
One novel approach to tracing the complex networks of ideas that have contributed to contemporary
theories of Qur’anic interpretation is the application of data analysis and visualization techniques to the
many large databases of Islamic and non-Islamic texts that have appeared online in recent years. Online
translation engines, cluster analysis, and natural language processing techniques such as topic modeling
can be combined with visualization techniques such as network maps, dendograms, word clouds, stream
graphs, alluvial diagrams, scatterplots, treemaps, and cluster layouts to produce schematic
representations of the relationships between texts, authors, and concepts across the various disciplines
that have contributed to Qur’anic hermeneutics—disciplines like literary theory, philosophical
hermeneutics, semiotics and pragmatics, Biblical studies, Qur’anic exegesis, Islamic legal hermeneutics,
and the Qur’anic sciences. Given the limitations of the available databases and online tools, the resulting
visualizations cannot be considered definitive genealogies of contemporary Qur’anic hermeneutics, but
they can serve as roadmaps to guide the slow, careful work of intellectual history in more efficient and
innovative directions, and they can reveal conceptual resonances and lines of influence that otherwise
might be discovered only through years of reading, or not at all. This paper presents a vision for a set of
software tools that is being adapted for this purpose by Exaptive, Inc., and the University of Oklahoma,
illustrates it with sample visualizations based on a database of personal notes, and indicates some of the
anticipated challenges and strategies to be pursued.
Introduction: from tracing genealogies to charting currents beneath an ocean of data
When I did the research for my first book, I spent most of my time combing through three or four
shelves of Arabic books on Islamic legal theory. Every time I found a hermeneutical principle for
analyzing Qur’anic language expressed by or attributed to an early Muslim author, I would write out a
little note card to that effect. Over the years I accumulated about thirty–eight hundred such note cards
relating to some twelve hundred scholars. Fortunately, it occurred to me early in the process to write
1
those cards electronically, in a database that I created using Microsoft Access, so that when it came time
to write a paragraph tracing the genealogies of Muʿtazilī theologians’ views on the problem of general
and particular expressions, for example, all I had to do was run a query to display in chronological order
all the note cards I had complied on that topic by scholars I had flagged as Muʿtazilīs, and I was ready to
start writing.
All of that took time, of course, and some practice with the database software, but oh, how easy it all
seems now! I was working with a limited canon of texts in a single language reporting the views of a
manageable number of thinkers on a well defined set of questions. Now that I have rashly decided to
wade into the much less orderly world of modern Islamic hermeneutical theory, I am faced with an
explosion of writings in multiple genres and languages approaching the problem from numerous angles.
I have set myself the foolhardy goal of tracing the genealogies of contemporary Qur’anic hermeneutical
theories from classical Islamic and modern European discourses through the various permutations and
transformations they have undergone in the twentieth century. Such an ambitious project, however, will
require me to study numerous unfamiliar discourses and several new languages. The task seems less like
genealogical research than like trying to chart the currents beneath the surface of a vast ocean, with
only a rudimentary map to go by showing only the most famous ports of call along the coastline. How
does one proceed? Not alone, surely, yet however much they collaborate scholars must still draw their
own mental maps for themselves. If I am to produce any kind of meaningful narrative about the last
century’s developments in Qur’anic hermeneutics before I reach my limits and am forced to retire, I will
need a new approach. Note cards—even electronic ones—will no longer suffice.
If only I could get a snapshot of what is going on in the torrent of new books on Qur’anic hermeneutics
being written every year here in Indonesia and across the Muslim world without actually having to read
them all! That is the point of what people in the digital humanities call “distant reading,” which cannot
take the place of close reading, but has the potential to help us contextualize our close reading and plan
it more strategically. At present, however, technologies for distant reading typically require one to
compile for oneself a single set of digital texts, and do not offer much flexibility in analyzing and
visualizing those texts. To help remedy that situation I am engaged in a pilot project with the University
of Oklahoma Libraries and a research software firm called Exaptive to develop a customizable toolkit
that will allow me and other researchers to cobble together, on the fly, multiple existing data analysis
and visualization tools, and apply them simultaneously to multiple online databases of Islamic and nonIslamic works on hermeneutics or any other topic. This paper puts forward my vision for how that
software might serve the research process, illustrates that process with visualizations created from my
database of personal notes, and indicates some of the anticipated challenges and strategies to be
pursued.
Proposed sources of data
I have identified several databases of digital texts and bibliographic data that look usable and that
provide sufficient coverage, for now, of the literature I wish to explore. The Exaptive application, or
“Xap,” will attempt to integrate resources such as the following:

Al-Maktaba al-Shāmila (al-maktaba al-shamila), a downloadable, locally searchable, clean and
very extensive database of classical Arabic texts on various Islamic sciences. Including this ever
growing library will ensure that when I use the software to visualize clusters of texts on related
topics, classical Islamic works will be part of the picture. Their vocabulary will differ somewhat
2



from modern works, while overlapping with some of them. Visualizing which clusters of modern
texts use which parts of the classical vocabulary of Qur’anic exegesis, theology, legal theory, and
literary theory will be interesting in its own right, but some modern works will discuss classical
concepts using new vocabulary in multiple languages, and it will be necessary to use translation
equivalency tables or “thesaurus files” to link terms for related concepts in different languages
and vocabularies. Furthermore, many modern texts will deploy classical terms and concepts in
new ways, so it will be instructive to visualize which groups of authors use terms like taʾwīl
(interpretation), for example, in proximity with terms like raʾy (reasoned opinion) and
rationality, and which authors use taʾwīl alongside different notions like historical or social
context. It will also be possible to trace which modern discourses tend to cite which classical
authors most. Visualizing the connections between the classical texts in al-Maktaba al-Shāmila
and the modern texts in the other databases will be crucial for tracing the classical Islamic
dimensions of the intellectual genealogies or currents of contemporary Islamic hermeneutical
thought.
In order to put some modern Arabic writings in the picture, the software will also employ the
New Alexandrian Library’s Digital Assets Repository
(http://dar.bibalex.org/webpages/advancedsearch.jsf), a thoughtfully designed and easily
accessible resource that includes some 30,000 titles on religious topics alone, including not only
classical Arabic and modern European-language works but also a wide range of recent Arabic
titles. Because many of these works are still under copyright, their full text is not readable
online, but it is fully searchable, which will allow the software to determine which modern
works use which clusters of concepts and cite which other scholars.
A much larger database of full text books in multiple languages is the HathiTrust Digital Library
(http://www.hathitrust.org/), whose holdings overlap significantly with Google Books but are
more readily usable for academic research because the HathiTrust is governed by member
universities, not by commercial interests. In fact, the HathiTrust has just released a large
database (https://sharc.hathitrust.org/features) giving page by page word counts for the public
domain books in its collection, which will greatly facilitate the kind of vocabulary analysis and
topic modeling the Exaptive software will be doing. The HathiTrust’s Arabic holdings differ
substantially from, and should complement well, the digital archive at the Alexandrian Library,
but the Trust’s greatest strength is in European languages. This will allow me to include in my
research universe a long history of European publications dealing with language and
interpretation, and will thus help to identify which European thinkers deployed similar
combinations of concepts as modern Muslim thinkers and which European works are most cited
or appear to have been most influential in the development of various strands of modern
Muslim hermeneutical thought.
Capturing recent publications in non-European languages other than Arabic will be more tricky
because these works are not as well represented in major digital collections. Some such works
have been digitized, however: for example, all the undergraduate theses and doctoral
dissertations written here at UIN Sunan Kalijaga or at any other State Islamic University in
Indonesia are available online, but they are housed in separate digital repositories
(http://digilib.uin-suka.ac.id/, etc.). Fortunately, Google Scholar (http://scholar.google.com)
indexes these sites, so it is not necessary to search them separately; incorporating Google
3


Scholar into the Exaptive Xap will make it possible to analyze and include these non–European
works, as well as recent European–language scholarship, in my textual universe.
Unfortunately, many recent commercially printed books are not readily searchable online. They
may be floating around the internet as pdf files, but those possibly illegal copies are not
searchable through any one interface. It will be necessary, therefore, for me to scan for my own
personal use some of the most promising–looking Indonesian books that I will be taking home
with me from the Social Agency Bookstore that is just down the street from here. Those books
whose tables of contents indicate they are likely to be most relevant for my research will be
scanned and processed using optical character recognition software, and the resulting full text
files will be analyzed by the Xap and integrated into the results it displays so that I can spot
possible connections and convergences between these very recent Indonesian titles and all the
other classical and modern works in various languages in all the other databases. That should
help me decide which Indonesian books to read first.
Finally, my own database of notes, which I created in Microsoft Access for my previous
research, and which I continue to use for this new project, will be integrated into this software
(either in Access or in some other format) as a kind of overlay, superimposed upon the
relationships and connections that the software discerns among the online books. When my
own database indicates that a certain author has quoted another, or has discussed a certain
concept, this will appear as a highlighted relationship in whatever visualization I am using. And if
the software shows a potential citation or resonance between two works, and I choose to
investigate by clicking on that relationship to pull up the texts themselves, I will be able to
annotate that relationship in my own database, and thereafter those two works will be shown
as connected in some highlighted fashion. This will allow me to build my own visual analysis of
the currents, genealogies and resonances between works and authors right on top of the
connections that are suggested by the software.
Proposed methods of analyzing and visualizing relationships
For now, that personal database of notecards provides a small body of data—several thousand records
relating to about a thousand authors—that can be used to illustrate some of the analysis and
visualization techniques that the Xap could perform on the larger universe of online books and
bibliographic data. Let me illustrate how I imagine using the Exaptive Xap to explore the world of
Qur’anic hermeneutics.
When one ventures into a new field of research in the humanities, one has only a starting point: a basic
set of ideas or texts or people in which one is interested. The first thing I will want to do is explore
outward: starting from those ideas and those authors’ names, I want to know what other concepts have
been discussed in association with those ideas, which authors have written about them, and how those
concepts and people could be grouped into connected yet distinguishable “discourses” around which I
might structure my research.
Ideally, I would start by looking at a giant network map of ideas and authors from the whole history of
human literature. This, however, is too large a computational problem: it would require “topic
modeling” on the whole Google Books corpus, something not even Google has attempted. Fortunately,
however, this is unnecessary, because one can use the concepts and names one already knows as seeds
to help one define what parts of the vast online corpus to analyze and map. Let’s see what we get, using
4
the relatively small dataset of my own electronic notecards, if we start with my seed concepts of
“Qur’an” and “hermeneutics” and generate a list of other terms that occur most commonly alongside
those two words. Extracting all the notecards that use either of these two terms, and running the text of
those notecards through the clustering and mapping algorithms of the freely available VOSviewer
software, we obtain the following visual map:
The software has sorted the most common terms, from all my notecards involving Qur’an or
hermeneutics, into colored clusters of words that tend to occur together. From this picture it becomes
immediately apparent that there are several different kinds of discourse around my two seed terms:
there is a discussion of God’s attribute of speech (in blue); there is a related but distinct discourse about
khalq al-qurʾān (the doctrine of the created Qur’an, in purple—Qur’an has been truncated to “qur,”
apparently because the software does not allow a word to contain an apostrophe); there is a separate
discourse (in red) about the nature of language and revelation; there is another discourse (in green)
involving interpretive problems such as apparent meaning (ẓāhir) and figurative language (majāz); and
another discussion (in yellow) about commands and prohibitions. This first research step, starting with
two seed words, has given me a longer list of terms that tend to come up in discussions of the Qur’an
and hermeneutics: speech, created and eternal, revelation, language, figurative language (majāz),
apparent meaning (ẓāhir), command, and prohibition. This expanded list of terms defines my research
universe—the subset of the universe of human writing that I will proceed to explore.
The next step outwards in the exploratory process would be to identify a universe of authors who have
discussed these terms. In the real world of online databases, this will be a very large set of texts. For
now, let’s pretend that my entire database of notecards is that universe of authors and texts, and let’s
see how we could go about exploring it. If we use the VOSviewer to map and cluster that entire
universe, this is what we get:
5
At first glance this looks like an unstructured mass of names and terms, but in fact both the colors and
the position of the dots are meaningful: the more two words tend to appear together, the closer
together they are on the map, and those that occur together most often are grouped by colors
indicating distinct discourses. Indeed, we can see that the VOSviewer’s algorithms have identified some
very significant discourses and groups of authors. Off on the fringe to the right is a cluster of brown dots
including the names of several famous and rather cutting–edge contemporary Muslim writers about the
Qur’an and hermeneutics: Nasr Hamid Abu Zayd, Ebrahim Moosa, Farid Esack, Fazlur Rahman, and, a
little further in, the less innovative Muhammad Quraish Shihab. How did the software know that they
belong together? It didn’t; it just noticed that they happen to come up in similar contexts, and if I did not
just happen to be already familiar with them, this map would be my first clue that maybe I should read
them together. In addition to names, terms also are helpfully grouped: in red we have general
considerations relating to Islamic law: God, knowledge, law, reason, person, state, theology…. In blue we
have considerations of speech and meaning, as well as the related modern notion of a “speech act” and
the pair “expression” and lafẓ (verbal expression). In green we have amr, which means command and
thus fits well with wujūb (obligation), nahy (prohibition), act, and irāda (will). In purple we have
figurative (majāz) and literal (ḥaqīqa); in brown we have general (ʿāmm), particular (khāṣṣ), and
particularization (takhṣīṣ). In yellow we have the famous scholar al-Shāfiʿī (whose name the software has
truncated to Shafi), who wrote a great deal on Hadith and Qur’an (truncated to Qur) as well as clarity
(bayān). Finally in light blue we have qiyās (reasoning by analogy) right next to the name of the Ẓāhiriyya
(“literalists”) who argued fiercely against analogical reasoning, as well as the Ẓāhirī thinkers Dāʾūd and
Abū Saʿīd al-Nahrawānī. In other words, the software has taken an amorphous corpus of texts (or in this
case my notes about them) and sorted them out to show what the main topics are and who some of the
main thinkers are who wrote about each. In the VOSviewer we can zoom in on any part of this map to
see other terms and names associated with each colored discourse. But even at this distance we gain
from this map a pretty clear sense of the main topics in Islamic hermeneutics. We can also visualize the
same data in a more readable “density” map that colors names and terms by their frequency of
6
occurrence. This format is less helpful in identifying separate discourses, but it does still group together
spatially those names and terms that tend to occur together:
This completes our first outward, expansive exploration; we have now defined a universe of terms,
texts, and authors, and we are in a good position to select a few key terms and start focusing on them
more narrowly.
One way to start whittling down the list of terms is to let the software select only the most common
ones, or the ones to which it assigns the highest “relevance scores,” and rerun the clustering and
mapping algorithms on only those most important terms, resulting in a much simpler picture:
7
This smaller set of terms can, in turn, be expanded to show all the people who, in my database of
notecards, have had anything to say about just those several dozen terms:
Even with just a few dozen terms, this is still a complex picture when you add all the names of authors,
but if we use the VOSviewer to zoom in on particular sections to read the names that are not yet visible
at this scale, we will soon discover that this map is actually quite a bit simpler than the previous network
map that showed all the significant terms and names in my database. One thing this diagram already
shows quite clearly is that there is an especially distinct cluster of people, shown in purple and bulging
off the cloud on the left, who are engaged only or mainly with tafsīr, the formal discipline of Qur’anic
exegesis. Most terms fall toward the middle of the diagram, because they are linked to many different
names, and most names (except those of the most wide-ranging authors) tend to be around the margins
because they are linked only to a modest number of terms. But the term tafsīr is off to the side because
most of the people who use it can be grouped together, and they tend not to be concerned with other
terms (at least not in the notecards in my database). This suggests that tafsīr (Qur’anic exegesis) can be
treated as a somewhat separate discourse from the remainder of the discussions going on in this little
literary universe.
The same information can be made more readable as a “density” map, this time colored to indicate the
distinct discourses:
8
Our literary universe has now been reduced to writers dealing with just a few dozen of the most
common terms, but we aren’t equally interested in all of them. A next step inward would be to select for
ourselves just a handful of the concepts shown here, and create a clearer map showing in more detail all
the authors involved with those few terms. At this stage, when the researcher is manually selecting a list
of terms on which to focus, it can be useful to create a “thesaurus file” that equates synonyms so that
the Arabic “amr” and the English “command,” for example, are treated as a single term, and references
to both of them are aggregated and displayed together. We can even define little discourses—clusters
of different but closely related terms that tend to be discussed together, such as the related set amr /
command, nahy / prohibition, and wujūb / obligation, which we noticed occurring together in an earlier
diagram. At present such thesaurus files (or translation equivalency tables, or discourse definition
tables) must be created manually, but ideally the Exaptive Xap will allow the user to link related terms
that appear in a diagram at the click of a mouse, and immediately collapse them into a single dot on the
map. The user will be able to look at the terms that have been grouped together by color based on their
frequency of co-occurrence, and select those terms that he or she wants to treat as parts of a single
discourse. Thereafter all references to commands, prohibitions, and obligations, in any language, would
always be displayed together as references to the “command discourse,” and authors who frequently
use any of those terms would all be linked together around a large dot labelled “COMMAND.”
For example, let me select from my universe of terms five main discourses that interest me—commands
(amr, nahy, wujūb), speech (kalām, khiṭāb), general expressions (ʿāmm, khāṣṣ, takhṣīṣ), literal meaning
(ḥaqīqa, ẓāhir), and clarity (bayān, as well as ambiguity, iḥtimāl). Suppose I link the main terms
associated with each of these discourses using a thesaurus file. Then I display all the authors who have
had anything to say (or at least a minimum number of things to say) about any of the terms associated
with these discourses. The result would look something like this:
9
Or, more helpfully, like this:
Suddenly we have a much more manageable diagram, and it reveals several interesting things. First, the
whole discourse about general and particular expressions (which was relevant for legal questions such
as whether “kill the unbelievers” means “kill all unbelievers” or only applies to some of them) is rather
isolated from the others; many writers in my database are concerned with this problem to the exclusion
of my other chosen discourses. (If we zoomed in using the VOSviewer we would see that many of these
thinkers, who appear as a tight constellation of red dots off to the right, are Muʿtazilī theologians and
Ḥanafī jurists.) On the other hand, authors who discussed commands also tended to discuss the problem
10
of the clarity or ambiguity of revealed texts; we can tell that because the software has placed those two
terms close together. The more theoretical question of what God’s speech is and how it communicates
is a bit more separate, in blue; it attracted its own little contingent of thinkers, including Ashʿarī and
proto-Ashʿarī theologians like al-Bāqillānī and Ibn Kullāb. To my surprise, the famous jurist al-Shāfiʿī
appears here in purple, quite close to the discourse about literal meaning; I do not consider him a
literalist at all, but he must at least have had a strong interest in literal interpretation because he
appears here right next to the Ẓāhirī literalist Ibn Ḥazm.
Another way of picturing the data about these five discourses would be to order it chronologically, using
the death dates instead of the names of the authors to show how discussion of each of these five topics
ebbed and flowed over several centuries. Using several visualization tools from Density Design’s online
RAW tool, we can visualize the number of references to these five discourses, over the span of the
second through fifth Islamic centuries (the eighth through eleventh centuries C.E.), like this:
Or like this:
Or this:
11
This last visualization, a Voronoi Tressellation, gives perhaps the best sense of the relative dominance of
each discourse in any given time period. Discussions of general expressions (in red) were clearly very
important for a couple of centuries; commands (in green) were discussed early on and then became a
major concern again in the late fourth and fifth Islamic centuries (tenth and eleventh C.E.); issues of
clarity and ambiguity (in yellow) remained a regular theme; but the notion of literal meaning (in purple)
was a hot topic only for a brief period: after that, in fact, it came to be taken for granted that legal
interpretation (which is the main focus of my database) was literal, and the distinction between literal
and figurative meaning came to hold less interest for legal interpreters.
Let’s say we want to focus more narrowly on a single topic such as commands. We can select only those
authors who wrote about commands (and other related terms), and visualize which of those authors
cited which others most often:
Notice that the clustering algorithm has divided these thinkers in a way that happens to correspond
remarkably closely to their legal and theological schools (about which the software itself knows
nothing): Most of the red dots are Muʿtazilī theologians, most of the purple ones are Ashʿarī theologians,
the blue dots are Ḥanbalī jurists, the orange dots are Ḥanafīs, the aqua dots are Mālikīs and the little
green dots largely Ẓāhirīs. This tells us that with some exceptions medieval Muslim scholars tended to
cite others within their own schools of thought far more often than they cited outsiders. This is not
surprising, but it could have been otherwise, and this network diagram shows that in fact conversations
about commands did tend to occur within school lines.
Could we break down our entire universe of authors along school lines in this way, leaving aside
concepts and terms and running a clustering algorithm only on scholars’ names, linking figures who cite
each other and placing closer together those who cite each other most? Sure. This is what we get:
12
The picture is more complex, but again the clustering algorithm finds that people tend to cite each other
within school lines, and the schools are demarcated in much the same way as when analyzing only
citations about commands. This network diagram, viewed in the VOSviewer, provides a scalable map in
which one can zoom in, move around, and see counts of how many times specific figures cite each other
by hovering over the line connecting two figures.
Which of these figures was most influential? Was it those with the largest dots? Not necessarily; a large
dot in this diagram means either that a figure is cited frequently by others, or that he frequently cites
others; only the former indicates “influence.” To gauge influence we need to see only how many times
an author is cited by others. By ordering that information chronologically, we can look at how each
author’s influence waxed and waned over time. Limiting ourselves to just the most influential (or most
frequently cited) scholars, we get the following graph (which was created using the “stacked area chart”
feature in Microsoft Excel):
13
The authors who lived earliest are listed at the bottom of the graph, but the earliest are not always the
first to become influential. Among other things, this graph confirms what many have long said, that
al-Shāfiʿī (the lower pink band) was the “founding figure” of Islamic legal hermeneutics (as well as the
Shāfiʿī school of law); he continued to be cited throughout the period that my database principally
covers. On the other hand, the five Muʿtazilī thinkers shown here (all in red) were not cited frequently
until the first half of the eleventh century, even though some of them lived much earlier; and thereafter
their influence waned. By the end of the eleventh century Ashʿarī thinkers (in purple) and Ẓāhirī
“literalists” (in green) were all the rage, at least in the works I studied. Ibn Ḥanbal, the “founder” of the
Ḥanbalī school of law, also finally saw his stock begin to rise, some two and a half centuries after his
death. This reflects the relatively late formation of the Ḥanbalī school of law as a distinct institution with
a formal curriculum that included legal hermeneutics. Notice that the “founders” of the other two Sunnī
schools, Abū Ḥanīfa (the lowest orange bar) and Mālik (aqua) continued to be cited, but were not nearly
as influential as later scholars because they did not develop formal hermeneutical theories.
One is struck again by how well this visualized data depicts historical reality about which the software
has no direct information. This bodes well for the prospect of data visualization in the much larger
universe of online books and metadata: the picture that arises by computational analysis of strings of
words is not definitive, but even with this small sample of data the visualizations often seem to correlate
remarkably well with independent historical benchmarks, while deviating from them often enough to
suggest interesting possible revisions to the standard historical narratives.
But our question about which figures were most influential has gotten us far afield from the five topics
on which we were trying to focus. That is the nature of research in the humanities: a constant interplay
between expansion and contraction of focus and interest, like breathing in and breathing out. Let’s
contract our gaze again and get back to our five main topics.
14
Another question we might want to ask about those five topics, for instance, is which discourses tend to
correlate with which others: when an author participates in one discourse, which other discourses is he
or she most likely to take up as well? We noted earlier that authors who discuss commands also tend to
discuss clarity, since those two topics were close together on our network map. We can compare the
extent of individual authors’ participation in each discourse, at least for a few of the most prominent
thinkers, using a parallel coordinates diagram (also from RAW):
Each line represents one thinker, colored according to his school of thought. Not surprisingly, al-Shāfiʿī,
in pink, tops the charts. The two Ashʿarī theologians (in purple) are engaged in all five of our selected
aspects of legal hermeneutics, including theoretical reflection on God’s speech and the question of
literal versus figurative meaning. By contrast, Ḥanafī jurists (in orange) are interested especially in the
three more practical questions on the left, and less in the two more theoretical questions on the right.
More surprising is the fact that Muʿtazilī theologians (in red) also show relatively little interest in the two
theoretical questions on the right, while the two Ḥanbalī jurists (in blue) show somewhat more interest
in those theoretical questions. This challenges the old stereotype that the Muʿtazilīs were sophisticated
rationalists and the Ḥanbalīs naïve traditionalists. Visual analysis of textual data may often correlate
with a historian’s expectations, but sometimes it runs against those expectations, and that can be a
valuable clue that an old assumption is worth a second look.
At some point in the research cycle of expanding and contracting our range of vision, of breathing in and
breathing out, we need to hone in on a particular book and read it. Have these visualizations given us
any direction? Well, al-Shāfiʿī has emerged in several diagrams as a key thinker, especially for the five
discourses we have singled out. If we zoom in on him in the Exaptive Xap, we should be able to pull up
information about the bibliographic records in which he is the author, and even identify which of his
texts should be most relevant for each of these discourses. We could visualize where al-Shāfiʿī wrote
about each topic, and also how influential each of his works was regarding each topic. For such purposes
an alluvial diagram is ideal. This one (again produced with RAW) shows which authors (colored according
to their schools of thought) cited which of al-Shāfiʿī’s writings, and how much each of those writings
dealt with each of our five topics:
15
This alluvial diagram speaks volumes about al-Shāfiʿī’s influence. He was cited not only by members of
his own school (who are shown in pink), but also by theologians (especially Ashʿarīs, some of whom
were also Shāfiʿīs in law) and even by Ḥanafī, Ẓāhirī, Mālikī and Ḥanbalī jurists. Most tended to cite his
views directly, without mentioning any of his written works, or else cited his Risāla; his other works
were cited mainly by scholars within his own school.
The diagram also reveals something about the topics of al-Shāfiʿī’s works—or at least those topics
concerning which they were most often cited. His famous Risāla appears to have been quite influential,
mainly with regard to general expressions and the issue of clarity and ambiguity, and to a lesser extent
with regard to literal or apparent meaning. The Risāla does not appear to have dealt as extensively with
commands or with theories of speech, however, or at least was not remembered by the scholars in my
database as an important source on those topics.
16
We learn something different from this data, however, if we flip–flop the two middle columns: rather
than which works each of the later authors cited, we can see on which topics each author cited him:
Now we can see that although my own notes referred frequently enough to al-Shāfiʿī’s views on literal
or apparent meaning, almost no Muslim thinkers quoted him on that subject. All kinds of thinkers cited
him on general expressions and the issue of clarity, while an almost entirely different group of thinkers
cited his views on commands, and a subset of those scholars cited him on speech, mostly citing works
other than the Risāla.
With al-Shāfiʿī’s subsequent influence on at least four of our five topics now well established, there
remains one step to be taken before we return to the old–fashioned method of actually reading a book:
we need to pick one of al-Shāfiʿī’s books to start with, and the most obvious candidate, according to
these diagrams, is his famous Risāla (The Epistle on Legal Theory).
17
Before we actually crack the book, however, we can get a fuller picture of what it discusses by means of
the familiar device of a word cloud. The most sophisticated word clouds display not simply the most
common words in a book, but those words and phrases that are “statistically improbable” and therefore
distinguish the book from others. Google Books displays such word clouds for its full text books, but it is
not yet very good at handling works in Arabic. For al-Shāfiʿī’s Risāla it gives us:
All this tells us is that Google has not yet bothered to install a list of Arabic “stop words,” which would
have eliminated most of the words in this cloud—prepositions, conjunctions and the like, which tell us
nothing about the book’s contents.
Google’s word cloud of “common terms and phrases” for Joseph Lowry’s new English translation and
Arabic edition of the Risāla is a bit more helpful:
The character encoding for Arabic words is incompatible with Google’s web browser, but at least we find
relevant terms. The words “command,” “obligation,” and “prohibited” confirm that we will at least find
here a discussion of commands (even though the alluvial diagrams above showed that the Risāla was
not often cited regarding commands). We also find the word “unrestricted,” which Lowry uses instead of
“general” to translate ʿāmm. Those words, however, are rather overshadowed by other less obviously
relevant phrases, and our other interests (clarity, speech, and literal meaning) are nowhere to be seen,
18
so we will try generating our own word cloud using the free online word cloud generator Wordle. Before
we do, though, it is worth noting how helpful Google Books’ lists of “common terms and phrases” can be
for works in English. Here is Google’s word cloud for Abdullah Saeed’s book Approaches to the Qur’an in
Contemporary Indonesia:
This is far better than a mere “frequency of occurrence” cloud; by identifying not just single words but
also statistically improbable phrases it brings to the fore full names (Fazlur Rahman, Hamzah Fansuri,
Nurcholish Madjid, [Muhammad] Quraish Shihab) and even titles of books (H.B. Jassin’s al-Quranu’lKarim Bacaan Mulia). If one clicks on a word, Google Books will display all the snippets of text from the
book in which that word or phrase appears, allowing one to get a quick sense of what kinds of things the
book has to say about each term. The word cloud also reveals immediately that the classical disciplines
of exegesis (tafsīr) and Islamic law are key domains for Qur’anic interpretation in Indonesia today, and it
suggests that Indonesia’s political history has shaped that interpretive discourse, with Pancasila,
Soeharto, and “Indonesian politics” featuring prominently. At least that is what one can expect to find in
this book; if that is not one’s interest, one knows to look elsewhere first—and learning what not to read
is almost as helpful for a researcher as learning what one should read. This must be one of the most
important benefits of the OU/Exaptive Xap: it should not only give us cosmic–looking diagrams of vast
fields of literature; it must also help us hone in on those specific books that are most important for us to
read right now.
For al-Shāfiʿī’s famous Epistle on Legal Theory, al-Risāla, we will have to produce our own word cloud,
which is quite simple using the full Arabic text extracted from al-Maktaba al-Shāmila and the online
service Wordle:
19
In addition to al-Shāfiʿī’s own name (‫ الشافعي‬, in red) we see that he will be quoting a lot of hadith (‫ حديسث‬,
Prophetic traditions, in red) because he refers frequently to the Sunna (‫ السنة‬, also in red) and because
one of the largest words in the diagram (and therefore one of the most frequent in the book) is “he
related to us” (‫ أخبرنا‬, also in red) which is used repeatedly in relating the chain of narrators who
transmitted a hadith. The book is also clearly concerned with the Qur’an, which al-Shāfiʿī refers to as
“the book” (‫ الكتاب‬, in dark blue), and with meaning (‫ معنى‬, in orange) and knowledge (‫ العلم‬, in bright
green). Surprisingly, prayer (‫ الصاالة‬, in dark red) and women (‫ التساء‬, in yellow) also appear as frequent
topics, indicating the kinds of specific examples al-Shāfiʿī will use to illustrate his theoretical points. We
also find “command” (‫ أمر‬, in dark blue, as well as “his command,” ‫ أمره‬, in yellow green) and “literal or
apparent meaning” (al-ẓāhir, ‫ الظاهر‬, in red); this is noteworthy, because the alluvial diagrams above
showed that subsequent scholars cited the Risāla on commands and literal meaning less than they cited
it on general expressions and clarity, which, strangely, are nowhere to be found in this word cloud. It
appears that the Risāla used the terms “clarity” (bayān) and “general” (ʿāmm) too seldom for them to
make it into the word cloud based on frequency alone, but that nevertheless what the book had to say
about them proved quite influential. To see how significant those other terms are in this book, we can
display their occurrence graphically, using a word trends graph produced with Voyant Tools:
20
Here it becomes obvious why “command” showed up first in the word cloud: it is an ongoing concern,
discussed quite intensively at some points—the humps in the line of blue dots. The term ẓāhir, “literal or
apparent meaning” (in green) shows up principally toward the end of the book (where it actually has a
different sense that is not directly relevant to our interests, as we would soon learn if we opened the
book, or just followed a link from the Xap to view snippets of context in which the term appears).
General expressions (in pink), though not mentioned often enough to show up in the word cloud, are
nevertheless a subsidiary but ongoing concern throughout the book, while bayān (clarity, the line of
blue triangles) is discussed only in the first section of the book (as anyone familiar with the Risāla will
remember). Speech (here represented only by the word khiṭāb) shows up in light blue, being mentioned
occasionally in the middle part of the book, but perhaps, this graph would suggest, never being
discussed as a subject in its own right. That gives us a very good clue about our next step: if we are
interested mainly in theoretical discussions of speech, we might as well start reading elsewhere, but if
we are interested in the problem of clarity and ambiguity, we should dive in and read the first section of
the Risāla.
I have read that first section many times over the years, and through that very close reading I have come
to the conclusion that the Risāla is in fact not one but a sequence of three distinct books, each with its
own argument, the last two having been appended to the first in response to questions and objections
that had been raised by al-Shāfiʿī’s opponents. There is plenty of internal evidence for this claim, which I
will reserve for another paper I am presently writing, but I found that word trend graphs of key terms
supplemented that evidence and helped bolster my argument. For example, I believe that Book One
(and only Book One) of the Risāla constitutes al-Shāfiʿī’s argument that all of Islamic law comes
ultimately from the Qur’an but is clarified by the Prophet’s Sunna (his practice recorded in hadith). A
word trend graph showing the distribution of various terms and word forms designating the Qur’an
bears this out:
21
The Qur’an is mentioned frequently throughout Book One, which occupies the first 35% of the text. The
first five percent of Book Two, represented by the dot over the number 8, contains a detailed summary
of al-Shāfiʿī’s argument from Book One; this is itself a clue that Book Two was written subsequently, and
it explains why the Qur’an is mentioned again a great deal in that section. Thereafter, however, the
Qur’an is not a major topic. Instead, Book Two responds to a question about how to resolve conflicts
between hadith (traditions from the Prophet) that appear contradictory. Consequently, hadith do not
become an important concern until Book Two:
Book Two does not use the word Sunna much, however; that was the abstract term al-Shāfiʿī used in
Book One (and in the summary of Book One at the start of Book Two) when discussing how the
Prophet’s Sunna clarifies the Qur’an:
22
Nor does al-Shāfiʿī use the term khabar, or report, in Book Two (even though he is constantly citing
reports about the Prophet’s sunna—he just tends to refer to them as hadith):
The term khabar becomes important only in Book Three, where a type of hadith called an “individually
transmitted report” (khabar al-wāḥid) is the first of several types of disputable evidence that al-Shāfiʿī
discusses in an attempt to explain why it is okay that Islamic law is often based on uncertain
interpretations of disputable evidence. That discussion of uncertainty constitutes a new and separate
argument, never foreshadowed in Books One or Two, centering around the terms knowledge (ʿilm),
apparent truth (ẓāhir), and real truth (ḥaqq):
In this case the resolution of the graph is not quite fine enough, and it appears that knowledge is
discussed at the end of Book Two, but by breaking the text into one hundred parts instead of twenty we
23
can see that the blue peak representing a discussion of knowledge actually occurs at the very beginning
of Book Three:
What all these word trend graphs show is that the three sections I have identified not only discuss
different topics, they do so using different sets of vocabulary. The evidence of vocabulary distribution,
which I would not have noticed without these visualizations, corroborates my argument from other
internal evidence that the Risāla was composed in three stages. Such digital visualization techniques, all
by themselves, cannot prove such a structural claim, but they can supplement arguments based on close
reading, and they can bring to light structural features that we had not noticed before. They can also
help us find the sections of an unfamiliar book that are most likely to be relevant to our research. A
good index is still useful, but a graphical index—which is all a word trends graph really is—can be even
more eye–opening. Even in the slow work of close reading, digital tools remain an important resource.
Promises and challenges
Our visual tour through the sample research universe of my own database of notes has given us a
picture of how digital analysis and visual representation of textual data can facilitate the pursuit of a
traditional research project in the humanities. It has also revealed some important priorities and
challenges for the OU/Exaptive project.
First, humanities research requires different kinds of analysis and visualization at different points in the
natural rhythm of scholarship. We started with seed words, expanded our world of concepts by
searching for co-occurring words, clustered concepts into discourses, mapped the universe of authors
dealing with those discourses, grouped thinkers into networks or schools of thought, graphed
chronologically the evolution of discourses and the influence of particular authors, zeroed in on an
especially influential author, and diagrammed who cited which of that author’s books on which topics.
This allowed us to choose a particular book to read, but we analyzed it first by means of word clouds
and word trend graphs, learning in the process that it discusses some topics at greater or lesser length
than the diagrams of its influence on those topics had suggested. As we settle in to read that book (or
the portions of it suggested by the word trend graph), visualization remains a useful tool, and the
annotations we make about the text as we read it—whom it cites and what concepts it discusses—can
and should become part of the research universe visualized in the Exaptive Xap. And as we read, we will
again be led to new ideas, sparking a new cycle of outward exploration followed by contraction and a
new focus on yet another author and book, contextualized within a somewhat different colored map of
authors, networks, and discourses. At each point in the process a new tool is needed, so one key feature
24
of the Xap must be the ability to move seamlessly, using the same growing universe of data, from one
type of visualization to another.
Second, multiple sources of data are necessary. Visualizing my own notes revealed some things I did not
previously know about them, but really stretching the boundaries of our knowledge and challenging our
presuppositions and limited imaginations requires incorporating new sources of data that change the
picture. The Xap must not be limited to a single, controlled dataset, but must be expandable so that new
datasets can be tied in as research interests grow and new materials become available. And as online
data is aggregated into a manipulable dataset, the source of each record—its original bibliographic
record or full text access point—must always remain traceable so that at any point the user can zero in
on a particular use of a particular term by a particular author in a particular text, go find it, and read it.
The multiplicity of tools and datasets presents some major challenges. I have suggested several
promising databases that would be useful for my own current project, but other scholars will need to
access other data sets, and each will have its own legal and practical limitations. Each will offer a
different kind of data: some will provide full text or word counts on which topic modeling and other
algorithms can be performed; others will provide only search capabilities, so that word counts will have
to be built up through repeated searches; and others will provide only bibliographic data.
Moreover, each source will present data in its own format. It will be necessary to develop a robust and
realistic format for temporarily storing data from multiple sources in a single data structure that
supports all the different types of analysis and visualization that the Xap offers.
Furthermore, much of the available data—especially that based on optical character recognition—will
be seriously flawed. Fortunately, a considerable amount of noise and inaccuracy is quite acceptable in a
project like this, where the results are meant to be suggestive rather than definitive.
Additionally, even accurate data will take variable forms: words will be truncated or represented in
multiple ways; names of authors will be recorded and spelled in numerous ways; and single books will
be described differently in different databases or even within a single database. Some means will have
to be provided for linking the most important synonymous or related terms, names, or titles, and
identifying them with specific concepts, persons, and works. One way of doing this would be to let the
user build thesaurus tables (on the fly and without too much manual typing) to group items he or she
encounters, performing functions such as:





Linking terms (in multiple languages) with single concepts. A translation suggestion tool would
aid in developing lists of terms to represent concepts.
Grouping concepts into discourses.
Identifying multiple names and spellings with single historical persons.
Grouping persons into schools, movements or networks.
Identifying multiple bibliographic records with single works, editions, or copies.
Some such identifications might be performed or suggested automatically, and the user should be able
to add and modify identifications whenever he or she sees an important name or term showing up in
multiple forms. For example, the terms “ʿāmm” and “general” could be linked automatically and shown
side by side, and the user could then choose to drag one onto the other to confirm their identification. It
is important to remember, however, that these identifications need not be perfect or complete. The
25
visualizations that the Xap produces are not research conclusions; they are suggestions, whose
usefulness is proven only by their being left behind in pursuit of some new thought or reading.
Another challenge will be the sheer quantity of available and relevant data. It will be necessary to set
adjustable thresholds for the frequency of occurrence of terms as well as other criteria for including a
work or author in the research universe upon which analysis and visualizations are performed. It will not
be possible to store even temporarily, or analyze in a reasonable timeframe, data on every instance of
every relevant term in every available work, so it will be necessary to determine an appropriate level of
statistical detail to record (when available) about each work: complete word counts per page or per text,
or only the density of occurrence per text of a small list of key terms.
At the same time the data available online, though abundant, is far from a complete representation of
the universe of relevant literature, and that selectivity is far from innocent. Works from regions with less
developed publishing and distribution networks, or by persons without access to those networks, are
likely to be underrepresented. This will have an invisible but real impact on the resulting scholarship.
Once again, however, it is important to remember that we are not aiming at perfection, just
improvement and advancement. Scholarship has always been blind to most of what human beings have
had to say on any given subject. Present Western scholarship is limited in its range of vision by its
tradition of relying primarily on citations in prior Western scholarship. The internet may contain only a
skewed slice of human cultural productivity, but it is a far wider slice than scholars have worked with
heretofore. The OU/Exaptive Xap should result in scholarship that brings to light creative thinkers who
have not been noticed before outside their own immediate circles, and among better known figures it
should reveal hitherto unimagined connections.
In other words, the purpose of the OU/Exaptive Xap is to expand and stimulate the choices that
humanities scholars make every day. Data mining and visualization cannot take the place of close
reading of neglected texts, of travelling to distant places to find those books and speak with their
authors face to face, or of thinking long and hard to understand what one reads and hears. The
humanities necessarily require the very human processes of listening, interacting, reflecting, translating,
and synthesizing. Google Earth lets you zoom in to find a specific address, and zoom back out to see its
relation to any other place on earth, but it is not a substitute for travel. What it does very well is help
you to discover and imagine places you have not visited, and get some orientation to them before you
arrive. The OU/Exaptive Xap has a similar function: to help a scholar start where he or she is, get a big
picture of the context of his or her ideas within the universe of human literature, rise up above the
disciplinary tunnels and self-referential world of modern Western scholarship, give voice to figures who
have previously been marginalized, spot new potential connections and influences and watershed
moments, and make strategic choices about what to read next and how to frame it.
Finally, Exaptive’s platform is designed not only to help generate better published scholarship, but to
publish the scholarly process itself. Out of competitiveness and sheer inertia, our universities (and
funding sources) still tend to reward accomplishments that can be claimed by one institution, a small
team, or—especially in the humanities—one individual. One of Exaptive’s main goals is for work being
done by one scholar in one way with one set of data to spark new ways of working and thinking for
another scholar working on a different but somehow comparable set of data. The processes—the
research steps illustrated above, the visualizations employed, the connections drawn—should be
tracked so that the Exaptive platform can propose those methods to others who seem to be doing
similar kinds of analysis on different data. Even the data tables constructed by one user might
26
occasionally be useful to another. Few scholars will want to dig through my own database of notes on
Qur’anic hermeneutics, and I would not want anyone to have access to all of the comments that I have
written over the years (some of which doubtless fall short of charity or even rationality). But perhaps my
thesaurus tables would provide a useful filter for someone else working on Biblical exegesis or Islamic
law; or perhaps I could improve my way of visualizing influences by adopting techniques from someone
analyzing social networks. If Xaps created using the Exaptive platform can offer a limited amount of
transparency (with appropriate controls for privacy and proprietary information), not only my published
results but also the ways I go about discovering and representing them might become part of the
publically available record and fruit of my research.
Indeed, it might eventually come to pass that my results themselves will become public to some degree
even as they are being created, long before they are finalized and published, and will remain available
for others to continue exploring and manipulating using the same interface I used to create them. That
would expose my research in a new way, making it vulnerable to a whole new level of critique; but such
vulnerability is a good thing in the sciences, where data is (or should be) made accessible so that others
can use it to confirm or challenge the conclusions that were based upon it. More such vulnerability
would make scholarship in the humanities less territorial, more collaborative, and more open to critique,
correction, and stimulation. Publication would become less individual, less closed, less final, and more
open-ended, taking the form not just of a written conclusion but also of a live and accessible world of
data with a built-in forum for exchange, in which my own written and visual results can be viewed and
analyzed by others in ways I never thought to look at them myself.
Such a shift in the methods of scholarship and publication would, in turn, require a shift in scholarly
ethics that has been too long coming in the humanities: a shift toward valuing humility over authority,
vulnerability and openness over definitiveness, and finding the possibilities in others’ work over finding
fault with it. A difficult shift, to be sure, for reasons as deeply personal as they are institutional. But a
worthy shift it would be, making the humanities more humane and scholars more human—ironically, by
digital means.
27
Download