Genealogies of Qur’anic Hermeneutics: Tracing Trajectories through Online Data David Vishanoff Associate Professor, Religious Studies Program, University of Oklahoma, USA August 6, 2015 International Conference “ New Trends in Qur’anic Studies” Co-hosted by the International Qur’anic Studies Association (IQSA) and the State Islamic University (UIN) Sunan Kalijaga, Yogyakarta, Indonesia. Abstract One novel approach to tracing the complex networks of ideas that have contributed to contemporary theories of Qur’anic interpretation is the application of data analysis and visualization techniques to the many large databases of Islamic and non-Islamic texts that have appeared online in recent years. Online translation engines, cluster analysis, and natural language processing techniques such as topic modeling can be combined with visualization techniques such as network maps, dendograms, word clouds, stream graphs, alluvial diagrams, scatterplots, treemaps, and cluster layouts to produce schematic representations of the relationships between texts, authors, and concepts across the various disciplines that have contributed to Qur’anic hermeneutics—disciplines like literary theory, philosophical hermeneutics, semiotics and pragmatics, Biblical studies, Qur’anic exegesis, Islamic legal hermeneutics, and the Qur’anic sciences. Given the limitations of the available databases and online tools, the resulting visualizations cannot be considered definitive genealogies of contemporary Qur’anic hermeneutics, but they can serve as roadmaps to guide the slow, careful work of intellectual history in more efficient and innovative directions, and they can reveal conceptual resonances and lines of influence that otherwise might be discovered only through years of reading, or not at all. This paper presents a vision for a set of software tools that is being adapted for this purpose by Exaptive, Inc., and the University of Oklahoma, illustrates it with sample visualizations based on a database of personal notes, and indicates some of the anticipated challenges and strategies to be pursued. Introduction: from tracing genealogies to charting currents beneath an ocean of data When I did the research for my first book, I spent most of my time combing through three or four shelves of Arabic books on Islamic legal theory. Every time I found a hermeneutical principle for analyzing Qur’anic language expressed by or attributed to an early Muslim author, I would write out a little note card to that effect. Over the years I accumulated about thirty–eight hundred such note cards relating to some twelve hundred scholars. Fortunately, it occurred to me early in the process to write 1 those cards electronically, in a database that I created using Microsoft Access, so that when it came time to write a paragraph tracing the genealogies of Muʿtazilī theologians’ views on the problem of general and particular expressions, for example, all I had to do was run a query to display in chronological order all the note cards I had complied on that topic by scholars I had flagged as Muʿtazilīs, and I was ready to start writing. All of that took time, of course, and some practice with the database software, but oh, how easy it all seems now! I was working with a limited canon of texts in a single language reporting the views of a manageable number of thinkers on a well defined set of questions. Now that I have rashly decided to wade into the much less orderly world of modern Islamic hermeneutical theory, I am faced with an explosion of writings in multiple genres and languages approaching the problem from numerous angles. I have set myself the foolhardy goal of tracing the genealogies of contemporary Qur’anic hermeneutical theories from classical Islamic and modern European discourses through the various permutations and transformations they have undergone in the twentieth century. Such an ambitious project, however, will require me to study numerous unfamiliar discourses and several new languages. The task seems less like genealogical research than like trying to chart the currents beneath the surface of a vast ocean, with only a rudimentary map to go by showing only the most famous ports of call along the coastline. How does one proceed? Not alone, surely, yet however much they collaborate scholars must still draw their own mental maps for themselves. If I am to produce any kind of meaningful narrative about the last century’s developments in Qur’anic hermeneutics before I reach my limits and am forced to retire, I will need a new approach. Note cards—even electronic ones—will no longer suffice. If only I could get a snapshot of what is going on in the torrent of new books on Qur’anic hermeneutics being written every year here in Indonesia and across the Muslim world without actually having to read them all! That is the point of what people in the digital humanities call “distant reading,” which cannot take the place of close reading, but has the potential to help us contextualize our close reading and plan it more strategically. At present, however, technologies for distant reading typically require one to compile for oneself a single set of digital texts, and do not offer much flexibility in analyzing and visualizing those texts. To help remedy that situation I am engaged in a pilot project with the University of Oklahoma Libraries and a research software firm called Exaptive to develop a customizable toolkit that will allow me and other researchers to cobble together, on the fly, multiple existing data analysis and visualization tools, and apply them simultaneously to multiple online databases of Islamic and nonIslamic works on hermeneutics or any other topic. This paper puts forward my vision for how that software might serve the research process, illustrates that process with visualizations created from my database of personal notes, and indicates some of the anticipated challenges and strategies to be pursued. Proposed sources of data I have identified several databases of digital texts and bibliographic data that look usable and that provide sufficient coverage, for now, of the literature I wish to explore. The Exaptive application, or “Xap,” will attempt to integrate resources such as the following: Al-Maktaba al-Shāmila (al-maktaba al-shamila), a downloadable, locally searchable, clean and very extensive database of classical Arabic texts on various Islamic sciences. Including this ever growing library will ensure that when I use the software to visualize clusters of texts on related topics, classical Islamic works will be part of the picture. Their vocabulary will differ somewhat 2 from modern works, while overlapping with some of them. Visualizing which clusters of modern texts use which parts of the classical vocabulary of Qur’anic exegesis, theology, legal theory, and literary theory will be interesting in its own right, but some modern works will discuss classical concepts using new vocabulary in multiple languages, and it will be necessary to use translation equivalency tables or “thesaurus files” to link terms for related concepts in different languages and vocabularies. Furthermore, many modern texts will deploy classical terms and concepts in new ways, so it will be instructive to visualize which groups of authors use terms like taʾwīl (interpretation), for example, in proximity with terms like raʾy (reasoned opinion) and rationality, and which authors use taʾwīl alongside different notions like historical or social context. It will also be possible to trace which modern discourses tend to cite which classical authors most. Visualizing the connections between the classical texts in al-Maktaba al-Shāmila and the modern texts in the other databases will be crucial for tracing the classical Islamic dimensions of the intellectual genealogies or currents of contemporary Islamic hermeneutical thought. In order to put some modern Arabic writings in the picture, the software will also employ the New Alexandrian Library’s Digital Assets Repository (http://dar.bibalex.org/webpages/advancedsearch.jsf), a thoughtfully designed and easily accessible resource that includes some 30,000 titles on religious topics alone, including not only classical Arabic and modern European-language works but also a wide range of recent Arabic titles. Because many of these works are still under copyright, their full text is not readable online, but it is fully searchable, which will allow the software to determine which modern works use which clusters of concepts and cite which other scholars. A much larger database of full text books in multiple languages is the HathiTrust Digital Library (http://www.hathitrust.org/), whose holdings overlap significantly with Google Books but are more readily usable for academic research because the HathiTrust is governed by member universities, not by commercial interests. In fact, the HathiTrust has just released a large database (https://sharc.hathitrust.org/features) giving page by page word counts for the public domain books in its collection, which will greatly facilitate the kind of vocabulary analysis and topic modeling the Exaptive software will be doing. The HathiTrust’s Arabic holdings differ substantially from, and should complement well, the digital archive at the Alexandrian Library, but the Trust’s greatest strength is in European languages. This will allow me to include in my research universe a long history of European publications dealing with language and interpretation, and will thus help to identify which European thinkers deployed similar combinations of concepts as modern Muslim thinkers and which European works are most cited or appear to have been most influential in the development of various strands of modern Muslim hermeneutical thought. Capturing recent publications in non-European languages other than Arabic will be more tricky because these works are not as well represented in major digital collections. Some such works have been digitized, however: for example, all the undergraduate theses and doctoral dissertations written here at UIN Sunan Kalijaga or at any other State Islamic University in Indonesia are available online, but they are housed in separate digital repositories (http://digilib.uin-suka.ac.id/, etc.). Fortunately, Google Scholar (http://scholar.google.com) indexes these sites, so it is not necessary to search them separately; incorporating Google 3 Scholar into the Exaptive Xap will make it possible to analyze and include these non–European works, as well as recent European–language scholarship, in my textual universe. Unfortunately, many recent commercially printed books are not readily searchable online. They may be floating around the internet as pdf files, but those possibly illegal copies are not searchable through any one interface. It will be necessary, therefore, for me to scan for my own personal use some of the most promising–looking Indonesian books that I will be taking home with me from the Social Agency Bookstore that is just down the street from here. Those books whose tables of contents indicate they are likely to be most relevant for my research will be scanned and processed using optical character recognition software, and the resulting full text files will be analyzed by the Xap and integrated into the results it displays so that I can spot possible connections and convergences between these very recent Indonesian titles and all the other classical and modern works in various languages in all the other databases. That should help me decide which Indonesian books to read first. Finally, my own database of notes, which I created in Microsoft Access for my previous research, and which I continue to use for this new project, will be integrated into this software (either in Access or in some other format) as a kind of overlay, superimposed upon the relationships and connections that the software discerns among the online books. When my own database indicates that a certain author has quoted another, or has discussed a certain concept, this will appear as a highlighted relationship in whatever visualization I am using. And if the software shows a potential citation or resonance between two works, and I choose to investigate by clicking on that relationship to pull up the texts themselves, I will be able to annotate that relationship in my own database, and thereafter those two works will be shown as connected in some highlighted fashion. This will allow me to build my own visual analysis of the currents, genealogies and resonances between works and authors right on top of the connections that are suggested by the software. Proposed methods of analyzing and visualizing relationships For now, that personal database of notecards provides a small body of data—several thousand records relating to about a thousand authors—that can be used to illustrate some of the analysis and visualization techniques that the Xap could perform on the larger universe of online books and bibliographic data. Let me illustrate how I imagine using the Exaptive Xap to explore the world of Qur’anic hermeneutics. When one ventures into a new field of research in the humanities, one has only a starting point: a basic set of ideas or texts or people in which one is interested. The first thing I will want to do is explore outward: starting from those ideas and those authors’ names, I want to know what other concepts have been discussed in association with those ideas, which authors have written about them, and how those concepts and people could be grouped into connected yet distinguishable “discourses” around which I might structure my research. Ideally, I would start by looking at a giant network map of ideas and authors from the whole history of human literature. This, however, is too large a computational problem: it would require “topic modeling” on the whole Google Books corpus, something not even Google has attempted. Fortunately, however, this is unnecessary, because one can use the concepts and names one already knows as seeds to help one define what parts of the vast online corpus to analyze and map. Let’s see what we get, using 4 the relatively small dataset of my own electronic notecards, if we start with my seed concepts of “Qur’an” and “hermeneutics” and generate a list of other terms that occur most commonly alongside those two words. Extracting all the notecards that use either of these two terms, and running the text of those notecards through the clustering and mapping algorithms of the freely available VOSviewer software, we obtain the following visual map: The software has sorted the most common terms, from all my notecards involving Qur’an or hermeneutics, into colored clusters of words that tend to occur together. From this picture it becomes immediately apparent that there are several different kinds of discourse around my two seed terms: there is a discussion of God’s attribute of speech (in blue); there is a related but distinct discourse about khalq al-qurʾān (the doctrine of the created Qur’an, in purple—Qur’an has been truncated to “qur,” apparently because the software does not allow a word to contain an apostrophe); there is a separate discourse (in red) about the nature of language and revelation; there is another discourse (in green) involving interpretive problems such as apparent meaning (ẓāhir) and figurative language (majāz); and another discussion (in yellow) about commands and prohibitions. This first research step, starting with two seed words, has given me a longer list of terms that tend to come up in discussions of the Qur’an and hermeneutics: speech, created and eternal, revelation, language, figurative language (majāz), apparent meaning (ẓāhir), command, and prohibition. This expanded list of terms defines my research universe—the subset of the universe of human writing that I will proceed to explore. The next step outwards in the exploratory process would be to identify a universe of authors who have discussed these terms. In the real world of online databases, this will be a very large set of texts. For now, let’s pretend that my entire database of notecards is that universe of authors and texts, and let’s see how we could go about exploring it. If we use the VOSviewer to map and cluster that entire universe, this is what we get: 5 At first glance this looks like an unstructured mass of names and terms, but in fact both the colors and the position of the dots are meaningful: the more two words tend to appear together, the closer together they are on the map, and those that occur together most often are grouped by colors indicating distinct discourses. Indeed, we can see that the VOSviewer’s algorithms have identified some very significant discourses and groups of authors. Off on the fringe to the right is a cluster of brown dots including the names of several famous and rather cutting–edge contemporary Muslim writers about the Qur’an and hermeneutics: Nasr Hamid Abu Zayd, Ebrahim Moosa, Farid Esack, Fazlur Rahman, and, a little further in, the less innovative Muhammad Quraish Shihab. How did the software know that they belong together? It didn’t; it just noticed that they happen to come up in similar contexts, and if I did not just happen to be already familiar with them, this map would be my first clue that maybe I should read them together. In addition to names, terms also are helpfully grouped: in red we have general considerations relating to Islamic law: God, knowledge, law, reason, person, state, theology…. In blue we have considerations of speech and meaning, as well as the related modern notion of a “speech act” and the pair “expression” and lafẓ (verbal expression). In green we have amr, which means command and thus fits well with wujūb (obligation), nahy (prohibition), act, and irāda (will). In purple we have figurative (majāz) and literal (ḥaqīqa); in brown we have general (ʿāmm), particular (khāṣṣ), and particularization (takhṣīṣ). In yellow we have the famous scholar al-Shāfiʿī (whose name the software has truncated to Shafi), who wrote a great deal on Hadith and Qur’an (truncated to Qur) as well as clarity (bayān). Finally in light blue we have qiyās (reasoning by analogy) right next to the name of the Ẓāhiriyya (“literalists”) who argued fiercely against analogical reasoning, as well as the Ẓāhirī thinkers Dāʾūd and Abū Saʿīd al-Nahrawānī. In other words, the software has taken an amorphous corpus of texts (or in this case my notes about them) and sorted them out to show what the main topics are and who some of the main thinkers are who wrote about each. In the VOSviewer we can zoom in on any part of this map to see other terms and names associated with each colored discourse. But even at this distance we gain from this map a pretty clear sense of the main topics in Islamic hermeneutics. We can also visualize the same data in a more readable “density” map that colors names and terms by their frequency of 6 occurrence. This format is less helpful in identifying separate discourses, but it does still group together spatially those names and terms that tend to occur together: This completes our first outward, expansive exploration; we have now defined a universe of terms, texts, and authors, and we are in a good position to select a few key terms and start focusing on them more narrowly. One way to start whittling down the list of terms is to let the software select only the most common ones, or the ones to which it assigns the highest “relevance scores,” and rerun the clustering and mapping algorithms on only those most important terms, resulting in a much simpler picture: 7 This smaller set of terms can, in turn, be expanded to show all the people who, in my database of notecards, have had anything to say about just those several dozen terms: Even with just a few dozen terms, this is still a complex picture when you add all the names of authors, but if we use the VOSviewer to zoom in on particular sections to read the names that are not yet visible at this scale, we will soon discover that this map is actually quite a bit simpler than the previous network map that showed all the significant terms and names in my database. One thing this diagram already shows quite clearly is that there is an especially distinct cluster of people, shown in purple and bulging off the cloud on the left, who are engaged only or mainly with tafsīr, the formal discipline of Qur’anic exegesis. Most terms fall toward the middle of the diagram, because they are linked to many different names, and most names (except those of the most wide-ranging authors) tend to be around the margins because they are linked only to a modest number of terms. But the term tafsīr is off to the side because most of the people who use it can be grouped together, and they tend not to be concerned with other terms (at least not in the notecards in my database). This suggests that tafsīr (Qur’anic exegesis) can be treated as a somewhat separate discourse from the remainder of the discussions going on in this little literary universe. The same information can be made more readable as a “density” map, this time colored to indicate the distinct discourses: 8 Our literary universe has now been reduced to writers dealing with just a few dozen of the most common terms, but we aren’t equally interested in all of them. A next step inward would be to select for ourselves just a handful of the concepts shown here, and create a clearer map showing in more detail all the authors involved with those few terms. At this stage, when the researcher is manually selecting a list of terms on which to focus, it can be useful to create a “thesaurus file” that equates synonyms so that the Arabic “amr” and the English “command,” for example, are treated as a single term, and references to both of them are aggregated and displayed together. We can even define little discourses—clusters of different but closely related terms that tend to be discussed together, such as the related set amr / command, nahy / prohibition, and wujūb / obligation, which we noticed occurring together in an earlier diagram. At present such thesaurus files (or translation equivalency tables, or discourse definition tables) must be created manually, but ideally the Exaptive Xap will allow the user to link related terms that appear in a diagram at the click of a mouse, and immediately collapse them into a single dot on the map. The user will be able to look at the terms that have been grouped together by color based on their frequency of co-occurrence, and select those terms that he or she wants to treat as parts of a single discourse. Thereafter all references to commands, prohibitions, and obligations, in any language, would always be displayed together as references to the “command discourse,” and authors who frequently use any of those terms would all be linked together around a large dot labelled “COMMAND.” For example, let me select from my universe of terms five main discourses that interest me—commands (amr, nahy, wujūb), speech (kalām, khiṭāb), general expressions (ʿāmm, khāṣṣ, takhṣīṣ), literal meaning (ḥaqīqa, ẓāhir), and clarity (bayān, as well as ambiguity, iḥtimāl). Suppose I link the main terms associated with each of these discourses using a thesaurus file. Then I display all the authors who have had anything to say (or at least a minimum number of things to say) about any of the terms associated with these discourses. The result would look something like this: 9 Or, more helpfully, like this: Suddenly we have a much more manageable diagram, and it reveals several interesting things. First, the whole discourse about general and particular expressions (which was relevant for legal questions such as whether “kill the unbelievers” means “kill all unbelievers” or only applies to some of them) is rather isolated from the others; many writers in my database are concerned with this problem to the exclusion of my other chosen discourses. (If we zoomed in using the VOSviewer we would see that many of these thinkers, who appear as a tight constellation of red dots off to the right, are Muʿtazilī theologians and Ḥanafī jurists.) On the other hand, authors who discussed commands also tended to discuss the problem 10 of the clarity or ambiguity of revealed texts; we can tell that because the software has placed those two terms close together. The more theoretical question of what God’s speech is and how it communicates is a bit more separate, in blue; it attracted its own little contingent of thinkers, including Ashʿarī and proto-Ashʿarī theologians like al-Bāqillānī and Ibn Kullāb. To my surprise, the famous jurist al-Shāfiʿī appears here in purple, quite close to the discourse about literal meaning; I do not consider him a literalist at all, but he must at least have had a strong interest in literal interpretation because he appears here right next to the Ẓāhirī literalist Ibn Ḥazm. Another way of picturing the data about these five discourses would be to order it chronologically, using the death dates instead of the names of the authors to show how discussion of each of these five topics ebbed and flowed over several centuries. Using several visualization tools from Density Design’s online RAW tool, we can visualize the number of references to these five discourses, over the span of the second through fifth Islamic centuries (the eighth through eleventh centuries C.E.), like this: Or like this: Or this: 11 This last visualization, a Voronoi Tressellation, gives perhaps the best sense of the relative dominance of each discourse in any given time period. Discussions of general expressions (in red) were clearly very important for a couple of centuries; commands (in green) were discussed early on and then became a major concern again in the late fourth and fifth Islamic centuries (tenth and eleventh C.E.); issues of clarity and ambiguity (in yellow) remained a regular theme; but the notion of literal meaning (in purple) was a hot topic only for a brief period: after that, in fact, it came to be taken for granted that legal interpretation (which is the main focus of my database) was literal, and the distinction between literal and figurative meaning came to hold less interest for legal interpreters. Let’s say we want to focus more narrowly on a single topic such as commands. We can select only those authors who wrote about commands (and other related terms), and visualize which of those authors cited which others most often: Notice that the clustering algorithm has divided these thinkers in a way that happens to correspond remarkably closely to their legal and theological schools (about which the software itself knows nothing): Most of the red dots are Muʿtazilī theologians, most of the purple ones are Ashʿarī theologians, the blue dots are Ḥanbalī jurists, the orange dots are Ḥanafīs, the aqua dots are Mālikīs and the little green dots largely Ẓāhirīs. This tells us that with some exceptions medieval Muslim scholars tended to cite others within their own schools of thought far more often than they cited outsiders. This is not surprising, but it could have been otherwise, and this network diagram shows that in fact conversations about commands did tend to occur within school lines. Could we break down our entire universe of authors along school lines in this way, leaving aside concepts and terms and running a clustering algorithm only on scholars’ names, linking figures who cite each other and placing closer together those who cite each other most? Sure. This is what we get: 12 The picture is more complex, but again the clustering algorithm finds that people tend to cite each other within school lines, and the schools are demarcated in much the same way as when analyzing only citations about commands. This network diagram, viewed in the VOSviewer, provides a scalable map in which one can zoom in, move around, and see counts of how many times specific figures cite each other by hovering over the line connecting two figures. Which of these figures was most influential? Was it those with the largest dots? Not necessarily; a large dot in this diagram means either that a figure is cited frequently by others, or that he frequently cites others; only the former indicates “influence.” To gauge influence we need to see only how many times an author is cited by others. By ordering that information chronologically, we can look at how each author’s influence waxed and waned over time. Limiting ourselves to just the most influential (or most frequently cited) scholars, we get the following graph (which was created using the “stacked area chart” feature in Microsoft Excel): 13 The authors who lived earliest are listed at the bottom of the graph, but the earliest are not always the first to become influential. Among other things, this graph confirms what many have long said, that al-Shāfiʿī (the lower pink band) was the “founding figure” of Islamic legal hermeneutics (as well as the Shāfiʿī school of law); he continued to be cited throughout the period that my database principally covers. On the other hand, the five Muʿtazilī thinkers shown here (all in red) were not cited frequently until the first half of the eleventh century, even though some of them lived much earlier; and thereafter their influence waned. By the end of the eleventh century Ashʿarī thinkers (in purple) and Ẓāhirī “literalists” (in green) were all the rage, at least in the works I studied. Ibn Ḥanbal, the “founder” of the Ḥanbalī school of law, also finally saw his stock begin to rise, some two and a half centuries after his death. This reflects the relatively late formation of the Ḥanbalī school of law as a distinct institution with a formal curriculum that included legal hermeneutics. Notice that the “founders” of the other two Sunnī schools, Abū Ḥanīfa (the lowest orange bar) and Mālik (aqua) continued to be cited, but were not nearly as influential as later scholars because they did not develop formal hermeneutical theories. One is struck again by how well this visualized data depicts historical reality about which the software has no direct information. This bodes well for the prospect of data visualization in the much larger universe of online books and metadata: the picture that arises by computational analysis of strings of words is not definitive, but even with this small sample of data the visualizations often seem to correlate remarkably well with independent historical benchmarks, while deviating from them often enough to suggest interesting possible revisions to the standard historical narratives. But our question about which figures were most influential has gotten us far afield from the five topics on which we were trying to focus. That is the nature of research in the humanities: a constant interplay between expansion and contraction of focus and interest, like breathing in and breathing out. Let’s contract our gaze again and get back to our five main topics. 14 Another question we might want to ask about those five topics, for instance, is which discourses tend to correlate with which others: when an author participates in one discourse, which other discourses is he or she most likely to take up as well? We noted earlier that authors who discuss commands also tend to discuss clarity, since those two topics were close together on our network map. We can compare the extent of individual authors’ participation in each discourse, at least for a few of the most prominent thinkers, using a parallel coordinates diagram (also from RAW): Each line represents one thinker, colored according to his school of thought. Not surprisingly, al-Shāfiʿī, in pink, tops the charts. The two Ashʿarī theologians (in purple) are engaged in all five of our selected aspects of legal hermeneutics, including theoretical reflection on God’s speech and the question of literal versus figurative meaning. By contrast, Ḥanafī jurists (in orange) are interested especially in the three more practical questions on the left, and less in the two more theoretical questions on the right. More surprising is the fact that Muʿtazilī theologians (in red) also show relatively little interest in the two theoretical questions on the right, while the two Ḥanbalī jurists (in blue) show somewhat more interest in those theoretical questions. This challenges the old stereotype that the Muʿtazilīs were sophisticated rationalists and the Ḥanbalīs naïve traditionalists. Visual analysis of textual data may often correlate with a historian’s expectations, but sometimes it runs against those expectations, and that can be a valuable clue that an old assumption is worth a second look. At some point in the research cycle of expanding and contracting our range of vision, of breathing in and breathing out, we need to hone in on a particular book and read it. Have these visualizations given us any direction? Well, al-Shāfiʿī has emerged in several diagrams as a key thinker, especially for the five discourses we have singled out. If we zoom in on him in the Exaptive Xap, we should be able to pull up information about the bibliographic records in which he is the author, and even identify which of his texts should be most relevant for each of these discourses. We could visualize where al-Shāfiʿī wrote about each topic, and also how influential each of his works was regarding each topic. For such purposes an alluvial diagram is ideal. This one (again produced with RAW) shows which authors (colored according to their schools of thought) cited which of al-Shāfiʿī’s writings, and how much each of those writings dealt with each of our five topics: 15 This alluvial diagram speaks volumes about al-Shāfiʿī’s influence. He was cited not only by members of his own school (who are shown in pink), but also by theologians (especially Ashʿarīs, some of whom were also Shāfiʿīs in law) and even by Ḥanafī, Ẓāhirī, Mālikī and Ḥanbalī jurists. Most tended to cite his views directly, without mentioning any of his written works, or else cited his Risāla; his other works were cited mainly by scholars within his own school. The diagram also reveals something about the topics of al-Shāfiʿī’s works—or at least those topics concerning which they were most often cited. His famous Risāla appears to have been quite influential, mainly with regard to general expressions and the issue of clarity and ambiguity, and to a lesser extent with regard to literal or apparent meaning. The Risāla does not appear to have dealt as extensively with commands or with theories of speech, however, or at least was not remembered by the scholars in my database as an important source on those topics. 16 We learn something different from this data, however, if we flip–flop the two middle columns: rather than which works each of the later authors cited, we can see on which topics each author cited him: Now we can see that although my own notes referred frequently enough to al-Shāfiʿī’s views on literal or apparent meaning, almost no Muslim thinkers quoted him on that subject. All kinds of thinkers cited him on general expressions and the issue of clarity, while an almost entirely different group of thinkers cited his views on commands, and a subset of those scholars cited him on speech, mostly citing works other than the Risāla. With al-Shāfiʿī’s subsequent influence on at least four of our five topics now well established, there remains one step to be taken before we return to the old–fashioned method of actually reading a book: we need to pick one of al-Shāfiʿī’s books to start with, and the most obvious candidate, according to these diagrams, is his famous Risāla (The Epistle on Legal Theory). 17 Before we actually crack the book, however, we can get a fuller picture of what it discusses by means of the familiar device of a word cloud. The most sophisticated word clouds display not simply the most common words in a book, but those words and phrases that are “statistically improbable” and therefore distinguish the book from others. Google Books displays such word clouds for its full text books, but it is not yet very good at handling works in Arabic. For al-Shāfiʿī’s Risāla it gives us: All this tells us is that Google has not yet bothered to install a list of Arabic “stop words,” which would have eliminated most of the words in this cloud—prepositions, conjunctions and the like, which tell us nothing about the book’s contents. Google’s word cloud of “common terms and phrases” for Joseph Lowry’s new English translation and Arabic edition of the Risāla is a bit more helpful: The character encoding for Arabic words is incompatible with Google’s web browser, but at least we find relevant terms. The words “command,” “obligation,” and “prohibited” confirm that we will at least find here a discussion of commands (even though the alluvial diagrams above showed that the Risāla was not often cited regarding commands). We also find the word “unrestricted,” which Lowry uses instead of “general” to translate ʿāmm. Those words, however, are rather overshadowed by other less obviously relevant phrases, and our other interests (clarity, speech, and literal meaning) are nowhere to be seen, 18 so we will try generating our own word cloud using the free online word cloud generator Wordle. Before we do, though, it is worth noting how helpful Google Books’ lists of “common terms and phrases” can be for works in English. Here is Google’s word cloud for Abdullah Saeed’s book Approaches to the Qur’an in Contemporary Indonesia: This is far better than a mere “frequency of occurrence” cloud; by identifying not just single words but also statistically improbable phrases it brings to the fore full names (Fazlur Rahman, Hamzah Fansuri, Nurcholish Madjid, [Muhammad] Quraish Shihab) and even titles of books (H.B. Jassin’s al-Quranu’lKarim Bacaan Mulia). If one clicks on a word, Google Books will display all the snippets of text from the book in which that word or phrase appears, allowing one to get a quick sense of what kinds of things the book has to say about each term. The word cloud also reveals immediately that the classical disciplines of exegesis (tafsīr) and Islamic law are key domains for Qur’anic interpretation in Indonesia today, and it suggests that Indonesia’s political history has shaped that interpretive discourse, with Pancasila, Soeharto, and “Indonesian politics” featuring prominently. At least that is what one can expect to find in this book; if that is not one’s interest, one knows to look elsewhere first—and learning what not to read is almost as helpful for a researcher as learning what one should read. This must be one of the most important benefits of the OU/Exaptive Xap: it should not only give us cosmic–looking diagrams of vast fields of literature; it must also help us hone in on those specific books that are most important for us to read right now. For al-Shāfiʿī’s famous Epistle on Legal Theory, al-Risāla, we will have to produce our own word cloud, which is quite simple using the full Arabic text extracted from al-Maktaba al-Shāmila and the online service Wordle: 19 In addition to al-Shāfiʿī’s own name ( الشافعي, in red) we see that he will be quoting a lot of hadith ( حديسث, Prophetic traditions, in red) because he refers frequently to the Sunna ( السنة, also in red) and because one of the largest words in the diagram (and therefore one of the most frequent in the book) is “he related to us” ( أخبرنا, also in red) which is used repeatedly in relating the chain of narrators who transmitted a hadith. The book is also clearly concerned with the Qur’an, which al-Shāfiʿī refers to as “the book” ( الكتاب, in dark blue), and with meaning ( معنى, in orange) and knowledge ( العلم, in bright green). Surprisingly, prayer ( الصاالة, in dark red) and women ( التساء, in yellow) also appear as frequent topics, indicating the kinds of specific examples al-Shāfiʿī will use to illustrate his theoretical points. We also find “command” ( أمر, in dark blue, as well as “his command,” أمره, in yellow green) and “literal or apparent meaning” (al-ẓāhir, الظاهر, in red); this is noteworthy, because the alluvial diagrams above showed that subsequent scholars cited the Risāla on commands and literal meaning less than they cited it on general expressions and clarity, which, strangely, are nowhere to be found in this word cloud. It appears that the Risāla used the terms “clarity” (bayān) and “general” (ʿāmm) too seldom for them to make it into the word cloud based on frequency alone, but that nevertheless what the book had to say about them proved quite influential. To see how significant those other terms are in this book, we can display their occurrence graphically, using a word trends graph produced with Voyant Tools: 20 Here it becomes obvious why “command” showed up first in the word cloud: it is an ongoing concern, discussed quite intensively at some points—the humps in the line of blue dots. The term ẓāhir, “literal or apparent meaning” (in green) shows up principally toward the end of the book (where it actually has a different sense that is not directly relevant to our interests, as we would soon learn if we opened the book, or just followed a link from the Xap to view snippets of context in which the term appears). General expressions (in pink), though not mentioned often enough to show up in the word cloud, are nevertheless a subsidiary but ongoing concern throughout the book, while bayān (clarity, the line of blue triangles) is discussed only in the first section of the book (as anyone familiar with the Risāla will remember). Speech (here represented only by the word khiṭāb) shows up in light blue, being mentioned occasionally in the middle part of the book, but perhaps, this graph would suggest, never being discussed as a subject in its own right. That gives us a very good clue about our next step: if we are interested mainly in theoretical discussions of speech, we might as well start reading elsewhere, but if we are interested in the problem of clarity and ambiguity, we should dive in and read the first section of the Risāla. I have read that first section many times over the years, and through that very close reading I have come to the conclusion that the Risāla is in fact not one but a sequence of three distinct books, each with its own argument, the last two having been appended to the first in response to questions and objections that had been raised by al-Shāfiʿī’s opponents. There is plenty of internal evidence for this claim, which I will reserve for another paper I am presently writing, but I found that word trend graphs of key terms supplemented that evidence and helped bolster my argument. For example, I believe that Book One (and only Book One) of the Risāla constitutes al-Shāfiʿī’s argument that all of Islamic law comes ultimately from the Qur’an but is clarified by the Prophet’s Sunna (his practice recorded in hadith). A word trend graph showing the distribution of various terms and word forms designating the Qur’an bears this out: 21 The Qur’an is mentioned frequently throughout Book One, which occupies the first 35% of the text. The first five percent of Book Two, represented by the dot over the number 8, contains a detailed summary of al-Shāfiʿī’s argument from Book One; this is itself a clue that Book Two was written subsequently, and it explains why the Qur’an is mentioned again a great deal in that section. Thereafter, however, the Qur’an is not a major topic. Instead, Book Two responds to a question about how to resolve conflicts between hadith (traditions from the Prophet) that appear contradictory. Consequently, hadith do not become an important concern until Book Two: Book Two does not use the word Sunna much, however; that was the abstract term al-Shāfiʿī used in Book One (and in the summary of Book One at the start of Book Two) when discussing how the Prophet’s Sunna clarifies the Qur’an: 22 Nor does al-Shāfiʿī use the term khabar, or report, in Book Two (even though he is constantly citing reports about the Prophet’s sunna—he just tends to refer to them as hadith): The term khabar becomes important only in Book Three, where a type of hadith called an “individually transmitted report” (khabar al-wāḥid) is the first of several types of disputable evidence that al-Shāfiʿī discusses in an attempt to explain why it is okay that Islamic law is often based on uncertain interpretations of disputable evidence. That discussion of uncertainty constitutes a new and separate argument, never foreshadowed in Books One or Two, centering around the terms knowledge (ʿilm), apparent truth (ẓāhir), and real truth (ḥaqq): In this case the resolution of the graph is not quite fine enough, and it appears that knowledge is discussed at the end of Book Two, but by breaking the text into one hundred parts instead of twenty we 23 can see that the blue peak representing a discussion of knowledge actually occurs at the very beginning of Book Three: What all these word trend graphs show is that the three sections I have identified not only discuss different topics, they do so using different sets of vocabulary. The evidence of vocabulary distribution, which I would not have noticed without these visualizations, corroborates my argument from other internal evidence that the Risāla was composed in three stages. Such digital visualization techniques, all by themselves, cannot prove such a structural claim, but they can supplement arguments based on close reading, and they can bring to light structural features that we had not noticed before. They can also help us find the sections of an unfamiliar book that are most likely to be relevant to our research. A good index is still useful, but a graphical index—which is all a word trends graph really is—can be even more eye–opening. Even in the slow work of close reading, digital tools remain an important resource. Promises and challenges Our visual tour through the sample research universe of my own database of notes has given us a picture of how digital analysis and visual representation of textual data can facilitate the pursuit of a traditional research project in the humanities. It has also revealed some important priorities and challenges for the OU/Exaptive project. First, humanities research requires different kinds of analysis and visualization at different points in the natural rhythm of scholarship. We started with seed words, expanded our world of concepts by searching for co-occurring words, clustered concepts into discourses, mapped the universe of authors dealing with those discourses, grouped thinkers into networks or schools of thought, graphed chronologically the evolution of discourses and the influence of particular authors, zeroed in on an especially influential author, and diagrammed who cited which of that author’s books on which topics. This allowed us to choose a particular book to read, but we analyzed it first by means of word clouds and word trend graphs, learning in the process that it discusses some topics at greater or lesser length than the diagrams of its influence on those topics had suggested. As we settle in to read that book (or the portions of it suggested by the word trend graph), visualization remains a useful tool, and the annotations we make about the text as we read it—whom it cites and what concepts it discusses—can and should become part of the research universe visualized in the Exaptive Xap. And as we read, we will again be led to new ideas, sparking a new cycle of outward exploration followed by contraction and a new focus on yet another author and book, contextualized within a somewhat different colored map of authors, networks, and discourses. At each point in the process a new tool is needed, so one key feature 24 of the Xap must be the ability to move seamlessly, using the same growing universe of data, from one type of visualization to another. Second, multiple sources of data are necessary. Visualizing my own notes revealed some things I did not previously know about them, but really stretching the boundaries of our knowledge and challenging our presuppositions and limited imaginations requires incorporating new sources of data that change the picture. The Xap must not be limited to a single, controlled dataset, but must be expandable so that new datasets can be tied in as research interests grow and new materials become available. And as online data is aggregated into a manipulable dataset, the source of each record—its original bibliographic record or full text access point—must always remain traceable so that at any point the user can zero in on a particular use of a particular term by a particular author in a particular text, go find it, and read it. The multiplicity of tools and datasets presents some major challenges. I have suggested several promising databases that would be useful for my own current project, but other scholars will need to access other data sets, and each will have its own legal and practical limitations. Each will offer a different kind of data: some will provide full text or word counts on which topic modeling and other algorithms can be performed; others will provide only search capabilities, so that word counts will have to be built up through repeated searches; and others will provide only bibliographic data. Moreover, each source will present data in its own format. It will be necessary to develop a robust and realistic format for temporarily storing data from multiple sources in a single data structure that supports all the different types of analysis and visualization that the Xap offers. Furthermore, much of the available data—especially that based on optical character recognition—will be seriously flawed. Fortunately, a considerable amount of noise and inaccuracy is quite acceptable in a project like this, where the results are meant to be suggestive rather than definitive. Additionally, even accurate data will take variable forms: words will be truncated or represented in multiple ways; names of authors will be recorded and spelled in numerous ways; and single books will be described differently in different databases or even within a single database. Some means will have to be provided for linking the most important synonymous or related terms, names, or titles, and identifying them with specific concepts, persons, and works. One way of doing this would be to let the user build thesaurus tables (on the fly and without too much manual typing) to group items he or she encounters, performing functions such as: Linking terms (in multiple languages) with single concepts. A translation suggestion tool would aid in developing lists of terms to represent concepts. Grouping concepts into discourses. Identifying multiple names and spellings with single historical persons. Grouping persons into schools, movements or networks. Identifying multiple bibliographic records with single works, editions, or copies. Some such identifications might be performed or suggested automatically, and the user should be able to add and modify identifications whenever he or she sees an important name or term showing up in multiple forms. For example, the terms “ʿāmm” and “general” could be linked automatically and shown side by side, and the user could then choose to drag one onto the other to confirm their identification. It is important to remember, however, that these identifications need not be perfect or complete. The 25 visualizations that the Xap produces are not research conclusions; they are suggestions, whose usefulness is proven only by their being left behind in pursuit of some new thought or reading. Another challenge will be the sheer quantity of available and relevant data. It will be necessary to set adjustable thresholds for the frequency of occurrence of terms as well as other criteria for including a work or author in the research universe upon which analysis and visualizations are performed. It will not be possible to store even temporarily, or analyze in a reasonable timeframe, data on every instance of every relevant term in every available work, so it will be necessary to determine an appropriate level of statistical detail to record (when available) about each work: complete word counts per page or per text, or only the density of occurrence per text of a small list of key terms. At the same time the data available online, though abundant, is far from a complete representation of the universe of relevant literature, and that selectivity is far from innocent. Works from regions with less developed publishing and distribution networks, or by persons without access to those networks, are likely to be underrepresented. This will have an invisible but real impact on the resulting scholarship. Once again, however, it is important to remember that we are not aiming at perfection, just improvement and advancement. Scholarship has always been blind to most of what human beings have had to say on any given subject. Present Western scholarship is limited in its range of vision by its tradition of relying primarily on citations in prior Western scholarship. The internet may contain only a skewed slice of human cultural productivity, but it is a far wider slice than scholars have worked with heretofore. The OU/Exaptive Xap should result in scholarship that brings to light creative thinkers who have not been noticed before outside their own immediate circles, and among better known figures it should reveal hitherto unimagined connections. In other words, the purpose of the OU/Exaptive Xap is to expand and stimulate the choices that humanities scholars make every day. Data mining and visualization cannot take the place of close reading of neglected texts, of travelling to distant places to find those books and speak with their authors face to face, or of thinking long and hard to understand what one reads and hears. The humanities necessarily require the very human processes of listening, interacting, reflecting, translating, and synthesizing. Google Earth lets you zoom in to find a specific address, and zoom back out to see its relation to any other place on earth, but it is not a substitute for travel. What it does very well is help you to discover and imagine places you have not visited, and get some orientation to them before you arrive. The OU/Exaptive Xap has a similar function: to help a scholar start where he or she is, get a big picture of the context of his or her ideas within the universe of human literature, rise up above the disciplinary tunnels and self-referential world of modern Western scholarship, give voice to figures who have previously been marginalized, spot new potential connections and influences and watershed moments, and make strategic choices about what to read next and how to frame it. Finally, Exaptive’s platform is designed not only to help generate better published scholarship, but to publish the scholarly process itself. Out of competitiveness and sheer inertia, our universities (and funding sources) still tend to reward accomplishments that can be claimed by one institution, a small team, or—especially in the humanities—one individual. One of Exaptive’s main goals is for work being done by one scholar in one way with one set of data to spark new ways of working and thinking for another scholar working on a different but somehow comparable set of data. The processes—the research steps illustrated above, the visualizations employed, the connections drawn—should be tracked so that the Exaptive platform can propose those methods to others who seem to be doing similar kinds of analysis on different data. Even the data tables constructed by one user might 26 occasionally be useful to another. Few scholars will want to dig through my own database of notes on Qur’anic hermeneutics, and I would not want anyone to have access to all of the comments that I have written over the years (some of which doubtless fall short of charity or even rationality). But perhaps my thesaurus tables would provide a useful filter for someone else working on Biblical exegesis or Islamic law; or perhaps I could improve my way of visualizing influences by adopting techniques from someone analyzing social networks. If Xaps created using the Exaptive platform can offer a limited amount of transparency (with appropriate controls for privacy and proprietary information), not only my published results but also the ways I go about discovering and representing them might become part of the publically available record and fruit of my research. Indeed, it might eventually come to pass that my results themselves will become public to some degree even as they are being created, long before they are finalized and published, and will remain available for others to continue exploring and manipulating using the same interface I used to create them. That would expose my research in a new way, making it vulnerable to a whole new level of critique; but such vulnerability is a good thing in the sciences, where data is (or should be) made accessible so that others can use it to confirm or challenge the conclusions that were based upon it. More such vulnerability would make scholarship in the humanities less territorial, more collaborative, and more open to critique, correction, and stimulation. Publication would become less individual, less closed, less final, and more open-ended, taking the form not just of a written conclusion but also of a live and accessible world of data with a built-in forum for exchange, in which my own written and visual results can be viewed and analyzed by others in ways I never thought to look at them myself. Such a shift in the methods of scholarship and publication would, in turn, require a shift in scholarly ethics that has been too long coming in the humanities: a shift toward valuing humility over authority, vulnerability and openness over definitiveness, and finding the possibilities in others’ work over finding fault with it. A difficult shift, to be sure, for reasons as deeply personal as they are institutional. But a worthy shift it would be, making the humanities more humane and scholars more human—ironically, by digital means. 27