Can case citations tell us what a legal opinion is about?

Can case citations tell us what a legal opinion is about?
If so, why not use them to sort legal information?
Submitted by:
Eric Dunn and Michael Ruiz
May 31, 2014
Professor Vogl and Professor Genesereth
Legal Informatics (Law 729) Spring 2013-2014
Justice Oliver Wendell Holmes described lawyering as a practice of “gather[ing]
the scattered prophecies of the past” to bear on a particular case.1 In legal opinions these
“prophecies” are gathered in the form of citations to precedent – prior cases that provide
the basis for an argument. These citations are signals that transmit information in a
standard format. For a lawyer, a citation means a particular argument depends upon
another. Citations reference a larger “legal network” in which “a web of opinions [are]
connected to each other through stacked sets of citations.”2
Our paper seeks to leverage citations to precedent in order to intuit the subject of
a legal opinion based on how it taps into these “legal network[s].” In Part I, we describe
why leveraging citations can and should be done automatically. In Part II, we describe
the algorithm we developed to automatically categorize legal opinions based on the
citations they make to other cases and then test this approach to see how well it sorts
recently decided Federal Circuit cases. In Part III, we discuss the limitations of our initial
observations and make recommendations for future research. Our results suggest that
algorithms can accurately predict a case’s subject matter based purely on the cases cited
within the opinion. Applying these algorithms to the growing amount of legal
information available to the public will help make legal information accessible – not just
available, but useful.
Oliver Wendell Holmes Jr., The Path of the Law, 10 Harv. L. Rev. 457 (1897).
Jay P. Kesan, David L. Schwartz, Ted Sichelman, Paving the Path to Accurately Predicting Legal
Outcomes: A Comment on Professor Chien's Predicting Patent Litigation, 90 Tex. L. Rev. 97 (2012).
Part I: The value of automatically sorting legal information
The amount of legal information available for free has increased, but it will remain
inaccessible until it is organized.
The first obstacle to obtaining access to the law is being able to gather, as Justice
Holmes would say, the precedents and statutes that give the law its shape. While legal
information was once “buried in dimly lit basements of federal courthouses” it is now
possible for “anyone with a computer, Internet connection, and credit card” to access
legal information.3 For most of the twentieth century Westlaw and Lexis, which provide
online access to legal opinions, drove this increase in access.4 In the past decade the
competition to provide basic access to legal information has intensified and the cost of
access has fallen.5 Large companies, start-ups, and open source projects have exploded
the amount of open, available legal information.6
More information is not necessarily better, however. Access to legal information
is meaningless if the information is not organized and searchable. Growing the haystack
without making the needle easier to find does not increase accessibility, it simply
provides access to those who know where to look.
This problem has already been solved, at least partially. The primary tools that
modern lawyers use to navigate the law combine access to information with tools that
Dru Stevenson and Nicholas J. Wagoner, Bargaining in the Shadow of Big Data, 66 Fla. L. Rev. 22-23
(forthcoming 2014). Available at
Paul Lomio, Lecture at Stanford Law School (Apr. 3, 2014), available at
Stevenson, supra note 3, at 25.
Id. Additionally, several examples of emerging legal technology provide an example of the ongoing
revolution. Ravel Law, available at, provides access to legal opinions and legal analytics.
Casetext, available at, provides access to legal opinions and “crowd-sourced” tags, which
attempt to replicate Lexis and Westlaw headnotes. Google has also made some court documents and legal
documents (such as patents) available via Google Scholar, available at The
Library of Congress now maintains a free online database of U.S. statutes, now available at While PACER is also maintained by the U.S. courts, and accessible with a fee,
projects such as RECAP, available at, have emerged to make even these documents
accessible for free.
make this information accessible. For example, Westlaw and Lexis allow users to search
for legal information, quickly summarize it, and discover whether it is still applicable
law. This information is sorted by subject, court, terms, and other pieces of metadata.
But these tools exist behind a pay wall. They are the product of “an army of pricey legal
experts [who] manually sift through, summarize, and classify each source before making
it available online.”7 Other projects, such as Casetext, have used “crowd-sourcing” to
fuel the process of organizing legal information. These projects ultimately still depend
on an army of unpaid legal experts who sift, summarize, and classify.8
Resolving the tension between growing the size of available information and
making the information accessible need not depend on pricey legal experts. Well-tuned
algorithms provide an opportunity to resolve this tension because they make information
accessible independent of its size. In other words, finding the needle in a larger haystack
merely requires more computing power rather than more manpower. Combined with
existing tools that cull cases from court websites and statutes from government websites
these tools could grow the pool of available information and automatically sort it.
Legal opinions are ripe for automatic sorting because lawyers already pair
structured, machine-readable metadata with their arguments.
Computer algorithms that analyze language thrive on pattern recognition.9
Although computers do no speak the language of humans, at least not yet, programs can
be trained to recognize patterns and leverage those patterns to analyze text. For example,
Dru Stevenson and Nicholas J. Wagoner, Bargaining in the Shadow of Big Data, 66 Fla. L. Rev. 24
(forthcoming 2014). Available at
During our research we found the available tags at Casetext inadequate. Some cases have well developed
tags, suggesting that increased use could allow Casetext to develop into a valuable legal resource, but many
cases, especially critical intellectual property cases, were untagged and unsorted.
See generally Christopher Bishop, PATTERN RECOGNITION AND MACHINE LEARNING, (Springer-Verlag
New York, Inc. 2006).
e-discovery tools prove that supervised algorithms, which analyze text (keywords, etc)
and metadata (information about the text), can help sort legal information.10 Scholars
have begun to apply these natural language processing approaches to legal arguments,11
but our paper argues that the most useful patterns in legal opinions remain underutilized.
It is often said that learning the law is like learning a different language.12
Lawyers learn how to understand and parse legal language, but they also learn another
language that lawyers speak. This language is highly structured, governed by a single set
of rules, and largely uniform.13 This “second language” is the language of legal citations
– often called “blue booking” among lawyers, scholars, and exhausted law students.14
Legal citations are highly structured signals that link together legal arguments,
connecting a legal argument to one that has been made previously.
When lawyers “gather[] the scattered prophecies of the past” they cite to
precedent, cases in which a court has previously decided and articulated a legal rule.15 A
citation hints that an argument being provided in the current opinion is substantively
similar, or identical, to the argument to which the lawyer (or judge) has cited. To allow
Daniel M. Katz, Quantitative Legal Prediction – or – How I Learned to Stop Worrying and Start
Preparing for the Data Driven Future of the Legal Services Industry, 62 Emory L. J. 946 (2011).
Marie-Francine Moens, Erik Boiy, Raquel Mochales Palau, Chris Reed, Automatic detection of
arguments in legal texts, Address at the 11th International Conference on Artificial Intelligence and Law,
(June 04, 2007), available at
William M. Sullivan, Anne Colby, Judith Welch Wegner, Lloyd Bond, Lee S. Shulman, EDUCATING
It is worth noting that the Bluebook provides universal citations in most federal courts, but all courts
(especially at the state level) have subtle differences. Nevertheless these variations are systematic.
Expanding a citator to read different formats would be a matter of programming specific use cases rather
than departing from the value that structured citations offer.
The term makes reference to the Bluebook. THE BLUEBOOK: A UNIFORM SYSTEM OF CITATION
(Columbia Law Review Ass’n et al. eds., 19th ed. 2010).
Oliver Wendell Holmes Jr., The Path of the Law, 10 Harv. L. Rev. 457 (1897).
the reader to easily refer to the previous argument citations appear in a structured format
governed by a universal style manual.16
For example, Figure 1.117 shows an example of a legal citation:
The value of legal citations is that they offer a way to judge the substance of an
argument without reading the opinion or processing unstructured text. Figure 1.1 shows
the type of information available in a citation, including the court that decided the case
and the year it was decided. To sort the contents of a legal opinion an algorithm need
only know some information about a subset of cases. Then if a future case cites to a
“known” case the algorithm can learn about the new case based on the “known” case.
Finding these citations can be done automatically, leveraging the structured format
lawyers use to communicate legal information to each other.
This approach does not eliminate the need for an “army” of legal experts, but it
dramatically reduces the amount of manpower necessary to sort legal information. From
a small pool of cases, hand coded by humans, an algorithm can expand outwards –
searching new documents for references to known cases and tagging these cases
Even where The Bluebook does not provide the universally followed set of citation rules, citations
merely follow a subtly different set of uniform standards. These standards would be available on the
website of the relevant court. For example, New York courts have specific citation rules which subtly vary
from The Bluebook and these rules are available at
THE BLUEBOOK: A UNIFORM SYSTEM OF CITATION 87 (Columbia Law Review Ass’n et al. eds., 19th ed.
Part II: Testing an algorithm to sort Federal Circuit cases
Designing the algorithm
The goal of our project was to design a program to parse citations to precedent
and use these citations to approximate the subject of a legal opinion. In essence, the
program finds the propositions used to support a lawyer’s argument and leverages its
knowledge about those propositions to understand the underlying argument.
Given the information contained in case citations and the machine-readable
pattern in which case citations appear, our first step was to write code to parse a legal
opinion and pull out citations to previous cases. Once we developed a parsing algorithm
we needed to “teach” the algorithm about precedent. To do this we used Casetext, a
website which provides access to legal opinions and the ability to “tag” legal opinions
with descriptive terms. Based on a combination of substantive expertise and Casetext’s
ability to sort opinions based on the number of citations they have received we collected
and tagged thirty “critical cases” to feed into the program. Each page of a case was
tagged with descriptive information, such as labels indicating whether the page discussed
patents. The critical cases included key cases from the Federal Circuit and the Supreme
Court, which are the two appellate courts with subject matter jurisdiction in patent cases.
Figure 2.1 shows an example of a page from Casetext with tags applied in the right hand
Next, we “trained” the algorithm by integrating the tags applied through Casetext
into the program. We were able to essentially match pages in the Federal Reporter to
substantive tags in a dataset.
Leveraging our existing parsing algorithm we calibrated the program to read a
legal opinion, find any references to a page in the Federal Reporter (a citation to a
Supreme Court or Federal Circuit case), and match this reference to the dataset described
above in order to apply tags from “known” cases. Based on the applied tags the program
calculated whether to classify a case as a patent case or not.18
Testing the algorithm
We tested the efficacy of the algorithm by measuring its accuracy via a binary
classification test. First, we drew a sample of 100 random legal opinions issued by
United States Court of Appeals for the Federal Circuit in the final quarter of 2013. The
Court of Appeals for the Federal Circuit decides more patent cases than any other single
U.S. court, with approximately half of its cases being patent cases.19 We chose to limit
our sample to this court in particular to ensure that our sample contained a critical mass
of patent cases for the algorithm to detect.
We examined these cases manually and sorted them into “patent” and “nonpatent” categories. We categorized a case as a patent case if the legal issue was
substantially related to patent subject matter. Opinions based overwhelmingly on
procedural grounds without discussion of underlying patent issues were categorized as
non-patent cases even if the original complaints were based on patent claims. These
The program has additional functionality not explored in this project. Essentially, it is able to carry
forward all the tags and make more complicated conclusions – such as whether to apply test specific tags.
United States Court of Appeals for the Federal Circuit, Appeals Filed, by Category FY 2013 (available at
cases might contain useful general precedent regarding subjects like venue change or
court filing deadlines, but the lack of reference to patent doctrine or patent issues grants
them little value to an attorney searching for patent cases. After adjusting for errors,
(explained below) our final sample contained 38 patent cases and 51 non-patent cases.
After manually sorting the cases, we compared the algorithm’s categorization for
each case against our manual categorization. By comparing the differences we
determined how often the algorithm correctly identified what we determined to be patent
cases. We ran the binary classification test for the algorithm under two different
thresholds. In the first trial (threshold = 1), the algorithm categorized a case as “patent” if
the body of the text contained at least one reference to patents. For example, if the
opinion cited a “critical” case it was assigned as a “patent” case under this first threshold.
In the second trial (threshold = 2), the body of the text had to contain at least two
references to patents. For example, if the opinion cited to a critical case multiple times,
cited at least two critical cases, or cited to a page in a critical case that had been tagged as
especially relevant to patents the case was designated as a “patent” case under this second
5 cases were removed from the analysis for having no citations. These opinions
were short orders or affirmations without fully written decisions and no cases were cited
in the body of the text. We removed these cases from the analysis because they were not
the target type of cases that we were testing since they have little precedential value. An
additional 6 cases had to be removed for technical reasons. For these cases, either the
URL was unstable and produced an error or the webpage was incompatible with the
algorithm and also produced an error.
The first trial (threshold = 1, N = 89) tested at .75 accuracy with a negative
prediction value of .7 and a positive prediction value of 1. See figure 2.2 for a breakdown
of individual case results.
Patent Case
Algorithm Test Outcome
Figure 2.2
The second trial (threshold = 2, N = 89) tested at .67 accuracy with a negative
prediction value of .63 and a positive prediction value of 1. See figure 2.3 for a
breakdown of individual case results.
Patent Case
Test Outcome
Figure 2.3
True Positive
True Negative
False Positive
False Negative
0, 0%
Figure 2.4
Figure 2.4 displays results of trial 1 presented as a ratio of the total sample. At a
threshold value of one, the algorithm correctly categorized 16 of the cases as patent cases
(true positives) and 51 cases as non-patent (true negatives). The algorithm also
miscategorized 22 cases as non-patent cases when they in fact were patent cases (false
negatives). In proportionate terms, the algorithm correctly identified 75% of all cases,
but misidentified 58% of the pool of patent cases.
The algorithm set to a threshold value of two performed strictly worse than the
threshold one trial. The number of false negatives rose to 29 with the algorithm only
correctly tagging 24% of patent cases. Unsurprisingly, the number of false positives
remained zero. Given these results, it’s clear that a threshold value of one critical case is
a more accurate filter for the algorithm under our trial conditions.
The algorithm was successfully able to categorize many of the cases, but there is
still room for improvement. Even at a threshold value of 1 the algorithm missed over
half the patent cases in the sample, meaning that those patent cases in our sample did not
cite to any of the critical cases that the algorithm is built to recognize. Despite these
errors, it is encouraging to note that the algorithm did not produce any false positives.
Although the sensitivity of our algorithm is lower than we would prefer, its reliability is
also much higher than expected.
The algorithm may have missed several patent cases, but it was correct about
every case that it did categorize as a patent case. This indicates that none of the nonpatent cases were citing to critical patent cases in their opinion. If this pattern holds, then
our automated case tagging system has only a minimal chance to be thrown off by cases
citing precedent from other bodies of case law. This is an important success because it
suggests that our automated sorting system can be used without fear of erroneously
tagging cases. At worst, the algorithm will leave some cases untagged that would have
remained untagged in the absence of our automated system.
Part III: Limitations on our test and future adjustments
Our conclusions are limited by a small critical case pool and the narrow conditions
of our initial test.
Our primary limitation on the accuracy of the sorting algorithm is the small
number of critical cases that the system is based on. The 22 patent cases that were
erroneously flagged as non-patent in our test contain citations to other patent cases, but
not the critical cases that our algorithm relies on. If we build up the underlying web of
critical cases we should see a direct correlation in number of patent cases that we catch
with our sorting system. A second limitation to our findings is that our sample was drawn
from a single court that only hears limited types of cases. The Court of Appeals for the
Federal Circuit has subject matter jurisdiction over several areas of law, but the majority
of its cases are patent and administrative law cases. While our sample is an accurate
representation of the cases before the Court of Appeals for the Federal Circuit, the
population from which our sample is drawn is not necessarily a good representation of
the larger body of case law across different courts. It’s possible that our sorting
algorithm will be less accurate at separating patent cases from other types of cases not
present in our sample such as tech transactions, licensing contract disputes, or even
unexpected case types such as civil rights or general torts cases.
As we move forward into applying the algorithm to different case types, another
possible limitation we may discover is that our lack of false positives in this test is unique
to patent law. Our trial did not produce any false positives because the non-patent cases
in our sample did not cite to our critical patent cases, but this effect may be dependent on
an insular quality of patent law. Patent law is a highly technical and specific body of law,
so it makes sense that other types of cases would have little use for findings from such a
specialized legal field. This might not be the case for more general bodies of law, such as
tort. For example, one might imagine that an opinion issued for a contract case
discussing negligence might cite a seminal tort case on the nature of the negligence
standard. Under our current model, our algorithm would erroneously flag this contract
case as a tort case based on that citation – at least at the low threshold used in our first
trial. One possible solution for this potential problem might be adjusting the algorithm’s
threshold values based on the type of tag we want to apply, which will have less of an
effect on false-negatives when the size of critical cases grows. Additionally, the layering
of page specific tags will help address this problem. For example, if a human can
distinguish parts of an opinion that reflect one area of law from another then the
algorithm can trace a citation not just to a specific opinion but to a particular page in the
Expansion of our critical case pool and moving into additional bodies of law will
allow us to sort more varied and more granular information.
Our next steps involve expanding the scope of our proposed sorting system by
addressing our limitations from this first test. We first have to expand our selection of
critical opinions for patent case identification before establishing pools of critical cases in
other bodies of law. The algorithm’s sorting accuracy and its optimal threshold value
will have to be tested for separate areas of law to ensure that our automated system
remains an effective tool for categorizing types of cases other than patent.
After collecting a robust selection of critical cases, we can work to create more
granular tags beyond simple categorization of case types. Because our algorithm bases
its conclusions on case citations, the system could theoretically identify cases that are
often cited for specific propositions. Ideally, our automated system will be able to not
only categorize the type of case that it is reading, but also to link often cited cases with
particular doctrines so that specific passages could be categorized as well. For example,
the algorithm might tag a case generally as a Title VII case while also tagging specific
pages or paragraphs with doctrines specific to Title VII cases such as “tiers of scrutiny.”
In experiments with our algorithm we were able to successfully deploy this approach in
the area of patent law, tagging specific passages as not only “patent” but with tags such
as “patentable subject matter” – refining our algorithm to recognize where a test was used
in an opinion.
One step we could take to improve the functionality of our system is to have the
system automatically integrate the cases it tags into its base of critical cases. We did not
employ this approach for this trial because we wanted to strictly test whether citation
categorization was useful in the first instance. If done successfully, every case that our
system reads and tags as a patent case, for example, would immediately be used to
identify other patent cases in the future that cite to the cases that were tagged by the
algorithm. This machine learning would improve accuracy by increasing the size of the
critical case pool and it may also keep the system up to date as more recent legal opinions
are issued. We will have to approach this form of machine learning with caution
however, because any cases that are erroneously tagged and placed into the critical case
pool could perpetuate other erroneous tags. It is possible that a few initial false positives
could have an effect that spirals outward into more false positives once those tags are
used to identify further cases, seriously hampering the algorithms accuracy in a systemic
fashion. Currently, the lack of false positives in our initial test lessens these concerns, but
it is a possibility we will have to monitor.
Legal data is increasingly available, but the sheer volume of legal information
becoming available makes it unwieldy to use. As with any body of large data, properly
sorting and labelling this data is an essential first step before useful analysis becomes
possible. Unfortunately, the sheer amount of legal text that is produced every year
requires a large number of legal professionals to read and annotate the massive body of
information. As an alternative to costly legal analysts, we propose an automated system
that reads, categorizes, and labels legal opinions. If we are to depend on automatic
algorithms to cull legal information we must develop automatic algorithms to sort this
information as well.
Our initial test shows that an automated system built to categorize legal opinions
based on the recognition of case citations can accurately sort legal information. Our
algorithm correctly labeled 75% of cases in our sample as patent or non-patent and we
expect this accuracy rate to improve as we add more cases to the critical pool of citations
that the algorithm recognizes. Additionally, our algorithm was 100% correct whenever it
classified a case as a patent case. This lack of false positives is important because it
suggests that while our automated classification system may miss some cases and leave
them untagged, it does not erroneously classify a case with the wrong tag. If this trend
remains true, our automated system’s positive tags can be relied on as a completely
accurate way to categorize large bodies of cases.
While our test results are somewhat limited by the homogeneity of the population
our sample was taken from and by our exclusive focus on patent cases, our findings
suggest that our automated method is already a useful tool for sorting data. Ideally, our
system will eventually be able to tag specific doctrines within legal opinions and
automatically incorporate newly tagged cases into its own underlying web of critical
cases. Regardless of these possible future features, we are confident that our current
approach provides an efficient and accurate way to categorize legal information.
Related flashcards

Virtual reality

23 cards

Social groups

34 cards

Video game controllers

38 cards

User interfaces

40 cards

Chess engines

41 cards

Create Flashcards