DGO 2011

advertisement
Bio-REGNET
Developing an Ontology for the U.S.
Patent System
Siddharth Taduri, Hang Yu, Gloria T. Lau, Kincho
H. Law, Jay P. Kesan
Stanford University
University of Illinois Urbana-Champaign
06/13/2011
Problem Statement
Issued
Patents and
Applications
File
Wrappers
Court Cases
Regulations
and Laws
Technical
Publications
• Patent Validity and Enforcement Questions involves analysis of
documents in various domains – World-wide Patents, PTO File
Wrappers, Scientific Publications and Court documents
• The information is siloed into several diverse information
sources
06/13/2011
2
Problem Statement
Issued
Patents and
Applications
File
Wrappers
Court Cases Specific Technical
Domain
Regulations
and Laws
Technical
Publications
•The sources are diverse in structure, formats, semantics
and syntax
• How to develop a comprehensive knowledge of patents in
a particular technological space?
06/13/2011
3
Patents Documents
 Over 7 million U.S. patents
 In 2009, 485,312 patent applications
were filed
 Information is contained in various
sections of the documents; a full-text
search alone is not sufficient – other
metrics such as classification, citations
etc. need to be considered
 Documents are available in HTML
Format and can be easily parsed
06/13/2011
4
927 F.2d 1200 (1991)
AMGEN, INC., Plaintiff/Cross-Appellant,
v.
CHUGAI PHARMACEUTICAL CO., LTD., and Genetics Institute, Inc., DefendantsAppellants.
Court Cases
Nos. 90-1273, 90-1275.
United States Court of Appeals, Federal Circuit.
March 5, 1991.
Suggestion for Rehearing Declined May 20, 1991.
…
…
Before MARKEY, LOURIE and CLEVENGER, Circuit Judges.
…
THE PATENTS
On June 30, 1987, the United States Patent and Trademark Office (PTO) issued to Dr. Rodney
Hewick U.S. Patent 4,677,195, entitled "Method for the Purification of Erythropoietin and
Erythropoietin Compositions" (the '195 patent). The patent claims both homogeneous EPO and
compositions thereof and a method for purifying human EPO using reverse phase high
performance liquid chromatography. The method claims are not before us. The relevant claims
of the '195 patent are:
1.
Homogeneous erythropoietin characterized by a molecular weight of about 34,000
daltons on SDS PAGE, movement as a single peak on reverse phase high performance
liquid chromatography and a specific activity of at least 160,000 IU per absorbance unit
at 280 nanometers.
******
3.
A pharmaceutical composition for the treatment of anemia comprising a therapeutically
effective amount of the homogeneous erythropoietin of claim 1 in a pharmaceutically
acceptable vehicle.
4.
Homogeneous erythropoietin characterized by a molecular weight of about 34,000
daltons on SDS PAGE, movement as a single peak on reverse phase high performance
liquid chromatography and a specific activity of at least about 160,000 IU per
absorbance unit at 280 nanometers.
06/13/2011
 Court Cases are not very well
structured!
 Comparatively more difficult
to parse information
 PACER – an electronic system
to access databases for U.S.
Courts - requires one to know
party/assignee
name,
case
number/type, etc. which may
not be known
5
Events
Text
Patent File Wrappers
 File Wrappers are folders which
contain all documents exchanged
between a patent applicant and
the patent office
 Every File Wrapper is different!
No standardized ordering of
events
 The relevant information is
embed within lots of irrelevant
text
 File Wrappers are available as
images requiring additional
processing in order to extract
text
06/13/2011
6
Cross-Referencing
 There are many aspects of these documents which can be utilized;
especially the cross-referencing between the documents
COURT CASE
314 F.3d 1313 (2003)
AMGEN INC., Plaintiff-Cross Appellant v.
HOECHST MARION ROUSSEL, INC. (now
known as Aventis Pharmaceuticals, Inc.) and
Transkaryotic Therapies, Inc., DefendantsAppellants.
…
Plaintiff-Cross Appellant Amgen Inc. is the
owner of numerous patents directed to the
production of erythropoietin ("EPO"), …alleging
that TKT's Investigational New Drug
Application
("INDA")
infringed
United
States Patent Nos. 5,547,933; 5,618,698; and
5,621,080. The complaint was amended in
October
1999
to
include
United
States Patent Nos. 5,756,349 and 5,955,422,
which issued after suit was filed.
BIOPORTAL: DOMAIN KNOWLEDGE
06/13/2011
REGULATIONS:
U.S. Code Title 35, C. F. R Title 37, M. P. E. P.
…
Publication Database
FILE WRAPPER
U.S. Patent 5,955,422
…
PATENT
United States Patent, 5,955,422
September 21, 1999
Production of erthropoietin
Abstract: Disclosed are novel polypeptides
possessing part or all of the primary
structural conformation and one or more of
the biological properties of mammalian
erythropoietin ("EPO") …
Inventors: Lin; Fu-Kuen (Thousand Oaks,
CA)
Assignee: Kirin-Amgen, Inc. (Thousand Oaks,
CA)
Appl. No.: 08/100,197
Filed: August 2, 1993.
Claims 61-63 are rejected under 35
U.S.C. § 103 as being unpatentable
over any one of Miyake et al., 1977
(R)
…
In accordance with the provisions of
37 C.F.R. §1.607, the present
continuation is being filed for the
purpose of
…
7
Basis on Developing Patent System Ontology
 Established semantics allow us to reason over the classes,
properties and instances to infer new facts
 Documents can be connected to form a network similar to
citation networks. Only now we have not just citations, but
other metadata such as co-inventorships, technological
classification and other cross-domain relevancy metrics
between documents (ex: patents occurring in court cases etc.)
 Allows us to perform link analysis using algorithms such as
Page Rank to establish importance
 Can develop rules to perform additional inferences over the
knowledge
06/13/2011
8
Competancy Questions
Single Domain
• Return all patent documents which contain the keyword
“erythropoietin” in the “claims”
• Return all court cases which involve “Amgen_Inc” either as the
plaintiff, defendant of both, and from the court “courtA”
Multi-domain:
• Return all patents which contain the keyword – “erythropoietin” in
the “claims”, which have been challenged in the courts
The complexity of the queries, depends on the user’s requirement
In general, the ontology should be able to answer:
1. Textual queries
2. Metadata queries, with numeric filters
3. Multi-source queries
5/24/2011
9
Class Hierarchy - I
06/13/2011
10
Class Hierarchy - II
06/13/2011
11
Class Hierarchy - III
06/13/2011
12
Parsing the document to instantiate the Ontology
 Documents
are
automatically
parsed using a regular expression
based script
 Separate scripts needed for each
document domain
 Ontology
is
automatically
instantiated using the Protégé-OWL
API
Chugai
..
Amgen
..
hasDefendant
Case
1
06/13/2011
hasPlaintiff
13
What can you ask the Patent Ontology?
 Simple questions can be answered by currently existing systems
 Return all Patents by the Inventor – “Fu-Kuen Lin”
 Return all Court Cases prior to yyyy-mm-dd
 Return all the patent documents which contain the keyword
“erythropoietin” in the Claims and Assigned to “Amgen_Inc”
 The Patent System Ontology is intended to answer simple queries as well
as complex queries which span more than a single information domain
 Return a court case which involves 3 or more patents
 From a file wrapper, identify the patents involved in an interference,
display information about the inventor, assignee, and claims of that
patent. Further, enlist the other patents the inventor owns, if any.
Note: The patent system ontology allows inferring details about one
document type (patents), based on the information from other document
types (file wrappers)
06/13/2011
14
Example Query
 Return all the patent documents which contain the keyword
“erythropoietin” in the Claims and Assigned to “Amgen_Inc”. What
technology classes do these patent documents belong to?
 SPARQL Query:
SELECT DISTINCT ?patent ?inventor
FROM <http://localhost:8890/PatentOntologyInferred>
WHERE{
?patent a ont:Patent .
?patent ont:hasAbstract ?abs .
?abs ont:resourceVal ?val .
?val bif:contains "erythropoietin" .
?patent ont:hasAssignee ont:Amgen_Inc .
}
Limit 10
?patent ont:hasInventor ?inventor
06/13/2011
Patent
Inventor
5856298 Strickland_Thomas_W
5885574
Elliott_Steven_G
7304150
Egrie_Joan_C
7304150
Elliott_Steven_G
7304150
Browne_Jeffrey_K
7304150
Sitney_Karen_C
7217689
Elliott_Steven_G
7217689
Byrne_Thomas_E
6319499
Elliott_Steven_G
5756349
Lin_Fu-Kuen
15
So Far …
 54 Classes, 40 Properties and over 15,000 individuals from 1150
patents, 30 court cases and one partially instantiated file wrapper
 Used Protégé-OWL to edit the ontology and Protégé-OWL API to
programmatically instantiate physical documents
 Can query any SPARQL endpoint such as Protégé or Virtuoso’s
Triple Store
 Can also use SWRL to query (We haven’t developed SWRL query
rules)
06/13/2011
16
Use-Case: Erythropoietin
Current Corpus : experimental platform to test the
overall effectiveness of the framework
 5 Core patents – U.S. Patents 5,621,080, 5,756,349, 5,955,422,
5,547,933, 5,618,698
 135 directly related patents (through citations) form our gold
standard for computing formal measures such as Precision and Recall
 Total patent corpus of 1150 patents
 Identified over related 3000 publications through citations. These are
available on PubMed and can be accessed through Entrez – A tool that
provides a search interface to PubMed database
 Around 30 court cases, patent litigation involving major companies
including Amgen, Hoechst Marion Roussel, Inc., Transkaryotic
Therapies, Inc.
06/13/2011
17
Querying BioPortal to Extract Concepts and Terms
06/13/2011
18
Expanded Query
Original Term: Erythropoietin
Synonyms: Erythropoietin, Recombinant Erythropoietin, erythropoietin receptor
binding, Hematopoietin, Recombinant EPO, Erythrocyte Colony Stimulating Factor,
Epoetin, EPO …
Children: Darbopoietin Alfa, Epoetin Alfa, Epoetin Beta …
Parents: Colony Stimulating Factors, cytokine receptor binding, recombinant
hematopoietic growth factors…
Grand-Parents: hematopoietic growth factor, receptor binding, recombinant growth
factor …
 An appropriate ranking function is to be applied to balance the more general terms.
Heuristically, we assign a higher weight to synonyms, and a lower weight as we traverse away
from the concept node
 Resulting Query: “original term” OR [synonyms]^weight OR [children]^weight OR ….
06/13/2011
19
Current prototype framework
1. Use bio-ontologies to expand user’s query, covering broader terms and
concepts
2. Search document domain using expanded query
3. Use patent system ontology’s properties to relate documents (from all
document domains)
4. Support user feedback to ensure search progresses in right directions
Patent System Ontology
06/13/2011
20
Querying with SPARQL
 SPARQL is a query language for
RDF
Operation
Variables
SELECT ?subject ?predicate ?object
WHERE {
?subject ?predicate ?object
}
Triples
 Syntactically very similar to SQL
– for relational databases
 Any number of variables can be
specified
 Many triples can be used in
conjunction
to
form
more
complex queries
 We will use Virtuoso’s triple
store to query the ontology
06/13/2011
21
Court Cases with “Erythropoietin”
SELECT DISTINCT ?cases
WHERE {
?cases a :CourtCase .
?cases :hasBody ?caseBody .
?caseBody :resourceVal ?comment .
FILTER REGEX (?comment,
"erythropoietin", "i") .
Case_4: Amgen v/s Chugai …
Case_5: Amgen v/s Genetics …
Case_2: Amgen v/s Chugai …
Case_3: Amgen v/s F.
Hoffma…
….
30 Cases retrieved
}
06/13/2011
22
Patents Involved in the Court Cases
SELECT DISTINCT ?patents
WHERE {
?cases a :CourtCase .
?cases :hasBody ?caseBody .
?caseBody :resourceVal ?comment .
FILTER REGEX (?comment,
"erythropoietin", "i") .
}
?cases :patentsInvolved ?patents .
5411868
5621080: Production of Erythropoietin
5547933: Production of Erythropoietin
5618698: Production of Erythropoietin
5756349: Production of Erythropoietin
5955422: Production of Erythropoietin
5441868
4703008
4677195
5322837
Core Patents are in bold
06/13/2011
23
List of Events in the File Wrapper
SELECT DISTINCT ?doc
WHERE {
:FileWrapper_5955422 :contains
?doc .
?doc :hasDate ?date
}
ORDER BY ?date
06/13/2011
07_609741
07_609741_Amendment_1
07_609741_Interference_1
07_609741_Rejection_1
07_957073_Amendment_1
…
P5955422 (Issued Patent)
24
Initial Claims of File Wrapper
SELECT DISTINCT ?claim
WHERE {
:07_609741 :hasClaim ?claim .
}
ORDER BY ?claim
07_609741_claim_1
07_609741_claim_2
07_609741_claim_3
…
07_609741_claim_60
A purified and isolated polypeptide having part or all of the
primary structural conformation and one or more of the
biological properties of naturally occurring erythropoietin and
characterized by being the product of procaryotic or eucaryotic
expression of an exogenous DNA sequence.
06/13/2011
25
Summary of Interference Record
SELECT DISTINCT ?claim
WHERE {
:07_609741_Interference_1
:InterferingClaims ?claimInt .
:07_609741_Interference_1 :affectedClaims
?claim .
}
ORDER BY ?claim
07_609741_claim_60
07_609741_claim_61
07_609741_claim_62
P4879272_claim_2
P4879272_claim_3
An erythropoietin-containing,
pharmaceutically-acceptable
composition wherein human serum
albumin is mixed with
erythropoietin either during the
preparation of said composition or
just before administration thereof.
An erythropoietin-containing,
pharmaceutically-acceptable preparation
wherein human serum
albumin is mixed with erythropoietin.
06/13/2011
26
Current Limitations
 One needs to know SPARQL in order to query
 One needs to know the semantics of the ontology such as
the relations, domain and range restrictions etc.
 Performing manual querying can be very time consuming.
Automation is needed
 Domain specific semantics need to be separately integrated
 Probabilistic weighing – ranking inventors, assignees,
patents etc. is not possible using the SPARQL endpoint
 We are developing a user-friendly automated tool to
search the patent system
06/13/2011
27
Future Work
 Include other
regulations, laws
information
sources
–
publications,
 Develop automated tool and search framework (Currently
under development)
 Experiment with more use cases outside of the biomedical
domain
06/13/2011
28
Tool Snapshot
06/13/2011
29
Acknowledgement
This research is partially supported by NSF
Grant Number 0811975 awarded to the
University of Illinois at Urbana-Champaign
and NSF Grant Number 0811460 to Stanford
University. Any opinions and findings are
those of the authors, and do not necessarily
reflect the views of the National Science
Foundation.
06/13/2011
30
Please Visit the
System
Demonstration
Thank You!
Questions?
06/13/2011
31
Extra Slides
06/13/2011
32
Common US Classes, Inventors and Assignee
SELECT DISTINCT ?inv ?class ?assignee
WHERE {
?cases a :CourtCase .
?cases :hasBody ?caseBody .
?caseBody :resourceVal ?comment .
FILTER REGEX (?comment,
"erythropoietin", "i") .
?cases :patentsInvolved ?patents .
?patents :hasInventor ?inv .
?patents :hasUSClass ?class .
?patents :hasAssignee ?assignee .
}
06/13/2011
?inv
Lin_Fu-Kuen
Hewick_Rodney_
M
Seehra_Jasbir_S
?class
USPC 530/380
USPC 530/399
USPC 530/397
USPC 514/8
USPC 435/69_6
USPC 530/835
USPC 530/388_7
…
?assignee
Kirin-Amgen_Inc
Genetics_Institute_In
c
…
33
Extracting Citations
Results
SELECT DISTINCT ?forw ?backw
WHERE {
?cases a :CourtCase .
?cases :hasBody ?caseBody .
?caseBody :resourceVal
?comment .
FILTER REGEX (?comment, "erythropoietin",
"i") .
?cases :patentsInvolved ?patents .
?patents :hasCitation ?forw
?backw :hasCitation ?patents .
}
6541033
4710473
4358535
4558005
4465624
4757006
4399216
4558006
3865801
3033753
…
06/13/2011
34
Generated Results
 Around 30 court cases
 Several
patents
citations
patents including core
and
forward/backward
 Can
search
patents
by
the
inventors, assignees and/or US class
identified
 What’s more? Can go search court
cases
with
new
keywords
or
information gathered
06/13/2011
Gathered Results
Case_4: Amgen v/s Chugai …
Case_5: Amgen v/s Genetics In.
Case_2: Amgen v/s Chugai …
….
5621080:
5547933:
5618698:
5756349:
5955422:
…
5441868
4703008
4677195
5322837
…
Production of Erythropoietin
Production of Erythropoietin
Production of Erythropoietin
Production of Erythropoietin
Production of Erythropoietin
Patents with Inventor: Lin_FuKuen
Patents owned by Genetics_Inc
…
35
Related documents
Download