Patent Classifications as

advertisement
Patent Classifications as
‘Knowledge’
…towards a more conscious
(auto)categorization of patents
Arcanum Development
2013
A usual hierarchic categorization
task…
Given a hierarchic taxonomy (classification
system)
Provide a list of taxonomy nodes (classification
symbols) for a document that best match the
subject matter of the taxonomy node
“Best” is based on…
– For experts: understanding the subject matter
and considering classification rules
– For computers: providing the hierarchy, and training
with sample categorized documents …and the rules?!
However, patent classification task is somewhat
more complicated…
Typical methods of categorization
Roots (sections)
Non-classifying
(‘preclassification’)
levels
… during training…
flat
the best is the winner on
each level
greedy hierarchic
traversing down on
best only
Common features of patent
classification schemes
Hierarchic
– covered subject matter of a higher level contains subject matter of a
lower level
– but: may be assigned to a higher level when none of the lower levels fit
Some nodes (symbols) cannot be used (in general or alone) for
classification
– hierarchy levels
– indexing schemes
Schemes contain specific rules – relations – between symbols
– Place / priority / precedence / limiting rules
– Indexing rules
– References to symbols to be taken in consideration
Rules given in the scheme are extended by definitions / manuals of
classification
Schemes can be multilingual
Used by various offices, cultures (maybe slightly differently)
Using relations in patent
categorization
Last place rule
Takes precedence
Hierarchic, rules, references
Why a (more) formal analysis and
presentation is advantageous?
Recently: the rules are presented as text, in
various master files, machine-readable but not
machine-interpretable way (it is ‘content’ but not
yet ‘knowledge’)
Lots of complex rules spread over multiple
sources (e.g. definitions) and places (e.g.
reverse references)
Both for humans and computer programs, it
causes trouble to collect and apply all the rules
systematically
It is worth then to convert IPC content to
more explicit IPC ‘knowledge’…
Hypotheses
Tests were made
– to verify confusions
of patent examiners
of various autocategorizers
– …if they are in correlation to relations given in IPC
Assumption
– more references in IPC: higher overlap between
subject matter area
Hypotheses
– the more references in IPC between two areas,
the higher the confusion of humans and computers
– the knowledge coded in IPC is, indeed, used by
patent categorizers
Testing the hypotheses
If patent examiners take seriously references in
IPC
– the more references between two symbols, the higher
number of co-classification
Practice between two offices can be different
– the more references in IPC, the higher likelihood of
different decision
Confusion of autocategorizers
– more failures if subject matter area is overlapping
Cocategorization vs. IPC references
A47,A61
When there are
references in IPC,
patent examiners
take them seriously
– References mark
overlapping subject
matter areas
and/or
– References
propose the use of
secondary
(indexing) symbols
On class level,
frequency of
references in IPC is
similar
to the frequency of
common use of
symbols of both
classes in patent
documents
B65
B60-B65
A61
vs C07 and C12
C07-C12
F16
G-H
Differences in examiner’s practice
A47,A61
When there are
references in IPC,
patent examiners
may assign them
differently
– References mark
overlapping
subject matter
areas
On class level,
frequency of
references in IPC is
similar
to differences
between selected
first symbol
(prereform practice,
simulate
preclassification)
B65
B60-B65
A61
vs C07 and C12
C07-C12
F16
G-H
Confusion of autocategorizers
A47,A61
When there are
references in IPC,
autocategorizers
fail more frequently
B65
– “first symbol” may
be selected
differently
On class level,
frequency of
references in IPC is
similar
to differences
between selected
first symbol of an
autocategorizer
(2002 data, to
simulate
preclassification)
B60-B65
A61
vs C07 and C12
C07-C12
F16
G-H
Conclusion
1.
2.
3.
4.
Reference statistics in IPC
Co-classification
Human classification differences
Preclassification autocategorization errors
show similar characteristics
on higher levels of IPC
It may be even more important on lower levels, having
there more complex rules
Therefore, an easier access to the rules maybe
welcome both by human and machine categorizers
Presentation of IPCInfo
An analysis and data preparation was performed
as in-house research
– defining relevant relation types (about 15 main
relations and further ~20) (excerpts below)
– parsing IPC scheme, definitions, catchwords and RCL
– building relation graph in RDBMS
(>1.5 m relations)
The result is presented on a user interface
Convertible to RDF or OWL for further use
Patent taxonomy relations,
samples
reference: (transitive!)
A01B 1/00 Hand tools (edge trimmers for lawns A01G 3/06)
A01G 3/06 Hand-held edge trimmers or shears for lawns (mowers
combined with lawn edgers A01D 43/16)
precedence: (over 600 transitive cases, e.g A61M 3/00  A61M 5/00  A61M
36/00 [in definitions!])
A01B 3/24 Tractor-drawn ploughs (A01B 3/04 takes precedence)
A01B 3/04 Animal-drawn ploughs
limiting:
A01N PRESERVATION OF BODIES…; BIOCIDES, e.g. AS
DISINFECTANTS, AS PESTICIDES OR AS HERBICIDES; …
in Definitions for A01N subclass:
Fungicidal, bactericidal, insecticidal, disinfecting or antiseptic paper D21H
Patent taxonomy relations,
samples
indexing : guidance heading before A61K 101/00
Indexing scheme associated with group A61K 51/00, relating to the nature of the
radioactive substance
placerule : note before A01N 25/00, even specifying an exception…
In groups A01N 27/00-A01N 65/00, in the absence of an indication to the contrary, an
active ingredient is classified in the last appropriate place.
priorities (standardseq): for main groups in IPC where no place rule is applied
cooccurrence: e.g. in catchwords: also the text of IPC mentions the reference
CONDITIONING harvested crops A01D 43/10, A01D 82/00
A01D 43/10 with means for crushing or bruising the mown crop
A01D 82/00 Crop conditioners, i.e. machines for crushing or bruising stalks (mowers
combined with means for crushing or bruising the mown crop A01D 43/10)
Presentation of IPCInfo / 2
Thank you…
And keep reading if interested…
Formalization
With mathematical notations
Targeted for audience not familiar with IPC
The ‘patent’
(auto)categorization task
Regular multiclass hierarchic categorization task
– Given a hierarchic taxonomy (a patent classification)
with categories
– Given a set of training documents, each associated to
multiple categories
…or…
an expert knowing both state of the art of the field and
the taxonomy
– For a document, provide a list of potential categories
(preferably with relevance)
– Categorization level may be fixed (preclassification)
or full
But…
The ‘patent’
(auto)categorization task, but…
Really a regular multiclass hierarchic categorization task?
– Taxonomy: text and definitions (manuals or handbooks) and revisions, and
therefore:
known relations between categories (rules of classification, e.g. last place rule, takes
precedence)
secondary categories, non-primary categories (indexing codes, ‘not used as first
symbol’)
some categories excluded for ‘final’ categorization (top levels of the hierarchy) but
required in preclassification (where secondary categories cannot be used)
– Documents
contain metadata (priorities, inventor, applicant)
various “fields” (title, abstract, description, claims)
some fields are subject of independent categorization (claims),
some fields may be use just globally (abstract, description)
– Changes: subject matter of a symbol, classification rules and procedures
provided categories may require revisions,
since taxonomy can be revised in regular intervals or immediately
e.g. there is no more ‘main classification symbol’
preclassification may help to reduce the scope but requires handling failures
Notations: Hierarchic taxonomy
Taxonomy: T
Category: C,
supercategory: ⊗ ∉ C
Parent function: p: C→C ⋃ ⊗ function,
describing a non-directed tree graph
Ancestors: p+: C→C+ ⋃ ⊗, transitive closure of
p
Child function (subcategories): c: C→C* = p-1
Descendants: c+: C→C* transitive closure of c
Roots of taxonomy (‘sections’): C⊗ ⊂ C ,
C⊗ = {r ∊ C | p(r) = ⊗}
Notations: Patent taxonomy
Level of category: L, l: C→L (e.g. ‘subclass’)
Classifying category level: Lc⊂ L
Classifying category: Cc ⊂ C
Cc = { c ∊ C: l(c) ∊ Lc }
Non-classifying category: Cc‾ ⊂ C
Cc‾ = C ∖ Cc
Category symbol: s: C↔$ ($ stands for string)
Category sort relation: c1 < c2 ⇔ s(c1) < s(c2)
also min, max applicable for C+
Category interval: [f,t] = {c ∈ C | f ≤ c ∧ c ≤ t }
Usually: descendants form a contiguous interval, i.e.
∀ a ∈ C : d ∈ [ min(c+(a)), max(c+(a))] ⇔ d ∈ c+(a)
Notations: category relations
Relation types: R ⊂ (C → (℘(C) ∪ ⊗))
All relations in a taxonomy: TR ⊂ C ☓ C ☓ R
All relations for a category: r∀ : C → (R ☓ C)*
Obvious relation types in hierarchies:
{ parent, child, ancestor, descendant } ⊂ R
defined as parent ≈ p, child ≈ c etc.
Further obvious relation: sibling (s), as child of parent (c)
c  C
 {s  C | s  c}
sibling (c ) : 
{s  c( p(c )) | s  c} c  C
Interval and set relations: union of the single-category form, e.g.
descendant({c1,[c2,c3]})
result abbreviated as an interval or set:
descendant(a) = [min(c+(a)),max(c+(a))]
Patent taxonomy relations
on a single version
Invertable relations
–
–
–
–
Simple reference: category ‘refers’ to another
‘Takes precedence’ reference
limiting references, very similar to precedence
Allowed indexing symbols on an interval
Precedence relations on siblings
– placerule: first place rule or last place rule
– priority: siblings prioritized by ‘standardized sequence’
cooccurrence of references (commutative)
Patent taxonomy relations,
samples
reference: (may refer further!)
A01B 1/00 Hand tools (edge trimmers for lawns A01G 3/06)
A01G 3/06 Hand-held edge trimmers or shears for lawns (mowers
combined with lawn edgers A01D 43/16)
precedence:
A01B 3/24 Tractor-drawn ploughs (A01B 3/04 takes precedence)
A01B 3/04 Animal-drawn ploughs
limiting:
A01N PRESERVATION OF BODIES…; BIOCIDES, e.g. AS
DISINFECTANTS, AS PESTICIDES OR AS HERBICIDES; …
in Definitions for A01N subclass:
Fungicidal, bactericidal, insecticidal, disinfecting or antiseptic paper D21H
Patent taxonomy relations,
samples
indexing : guidance heading before A61K 101/00
Indexing scheme associated with group A61K 51/00, relating to the nature of the
radioactive substance
placerule : note before A01N 25/00, even specifying an exception…
In groups A01N 27/00-A01N 65/00, in the absence of an indication to the contrary, an
active ingredient is classified in the last appropriate place.
priorities (stand.seq.): main groups in IPC where no place rule is applied
cooccurrence: in catchwords: also the text of IPC mentions the reference
CONDITIONING harvested crops A01D 43/10, A01D 82/00
A01D 43/10 with means for crushing or bruising the mown crop
A01D 82/00 Crop conditioners, i.e. machines for crushing or bruising stalks (mowers
combined with means for crushing or bruising the mown crop A01D 43/10)
Patent taxonomy relations,
multiple versions
Patent taxonomies change in time
A former category (or a set) may be
– transferred to a single or a set of new categories or, it
is recognized that the subject matter is
– covered by a single or set of existing categories
In the newer version, all the categories which
are associated to a single or a set of former
categories, are in concordance relation
concordance relation may be computed by
transitive traversing category changes over
multiple versions
Patent taxonomy relations:
concordance relation sample
2011: B24B 49/00
Measuring or gauging equipment for controlling the feed movement of the
grinding tool or work; Arrangements of indicating or measuring equipment,
e.g. for indicating the start of the grinding operation
2012: B24B 49/00  B24B 37/005 - 37/015, B24B 49/00
B24B 37/005 . Control means for lapping machines or devices
B24B 37/013 . . Devices or means for detecting lapping completion
B24B 37/015 . . Temperature control
B24B 49/00 Measuring or gauging equipment for controlling the feed
movement of the grinding tool or work; Arrangements of indicating or
measuring equipment, e.g. for indicating the start of the grinding operation (
B24B 33/06, B24B 37/005 takes precedence; if applicable to other machine
tools, B23Q 15/00-B23Q 17/00 take precedence)
Effect of relations on categorization
A weighted directed graph can be built between
categories
Whenever an ‘oracle’ (e.g. a flat categorizer, a
fielded search etc.) proposes a category, related
categories must be evaluated and verified, may
be, in a given order, considering also weights
Training may also benefit from knowing, in
advance
– order of evaluation, e.g. standardized sequences,
priority rules
– relations:
to enhance a good hit or suppress a false hit
or co-classifiy
Download