Overcoming language barriers in patent information search

advertisement
Overcoming language barriers
in patent information search
Sep. 2010, Geneva
Daeshik Jeh
Director General, Information Policy Bureau
Korean Intellectual Property Office (KIPO)
Contents
1
Introduction
2
KIPO’s Activities
3
Global Efforts
4
Conclusion
1/34
1. Introduction
Background
Convertibility of information based on automatic
translation or interpretation may shake up everything
from employment and the organization of the office,
to the role of literacy in daily life…
- Power Shift by Alvin Toffler
2/34
1. Introduction
Background
As the world continues to come together in forms such as the UN,
WTO, WIPO, EU, BRICs, NAFTA, and APEC, it has become
increasingly important to exchange, convert and analyze
information across various languages.
The EU secretariat has approximately 4,000 translators
and interpreters on its payroll, which consumed around
800 million Euros in 2006. This translates to 1% of its
total budget and 40% of its administrative budget.
In spite of all this effort, there still remains difficulties in
multi-lingual translations (e.g., Finnish → English →
Hungarian).
* Source : EU Website
3/34
1. Introduction
Necessities – Patent examination
Patent Application
PCT Application
* Source: WIPO website
# of patent applications:
a 26% increase from 2001
to 2007
2,000
1,600
1,460,536
80%
1,854,416
1,701,179
70%
1,491,494
1,200
# of patent applications
by non-residents:
continuously increasing;
reached 43.3% of the
total # of applications filed
in 2007
60%
58%
58.3%
57.4%
56.7%
802,853
800
621,294
50%
725,506
613,379
43.3%
400
42%
’01
4/34
’03
40%
42.6%
41.7%
’05
’07
1. Introduction
Necessities – Patent examination
PCT Application
Patent Application
200
55.2%
52.5%
150
56.0%
60
PCT applications: a 48% increase
from 2001 to 2007
159,953
47.3%
50
136,753
115,206
108,236
100
PCT applications in non-native
English speaking countries:
gradually increasing
40
30
50
’01
’03
’05
’07
* Source: WIPO website
- English: US, EP, GB, CA, AU
- Non-English: JP, KR, CN, DE, RU
5/34
The PCT has now regulated its
official languages to include: English,
French, German, Japanese, Russian,
Chinese, Spanish, Arabic, Korean
and Portuguese
1. Introduction
Necessities – Patent examination
PCT Application
Patent Application
Patent applications: a 26% increase from 2001
to 2007
Patent applications by non-residents:
continuously increasing; reaching 43.3% of the
total # of applications filed in 2007
PCT applications: a 48% increase from 2001
to 2007
PCT applications in non-native English
speaking countries: gradually increasing
The PCT has now regulated its official
languages to include: English, French,
German, Japanese, Russian, Chinese,
Spanish, Arabic, Korean and Portuguese
Consequently, during patent examinations, it has now become
necessary to cite and refer to foreign documents as much as to
domestic documents.
6/34
1. Introduction
Necessities – R&D
As technologies become further developed and enhanced, they become
globalized beyond an enterprise’s nationality and the conventional
features of an area/region.
Improve
R&D
projects
Make it mandatory for prior art searches of
patent databases to be included in the planning
and evaluation of R&D projects
Patent information should be widely used in R&D
activities and the recent advent of “Open
Innovation” has made it more necessary, now
than ever, to refer to foreign patent
information
7/34
1. Introduction
How to overcome language barriers
Study the language of
target country
Merits
Demerits
faster and high
quality prior art
searches
takes a long time to
learn and be fluent in
a foreign language
Hire multilingual
search-personnel
Use a machine
translation system
more understandable
translation and
flexible management
of human resources
many translations in a
short time
bad prior art
searches due to the
lack of expert
knowledge of such
personnel
low quality
translations, big initial
investment is required
8/34
Fast and
Cost-effective!
1. Introduction
Commercial Machine Translation Services
Lots of commercial MT services including Google are available to the public.
Diverse services such as translation of web pages, translation toolbar etc.
Languages
supported
Service
Google
57 languages
• Free online service
• Translation of web pages
• Cross lingual retrieval system
• Google toolbar and translator toolkit
• Statistics-based translation
service
• Convenient user feedback
Yahoo
BABEL FISH
12 languages
• Free online service (max. 150 words)
• Translation of web pages
• Yahoo toolbar
• Technologies offered by
SYSTRAN
• based on English and French
52 languages
• Fee-based service
• Translation service for use in
multinational corporations
• Available at the USPTO and web
portals such as Yahoo, Lycos,
and Altavista
• based on English and French
33 languages
• Free line translation services
• Fee-based service: web sites,
translation API
• Available at the EPO
• Machine translation services for
enterprises including Microsoft
MT Provider
SYSTRAN
World
Lingo
9/34
Remark
1. Introduction
Use of Commercial Machine Translation Services
Demerits
Merits
Since commercial MT services are being continuously extended to cover many
languages, almost all patent documents in the world can be translated through
them.
There are many free services available to the public.
As they cover general sentences, they can be applied to both patent and nonpatent literature.
10/34
1. Introduction
Use of Commercial Machine Translation Services
Merits
Demerits
Prior art searches through commercial MT services do not provide convenience
in editing search queries. More so, search queries/results have to be copied
and pasted one by one.
Since commercial services are designed to support broad areas, they may be
inefficient for a specialized area like patents.
Many IPOs including KIPO, EPO, and JPO either have customized
commercial translation engines or in-house developed ones.
11/34
1. Introduction
Machine Translation Service Status of Some Major Countries in Asia
Patent specific MT services targeting non-native English speaking countries
such as China, Japan, and Korea
KIPO and JPO have customized commercial translation engines, while SIPO’s was developed
in-house.
MT Provider
Sirius
(Commercial
Service
Provider)
Toshiba
(Commercial
Service
Provider)
Languages supported
• Korean ↔ English
• Japanese → Korean
Service
• K-PION: Korean patent-utility model gazettes and
examination information in English
• KOMPASS: English/Japanese documents in Korean
targeting KIPO examiners
• KIPRIS: Overseas documents targeting the Korean public
(English/Japanese into Korean)
• Japanese ↔ English
• AIPN: Japanese patent information in English targeting
oversea examiners
• Japanese ↔ Chinese
• IPDL: Japanese patent information in English for the public
Chinese Patent
Information
• Chinese → English
Center
• CPMT (China Patent Machine Translation): free public
service for translating specifications and claims of gazettes
into English.
12/34
1
Introduction
2
KIPO’s Activities
2.1
MT Services
2.2
Patent Information Search
3
Global Efforts
4
Conclusion
13/34
2. KIPO’s Activities – MT Services
Status of KIPO’s MT Services
KOREAN
J2K Translation
J2K Translation Service
Launched in 2000
PL / NPL written in Japanese for KIPO’s examiners
PL written in Japanese for the general public
ENGLISH
JAPANESE
14/34
2. KIPO’s Activities – MT Services
Status of KIPO’s MT Services
K-PION Service
KOREAN
K2E Translation
37 IPOs
K2E Translation Service
Launched in 2005
For examiners of foreign IPOs and KIPO
Korean patent documents
ENGLISH
JAPANESE
15/34
2. KIPO’s Activities – MT Services
Status of KIPO’s MT Services
K-PION Service
KOREAN
K2E Translation
E2K Translation
K2E Translation Service
37 IPOs
Launched in 2005
For examiners of foreign IPOs and KIPO
Korean patent documents
E2K Translation Service
ENGLISH
Launched in 2008
JAPANESE
PL/NPL written in English for KIPO’s examiners
PL written in English for the general public
16/34
2. KIPO’s Activities – MT Services
Specialized Machine Translation Services for Patent Documents
To improve the quality of machine translation engines, the following
issues have been considered:
Linguistic features
- Word order (Korean and Japanese have same word order → Subject +
Object + Verb phrase; while for Chinese and English, it’s Subject + Verb
phrase + Object.)
- Letters (English, German, and French originated from Latin characters;
while Korean, Japanese and Chinese have their own characters)
Digitization of patent documents
- Accuracy in digitizing patent documents through OCR greatly influences the
quality of machine translations.
17/34
2. KIPO’s Activities – MT Services
Specialized Machine Translation Services for Patent Documents
To improve the quality of machine translation engines, the following
issues have been considered:
Building of a patent-specific terminology dictionary
Service type
~2007
2008
2009
300,000
Total
K2E
3,200,000
3,500,000
E2K
3,000,000
300,000
3,300,000
J2K
1,200,000
300,000
1,500,000
Use of markup documents such as XML
- e.g., KIPO has published patent gazettes in XML since February 2005.
18/34
2. KIPO’s Activities – MT Services
Methods of improving translation quality
Korean Patent Gazette
Features of Patent documents
Abstract: usually a single long
sentence and thus has a high
possibility of error when machine
translated
Specification: brief explanation of
the drawing is written in a simple
sentence and the other parts, in
general descriptive sentences.
Claims: has a hierarchical tree
structure made of independent and
dependent claims. Written in a noun
phrase
19/34
2. KIPO’s Activities – MT Services
Methods of improving translation quality
Features of Patent documents
Name
Others
In XML documents,
the tags help users
to identify the
different sections as
described in the
previous slide.
Abstract, Summary
Description Drawings
Claims
Different translation protocols depending on the tag information of the
patent gazette
20/34
2. KIPO’s Activities – MT Services
Example – Korean Patent Gazette
XML of Korean Patent Gazette
Application
Server
REQ_HNM_KE
1. Analyze
XML Tag
Information
오은영 → Oh Eun Young
REQ_KE
본 발명은… → This invention…
2. Adjust
appropriate
translation
protocol
REQ_ABS_KE
본 발명은… → This invention…
REQ_DRDES_KE
3. Translate
도1은 본 발명에.. → Drawing 1 is
a…
REQ_CLAIM_KE
K2E
Translation
Server
21/34
폐피혁을 용매에.. → Methodology
of…
2. KIPO’s Activities – MT Services
Applicability to Patent Documents Produced by Other IPOs
A consistent pattern depending on each item
IPOs
EPO
USPTO
JPO
SIPO
Abstract
Description
Short sentences of less
than 150 words
Low possibility of errors when translated since
it is comprised of short sentences and general
statements
Summarized in less
than 400 words
“Brief description of drawings” is written in
short sentences.
The entire “Description” is comprised of
general statements.
Concise statement with
a single sentence or
described respectively
“Brief description of drawings” is written in
short noun phrases without commas or
periods.
Other parts of “Description” is written in
general statements.
Patterns distinguished in markup documents such as XML
22/34
Claims
Tree structure
with
independent
and
dependent
claims written
in noun
phrases or
clauses
2. KIPO’s Activities – Patent Information Search
Patent Information Search using MT engines
To use MT engines for patent information search, the following issues have
been considered:
Target users and objectives of MT services
- internal examiners or foreign examiners
Building of a database
- original documents or machine translated documents
Users
Machine
Translator
DB
(Original
docs.)
Users
DB
(Machine
translated
docs.)
Machine
Translator
DB
(Original
docs.)
* In terms of cost-benefit analysis, the former is better for low frequency of using
foreign docs. while the latter is better for high frequency of using foreign docs.
Formulation of search queries (e.g., operators, terminology dictionary)
Screen layout / organization
23/34
2. KIPO’s Activities – Patent Information Search
KOMPASS (Korean Multifunctional Patent Search System)
KOMPASS targets KIPO examiners and supports patent information search
in English and Japanese.
It conducts integrated search in Korean, English, and Japanese, respectively.
Korean integrated search function targets Korean and Japanese documents
(Japanese documents: database built from machine-translated documents)
English integrated search function targets all kinds of data retrieved from
English documents and the search results can be translated into Korean.
Japanese integrated search function targets all kinds of data retrieved from
Japanese documents and the search results can be translated into Korean (only
for patents and utility models)
24/34
2. KIPO’s Activities – Patent Information Search
KOMPASS (Korean Multifunctional Patent Search System)
KOMPASS targets KIPO examiners and supports patent information search
in English and Japanese.
It conducts integrated searches in Korean, English, and Japanese,
respectively.
Korean integrated search function targets Korean and Japanese documents
(Japanese documents: database built from machine-translated documents)
English integrated search function targets all kinds of data retrieved from
- Japanese
gazettes
were
English
documents
and
the search results can be translated into Korean.
previously searchable through
Japanese
from
machineintegrated
translation.search function targets all kinds of data retrieved
DB
Japanese documents and the search results can be translated
(Original (only
Machine into Korean
- Due
to the
rapid
increase
Users
docs.)
Translator
for
patents
and
utility
models)of its
use by KIPO examiners, the
search speed has been getting
slower.
25/34
2. KIPO’s Activities – Patent Information Search
KOMPASS (Korean Multifunctional Patent Search System)
KOMPASS targets KIPO examiners and supports patent information
searches in English and Japanese.
It conducts integrated searches in Korean, English, and Japanese,
respectively.
Korean integrated search function targets Korean and Japanese documents
(Japanese documents: database built from machine-translated documents)
English integrated search function targets all kinds of data retrieved from
- English
In 2009,documents
for faster search,
and theall
search results can be translated into Korean.
the Japanese gazettes were
DBkinds of data retrieved from
Japanese
integrated
search
function targets(Machine
all
DB
machine-translated
and
used to
(Original
Japanese
documents and the search results can
be translated
into Korean
(only
translated
Machine
build a database.
Users
docs.)
docs.)
Translator
for patents and utility models)
- KIPO examiners’ convenience
has been greatly improved.
26/34
2. KIPO’s Activities – Patent Information Search
KOMPASS (Korean Multifunctional Patent Search System)
KOMPASS targets KIPO examiners and supports patent information
searches in English and Japanese.
It conducts integrated searches in Korean, English, and Japanese,
respectively.
Korean integrated search function targets Korean and Japanese documents
(Japanese documents: database built from machine-translated documents)
English integrated search function
targets all kinds of data retrieved from
Korean Search
English documents and the search results can be translated into Korean.
Japanese integrated search function targets all kinds of data retrieved from
Japanese documents and the search results can be translated into Korean (only
for patents and utility models)
Korean keyword search of Japanese
documents (using J2K database)
27/34
2. KIPO’s Activities – Patent Information Search
K-PION (Korean Patent Information Online Network)
K-PION is a free search service for helping foreign examiners better
understand Korean patent information (examinations, gazettes etc).
It also supports an English keyword search service.
service for retrieving Korean patent and utility model gazettes and examination
information from original and machine-translated documents
an English keyword search service for KPAs
service for Korean industrial designs and trademarks including PCT related documents
an English keyword search service for Korean patent and utility model gazettes
Applicant
Foreign
Examiners
Translate
Search results
into English
Search
Korean gazettes
K-PION Patent Information Retrieval
Input
English Keywords
Automatically
translated into
Korean Keywords
28/34
Extended to
Korean synonyms
1
Introduction
2
KIPO’s Activities
3
Global Efforts
4
3.1
IP5 Foundation Project on Mutual Machine Translation
3.2
Cross-Lingual Information Retrieval
Conclusion
29/34
3. Global Efforts
IP5 Foundation Project on Mutual Machine Translation
IP 5 offices will improve the quality of machine translation (MT) services
and harmonize MT services among themselves.
Achieved by:
(Improvement of the quality of MTs)
• Joint quality review of non-English to English MTs by English speaking
Offices
• MT system upgrade based on the quality review results
• Reduction of errors in original documents
(Harmonization of MT services)
• Harmonization of the contents of MT services
Regarding searches, this project will help each office to better
understand the prior art documents of other offices and to use them in
citations
30/34
3. Global Efforts
WIPO’s CLIR (Cross-Lingual Information Retrieval)
CLIR has been newly added to the PATENTSCOPE and the beta version is
currently under test by the public.
When searching PCT and national application data, inputted keywords can be
extended into other languages such as English, French, German, Japanese,
and Spanish.
Linked to Google translation service;
search results are available in all the
languages it supports.
Available in over 1.7 million published
international patent applications (PCT)
and in more than 3 million when
patent documents from Regional and
National collections are included.
31/34
1
Introduction
2
KIPO’s Activities
3
Global Efforts
4
Conclusion
32/34
4. Conclusion
Considering the tremendous amount of global patent information, machine
translation services will be the most practical and efficient way to search patent
information of other IPOs.
There are many ways to implement a patent search system using an MT engine. In
selecting a specific methodology, each IPO should consider the frequency of use,
budget, and linguistic features.
For improving the performance of MT and search systems, each IPO may consider
some options such as building of a machine-translated database, patent-specific
terminology dictionary, and state-of-the-art IT technologies such as XML.
International cooperation among IPOs is very important for the improvement of MT
quality. KIPO has done its utmost in order to overcome language barriers and
enable non-Korean speakers to better access Korean patent information. KIPO will
continue to collaborate with other IPOs in this regard.
33/34
E-mail: daeshik@kipo.go.kr
34/34
Download