Overcoming language barriers in patent information search Sep. 2010, Geneva Daeshik Jeh Director General, Information Policy Bureau Korean Intellectual Property Office (KIPO) Contents 1 Introduction 2 KIPO’s Activities 3 Global Efforts 4 Conclusion 1/34 1. Introduction Background Convertibility of information based on automatic translation or interpretation may shake up everything from employment and the organization of the office, to the role of literacy in daily life… - Power Shift by Alvin Toffler 2/34 1. Introduction Background As the world continues to come together in forms such as the UN, WTO, WIPO, EU, BRICs, NAFTA, and APEC, it has become increasingly important to exchange, convert and analyze information across various languages. The EU secretariat has approximately 4,000 translators and interpreters on its payroll, which consumed around 800 million Euros in 2006. This translates to 1% of its total budget and 40% of its administrative budget. In spite of all this effort, there still remains difficulties in multi-lingual translations (e.g., Finnish → English → Hungarian). * Source : EU Website 3/34 1. Introduction Necessities – Patent examination Patent Application PCT Application * Source: WIPO website # of patent applications: a 26% increase from 2001 to 2007 2,000 1,600 1,460,536 80% 1,854,416 1,701,179 70% 1,491,494 1,200 # of patent applications by non-residents: continuously increasing; reached 43.3% of the total # of applications filed in 2007 60% 58% 58.3% 57.4% 56.7% 802,853 800 621,294 50% 725,506 613,379 43.3% 400 42% ’01 4/34 ’03 40% 42.6% 41.7% ’05 ’07 1. Introduction Necessities – Patent examination PCT Application Patent Application 200 55.2% 52.5% 150 56.0% 60 PCT applications: a 48% increase from 2001 to 2007 159,953 47.3% 50 136,753 115,206 108,236 100 PCT applications in non-native English speaking countries: gradually increasing 40 30 50 ’01 ’03 ’05 ’07 * Source: WIPO website - English: US, EP, GB, CA, AU - Non-English: JP, KR, CN, DE, RU 5/34 The PCT has now regulated its official languages to include: English, French, German, Japanese, Russian, Chinese, Spanish, Arabic, Korean and Portuguese 1. Introduction Necessities – Patent examination PCT Application Patent Application Patent applications: a 26% increase from 2001 to 2007 Patent applications by non-residents: continuously increasing; reaching 43.3% of the total # of applications filed in 2007 PCT applications: a 48% increase from 2001 to 2007 PCT applications in non-native English speaking countries: gradually increasing The PCT has now regulated its official languages to include: English, French, German, Japanese, Russian, Chinese, Spanish, Arabic, Korean and Portuguese Consequently, during patent examinations, it has now become necessary to cite and refer to foreign documents as much as to domestic documents. 6/34 1. Introduction Necessities – R&D As technologies become further developed and enhanced, they become globalized beyond an enterprise’s nationality and the conventional features of an area/region. Improve R&D projects Make it mandatory for prior art searches of patent databases to be included in the planning and evaluation of R&D projects Patent information should be widely used in R&D activities and the recent advent of “Open Innovation” has made it more necessary, now than ever, to refer to foreign patent information 7/34 1. Introduction How to overcome language barriers Study the language of target country Merits Demerits faster and high quality prior art searches takes a long time to learn and be fluent in a foreign language Hire multilingual search-personnel Use a machine translation system more understandable translation and flexible management of human resources many translations in a short time bad prior art searches due to the lack of expert knowledge of such personnel low quality translations, big initial investment is required 8/34 Fast and Cost-effective! 1. Introduction Commercial Machine Translation Services Lots of commercial MT services including Google are available to the public. Diverse services such as translation of web pages, translation toolbar etc. Languages supported Service Google 57 languages • Free online service • Translation of web pages • Cross lingual retrieval system • Google toolbar and translator toolkit • Statistics-based translation service • Convenient user feedback Yahoo BABEL FISH 12 languages • Free online service (max. 150 words) • Translation of web pages • Yahoo toolbar • Technologies offered by SYSTRAN • based on English and French 52 languages • Fee-based service • Translation service for use in multinational corporations • Available at the USPTO and web portals such as Yahoo, Lycos, and Altavista • based on English and French 33 languages • Free line translation services • Fee-based service: web sites, translation API • Available at the EPO • Machine translation services for enterprises including Microsoft MT Provider SYSTRAN World Lingo 9/34 Remark 1. Introduction Use of Commercial Machine Translation Services Demerits Merits Since commercial MT services are being continuously extended to cover many languages, almost all patent documents in the world can be translated through them. There are many free services available to the public. As they cover general sentences, they can be applied to both patent and nonpatent literature. 10/34 1. Introduction Use of Commercial Machine Translation Services Merits Demerits Prior art searches through commercial MT services do not provide convenience in editing search queries. More so, search queries/results have to be copied and pasted one by one. Since commercial services are designed to support broad areas, they may be inefficient for a specialized area like patents. Many IPOs including KIPO, EPO, and JPO either have customized commercial translation engines or in-house developed ones. 11/34 1. Introduction Machine Translation Service Status of Some Major Countries in Asia Patent specific MT services targeting non-native English speaking countries such as China, Japan, and Korea KIPO and JPO have customized commercial translation engines, while SIPO’s was developed in-house. MT Provider Sirius (Commercial Service Provider) Toshiba (Commercial Service Provider) Languages supported • Korean ↔ English • Japanese → Korean Service • K-PION: Korean patent-utility model gazettes and examination information in English • KOMPASS: English/Japanese documents in Korean targeting KIPO examiners • KIPRIS: Overseas documents targeting the Korean public (English/Japanese into Korean) • Japanese ↔ English • AIPN: Japanese patent information in English targeting oversea examiners • Japanese ↔ Chinese • IPDL: Japanese patent information in English for the public Chinese Patent Information • Chinese → English Center • CPMT (China Patent Machine Translation): free public service for translating specifications and claims of gazettes into English. 12/34 1 Introduction 2 KIPO’s Activities 2.1 MT Services 2.2 Patent Information Search 3 Global Efforts 4 Conclusion 13/34 2. KIPO’s Activities – MT Services Status of KIPO’s MT Services KOREAN J2K Translation J2K Translation Service Launched in 2000 PL / NPL written in Japanese for KIPO’s examiners PL written in Japanese for the general public ENGLISH JAPANESE 14/34 2. KIPO’s Activities – MT Services Status of KIPO’s MT Services K-PION Service KOREAN K2E Translation 37 IPOs K2E Translation Service Launched in 2005 For examiners of foreign IPOs and KIPO Korean patent documents ENGLISH JAPANESE 15/34 2. KIPO’s Activities – MT Services Status of KIPO’s MT Services K-PION Service KOREAN K2E Translation E2K Translation K2E Translation Service 37 IPOs Launched in 2005 For examiners of foreign IPOs and KIPO Korean patent documents E2K Translation Service ENGLISH Launched in 2008 JAPANESE PL/NPL written in English for KIPO’s examiners PL written in English for the general public 16/34 2. KIPO’s Activities – MT Services Specialized Machine Translation Services for Patent Documents To improve the quality of machine translation engines, the following issues have been considered: Linguistic features - Word order (Korean and Japanese have same word order → Subject + Object + Verb phrase; while for Chinese and English, it’s Subject + Verb phrase + Object.) - Letters (English, German, and French originated from Latin characters; while Korean, Japanese and Chinese have their own characters) Digitization of patent documents - Accuracy in digitizing patent documents through OCR greatly influences the quality of machine translations. 17/34 2. KIPO’s Activities – MT Services Specialized Machine Translation Services for Patent Documents To improve the quality of machine translation engines, the following issues have been considered: Building of a patent-specific terminology dictionary Service type ~2007 2008 2009 300,000 Total K2E 3,200,000 3,500,000 E2K 3,000,000 300,000 3,300,000 J2K 1,200,000 300,000 1,500,000 Use of markup documents such as XML - e.g., KIPO has published patent gazettes in XML since February 2005. 18/34 2. KIPO’s Activities – MT Services Methods of improving translation quality Korean Patent Gazette Features of Patent documents Abstract: usually a single long sentence and thus has a high possibility of error when machine translated Specification: brief explanation of the drawing is written in a simple sentence and the other parts, in general descriptive sentences. Claims: has a hierarchical tree structure made of independent and dependent claims. Written in a noun phrase 19/34 2. KIPO’s Activities – MT Services Methods of improving translation quality Features of Patent documents Name Others In XML documents, the tags help users to identify the different sections as described in the previous slide. Abstract, Summary Description Drawings Claims Different translation protocols depending on the tag information of the patent gazette 20/34 2. KIPO’s Activities – MT Services Example – Korean Patent Gazette XML of Korean Patent Gazette Application Server REQ_HNM_KE 1. Analyze XML Tag Information 오은영 → Oh Eun Young REQ_KE 본 발명은… → This invention… 2. Adjust appropriate translation protocol REQ_ABS_KE 본 발명은… → This invention… REQ_DRDES_KE 3. Translate 도1은 본 발명에.. → Drawing 1 is a… REQ_CLAIM_KE K2E Translation Server 21/34 폐피혁을 용매에.. → Methodology of… 2. KIPO’s Activities – MT Services Applicability to Patent Documents Produced by Other IPOs A consistent pattern depending on each item IPOs EPO USPTO JPO SIPO Abstract Description Short sentences of less than 150 words Low possibility of errors when translated since it is comprised of short sentences and general statements Summarized in less than 400 words “Brief description of drawings” is written in short sentences. The entire “Description” is comprised of general statements. Concise statement with a single sentence or described respectively “Brief description of drawings” is written in short noun phrases without commas or periods. Other parts of “Description” is written in general statements. Patterns distinguished in markup documents such as XML 22/34 Claims Tree structure with independent and dependent claims written in noun phrases or clauses 2. KIPO’s Activities – Patent Information Search Patent Information Search using MT engines To use MT engines for patent information search, the following issues have been considered: Target users and objectives of MT services - internal examiners or foreign examiners Building of a database - original documents or machine translated documents Users Machine Translator DB (Original docs.) Users DB (Machine translated docs.) Machine Translator DB (Original docs.) * In terms of cost-benefit analysis, the former is better for low frequency of using foreign docs. while the latter is better for high frequency of using foreign docs. Formulation of search queries (e.g., operators, terminology dictionary) Screen layout / organization 23/34 2. KIPO’s Activities – Patent Information Search KOMPASS (Korean Multifunctional Patent Search System) KOMPASS targets KIPO examiners and supports patent information search in English and Japanese. It conducts integrated search in Korean, English, and Japanese, respectively. Korean integrated search function targets Korean and Japanese documents (Japanese documents: database built from machine-translated documents) English integrated search function targets all kinds of data retrieved from English documents and the search results can be translated into Korean. Japanese integrated search function targets all kinds of data retrieved from Japanese documents and the search results can be translated into Korean (only for patents and utility models) 24/34 2. KIPO’s Activities – Patent Information Search KOMPASS (Korean Multifunctional Patent Search System) KOMPASS targets KIPO examiners and supports patent information search in English and Japanese. It conducts integrated searches in Korean, English, and Japanese, respectively. Korean integrated search function targets Korean and Japanese documents (Japanese documents: database built from machine-translated documents) English integrated search function targets all kinds of data retrieved from - Japanese gazettes were English documents and the search results can be translated into Korean. previously searchable through Japanese from machineintegrated translation.search function targets all kinds of data retrieved DB Japanese documents and the search results can be translated (Original (only Machine into Korean - Due to the rapid increase Users docs.) Translator for patents and utility models)of its use by KIPO examiners, the search speed has been getting slower. 25/34 2. KIPO’s Activities – Patent Information Search KOMPASS (Korean Multifunctional Patent Search System) KOMPASS targets KIPO examiners and supports patent information searches in English and Japanese. It conducts integrated searches in Korean, English, and Japanese, respectively. Korean integrated search function targets Korean and Japanese documents (Japanese documents: database built from machine-translated documents) English integrated search function targets all kinds of data retrieved from - English In 2009,documents for faster search, and theall search results can be translated into Korean. the Japanese gazettes were DBkinds of data retrieved from Japanese integrated search function targets(Machine all DB machine-translated and used to (Original Japanese documents and the search results can be translated into Korean (only translated Machine build a database. Users docs.) docs.) Translator for patents and utility models) - KIPO examiners’ convenience has been greatly improved. 26/34 2. KIPO’s Activities – Patent Information Search KOMPASS (Korean Multifunctional Patent Search System) KOMPASS targets KIPO examiners and supports patent information searches in English and Japanese. It conducts integrated searches in Korean, English, and Japanese, respectively. Korean integrated search function targets Korean and Japanese documents (Japanese documents: database built from machine-translated documents) English integrated search function targets all kinds of data retrieved from Korean Search English documents and the search results can be translated into Korean. Japanese integrated search function targets all kinds of data retrieved from Japanese documents and the search results can be translated into Korean (only for patents and utility models) Korean keyword search of Japanese documents (using J2K database) 27/34 2. KIPO’s Activities – Patent Information Search K-PION (Korean Patent Information Online Network) K-PION is a free search service for helping foreign examiners better understand Korean patent information (examinations, gazettes etc). It also supports an English keyword search service. service for retrieving Korean patent and utility model gazettes and examination information from original and machine-translated documents an English keyword search service for KPAs service for Korean industrial designs and trademarks including PCT related documents an English keyword search service for Korean patent and utility model gazettes Applicant Foreign Examiners Translate Search results into English Search Korean gazettes K-PION Patent Information Retrieval Input English Keywords Automatically translated into Korean Keywords 28/34 Extended to Korean synonyms 1 Introduction 2 KIPO’s Activities 3 Global Efforts 4 3.1 IP5 Foundation Project on Mutual Machine Translation 3.2 Cross-Lingual Information Retrieval Conclusion 29/34 3. Global Efforts IP5 Foundation Project on Mutual Machine Translation IP 5 offices will improve the quality of machine translation (MT) services and harmonize MT services among themselves. Achieved by: (Improvement of the quality of MTs) • Joint quality review of non-English to English MTs by English speaking Offices • MT system upgrade based on the quality review results • Reduction of errors in original documents (Harmonization of MT services) • Harmonization of the contents of MT services Regarding searches, this project will help each office to better understand the prior art documents of other offices and to use them in citations 30/34 3. Global Efforts WIPO’s CLIR (Cross-Lingual Information Retrieval) CLIR has been newly added to the PATENTSCOPE and the beta version is currently under test by the public. When searching PCT and national application data, inputted keywords can be extended into other languages such as English, French, German, Japanese, and Spanish. Linked to Google translation service; search results are available in all the languages it supports. Available in over 1.7 million published international patent applications (PCT) and in more than 3 million when patent documents from Regional and National collections are included. 31/34 1 Introduction 2 KIPO’s Activities 3 Global Efforts 4 Conclusion 32/34 4. Conclusion Considering the tremendous amount of global patent information, machine translation services will be the most practical and efficient way to search patent information of other IPOs. There are many ways to implement a patent search system using an MT engine. In selecting a specific methodology, each IPO should consider the frequency of use, budget, and linguistic features. For improving the performance of MT and search systems, each IPO may consider some options such as building of a machine-translated database, patent-specific terminology dictionary, and state-of-the-art IT technologies such as XML. International cooperation among IPOs is very important for the improvement of MT quality. KIPO has done its utmost in order to overcome language barriers and enable non-Korean speakers to better access Korean patent information. KIPO will continue to collaborate with other IPOs in this regard. 33/34 E-mail: daeshik@kipo.go.kr 34/34