Beyond Paper Dictionaries: Mining the Web for Technical Terminology in Chinese Paper prepared for the Translation Teacher’s Certificate of the Consortium for the Training of Translation Teachers author: Nicky Harman Imperial College, University of London n.harman@ic.ac.uk date: 28th March 2002 APPENDICES APPENDIX 1 – THE KINGSOFT CIBA2001 DICTIONARY COLLECTION APPENDIX 2 – WORLD DIGESTION MEDICAL GLOSSARY APPENDIX 3 - HIV/AIDS ENGLISH-CHINESE GLOSSARY APPENDIX 4 - ENGLISH-CHINESE GLOSSARY OF ELECTORAL TERMS APPENDIX 5 - ENGLISH-CHINESE GLOSSARY OF LEGAL TERMS APPENDIX 6 – ENGLISH-CHINESE GLOSSARY OF COMPUTING TERMS APPENDIX 7 – EXCERPT FROM CHINESE GOVERNMENT URL ON 绿色食品标志 [GREEN FOOD LABELLING] APPENDIX 8 – THE SOIL ASSOCIATION DEFINITION OF “ORGANIC” APPENDIX 9 – “GREEN FERTILIZER” APPENDIX 10 - BIOSAFETY APPENDIX 11 - HOW TO USE SEARCH ENGINES IN CHINESE WEB MINING Which search engine? Since even those US and UK engines without a .cn domain (such as Altavista, Google and the metasearch engine, Copernicus), are successful in locating a wide range of Chinese-based, Chinese language URLs, the translator has a choice between a great variety of search engines. A metasearch engine like Copernicus from www.copernic.com (downloadable freeware version is called Copernicus 2001 Basic) has the advantage of covering a number of different search engines in a matter of minutes. It provided me with many excellent hits, for example medical material in Chinese on AIDS/HIV. Its disadvantage is that the basic version will not allow you to narrow the search down by specifying dates, language or domains. Other search engines will allow you to refine your search in various ways. My personal favourite is the Google Advanced Search facility, which includes all Boolean queries, as well as allowing you to specify, amongst other things, the language of search, the domain (for instance .cn or .hk) or even specific websites - .edu.hk for Hong Kong universities; .gov.cn for Chinese government sources. By a combination of the above methods and routes and sometimes, indeed, just by chance, I have found a number of useful terminology resources. In the table which follows, I have summarised the different syntax used by search engines and given examples of how to write queries. SEARCH ENGINE SYNTAX FOR THE QUERY COMMENTS www.altavista.com anchor:english AND anchor:chinese AND [source word/topic] example: anchor:english AND anchor:chinese AND autism The query must be entered on the Advanced Search page. This will look for pages which have link buttons labelled with the specified language, indicating that the [English] version of that page is to be found there. Altavista will not accept Chinese characters in the query box, so this method only works in the English to Chinese direction. To find the word required quickly, go to the Edit function on the browser toolbar, and input the word in the Find, on this page box. www.yahoo.com chinese AND [source word/topic] example: chinese AND autism The query must be entered on the Advanced Search page. Yahoo will not accept the syntax “anchor” or similar. Yahoo will not accept Chinese characters in the query box, so this method only works in the English to Chinese direction. www.copernic.com chinese [or China] [space] [source word/topic] Copernic will not accept the syntax “anchor” or similar. Copernic will not accept Chinese characters in the query box, so this method only works in the English to Chinese direction. www.yahoo.com.cn [source word] [space] [target language] entered into the general query box. www.yahoo.com.cn does not have an Advanced page, but the general query will accept two words as a Boolean query AND. www.google.com Example:艾滋病 english Note that the useful Find/on this page function will work equally well with Chinese characters. [source word] [space] [target language] entered into the “with all words” query box. Example: 艾滋病 english In the “languages” box, specify Chinese Simp. or Trad. You must specify Chinese as the language of search if you are searching on a word in Chinese, even if you are looking for pages in English, otherwise it will corrupt the Chinese characters on searching. The query must be entered on the Advanced Search page. Google will not accept the syntax “anchor” or similar. It is not necessary to input Boolean query words like AND. If you do so, Google will tell that it automatically does a Boolean search and has discounted that word. The domain box is very useful, for example requesting .edu.hk or edu.cn will take you to the academic URLs of China and Hong Kong. It will only accept one domain at a time, however. In addition, be aware that you need to specify the correct kind of Chinese (Simp/Trad) in the languages box for a particular domain, for example, C hinese Traditional, not Simplified, for Taiwan, otherwise you will get unrepresentative results, or none at all. Use the Find/on this page function as described.