What is happening at NTCIR Thanks for Teruko Mitamura, Tetsuya Sakai, Fred Gey, Yohei Seki, Daisuke Ishikawa, Atsushi Fujii, Hidetsugu Nanba, Terumasa Ehara for preparing slides Noriko Kando National Institute of Informatics, Japan http://research.nii.ac.jp/ntcir/ kando@nii. ac. jp ntcir at clef 2010-09-21 Noriko Kando 1 NTCIR: NII Test Collection for Information Retrieval Research Infrastructure for Evaluating IA A series of evaluation workshops designed to enhance research in information-access technologies by providing an infrastructure for large-scale evaluations. ■Data sets, evaluation methodologies, and forum Project started in late 1997 Once every 18 months Data sets (Test collections or TCs) Scientific, news, patents, and web Chinese, Korean, Japanese, and English Tasks (Research Areas) IR: Cross-lingual tasks, patents, web, Geo QA:Monolingual tasks, cross-lingual tasks Summarization, trend info., patent maps Opinion analysis, text mining Community-based Research Activities ntcir at clef 2010-09-21 Noriko Kando 2 Tasks at Past NTCIRs NTCIR 1 2 3 4 5 6 7 8 '99 '01 '02 '04 '05 '07 '08 '09User Generated ■ Community QA Contents ■ ■ ■ Opinion Analysis ■ ■ Cross-Lingual QA + IR Module-Based IR for Focused Domain ■ Geo Temporal ■ ■ ■ ■ □ Patent ■ ■ ■ Complex/ Any Types Question Answering Summarization / Consolidation Web Crosslingual Retrieval Text Retrieval ■ ■ Dialog ■ ■ ■ ■ Cross-Lingual Factoid, List ■ ■ ■ ■ ■ ■ ■ ■ ■ Text Mining / Classification Trend Info Visualization ■ ■ ■ ■ ■ ■ ■ ■ ■ Text Summarization Web ■ ■ Statistical MT ■ ■ ■ ■ ■ ■ ■ ■ Cross-Lingual IR ■ ■ ■ ■ ■ ■ ■ ■ Non-English Search ■ ■ ■ ■ ■ ■ ■ ■ Ad Hoc IR, IR for QA The Years the meetings were held. The tasks started 18 months before ntcir at clef 2010-09-21 Noriko Kando 3 NTCIR-8 Tasks (2008.07—2009.06) rd Int’l W on Evaluating The Information Access (EVIA ) refereed 3 1. Advanced CL Info Access - QA(CsCtJE->CsCtJ) ※Any Types of Questions GeoTime (E, (CsCtJE->CsCtJ J) Geo Temporal Information - IR for QA ) 2. User Generated Contents (CGM) - Opinion Analysis (Multilingual News) (Blog) - [Pilot] Community QA (Yahoo! Chiebukuro) 3. Focused Domain:Patent -Patent Translation; English -> Japanese New New ※Statistical MT, The World-Largest training data (J-E sentence alignment), Summer School, Extrinsic eval by CLIR -Patent Mining papers -> IPC -Evaluation of SMT ntcir at clef 2010-09-21 New Noriko Kando 4 NTCIR-7 & -8 Program Committee Mark Sanderson, Doug Oard, Atsushi Fujii, Tatsunori Mori, Fred Gey, Noriko Kando (and Ellen Voorhees, Sung Hyun Myaeng, Hsin-Hsi Chen, Tetsuya Sakai) ntcir at clef 2010-09-21 Noriko Kando 5 NTCIR-8 Coordination • NTCIR-8 is coordinated by NTCIR Project at NII, Japan. The following organizations contribute to the organization of NTCIR-8 as Task Organizers - Academia Sinica - Carnegie Mellon Univ - Chinese Academy of Science - Hiroshima City University - Hitachi, Co Ltd. - Hokkai Gakuen University - IBM - Microsoft Research Asia - National Institute of Information and Communication Technology - National Institute of Informatics - National Taiwan Univ ntcir at clef 2010-09-21 -National Taiwan Ocean Univ - Oki Electonic Co. - Tokyo Institute of Technology - Tokyo Univ - Toyohashi Univ of Technology and Science - Univ of California Barkeley -Univ of Tsukuba - Yamanashi Eiwa College - Yokohama National University Noriko Kando 6 [GeoTime] Dublin City Univ Carnegie Mellon Univ Hokkaido Univ Dalian Univ of Technology INESC-ID, Porugal National Taiwan Ocean Univ International Inst of Technology, Shenyan Institute of Hyderbad Aeronautical Engineering Kieo Univ Univ of Tokushima Nataional Inst of Materials Science Wuhan Univ Osaka Kyoiku Univ Univ California, Berkeley [IR4QA] Univ of Iowa Carnegie Mellon Univ Univ of Lisbon Chaoyang Univ of Technology Yokohama City Univ Dalian Univ of Technology Dublin City Univ [MOAT] Inner Mongolia Univ Beijing Uni of Posts and Queensland Univ of Telecommunications Technology Chaoyang Univ of Technology Shenyan Inst of Aeronautical Chinese Univ of HK+ Tsinghua Univ Engineering City Univ of Hong Kong (2 groups) Trinity College Dublin Hong Kong Polytechnic Univ Univ California, Berkeley KAIST Wuhan Univ National Taiwan Univ Wuhan Univ (Computer NEC Laboratories China School) Peking Univ Wuhan Univ of Science and Pohang Univ of Sci and Tech Technology SICS Toyohashi Univ of Technology Univ of Alicante Univ of Neuchatel NTCIR-8 Active Yuan Ze Univ Participants ntcir at clef 2010-09-21 Noriko Kando [CCLQA] [Patent Mining] Hiroshima City Univ Hitachi, Ltd. IBM Japan, Ltd. Institute of Scientific and Technical Information of China KAIST National Univ of Singapore NEC Shanghai Jiao Tong Univ Shenyang Institute of Aeronautical Engineering Toyohashi Univ of Technology Univ of Applied Sciences - UNIGE [Patent Translation] Dublin City University, CNGL Hiroshima City University Kyoto University NiCT Pohang Univ of Sci and Tech tottori university Toyohashi University of Technology Yamanashi Eiwa College [Community QA] Microsoft Research Asia National Institute of Informatics Shirayuri College 7 Complex Cross-lingual Question Answering (CCLQA) Task Different teams can exchange and create a “dream-team” QA system Small teams that do not possess an entire QA system can contribute IR and QA communities can collaborate http://aclia.lti.cs.cmu.edu/ntcir8/ ntcir at clef 2010-09-21 Noriko Kando 8 Evaluation Topics – any types of questions Type # Example Question Related Past NTCIR Task DEFINITION 10 What is the Human Genome Project? ACLIA BIOGRAPHY 10 Who is Howard Dean? ACLIA RELATIONSHIP 20 What is the relationship between Saddam Hussein and Jacques Chirac? ACLIA EVENT 20 What are the major conflicts between India and China on border issues? ACLIA WHY 20 Why doesn't U.S. ratify the Kyoto Protocol? QAC-4 PERSON 5 Who is the Finland's first woman president? QAC 1-3, CLQA 1,2 ORGANIZATION 5 What is the name of the company that produced the first Fairtrade coffee? QAC 1-3, CLQA 1,2 LOCATION 5 What is the name of the river that separates North Korea from China? QAC 1-3, CLQA 1,2 DATE 5 When did Queen Victoria die? QAC 1-3, CLQA 1,2 9 ntcir at clef 2010-09-21 Noriko Kando CT/JA-T IR4QA run rankings ntcir at clef 2010-09-21 Mean Q Mean nDCG Mean Q Mean nDCG Noriko Kando 10 CCLQA Human Evaluation Preliminary Results: JA-JA JA-JA Runs ALL LTI-JA-JA-01-T 0.1069 LTI-JA-JA-02-T 0.1443 LTI-JA-JA-03-T 0.1438 JA-JA automatic evaluation JA-JA Runs ALL LTI-JA-JA-01-T 0.2024 LTI-JA-JA-02-T 0.2259 LTI-JA-JA-03-T 0.2252 11 at clef 2010-09-21 ntcir Noriko Kando Effect of Combination; IR4QA+CCLQA JA-JA Collaboration Track: F3 score based on automatic evaluation CCLQA LTI IR BRKLY-JA-JA-01-DN 4 BRKLY-JA-JA-02-T Q BRKLY-JA-JA-03-DN A BRKLY-JA-JA-04-DN BRKLY-JA-JA-05-T 12 at clef 2010-09-21 ntcir 0.2934 0.2686 0.2074 0.3000 0.2746 Noriko Kando NTCIR-GEOTIME GEOTEMPORAL INFORMATION RETRIEVAL (New Track in NTCIR Workshop 8) Fredric Gey and Ray Larson and Noriko Kando Relevance judgments system by Jorge Machado and Hideki Shima Evaluation : Tetsuya Sakai Judgments: U Iowa, U Lisbon, U California Barkelay, NII Search with a specific focus on Geography + To distinguish from past GIR evaluations, we introduced a temporal component Asian language geographic search has not previously been evaluated, even though about 50 percent of the NTCIR-6 Cross-Language topics had a geographic component (usually a restriction to a particular country). ntcir at clef 2010-09-21 Noriko Kando 13 NTCIR-GeoTime PARTICIPANTS • Japanese Runs submitted by eight groups (two anonymous pooling supporters) Team Name Organization Anonymous Anonymous BRKLY University of California, Berkeley, USA FORST Yokohama National University, JAPAN HU-KB Hokkaido University, JAPAN KOLIS Keio University, JAPAN ANON2 Anonymous submission, group 2 M National Institute of Materials Science, JAPAN OKSAT Osaka Kyoiku University, JAPAN ntcir at clef 2010-09-21 Noriko Kando 14 NTCIR-GeoTime PARTICIPANTS • English Runs submitted by six groups Team Name Organization BRKLY University of California, Berkeley, USA DCU Dublin City University, IRELAND IITH International Institute of Technology, Hyderbad, INDIA INESC National Institute of Electroniques and Computer Systems, Lisbon, PORTUGAL UIOWA University of Iowa, USA XLDB University of Lisbon, PORTUGAL ntcir at clef 2010-09-21 Noriko Kando 15 NTCIR-GeoTime Approached • BRKLY: baseline approach, probablistic + psued relevance feedback •DCU, IITH, XLDB (U Lisbon) : geographic enhancements •KOLIS (Keio U) : counting the number of geographic and temporal expression in top-ranked docs in initial search, then re-rank •FORST (Yokohama Nat U): utilize factoid QA technique to question decomposition •HU-KB (Hokkaido U), U Iowa: Hybrid approach combinging probablistic model and weighted boolean query formulation ntcir at clef 2010-09-21 Noriko Kando 16 NTCIR-GeoTime: ENGLISH TOPIC DIFFICULTY by Average Precision Per-topic AP, Q and nDCG averaged over 25 English runs for 25 topics sorted by topic difficulty (AP ascending) 1 0.9 0.8 0.7 0.6 AP Q 0.5 nDCG 0.4 0.3 0.2 0.1 0 21 18 22 15 10 12 20 5 11 25 17 4 24 2 14 9 16 23 13 6 8 19 3 7 1 Most Difficult English topic (21): When and where were 17 Noriko Kando the 2010 Winter Olympics host city location announced? ntcir at clef 2010-09-21 NTCIR-GeoTime: JAPANESE TOPIC DIFFICULTY by Average Precision Per-topic AP, Q and nDCG averaged over 34 Japanes runs for 24 topics sorted by topic difficulty (AP ascending) 0.9 0.8 0.7 0.6 0.5 AP 0.4 Q 0.3 nDCG 0.2 0.1 0 1814 25222012 15231311 24 4 2 8 191016 5 21 6 1 9 3 7 Most Difficult Japanese topic 18: What date was a ntcir at clef 2010-09-21 country was invaded by Noriko the Kando United States in 2002? 18 NTCIR-GeoTime: TOPIC per TOPIC Analysis on Japanese 1.00 HU-KB-JA-JA03-D 0.80 KOLIS-JA-JA04-D FORST-JA-JA04-D nDCG 0.60 BRKLY-JA-JA02-D 0.40 0.20 0.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 18 19 20 21 22 23 24 25 Topic ID ntcir at clef 2010-09-21 (#17 is missing) Noriko Kando 19 NTCIR-GeoTime CHALLENGES • Geographic reference resolution is difficult enough, but • More difficult to process temporal expression (“last Wednesday”) references •Can indefinite answers be accepted (“a few hours”)? •Need Japanese Gazetteers •Need NE annotated corpus for further refinement ntcir at clef 2010-09-21 Noriko Kando 20 Multilingual Opinion Analysis (MAOT) CO2 Reduction? Lehman shock? ntcir at clef 2010-09-21 Noriko Kando 21 Opinion Question List ID N01 (N03) N04 N05 N06 N07 N08 N11 N13 N14 N16 N17 Opinion Question What negative prospects were discussed about the Euro when it was introduced in January of 2002? (What reasons were discussed about Bomb Terror in Bali Island in October, 2002?) What reasons have been given for the Space Shuttle Columbia accident in February, 2002? What negative comments were discussed about Bush's decision to start Iraq war in March, 2003? What negative prospects and opinions were discussed about SARS which started spreading in March, 2003? What reasons are given for the blackout around North America in August, 2003? What reasons and background information was discussed about the terrorist train bombing that happened in Madrid in March, 2004? Why did supporters want to elect George W. Bush in the November 2004 American Presidential Election? What positive comments were discussed to help the victims from earthquake and tsunami in Sumatera, Indonesia in December, 2004? What objections are given for the US opposition to the Kyoto Protocol that was enacted in February 2005? What reasons have been given for the anti-Japanese demonstrations that took place in April, 2005 in Peking and Shanghai in China? In July 2005 there were terrorist bombings in London. What reasons and background were given, and what controversies were discussed? N18 What actions by President George Bush were criticized in response to Hurricane Katrina's August 2005 landing? N20 What negative opinions and discussion happened about the Bird Flu that started spreading in October, 2005? Identify opinions that indicate that Arnold Schwarzenegger is a bad choice to be elected the new governor of California in the October 2003 election. Find positive opinions about the reaction of Nuclear and Industrial Safety Agency officials to the Mihama nuclear powerplant accident in August 2004. What were the advantages and disadvantages of the direct flight between Taiwan and Mainland China commercially? What are good and bad approaches to losing weight? What are complaints about XIX Olympic Winter Games that were held in and around Salt Lake City, Utah, United States in 2002? What are the comments about China's first manned space flight which happened successfully in October 2003? What negative comments were discussed when in April 2004 CBS made public pictures showing cruel U.S. military abuse of Iraqi prisoners of war? N24 N26 N27 N32 N36 N39 N41 ntcir at clef 2010-09-21 Noriko Kando 22 Effective approaches • The following teams attained good results with accomplished feature filtering, hot machine learning, and rich lexicon resources. TeamID UNINE PKUTM CityUHK Lang EN SC TC Feature Filtering Macihne Learing Z score logistic regression Iterative classifier SVM (better than NB, ME, DT) Supervised Lexicon Ensemble ntcir at clef 2010-09-21 Noriko Kando Lexicon Resource SentiWordNet In House, NTU, and Jun LI's lexicon NTUSD, LCPW, LCNW, CPWP, SKPI 23 Community QA Pilot Task Rank all posted answers by answer quality (as estimated by system) for every question. Training Best Answer: Questioner selected Test Test collection for CQA: 1500 questions Yahoo Chiebukuro data version 1.0: 3 million questions 1500 questions selected at random http://research.nii.ac.jp/ntcir/ntcirws8/yahoo/index-en.html ntcir at clef 2010-09-21 24 Good Answers: 4 assessors individually assessed 4 university students Noriko Kando GA-Hit@1 not useful 1 0.9 GA-nDCG and Q similar 5th run from MSRA 0.8 used the BA info for test 0.7 Qs directly so does not represent 0.6Practical performance In general, BA-Hit@1 and GA metrics agree with each other. 0.5 0.4 0.3 0.2 0.1 BA-Hit@1 GA-Hit@1 GA-nG@1 GA-nDCG GA-Q Runs sorted by GA-nG@1 ntcir at clef 2010-09-21 Noriko Kando 25 120 #questions 100 LOVE is HARD according to BA! Systems can’t find the asker’s BAs! BA- 80 60 easy medium 40 hard 20 0 ntcir at clef 2010-09-21 Noriko Kando 26 120 #questions 100 80 LOVE is EASY according to GA! GA-nG@1 Systems can find many good answers that are not BA! (BA-evaluation not good enough for social questions) 60 easy medium 40 hard GA-based evaluation alleviates the bias and nonexhaustiveness problems of BA. GA-based evaluation is more discriminative - detects system differences that BA overlooks. 20 0 ntcir at clef 2010-09-21 Noriko Kando 27 Goal: Automatic Creation of technical trend maps from a set of research papers and patents. Effect 1 Technology [AAA 1993] 1 [US Pat. XXXXX] Technology 2 Technology [CCC 2000] 3 Effect 3 Effect 2 [BBB 2002] [DDD 2001] [US Pat. YY-YYY] [US Pat. ZZZZZ] [US Pat. WWW] Research papers and patents are classified in terms of elemental technologies and their effects. ntcir at clef 2010-09-21 Noriko Kando 28 Evaluation Subtask 1 (Research Paper Classification) Metrics: Mean Average Precision (MAP) • k-NN based approach is superior to machine learning approach. • Re-ranking of IPC codes is effective. Subtask 2: Technical Trend Map Creation Metrics: Recall, Precision, and F-measure • Top systems employed CRF, and the following features are effective. – Dependency structure – Document structure – Domain adaptation ntcir at clef 2010-09-21 Noriko Kando 29 History of Patent IR at NTCIR • NTCIR-3 (2001-2002) – Technology survey • Applied conventional IR problems to patent data 2 years of JPO patent applications * JPO = Japan Patent Offic • NTCIR-4 (2003-2004) 5 years of JPO – Invalidity search patent applications Both document sets were • Addressed patent-specific published in 1993-2002 IR problems • NTCIR-5 (2004-2005) 10 years of JPO – Enlarged invalidity search patent applications • NTCIR-6 (2006-2007) – Added English patents ntcir at clef 2010-09-21 10 years of USPTO patents granted * USPTO = US Patent & Trademark Offi Noriko Kando 30 Extrinsic E-J: BLEU & IR measures 0.9 MAP & Recall@N 0.8 MAP,Recall@ 0.7 Recall@1000 (R = 0.77) Recall@500 (R = 0.84) Recall@200 (R = 0.86) Recall@100 (R = 0.86) MAP (R = 0.77) 0.6 0.5 Recall@100,200 are highly correlated with BLEU (R = 0.86) 0.4 0.3 0.2 0.1 12.00 14.00 16.00 18.00 20.00 22.00 24.00 26.00 BLEU ntcir at clef 2010-09-21 Noriko Kando 31 Overall structure General Chairs Evaluation Task Chairs + PCs Core Challenge Task Grand Challenge Task EVIA Chairs + PCs ??? New NTCIR Structure NTCIR general chairs: Noriko Kando (NII), Eiichiro Sumita (NICT), Tsuneaki Ka NTCIR evaluation chairs: Hideo Joho (U of Tsukuba) Tetsuya Sakai (MSRA) Task Selection Committee Grand challenges Pilot task Pilot task Core challenge ta Core challenge ta EVIA chairs: Mark Sanderson (RMIT) William Webber (U EVIA Program + 1 of Melbourne) Committee Refereed papers on Evaluating Information Access Tasks accepted for NTCIR-9 CORE TASKS • Intent (with One-Click Access subtask) • Recognizing Natural Language Inference • Geotemporal information retrieval (GeoTime2) • IR for Spoken Documents PILOT TASKS • Cross-lingual Link Discovery • Evaluation Framework of Interactive Information Access • Patent Machine Translation (PATMT) INTENT (w 1CLICK) task (CS, J) Organisers: Min Zhang, Yiqun Liu (Tsinghua U), Ruihua Song, Tetsuya Sakai, Youngin Song (MSRA), Makoto Kato (Kyoto U) Harry Potter • Subtopic mining book Find different intents given an book book ambiguous/underspecified query movie • Ranking Selectively diversity Web search results • One click access (1CLICK) Satisfy information need with the system’s first result page (X-byte text, not document list) No need to click after clicking on SEARCH Recognizing Natural Language Inference (CS, CT, J) Organisers: Hideki Shima, Teruko Mitamura (CMU), Chuan-Jie Lin (NTOU), Cheng-Wei Lee (Academia Si • YES/NO task Does text 1 entail text 2? text 1 ⇒ text 2 • Main task Given text 1 and text 2, choose from (1) Forward entailment (2) backward entailment (3) equivalent (4) contradict (5) independent • Extrinsic evaluation via CLIA will be conducted GeoTime2 (E, J, K?) Organisers: Fred Gey, Ray Larson (UCB), Noriko Kando (NII) • Second round of ad hoc IR for WHEN and WHERE • GeoTime1 topic: How old was Max Schmeling when he died, and where did he die? At GeoTime1, docs that contain both WHEN and WHERE were treated as relevant; Those that contain only WHEN or only WHERE info were treated as partially relevant. Can we do better? Can we create more realistic IR for Spoken Documents (J) Organisers: Tomoyosi Akiba, Seiichi Nakagawa, Kiyoaki Aikawa (Toyohashi U of Technology), Yoshiaki Itoh (Iwate Prefectural U), Tatsuya Kawahara (Kyoto U), Xinhui Hu (NICT), Hiroaki Nanjo (Ryukoku University), Hiromitsu Nishizaki (U of Yamanashi), Tomoko Matsui (Institute of Statistical Mathematics), Yoichi Yamashita (Ritsumeikan U) • Handling spontaneous speech data (spoken lectures) • Spoken term detection Find the position of a given query term within SD • Spoken document retrieval Find passages in SD data that match a given query Reference speech recognition results provided (non-speech people can easily participate) Cross-lingual Link Discovery (E>C,K,J) Organisers: Ling-Xiang Tang, Shlomo Geva, Andrew Trotman, Yue Xu, (Queesland U of Technology), Andrew Trotman (U of Otago) INEX Linking and NTCIR! • Given an English Wikipedia page, (1)Identify anchors; and (2)For each anchor, provide a list of C/K/J documents to be linked 太陽 Solar eclipse (1) sun (2) 日食 Corresponding Lunar eclipse W entry 太陽 月食 月食 Evaluation Framework of Interactive Information Access (J) Organisers: Tsuneaki Kato (U Tokyo) and Mitsunori Matsushita (Kansai U) • Explore evaluation of interactive and exploratory information access • Shared modules, shared interfaces between modules • Interactive complex question answering subtask (visual/textual output) • Trend information summarization subtask (e.g. cabinet support rate change) PATMT (C-E, J-E, E-J) Organisers: Benjamin Tsou, Bin Lu (CUHK), Isao Goto (NICT) • C-E MT (new at NTCIR-9), J-E MT, E-J MT Manual and automatic evaluation (BLEU, NIST) • PATMT@NTCIR-10 will include extrinsic High Low Human MAP BLEU evaluation by CLIR rating High • Lessons from PATMT@NTCIR-7,8: with multiple reference translations (RTs) TALK OUTLINE • • • • About myself Looking Back NTCIR NTCIR-5,6 CLIR (as a participant) NTCIR-7,8 ACLIA NTCIR-8 GeoTime Looking Forward NTCIR Grand Challenges NTCIR-9 Tasks Summary TALK SUMMARY • Looking back (1999-2010) - More teams are now using straight MT for CLIR. Often over 90% of monolingual performance. What are CLIR researchers doing now? CLIA? • Looking forward (2010-) - Grand challenges [tentative] = NN2S + E2E = No Need To Search + Easy To Explore. - Multilinguality will still be important especially in the system2user direction. Please participate in NTCIR-9 tasks! NTCIR-9 Important Dates (tentative) 1st Oct 2010 first call for participation Winter-Autumn 2011 tasks run November 2011 camera-ready papers due December 2011 NTCIR-9@NII, Tokyo Watch http://research.nii.ac.jp/ntcir Thanks Merci Danke schön Gracie Gracias Ta! Tack Köszönöm Kiitos Terima Kasih Khap Khun Ahsante Tak 謝謝 ありがとう http://research.nii.ac.jp/ntcir/ Will be moved to: http://ntcir.nii.ac.jp/ ntcir at clef 2010-09-21 Noriko Kando 45