Workshop on Corpus-Based Natural Language Processing 17 – 31 December 2001 Organised by AU-KBC Research Centre, Anna University, Chennai Jointly with Language Technology Research Centre, IIT, Hyderabad National Centre for Software Technology, Mumbai RCILTS-Tamil, Anna University, Chennai Tamil University, Thanjavur The workshop focussed on introducing the participants to the emerging techniques in machine processing of natural languages, which are based on language corpora. It emphasized mainly on Statistical Processing and Machine Translation. The main Workshop was preceded by a three-day Preparatory Workshop meant for participants not having the required background. Faculty of the Workshop Eminent persons both from India and abroad, who are working in this field for a long time now, conducted the workshop. Their role included giving lectures, acting as guides for the projects that were taken up during the workshop and motivating the participants to take up further research in NLP. The list of faculty members includes: Prof. Aravind K Joshi University of Pennsylvania U.S.A Dr. B. Srinivas AT&T Research Labs New Jersey. Dr. Anoop Sarkar University of Pennsylvania U.S.A Prof. Rajeev Sangal Director, LTRC Indian Institute of Information Technology Hyderabad. Mr. Durgesh D Rao Research Scientist, KBCS National Centre for Software Technology Mumbai. 1 Lecture Topics Various lectures covering the broad spectrum of Corpus-based Natural Language processing were delivered during the workshop that includes, 1.Finite State Machines 2.Finite State Transducers 3.Chunking 4.Tagging 5.Statistical Parsing 6.Language Models 7.Hidden Markov Model 8.Tree Adjoining Grammar Formalism 9.Paninian Grammar Formalism 10.Information Retrieval 11.Classifiers 12.Clustering 13.Latent Semantic Indexing 14.Word Sense Disambiguation 15.Machine Translation models 16.Statistical Machine Translation The workshop was organized in a way that, lectures were given during the morning sessions and in the afternoon sessions, participants were working in the projects assigned to them. The lectures topics were picked in such a way so as to cover the techniques used in Statistical Natural Language Processing and as well as to make the participant to understand the underlying concepts to enable him to take up individual research work in his area of interest. List of Participants A total number of 40 persons with different academic and language background, profession participated in the workshop. 1.Ananthakrishnan 2.Arun C H 3.Bhadran V K 4.Chenna Kesava Murthy M 5.Deepa Gupta 6.Dhanabalan T 7.Dija S 8.Elangaiyan R 9.Jayaprasad Hedge 10.Kamakshi S 11.Lehal G S 12.Malliga P 13.Manjula D NCST, Mumbai. NMC College, Marthandam. ERDC, Thiruvananthapuram. University of Hyderabad, Hyderabad. IIT, Delhi Anna University, Chennai. ERDC, Thiruvananthapuram. CIIL, Mysore. NCST, Mumbai. Tamil University, Thanjavur. Punjabi University, Patiala. Anna University, Chennai Anna University, Chennai 2 14.Meenakshi 15.Mona Parakh 16.Niva Das 17.Nobby Varghese 18.Prakash Rao 19.Pranjali Kanade 20.Ravikumar K E 21.Samir Kumar Borgohain 22.Santhosh T Varghese 23.Satheesh Kumar K 24.Shivshankar B Nair 25.Srinivasan C J 26.Sriram V 27.Subramaniam A 28.Tangirala Papi Reddy 29.Vamshi Krishna A 30.Varanasi Kiran Babu 31.Vibhav Agarwal 32.Vinish Jain 33.Vivek Mehta 34.Arulmozi S 35.Baskaran S 36.Kumarashanmugam B 37.Ramanan S V 38.Ramesh Kumar S 39.Thiyagarajan S 40.Visvanathan S Chennai. IIIT, Hyderabad IIIT, Hyberabad. ERDC, Thiruvananthapuram. IIIT, Hyberabad. IIIT, Hyberabad. Chennai. IIT, Guwahati. ERDC, Thiruvananthapuram. ERDC, Thiruvananthapuram. IIT, Guwahati. Amrita Institute of Technology, Coimbatore. IIIT, Hyberabad. Mozhi, Chennai. IIIT, Hyberabad. IIIT, Hyberabad. IIIT, Hyberabad. IIIT, Hyberabad. IIIT, Hyberabad. NCST, Mumbai. AU-KBC, Chennai AU-KBC, Chennai AU-KBC, Chennai AU-KBC, Chennai AU-KBC, Chennai AU-KBC, Chennai AU-KBC, Chennai Background of Participants Educational Background Others 8% Exposure to NLP Linguistics 18% Prior Knowledge 33% Nil 67% Comp. Sci. 74% Projects 3 Executing Group Projects by participants constituted an integral part of the workshop. The idea is to enable the participants to continue the work on their projects even after they go back to their institutions. There were 11 groups with each group working on one project and the faculty members of the Workshop guided these projects. The following were the Projects taken up during the workshop: 1.Morphological Analyzer for Tamil 2.Rule based Parser Tamil Telugu 3.Statistical Parser for Hindi 4.Document Classifier 5.Document retrieval 6.Word Sense Disambiguation: Mapping of English senses to Indian languages 7.Named Entity Recognizer 8.Terminology Extraction from Medical Literature 9.Example Based Machine Translation English - Hindi English - Tamil 10.Super Tags for Indian Languages Hindi Tamil 11.Cross-linking of bilingual dictionaries Hindi - Telugu Hindi - Tamil Hindi - Kannada Hindi - Marathi Feedback from the Participants About Workshop About Projects 0 100% 100 74 79 90% 80% 79 Participants Rating 80 75 60 40 20 0 Content Presentation 32 24 32 70% 60% 50% 40% 30% 20% 100 68 76 68 10% 0% Guidance Int eract ion Tools Provided Comput ing Facilit y Preparatory Workshop Main Workshop Adequate Not Adequate 4 About Arrangements 95 93 90 Rating 85.4 85 84 83 80 78 75 70 Classroom Lab Handouts Food Admin/Support Staff Overall Opinion Usefulness No 3% Duration of the Workshop Too long 3% 5 Yes 97% Correct 97%