Corpus-Based NLP Workshop - AU

advertisement
Workshop on Corpus-Based Natural Language Processing
17 – 31 December 2001
Organised by
AU-KBC Research Centre, Anna University, Chennai
Jointly with
Language Technology Research Centre, IIT, Hyderabad
National Centre for Software Technology, Mumbai
RCILTS-Tamil, Anna University, Chennai
Tamil University, Thanjavur
The workshop focussed on introducing the participants to the emerging
techniques in machine processing of natural languages, which are based on language
corpora. It emphasized mainly on Statistical Processing and Machine Translation. The
main Workshop was preceded by a three-day Preparatory Workshop meant for
participants not having the required background.
Faculty of the Workshop
Eminent persons both from India and abroad, who are working in this field for a
long time now, conducted the workshop. Their role included giving lectures, acting as
guides for the projects that were taken up during the workshop and motivating the
participants to take up further research in NLP. The list of faculty members includes:
Prof. Aravind K Joshi
University of Pennsylvania
U.S.A
Dr. B. Srinivas
AT&T Research Labs
New Jersey.
Dr. Anoop Sarkar
University of Pennsylvania
U.S.A
Prof. Rajeev Sangal
Director, LTRC
Indian Institute of Information Technology
Hyderabad.
Mr. Durgesh D Rao
Research Scientist, KBCS
National Centre for Software Technology
Mumbai.
1
Lecture Topics
Various lectures covering the broad spectrum of Corpus-based Natural Language
processing were delivered during the workshop that includes,
1.Finite State Machines
2.Finite State Transducers
3.Chunking
4.Tagging
5.Statistical Parsing
6.Language Models
7.Hidden Markov Model
8.Tree Adjoining Grammar Formalism
9.Paninian Grammar Formalism
10.Information Retrieval
11.Classifiers
12.Clustering
13.Latent Semantic Indexing
14.Word Sense Disambiguation
15.Machine Translation models
16.Statistical Machine Translation
The workshop was organized in a way that, lectures were given during the
morning sessions and in the afternoon sessions, participants were working in the projects
assigned to them. The lectures topics were picked in such a way so as to cover the
techniques used in Statistical Natural Language Processing and as well as to make the
participant to understand the underlying concepts to enable him to take up individual
research work in his area of interest.
List of Participants
A total number of 40 persons with different academic and language background,
profession participated in the workshop.
1.Ananthakrishnan
2.Arun C H
3.Bhadran V K
4.Chenna Kesava Murthy M
5.Deepa Gupta
6.Dhanabalan T
7.Dija S
8.Elangaiyan R
9.Jayaprasad Hedge
10.Kamakshi S
11.Lehal G S
12.Malliga P
13.Manjula D
NCST, Mumbai.
NMC College, Marthandam.
ERDC, Thiruvananthapuram.
University of Hyderabad, Hyderabad.
IIT, Delhi
Anna University, Chennai.
ERDC, Thiruvananthapuram.
CIIL, Mysore.
NCST, Mumbai.
Tamil University, Thanjavur.
Punjabi University, Patiala.
Anna University, Chennai
Anna University, Chennai
2
14.Meenakshi
15.Mona Parakh
16.Niva Das
17.Nobby Varghese
18.Prakash Rao
19.Pranjali Kanade
20.Ravikumar K E
21.Samir Kumar Borgohain
22.Santhosh T Varghese
23.Satheesh Kumar K
24.Shivshankar B Nair
25.Srinivasan C J
26.Sriram V
27.Subramaniam A
28.Tangirala Papi Reddy
29.Vamshi Krishna A
30.Varanasi Kiran Babu
31.Vibhav Agarwal
32.Vinish Jain
33.Vivek Mehta
34.Arulmozi S
35.Baskaran S
36.Kumarashanmugam B
37.Ramanan S V
38.Ramesh Kumar S
39.Thiyagarajan S
40.Visvanathan S
Chennai.
IIIT, Hyderabad
IIIT, Hyberabad.
ERDC, Thiruvananthapuram.
IIIT, Hyberabad.
IIIT, Hyberabad.
Chennai.
IIT, Guwahati.
ERDC, Thiruvananthapuram.
ERDC, Thiruvananthapuram.
IIT, Guwahati.
Amrita Institute of Technology, Coimbatore.
IIIT, Hyberabad.
Mozhi, Chennai.
IIIT, Hyberabad.
IIIT, Hyberabad.
IIIT, Hyberabad.
IIIT, Hyberabad.
IIIT, Hyberabad.
NCST, Mumbai.
AU-KBC, Chennai
AU-KBC, Chennai
AU-KBC, Chennai
AU-KBC, Chennai
AU-KBC, Chennai
AU-KBC, Chennai
AU-KBC, Chennai
Background of Participants
Educational Background
Others
8%
Exposure to NLP
Linguistics
18%
Prior
Knowledge
33%
Nil
67%
Comp. Sci.
74%
Projects
3
Executing Group Projects by participants constituted an integral part of the
workshop. The idea is to enable the participants to continue the work on their projects
even after they go back to their institutions. There were 11 groups with each group
working on one project and the faculty members of the Workshop guided these projects.
The following were the Projects taken up during the workshop:
1.Morphological Analyzer for Tamil
2.Rule based Parser
Tamil
Telugu
3.Statistical Parser for Hindi
4.Document Classifier
5.Document retrieval
6.Word Sense Disambiguation: Mapping of English senses to Indian languages
7.Named Entity Recognizer
8.Terminology Extraction from Medical Literature
9.Example Based Machine Translation
English - Hindi
English - Tamil
10.Super Tags for Indian Languages
Hindi
Tamil
11.Cross-linking of bilingual dictionaries
Hindi - Telugu
Hindi - Tamil
Hindi - Kannada
Hindi - Marathi
Feedback from
the Participants
About Workshop
About Projects
0
100%
100
74
79
90%
80%
79
Participants
Rating
80
75
60
40
20
0
Content
Presentation
32
24
32
70%
60%
50%
40%
30%
20%
100
68
76
68
10%
0%
Guidance
Int eract ion
Tools Provided
Comput ing
Facilit y
Preparatory Workshop
Main Workshop
Adequate
Not Adequate
4
About Arrangements
95
93
90
Rating
85.4
85
84
83
80
78
75
70
Classroom
Lab
Handouts
Food
Admin/Support
Staff
Overall Opinion
Usefulness
No
3%
Duration of the Workshop
Too long
3%
5
Yes
97%
Correct
97%
Download