Presentation of the CLIA Project On the occasion of FIRE at Kolkata by Pushpak Bhattacharyya, IIT Bombay, On behalf of the CLIA Consortium 12 Dec 2008 Motivation 2 CLIA is a real need Great language diversity in India Low comfort level with English less than 5% of the total population of about 700 million can use English effectively Need for critical information in large quantity and high quality, especially in agriculture, health, tourism, education and sectors CLIA project started in 2006: domainstourism and health 12 Dec 08 FIRE– Kolkata - CLIA Project 3 Geographically speaking World Rank in Terms of #speakers: Punjabi Bengali Hindi-Urdu: 5th Bengali: 7th Marathi: 14th … .. Marathi Telugu tamil 12 Dec 08 FIRE– Kolkata - CLIA Project 4 CLIA: basic information 5 Defining Diagram 12 Dec 08 FIRE– Kolkata - CLIA Project 6 CLIA Consortium Members Name of Institute Assigned Language(s) IIT Bombay (Consortium Leader) IIT-Kharagpur (consortium co-leader) IIIT Hyderabad Anna University-KBC Anna University-College of Engg ISI Kol Jadavpur University Kolkata CDAC-Pune Marathi, Hindi Bengali Telugu, Hindi Tamil Tamil Bengali Bengali Marathi, Hindi, Tamil Punjabi -- CDAC-Noida Utkal University 12 Dec 08 FIRE– Kolkata - CLIA Project 7 Principal Investigators Name of Institute Names IITB IIT-Kgp IIITH AU-KBC AU-CEG ISI Kol JU Kol CDAC-P CDAC-N Utkal University Prof. Pushpak Bhattacharyya Prof. Sudeshna Sarkar Prof. Vasudev Verma Prof. Sobha L. Prof. Ranjani Parthasarthy Prof. Mandar Mitra Prof. Sivaji Bandyopadhya Dr. Ajai Kumar Dr. Karunesh Arora Prof. Sanghamitra Mohanty 12 Dec 08 FIRE– Kolkata - CLIA Project 8 Some prominent research members Name of Institute Names IITB IIT-Kgp IIITH AU-KBC AU-CEG ISI Kol JU Kol CDAC-P CDAC-N Utkal University Manoj, Vishal, Vishaal, Ashish Nimesh, Dr. Rajendra Bhupal, Praneet Pattavi, Vijay, Vijay Kaviha, Subha Lalitha Prasenjt, Deepashri, Ayan Asif, Pinaki Swati, Abhishek Gaur Mohan, Ankur Balbant Rai 12 Dec 08 FIRE– Kolkata - CLIA Project 9 Prior expertise brought to the project (Horizontal, i.e., language independent) Name of Institute Areas of prior expertise/experience IITB IIT-Kgp IIITH NLP (LR, WSD, MT), Semantic Search Search and Ranking, Shallow Parsing Commercial level search engine building, query processing NER, Information Extraction, Summarization, Anaphora Morphology, Interlingua IR Evaluation, large scale IR system building (SMART) Example based MT, Summarization, NER Converters, File format processors, MT Parallel corpora, Query processing Machine Translation, Lexical Resources AU-KBC AU-CEG ISI Kol JU Kol CDAC-P CDAC-N Utkal University 12 Dec 08 FIRE– Kolkata - CLIA Project 10 Prior expertise brought to the project (vertical, i.e., language specific) Name of Institute Areas of prior expertise/experience IITB Hindi Marathi wordnet building, Hindi Marathi shallow parsing Bengali shallow parsing including MA Telugu-Eng CLIR, Telugu query processing Tamil NER, Tamil IE, Tamil Morph Tamil Morph, Eng-Tamil MT Bengali statistical stemming, large scale corpora for Bengali Bengali NER, EBMT involving Bengali Various Indian language converters Aligned parallel corpora for Indian languages -- IIT-Kgp IIITH AU-KBC AU-CEG ISI Kol JU Kol CDAC-P CDAC-N Utkal University 12 Dec 08 FIRE– Kolkata - CLIA Project 11 Horizontal tasks of CLIA and the organizations responsible Input Query processing Crawling, Indexing IIT KGP, IIITH, IITB User Interface IIT KGP, IIITH, IITB Searching, Ranking IIIT Hyderabad CDAC Noida File format processing 12 Dec 08 CDAC Pune FIRE– Kolkata - CLIA Project 12 Horizontal tasks of CLIA and the organizations responsible (contd) Document Processing (index time NER, IE) Document Processing (Post Retrieval: Snippet, Summary) IIT KGP, Utkal, CDACP Evaluation, Relevance Judgement Jadavpur University Distributed Search AU KBC ISI Kolkata UNL based semantic search (for Tamil) 12 Dec 08 AU CEG FIRE– Kolkata - CLIA Project 13 Languages and the organizations responsible Language Organization(s) Bengali Hindi IIT KGP (c), JU, ISI IIITH (c), IITB, CDAC Noida IITB (c), CDAC Pune CDAC Noida AUKBC (c), AUCEG IIITH Marathi Punjabi Tamil Telugu 12 Dec 08 FIRE– Kolkata - CLIA Project 14 CLIA Important Dates Project Start Date: 29th Aug 06 (effectively Jan 2007) First meeting of the Project Review and Steering Group (PRSG): 2nd March 2007 Second PRSG: 30th Aug 2007 Third PRSG: 08th March 2008 Fourth PRSG: 15th July 2008 Alpha version released: 15th July, 2008 Beta version to be released (along with the 5th PRSG): January, 2009 12 Dec 08 FIRE– Kolkata - CLIA Project 15 Related consortium: E-IL MT project English to Indian Language MT Indian Languages: Hindi, Marathi, Bengali, Urdu, Oriya, Telugu, Tamil Approaches: Statistical MT, Example Based MT Members: CDAC Pune (c), IIT Bombay, JU, UU, IIITH, IIITA 12 Dec 08 FIRE– Kolkata - CLIA Project 16 Related consortium:IL-IL MT project Indian Language to Indian Language MT Indian Languages: Hindi, Marathi, Bengali, Punjabi, Tamil, Telugu, Kannada Approach: Transfer Based Members: IIITH (c), CDAC Pune, IIT Bombay, JU, University of Hyderabad, AU KBC 12 Dec 08 FIRE– Kolkata - CLIA Project 17 All three projects are time bound and result oriented 2 years time frame (extension granted for 1 year) Strict deliverables For each project the budget outlay is about Rs 80 million (USD 2 million) 12 Dec 08 FIRE– Kolkata - CLIA Project 18 CLIA: Top level technological information 19 Process Flow 12 Dec 08 FIRE– Kolkata - CLIA Project 20 12 Dec 08 FIRE– Kolkata - CLIA Project 21 CLIA: achievements in 2 years (Jan 2007 to Dec 2008) Tools and resources (Copyrightable code and data) 22 Steps towards overall evaluation Yet to be completed Large Relevance judgment base under construction Precision, Recall, MAP, F-score etc. 50 queries per language (6 languages) About 5000 documents per language (6 languages) Crawled and indexed document base of English: approx 600,000 pages 12 Dec 08 FIRE– Kolkata - CLIA Project 23 Copyright for CLIA (code) Code Input Processing Details Soft Keyboard (Hindi, Bengali, Tamil, Telugu, Punjabi, Marathi Languages) (CDAC - P) Algorithm for transliteration of Devanagari words to English using Segment Based Transliteration (IIITH, IITB) Implementation of Multilingual Sense Dictionary along with API for accessing MSD during lexical substitution (IITB) Implementation of automatic Multi-word extraction algorithm for populating the multi-word field of index (IITB) Bengali Bengali stemmer (IITKGP) Bengali Hindi transliteration (IITKGP) Marathi 12 Dec 08 Implementation of Language Analyzers (Morphological Analyzer) for Marathi (IITB) FIRE– Kolkata - CLIA Project 24 Copyright for CLIA (code) contd. Code Punjabi Details Punjabi Spell Normalizer (CDAC-N) Punjabi Stemmer (CDAC-N) Font transcoders (Unicode - Proprietary fonts) - map files etc. (CDAC-N) Tamil Stemmer for Tamil (AUKBC) Named Entity Recognition engine (AUKBC) Information Extraction (AUKBC) Font transcoders (Tamil Proprietary fonts) (AUKBC) IE template Translation (AUKBC) 12 Dec 08 FIRE– Kolkata - CLIA Project 25 Copyright for CLIA (code) Cont.. Code Telugu Details Language Analyzer for Telugu (IIITH) Query Translation for Telugu and Hindi (IIITH). Query Transliteration for all languages. (IIITH) Transcoder (IIITH) Indexing CML converter (IITKGP) Focused Crawler (IIITH) Language Identifier (IIITH) File Format Processors (CDACP) 12 Dec 08 FIRE– Kolkata - CLIA Project 26 Copyright for CLIA (code) Cont.. Code Details Ranking Ranker implementation (IITKGP) Output Processing Snippet Generation (JU) Summary Generation (JU) Snippet Translation (JU) UNL Sentence constituent UNL enconverter (AUCEG) UNL indexer (AUCEG) UNL Template based Information extractor (AUCEG) UNL Template based Summarizer (AUCEG) UNL based Search and ranking (ranking module under development) (AUCEG) 12 Dec 08 FIRE– Kolkata - CLIA Project 27 Copyright for CLIA (data) Data Details Input Processing Bengali Synset dictionary entries for Bengali (shared with JU and CDAC Pune) English to Bengali Transliteration of NE list (shared with JU and IIT KGP) NE annotated corpora (IITKGP) NE list transliterated (IITKGP) Telugu Telugu to English Dictionary (IIITH) Telugu to English Transliteration list (IIITH) NE annotated corpora for Telugu and Hindi. (IIITH) Telugu corpus developed for IE module. (IIITH) 12 Dec 08 FIRE– Kolkata - CLIA Project 28 Copyright for CLIA (data) contd. Data Details Input Processing Tamil English - Tamil Parallel Named Entity List (AUKBC) Tamil - English Dictionary (AUKBC) Synset dictionary entries for Tamil (AUKBC) Tamil Named Entity annotated corpus (AUKBC) English Named Entity annotated corpus (AUKBC) Named Entity Tagset (AUKBC) 12 Dec 08 FIRE– Kolkata - CLIA Project 29 Copyright for CLIA Cont.. Data Punjabi Details Punjabi translations ( for parallel corpora ) (CDAC-N) English - Hindi - Punjabi parallel named entity list (CDAC-N) Punjabi Named Entity Tagged Corpus (under development) (CDAC-N) Database for Punjabi stemmer (prior development) (CDAC-N) Marathi English to Marathi Transliteration of NE list (IITB and CDAC Pune) Marathi-English parallel corpora in tourism domain used for training the snippet translation SMT system (IITB) List of Multi-Word Expressions in Marathi and Hindi (IITB) English-Marathi Parallel list of Named-entities used for IE Template translation (Shared with C-DAC Pune) Hindi Hindi to English Dictionary (IIIH) Hindi to English transliteration list (IIIH) Hindi MW list (IITB) 12 Dec 08 FIRE– Kolkata - CLIA Project 30 Copyright for CLIA Cont.. Data Details Evaluation of the IR system Set of test topics (general domain, tourism domain).(ISIK) Relevance judgments for the above pair.(ISIK) UNL 12 Dec 08 UW list - Tourism domain (AUCEG) FIRE– Kolkata - CLIA Project 31 Conclusion Large scale national level activity Large number of tools and resources developed under the consortium Alpha release done in July, 2008 Beta release to take place in Jan, 2009 Look forward to more detailed interactions and suggestions from the international audience 12 Dec 08 FIRE– Kolkata - CLIA Project 32 Introducing people… 33 Principal Investigators Name of Institute Names IITB IIT-Kgp IIITH AU-KBC AU-CEG ISI Kol JU Kol CDAC-P CDAC-N Utkal University Prof. Pushpak Bhattacharyya Prof. Sudeshna Sarkar Prof. Vasudev Verma Prof. Sobha Nair Prof. Ranjani Parthasarthy Prof. Mandar Mitra Prof. Sivaji Bandyopadhya Dr. Ajai Kumar Dr. Karunesh Arora Prof. Sanghamitra Mohanty 12 Dec 08 FIRE– Kolkata - CLIA Project 34 Some prominent research members Name of Institute Names IITB IIT-Kgp IIITH AU-KBC AU-CEG ISI Kol JU Kol CDAC-P CDAC-N Utkal University Manoj, Vishal, Vishaal, Ashish Nimesh, Dr. Rajendra Bhupal, Praneet Pattavi, Vijay, Vijay Kaviha, Subha Lalitha Prasenjt, Deepashri, Ayan Asif, Pinaki Swati, Abhishek Gaur Mohan, Ankur Balbant Rai 12 Dec 08 FIRE– Kolkata - CLIA Project 35 Overview Technical Status of the Project Technical Documentation Shared resources Testing methodology Software Documentation Alpha and Beta versions 12 Dec 08 FIRE– Kolkata - CLIA Project 36 Technical Summary Work Flow Input Query in IL Input Query Processing Search Document Processing Output Generation Evaluation 12 Dec 08 FIRE– Kolkata - CLIA Project 38 Project Status Input Query in IL Input Query Processing Search Document Processing Output Generation Evaluation 12 Dec 08 FIRE– Kolkata - CLIA Project 39 Status - Input Processing Stemmer All Language stemmers developed Integrated with Nutch through plug-ins Monolingual retrievals are working MWE 12 Dec 08 Guidelines are under discussion (IITB) Marathi ~ 2000 MWE Bangla ~ 600 MWE Tamil ~ 600 MWE Punjabi ~ 4000 MWE FIRE– Kolkata - CLIA Project 40 Status – Input Processing : NER Language NE-tagged Corpus size Accuracy NE list Details Hindi (IIITH) 50K words 68% 31,177 entries English 50K (AUKBC) 88.5% (Precision) 73.7% (Recall) F-Score-80.44% 7,500 entries (AUKBC) Gazetteer List size (IITKgp) : Health-39,819 entries Tourism-90,848 entries General-4,79,427 entries Punjabi (CDACN) Not started NA Person-10,004 | City-500 | Company-500 Hospital-20,603 Marathi (IITB) 50K 61.43% (F-score) Total-4763 | Time-361 | Numerical-706 | Names - 3666 Bengali (IITKgp) 125K (all domains) ~ 75-78% Bangla: 90,000 names (all domains) Gazetteer list is being transliterated to Bangla Tamil (AUKBC) 94K 88.5% (Precision) 73.7% (Recall) F-Score-80.44% NE-23,000 entries Dictionary of Personal names-70,000 (Tagged corpus + Dictionary used for NER) Telugu (IIITH) 60K 74% 38,000 entries 12 Dec 08 FIRE– Kolkata - CLIA Project 41 Status - Input Processing WSD (IITB) 2nd version WSD Interface for Sense-marking of corpus developed by IITB Dictionary 12 Dec 08 IITB working on E-Hin linkage All LVs working on IL-IL linking and E-IL linking ~10,000 synsets generated from Tourism corpora FIRE– Kolkata - CLIA Project 42 Status: Dictionary Eng-Hin Linkage ~ 2500 synsets linked (IITB) IL-IL Dictionary Status (as on 30 Sept 07) Language #Synsets linked Bengali Marathi Punjabi 2005 4298 (all cross-linked) 559 Tamil 1890 Telugu 461 12 Dec 08 FIRE– Kolkata - CLIA Project (without cross-linking) 43 Sample Input screen Input Screen 12 Dec 08 FIRE– Kolkata - CLIA Project 44 Sample Input screen Advanced search option 12 Dec 08 FIRE– Kolkata - CLIA Project 45 Project Status Input Query in IL Input Query Processing Search Document Processing Output Generation Evaluation 12 Dec 08 FIRE– Kolkata - CLIA Project 46 Status – Search Size of Indexed corpus Language No of pages No of URLs English Hindi 10,000 21,000 115 25 Bangla 3,000 25 Tamil 20,000 25 Punjabi 17,000 25 Marathi 3,300 42 12 Dec 08 FIRE– Kolkata - CLIA Project 47 Status – Search cML-Text Converter (IIT-Kgp) 12 Dec 08 First version of the engine is ready Software extracts the fields and body, but does not identify paragraphs and blocks in this version Has been tested for Bengali Ready to be integrated with Nutch FIRE– Kolkata - CLIA Project 48 Project Status Input Query in IL Input Query Processing Search Document Processing Output Generation Evaluation 12 Dec 08 FIRE– Kolkata - CLIA Project 49 Status – Document Processing Basic IE Engine and eleven IE Templates are ready (AUKBC) Has been tested with sample documents (EILMT corpus) First template “How to reach the place” is getting translated to Tamil, Telugu For other languages, the inflectionary markers are being provided 12 Dec 08 FIRE– Kolkata - CLIA Project 50 Project Status Input Query in IL Input Query Processing Search Document Processing Output Generation Evaluation 12 Dec 08 FIRE– Kolkata - CLIA Project 51 Sample Output Screen Output screen if Input language is Hindi 12 Dec 08 FIRE– Kolkata - CLIA Project 52 Sample Output screen Output screen if Input language is Hindi, and English tab is selected 12 Dec 08 FIRE– Kolkata - CLIA Project 53 Sample Output screen Output screen of translation of Snippet (English to Bengali) 12 Dec 08 FIRE– Kolkata - CLIA Project 54 Sample Output Screen Advanced output screen with Hindi Summary 12 Dec 08 FIRE– Kolkata - CLIA Project 55 Sample Output Screen Advanced output screen with Hindi Summary 12 Dec 08 FIRE– Kolkata - CLIA Project 56 Sample Output Screen Sample screen with Information Extraction 12 Dec 08 FIRE– Kolkata - CLIA Project 57 Status – Output Generation Snippet Generation (JU) 12 Dec 08 Working for monolingual retrieval Integrated with Nutch Has been tested for Bengali FIRE– Kolkata - CLIA Project 58 Project Status Input Query in IL Input Query Processing Search Document Processing Output Generation Evaluation 12 Dec 08 FIRE– Kolkata - CLIA Project 59 Status - Evaluation Corpora 12 Dec 08 Tourism and Health Corpora being collected for all languages News corpora also being collected. Period of news corpora ranges from 2002 to 2007 For News corpora, ISI Kol having dialogues with TOI and Hindustan Times for permission for the use of their multilingual corpora FIRE– Kolkata - CLIA Project 60 Details of Corpora (crawled) Assumption in SRS: Each language corpus has at least 50,000 documents from General / News + all available documents in Tourism and Health 12 Dec 08 FIRE– Kolkata - CLIA Project 61 Evaluation : Topics Topics (ISI Kol) A set of 95 topics are ready for evaluation 30 topics for training and 50 topics for testing and 15 topics as stand-by Each topic = Title + Narration + Description Translation of these 95 topics have been completed by all the six language verticals Sample Topic 12 Dec 08 <title> Euro Inflation</title> <desc> Find documents about rises in prices after the introduction of the Euro</desc> <narr> Any document is relevant that provides information on the rise of prices in any country that introduced the common European currency.</narr> FIRE– Kolkata - CLIA Project 62 Evaluation Methodology Benchmark data creation Corpus IR engine 1 Queries IR engine n IR engine 2 Pool Human judges Relevance Judgements 12 Dec 08 FIRE– Kolkata - CLIA Project 63 Evaluation Methodology Benchmark data creation Sample documents (corpus) Sample Queries / Topics (95) Relevance judgement Pooling 12 Dec 08 No of relevance judged Bangla documents ~ 4,500 Independently judged against 23 topics by each of two judges Pooling strategies adopted by TREC List of top ~100 documents are taken Pool = union of these FIRE– Kolkata - CLIA Project 64 Evaluation methodology Evaluation engine 30 Topics/Queries Corpus > 50,000 docs Retrieval Engine Top 100 Docs Relevance Judgments Evaluation Engine Metrics 12 Dec 08 FIRE– Kolkata - CLIA Project 65 UNL Monolingual retrieval is working for Tamil documents 6500 words in UNL Dictionary Words + MWE indexed Documents indexed 12 Dec 08 No. of documents processed in Tourism - 564 No of Concept-Relation-Concept indexed - 11,754 No of Concept-Relation indexed - 11,754 No of Concepts indexed - 17,650 FIRE– Kolkata - CLIA Project 66 Testing Methodology Testing methodology Black box testing based on SRS and design documents Unit testing by each sub-system Test cases (format) and test reports Integration testing Top down / Bottom-up based on dependencies Stubs and drivers Sub-system wise testing (module-wise) Input processing Search and Retrieval Document processing Output Generation Evaluation UNL System Testing 12 Dec 08 Performance testing FIRE– Kolkata - CLIA Project 67 Integration Use of controlled corpora for Integration Use of EILMT English and Hindi parallel corpus ISI generates the queries for corpus Translation of queries by all LVs English and Hindi synsets identified for building multilingual dictionary by each LV Each language vertical will be tested for their respective cross-lingual retrieval Information Extraction and output generation will be done on the same corpora Integration of each LV into Nutch at IITKgp 12 Dec 08 FIRE– Kolkata - CLIA Project 68 Test and Integration (contd.) Bug tracking system (Bugzilla) to be installed Currently planned for installation at IITB on the same server as CVS Bugzilla 12 Dec 08 Web-based general-purpose bug tracker tool Detects not only software bugs but also all other user-submitted tracking tickets Eases communication between team members Can be integrated with CVS and WIKI FIRE– Kolkata - CLIA Project 69 Bugzilla Requirements A compatible database management system – MySQL, Postgressql A suitable release of Perl 5 A compatible web server A suitable mail transfer agent, or any SMTP server Bugzilla Demo 12 Dec 08 https://landfill.bugzilla.org/bugzilla-tip/index.cgi FIRE– Kolkata - CLIA Project 70 Bugzilla - Design Bugs can be submitted by anybody, and will be assigned to a particular developer 12 Dec 08 FIRE– Kolkata - CLIA Project 71 Deployment diagram Deployment Diagram for Nutch-based Search Subsystem Quoted from Mike Cafarella , Doug Cutting, Building Nutch: Open Source Search, Queue, v.2 n.2, April 2004 The real life scenario would have four more such index servers, one for every Indian language and (maybe) more search servers to ensure greater number of searches per unit time 12 Dec 08 FIRE– Kolkata - CLIA Project 72 Hosting of Alpha and Beta versions Alpha Version ~10,000 documents in each language Low complexity system Hence simple hardware configuration sufficient Does not include Summary generation and Output translation Planned for Dec 2008 Beta Version ~10,00,000 documents in each language Hardware configuration being worked out - based on disk space requirements, throughput of system, response times, simultaneous users etc. Following details are being worked out: 12 Dec 08 Connectivity Where to host Support for hosting Planned for July 2008 FIRE– Kolkata - CLIA Project 73 Elitex08: Demo of Alpha Version Plan to demonstrate the following: Cross-lingual information retrieval for all languages Information Extraction and translation of at least one template to Tamil / Telugu Snippet Generation (monolingual) Hardware integration – IITKgp Publicity management / Poster design - JU Funds: Participation fees to be shared Demonstrate the same at IJCNLP08 exhibition (in Hyderabad - Jan 2008) 12 Dec 08 FIRE– Kolkata - CLIA Project 74 Gantt chart (as on Aug 30) 12 Dec 08 FIRE– Kolkata - CLIA Project 75 Gantt chart (as on Aug 30) 12 Dec 08 FIRE– Kolkata - CLIA Project 76 Software documentation SRS (Based on IEEE) Design document v2.0 (based on RUP) User Requirements Document (Ver 5.0) Java docs Test cases template File naming conventions Testing and integration guidelines Code review guidelines Skip templates 12 Dec 08 FIRE– Kolkata - CLIA Project 77 Software documentation : SRS SRS Introduction Overall description External interface requirements System features (module-wise) Advanced Search system for Tamil using UNL Back to Software Documentation 12 Dec 08 FIRE– Kolkata - CLIA Project Next 78 Software documentation: DD Design document (v 2.0) Has been simplified to suit project needs Introduction System Architecture System Design Solution Architecture (brief description of systems, subsystems) Software Architecture ( block diagrams) Logical Design (Class Diagrams ) Component Design (Component Diagrams ) Appendix - other details Back to Software Documentation 12 Dec 08 FIRE– Kolkata - CLIA Project Next 79 Software documentation:URD URD Introduction Objective Scope of the project Product perspective Capabilities of the Product User Characteristics Assumptions and dependencies Operational environment Input / Output scenarios Definitions, acronyms and abbreviations References Back to Software Documentation 12 Dec 08 FIRE– Kolkata - CLIA Project Next 80 Software documentation:Test Test case template: for all tests Test case Test data Expected result Actual result Back to Software Documentation 12 Dec 08 FIRE– Kolkata - CLIA Project Remarks Next 81 Software documentation:File naming File naming convention captures the following: Subject & domain of document Content Type (ppt / doc / rpt / Tr / etc) Name of Institute (IITB / ISI / IIITH etc.) Date of creation of doc (dd-mon-yy) Version no. Format <Subject>_<Content_type>_<Institute>_<date>_<ver.no>.<file ext> E.g. PRSG_Pres_IITB_08dec07_v1.ppt Back to Software Documentation 12 Dec 08 FIRE– Kolkata - CLIA Project Next 82 Shareable Resources and Tools Shared Resources across projects From ILILMT to CLIA: From EILMT to CLIA Morph Analyzer POS Tagger Chunker Dictionary Standardization IL-IL Synsets Synsets E-IL From CLIA to other projects: 12 Dec 08 NER engine NE list MWE FIRE– Kolkata - CLIA Project 83 Collaborative tools used - CLIA Tool Googlegroups Wiki CVS Google docs Webex Audioconferencing 12 Dec 08 Purpose Group e-Mailing Project Documents, Member Contact details, Minutes of meeting, Presentations, Timelines, progress reports, fund details etc Source code Sharing and editing of documents Weekly teleconferences FIRE– Kolkata - CLIA Project 84 CLIA Wiki site http://www.cfilt.iitb.ac.in/~consortia/dokuwiki CLIA Wiki contents Project Team Contact details Project documentation (SRS, Design doc, URD..) Meeting minutes and presentations Project fund details Progress reports and timelines Project resources Corpus Collaborative platform for audio conferences 12 Dec 08 FIRE– Kolkata - CLIA Project 85 CLIA Wiki site 12 Dec 08 FIRE– Kolkata - CLIA Project 86 Wiki – Upload notification 12 Dec 08 FIRE– Kolkata - CLIA Project 87 Thank You