Research Project - آزمایشگاه فناوری وب

advertisement
Hot News
Reporter:
Hossein Kamyar
Asef poormasoomi
Supervisor
Dr. Mohsen Kahani
T
ehran University


Database Research Group
Natural Language and Text Processing Group
Database Research Group
http://ece.ut.ac.ir/dbrg
Members :
Faculty Staff : 8
Students : 9
Alumni : 17
Dr.Caro Lucas Dr.Behzad Moshiri Dr. Rohani Rankouhi
Database Research Group
Research Project:
Modernization Of Systems
Information Retrieval
Data Mining
Data Management
Project Title
Supervisor
Question Answering with Human Plausible Reasoning
Dr. Farhad
Oroumchian
Improving xml Information Retrieval By means of Human Plausible
Reasoning
Dr. Farhad
Oroumchian
Question Answering with Dynamic Functions and Plausible Inferences
Dr. Farhad
Oroumchian
Distributed Information Retrieval on The Web
Dr. Farhad
Oroumchian
Concept-based searching in a semantic web environment
Dr. Farhad
Oroumchian
Database Research Group
Research Project:
Modernization Of Systems
Information Retrieval
Data Mining
Data Management
Project Title
Supervisor
XML Data Mining
Dr. Masoud
Rahgozar
Mining for conceptual associations in unstructured and semi-structured text
for reasoning with Human Plausible Reasoning.
Dr. Masoud
Rahgozar
Spatial Data Mining and its Application in bank business Intelligece
Dr. Masoud
Rahgozar
XML Mining by frequent tree patterns
Dr. Masoud
Rahgozar
Bioinformatic Database Integration Using Data Fusion Approach
Dr. Behzad Moshiri
Database Research Group
Research Project:
Modernization Of Systems
Information Retrieval
Data Mining
Data Management
Project Title
Supervisor
An Efficient Framework for XML Data Management
Dr. Masoud
Rahgozar
XML Query Processing and Optimization
Dr. Fatemi
Database Research Group
Industrial Project
Industrial Project
Industrial Project
Project Title
Organization
Iranian Welfare and Social Security Database Analysis
Ministry Of Welfare and
Social Security
MAVA-Vista: Advanced Digital Library System
ICT Department of MUT
Business Intelligence System
Bank Mellat
Geographical Information System
Statistics Center of Iran
Chizar Digital Archive
Management and
Planning Organization
Database Research Group
Related Course:
1. Introduction to Database Systems
2. Advanced Database Systems
3. Special Topics in Database Systems
4. Database Laboratory
5. Data Mining
6. Information Retrieval
7. Natural Language Processing
Database Research Group
Persian Corpus
Hamshahri Corpus
‫ این مجموعه در‬.‫ نگهداری و توزیع میشود‬CLEF ‫ رسمی مجموعه همشهری توسط برگزارکنندگان‬1 ‫نسخه‬
.‫ پرسوجو دارد‬100 ‫ استفاده شده است و‬CLEF2009 ‫و‬CLEF2008
‫در گروه تحقیقاتی پایگاه داده دانشگاه تهران و بر‬UTIRE ‫ توسط سامانه‬1388 ‫ مجموعه همشهری در سال‬2 ‫نسخه‬
.‫تهیه شده است‬TREC ‫اساس استاندارد‬
Criteria
Version 1
Version 2
Size (Unicode CLEF XML Format)
700 MB
1400 MB
Number of Documents
160,000
318,000
From
1996/4/23
1996/4/23
To
2003/2/11
2007/5/13
Documents Category
Yes
Yes
Link to Images
No
Yes
Link to Original Webpages
No
Yes
Query + Relevance Judgments
Yes
Yes
Documents Time Span


Database Research Group
Persian Corpus
Bijankhan Corpus
Bijankhan corpus is a tagged corpus that is suitable for natural language
processing research on the Persian (Farsi) language. This collection is
gathered form daily news and common texts. In this collection all
documents are categorized into different subjects such as political, cultural
and so on. Totally, there are 4300 different subjects. The Bijankhan
collection contains about 2.6 millions manually tagged words with a tag set
that contains 40 Persian POS tags.
‫‪Database Research Group‬‬
‫‪Persian Corpus‬‬
‫‪ dotIR‬مجموعه محک وب‬
‫‪‬‬
‫این مجموعه حاصل از خزش وب در حوزه ‪ .ir‬شامل یک میلیون سند ایجاد شد‪ .‬سپس با استفاده از نرمافزار ابداعی‬
‫‪UTIRE‬تعداد ‪ 50‬پرسوجو توسط ‪ 25‬کاربر ساخته شدند‪ .‬این پرسوجوها برای جستجوی مجموعه مورد استفاده قرار‬
‫گرفتند و صفحات بازیابی شده‪ ،‬شامل مجموع ‪ 18424‬سند (بطور متوسط ‪ 369‬سند برای هر پرسوجو)‪ ،‬توسط همان ‪25‬‬
‫کاربر مورد قضاوت قرار گرفتند‪ .‬بدین ترتیب اسناد مرتبط با هر پرسوجو مشخص گردید‪.‬‬
‫‪‬‬
‫بعالوه برای بررس ی و مقایسه الگوریتمهای رتبهبندی در فعالیتی موازی تعداد ‪ 56‬ویژگی از اسناد بازیابی شده برای هر‬
‫پرسوجو بر اساس استاندارد ‪( LETOR‬ارائه شده توسط ‪ )Microsoft Research Asia‬استخراج شدند‪.‬‬
‫محققان گرامی میتوانند از بردارهای مقدار ویژگی‪ ،‬ارتباط برای مقایسه الگوریتمهای پیشنهادی خود برای رتبهبندی و یا‬
‫آموزش و تنظیم الگوریتمها سود ببرند‪.‬‬
‫‪‬‬
‫این پروژه توسط مرکز تحقیقات مخابرات ایران و آزمایشگاه پایگاه داده دانشگاه تهران پشتیبانی شده است‪.‬‬
Natural Language and Text Processing Group
Members:
10 members
Heshaam Faili
[Assistant Professor, Ph.D. Artificial Intelligence from Sharif University of Technology]
Natural Language and Text Processing Group
Research Project:
Project Title
More Than 23 Papers ?
English-Persian Statistical Machine Translation
English-Persian Rule-based Machine Translation
Statistical Word Sense Disambiguation
Automatic Persian WordNet Construction
Parallel and Comparable Corpus Construction
Monolingual Corpus Construction
Spell, Grammatical and Real-word Error Detection and Correction
Grammar Induction
Semantic Role Labeling
Statistical Parsing
Text Classification using Neural Networks
Natural Language and Text Processing Group
Industrial Project
Industrial Project
Industrial Project
Project Title
Organization
Vafa Spell Checker
Software and Information Technology Group, Information Technology Research
Center, Iran Telecommunication Research Center.
‫ دستوری و معنایی‬،‫تشخیص و تصحیح خطاهای تایپی‬
word ‫قابلیت نصب بر روی ویرایشگر متداول‬
‫قابلیت یادگیری و ارتقاء عملکرد به صورت خودکار‬
‫دقیق و کارآمد‬
‫رایگان‬
•
•
•
•
•
Natural Language and Text Processing Group
Persian Corpus
1. TEP: Tehran English-Persian Parallel Corpus

First free Eng-Per corpus

4-million tokens on each side

Sentence Aligned
2. TMC: Tehran Monolingual Corpus

Largest freely available monolingual corpus for Persian language

Tokenized

Suitable for Language Modeling
3. Mutual Information
http://ece.ut.ac.ir/nlp/resources.html
Natural Language and Text Processing Group
Related Course:
Introduction to Natural Language Processing, Dr. Heshaam Faili Advanced Database Systems
B
shahid
eheshti University

Natural Language Processing research laboratory was founded by Dr.
Mehrnoush Shamsfard at the beginning of 2006 in computer engineering
department of Shahid Beheshti University

More Than 25 members.

More Than 92 papers.
http://nlp.sbu.ac.ir/
Research Project
A. Developing Linguistic resources

Developing Semantic annotated corpus

Developing chunked corpus

Developing parallel corpus

Developing Persian Verbs database

Semi-automatic Lexicon Acquisition
Start : 2006
Researchers : Maliheh Monshizadeh, Elham Fekri
Research Project
B. Fundamental Persian text processing tools

Standard Text Preparation for Persian

Stemmer /Morphological analyzer / lemmatizer

Tokenizer

POS Tagger

Spell checker

chunker

Syntax parser

Persian Named Entity Recognition - SBUNER

Persian Anaphora resolution

Semantic Role Labelling
Start : 2006
Researchers : Samira Noferesti, Rana Forsati, Pooneh Mortazavi, Hoda Sadat Jafari
Research Project
C. NLP Applications

Machine translation – PenTrans project

English to Persian Translation System

Persian to English Translation System

Machine translation evaluation toolkit

Persian Text summarization – PARSUMIST

Question Answering

Persian –

English – SBUQA

Information Extraction - Mersad

Text understanding

Conversion between Persian sentences and first order logic

Text generation
Start : 2006
Researchers : Chakaveh Saedi, Yasaman Motazedi, Mostafa Nazari
Research Project
D. Ontology engineering




Ontology development

Development of CMMI-ACQ ontology

Collaborative development of ontology of computer science and engineering (COMON)

Fuzzy ontologies
Ontology Learning

Ontology learning from text

Ontology learning from web

Relation extraction
Ontology mapping

Evolutionary ontology matching

A linguistic-Structural Approach to Bilingual Ontology Mapping
Ontology population and instantiation
Start : 2006
Researchers : Aynaz Taheri, Hakimeh Fadaei, Tara akhavan, Rahim Dehkharghani, Valeh
Montaghami, Bahareh Sarrafzadeh, Amir Sharifloo, Rana Forsati
Research Project
E. Semantic Web

Semantic Annotation of documents

Converting web documents into semantic web resources

Semantic search

Semantic web service discovery and composition
Start : 2006
Researchers : Bahareh Sarrafzadeh, Hoda Mirzaie, Maryam Haghollahi, Homan Farrokhzad
Research Project
F. Hybrids

Application of fuzzy ontologies in qualitative reasoning

E-learning

Ontology based Content Rearrangement for Intelligent Tutoring Systems – OCRITS

Intelligent Content Management
Start : 2006
Researchers : Hamzeh Motahari, Marzieh Shariati
Courseware

Ontology Engineering

Natural Language Processing

Semantic Web

Advanced Natural Language Processing, Fall 2005 BY:

Regina Barzilay and Michael Collins
MIT University
Columbia University
Tools
FarsNet The first Persian WordNet
STeP-1 Standard Text Preparation for Persian
Tokenizer
Stemmer
POS tagger
Spell checker
S
harif University
Natural Language Processing
Web Intelligence Laboratory
Natural Language Processing
Dr ghasem Sani
Dr hesham Faili
Since 2003 after three inactivity



Eliza
POS Tagger
Unsupervised Natural Grammar Induction
Web Intelligence Laboratory
Supervisor:
Dr Abolhasani
with 28 members
Web Intelligence Laboratory
Advanced Researches:

Semantic Search Engines

Semantic Web Services

Semantic web for pervasive computing

Annotation

Semantic Grids

Social Networks Analysis

Ontology Alignment and Learning

Web Clustering

Business Intelligence
Web Intelligence Laboratory
New Researches:

Composite Web Service Execution Framework.

Tracking news to find hot topics.

Semantic Programming.

Trust model in Semantic Web.

New models for recommender systems.

Using web to create a lecture for a subject.

A Farsi framework for Information Retrieval.

A semantic based framework for business intelligence applications.
S

cience & Technology University
Unknown Laboratory

but Online POS
Tagger
‫با همکاری پروژه ی عروض تحت پشتیانی شورای عالی اطالع رسانی‬
http://persianp.ir/index.php?option=com_wrapper&view=wrapper&Itemid=7
http://www.prosody.ir
Conferences
The Cross-Language Evaluation Forum (CLEF)
(i)
developing an infrastructure for the testing, tuning and evaluation
of information retrieval systems operating on European languages
in both monolingual and cross-language contexts
(ii)
(ii) creating test-suites of reusable data which can be employed by
system developers for benchmarking purposes.
CLEF Conferences be held since 2000
CLEF2011 will be held by Amsterdam University
Computational Approaches to Arabic Script-based
Languages (CAASL)
CAASL2011 will be held in Geneva
‫‪orporation‬‬
‫عصر گویش پرداز‬
‫‪‬‬
‫استخراج اطالعات آماري ‪n-gram‬براي زبان فارس ي‬
‫استخراج گرامر زبان فارس ي‬
‫‪‬‬
‫تهيه مجموعه واژگان زبان فارس ي‬
‫‪‬‬
‫استخراج كلمات پركاربرد زبان فارس ي به تفكيك موضوعي‬
‫‪‬‬
‫پروژه های در حال تحقیق‬
‫‪‬‬
‫مدل احتمالي کلمات تکي‪ ،‬دوتايي‪ ،‬سهتايي و چهارکلمهاي براي زبانهاي فارس ي و انگليس ي‬
‫‪‬‬
‫قوانین دستوري ‪GPSG‬براي زبان فارس ي‬
‫دستور زبان احتمالي‬
‫‪‬‬
‫پارسرهاي مناسب مدل زباني‬
‫‪‬‬
‫روشهاي خوشه بندي کلمات‬
‫‪‬‬
‫‪C‬‬
w
e do ...
Download