Big Data Integration

advertisement
Scuola di Dottorato in ICT
Doctoral School in ICT
Research project for a PhD curriculum in ICT – Computer Engineering and Science
Tutor: Prof.ssa Sonia Bergamaschi
(*) Italian Co-tutor: prof. Domenico Beneventano
(**) Foreign Co-tutor: prof. Divesh Shivrastava
Proposed Title of the research:
Big Data Integration
Keywords: (3)
Big Data Integration, heterogeneous data, semi-automatic annotation.
Research objectives: --(max 10 rows)
Big data is a popular term for describing the exponential growth, availability and use of
information, both structured and unstructured. Much has been written on the big data trend and
its potentiality for innovation and growth of enterprises. The advise of IDC (one of the premier
advisory firm specialized in information technology) for organizations and IT leaders is to
focus on the ever-increasing volume, variety and velocity of information that forms big data.
In most cases, such huge volume of data comes from multiple sources and across heterogeneous
systems, thus, data have to be to linked, matched, cleansed and transformed. Moreover, it is
necessary to determine how disparate data relates to common definitions and how to
systematically integrate structured and unstructured data assets to produce useful, high-quality
and up-to-date information.
The research area of Data Integration (DI), active since 90s, provided good techniques for
facing the above issues in a unifying framework, Relational Databases (RDB), with reference to
a less complex scenario (smaller volume, variety and velocity). MOMIS, distributed as open
source by the UNIMORE spin-off DATARIVER, is a successfull DI system.
The goal the project is study and develop the extension of the MOMIS DI system in two main
directions: exploiting more semantics to semi-automatically integrate new data types
(unstructured, multimedia, video, open data, etc.) and ensuring the system scalability when a
high number of datasources has to be integrated.
Proposed research activity --(max 10 rows)
- Study and development of innovative methods for the semi-automatic annotation of highly
heterogeneous data sources;
- designing and implementing computing solutions for large-scale data integration problems;
Supporting research projects (and Department )
Dipartimento di Ingegneria “Enzo Ferrari”
Possible connections with research groups, companies, universities..
Divesh Shivrastava – Bell Labs (USA)
F.Neumann – Hasso-Platner Institute (Germany)
(*) optional
(**) optional/ to be completed on the second year
Download