Scuola di Dottorato in ICT Doctoral School in ICT Research project for a PhD curriculum in ICT – Computer Engineering and Science Tutor: Prof.ssa Sonia Bergamaschi (*) Italian Co-tutor: prof. Domenico Beneventano (**) Foreign Co-tutor: prof. Divesh Shivrastava Proposed Title of the research: Big Data Integration Keywords: (3) Big Data Integration, heterogeneous data, semi-automatic annotation. Research objectives: --(max 10 rows) Big data is a popular term for describing the exponential growth, availability and use of information, both structured and unstructured. Much has been written on the big data trend and its potentiality for innovation and growth of enterprises. The advise of IDC (one of the premier advisory firm specialized in information technology) for organizations and IT leaders is to focus on the ever-increasing volume, variety and velocity of information that forms big data. In most cases, such huge volume of data comes from multiple sources and across heterogeneous systems, thus, data have to be to linked, matched, cleansed and transformed. Moreover, it is necessary to determine how disparate data relates to common definitions and how to systematically integrate structured and unstructured data assets to produce useful, high-quality and up-to-date information. The research area of Data Integration (DI), active since 90s, provided good techniques for facing the above issues in a unifying framework, Relational Databases (RDB), with reference to a less complex scenario (smaller volume, variety and velocity). MOMIS, distributed as open source by the UNIMORE spin-off DATARIVER, is a successfull DI system. The goal the project is study and develop the extension of the MOMIS DI system in two main directions: exploiting more semantics to semi-automatically integrate new data types (unstructured, multimedia, video, open data, etc.) and ensuring the system scalability when a high number of datasources has to be integrated. Proposed research activity --(max 10 rows) - Study and development of innovative methods for the semi-automatic annotation of highly heterogeneous data sources; - designing and implementing computing solutions for large-scale data integration problems; Supporting research projects (and Department ) Dipartimento di Ingegneria “Enzo Ferrari” Possible connections with research groups, companies, universities.. Divesh Shivrastava – Bell Labs (USA) F.Neumann – Hasso-Platner Institute (Germany) (*) optional (**) optional/ to be completed on the second year