. – Data analysis techniques and tools
P
ROFESSOR
F
RANCESCO
C
IVARDI
COURSE AIMS
“To avoid the danger of "drowning in information, but starving for knowledge" the branch of research known as data analysis has emerged, and a considerable number of methods and software tools have been developed. However, it is not these tools alone but the intelligent application of human intuition in combination with computational power, of sound background knowledge with computer-aided modeling, and of critical reflection with convenient automatic model construction, that results in successful intelligent data analysis projects .
” (Berthold et al., 2010).
Aim of the course is to teach students a good command of concepts which allow them to apply data analysis techniques, data warehousing, OLAP, data mining and machine learning algorithms to several application areas.
These concepts derive from the synergy between various subjects: artificial intelligence, statistics, Bayesian methods, information theory, control theory, computational complexity theory, neurophysiology, research into databases and information retrieval techniques. Last, but not least, the new Network Science
(social, biological etc.).
The areas of application include medical diagnosis, banking customer credit risk analysis, supermarket and e-commerce customer purchase behaviour analysis, industrial process optimization, the detection of fraud and prediction of terrorist attacks.
COURSE CONTENT
– Introduction to business intelligence, OLAP and data mining
– Concepts of data warehousing
– Multi-dimensional analysis. Dimensional modelling
– Relational and multidimensional databases
– Overview of SQL
– Introduction to MDX language
– Data mining topics: classification, prediction, clustering and association
– Decision trees. Entropy and information gain
– Overview of probability theory. Bayes' theorem
– Naive Bayes classifier. Bayesian networks
– Linear and multiple regression. Logistic regression
– Neural networks
– Support vector machines
– Validation and model comparison
– Cluster analysis: EM, k-means clustering and hierarchical algorithms
– Association analysis
– Introduction to Network Science
READING LIST
- Slides and lesson notes
- websites and papers announced during the lectures
B
ERTHOLD
, M.R., B
ORGELT
, C., H
ÖPPNER
, F., K
LAWONN
, F, Guide to intelligent data analysis, Springer
2010.
C.
V ERCELLIS , Business Intelligence - Modelli matematici e sistemi per le decisioni, McGraw-
Hill, 2006.
Reference texts:
R.
K IMBALL , Data Warehouse: La guida completa, Hoepli, 2002
I.H.
W ITTEN E IBE F RANK , Data Mining, Practical Machine Learning Tools and Techniques with Java implementations, Morgan Kaufmann, 1999
J.
H AN E M.
K AMBER , Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001.
TEACHING METHOD
Lectures, computer projects with freeware (KNIME, Gephi, NodeXL).
ASSESSMENT METHOD
The assessment is based on active participation in the course and a final project, with presentation and discussion of the results.
NOTES
Further information can be found on the lecturer's webpage at http://docenti.unicatt.it or on the Faculty notice board.