Uploaded by Royi Rassin

xxxxx

advertisement
26/01/20 :‫תאריך עדכון‬
:‫שם ומספר הקורס‬
‫יישומים בלמידה חישובית‬
Practical topics in Machine Learning
89-6541-01
‫ הרצאה‬:‫סוג הקורס‬
2 :‫היקף שעות‬
'‫ב‬
:‫סמסטר‬
‫תש"ף‬
:‫שנת לימודים‬
:‫אתר הקורס באינטרנט‬
We explore different use cases in which machine-learning algorithms are used to handle reallife problems involving large amounts of data items of different formats. We put a special focus
on medical-related problems.
After a brief introduction to machine learning, we will get familiar with some of the most
common technologies that are being used in practice. We will get to know the relevant libraries
and platforms that provide tools for processing and cleaning data of different types, including
unstructured volumes like images and texts. Our main coding language is Python; Among the
relevant libraries that we will work with during the course are: numpy, pandas, sklearn,
XGBoost, LGBM, pytorch, and more. Every class is divided into a preliminary part, in which we
will present the relevant theory of the technology in focus, followed by a practice session,
which includes presentation of code examples. During the course, we will show a number of
case studies, in which we will present a few recent published works and projects, and study
their machine-learning problem, data, and their proposed technology.
We will cover various topics in machine learning, including, decision trees, random forest,
neural nets, boosting and more. However, our focus will be put more on the practical side;
therefore, a background in machine learning is required, and the introductory course for
machine learning is a prerequisite for taking this course.
Class breakdown:
Class
Topic
Description
1
Intro
Background in machine learning: A quick
reminder of linear/logistic regression and
evaluation philosophy
Assignment
Teaching material:
Slides 1
Slides 2
Notebook - numpy, pandas
Notebook - sklearn, bike sharing data
Notebook - linear regression
Notebook - logistic regression, ROC, AUC
2-3
Feature handling,
data exploration
Case study
Feature cross and non-linear regression,
handling different feature types (numeric,
ordinal, categorical, string), exploring a
dataset (types of visualization:
contingency tables, normal/scatter plots,
box plots), data imputation
Teaching material:
Slides 1
Slide2 2 (TBD: data exploration,
imputation)
Notebook - one hot encoding
Notebook - feature cross
Notebook - feature cross on bike sharing
Case study (practical notebook):
Notebook (TBD) using the following
dataset:
https://www.kaggle.com/osmi/mentalhealth-in-tech-survey
Topics that will be covered: encoding
different types of features, data
exploration (contingency tables, different
distribution of features, and getting
intuition about what to look at in the data),
data cleansing (imputation)
4
Overfitting and
regularization
Variance/bias, feature selection, L1/L2
regularization
2 Case studies
Teaching material:
Slide 1
Notebook - regularization
Ex 1 (out)
Case study (paper):
Development and validation of a
predictive model for detection of
colorectal cancer in primary care by
analysis of complete blood counts: a
binational retrospective study
Case study (practical notebook):
TBD - a notebook about tuning
regularization parameter (inspired by
https://www.kaggle.com/kashnitsky/topic4-linear-models-part-3-regularization)
5
Multiclass
classification, intro to
neural networks
(reminder)
Basic image representation, convolution,
max entropy (softmax) classifier, Intro to
Feed forward networks, conv nets,
introduction to Pytorch, with examples
Ex 1 (in)
Ex 2 (out)
Teaching material:
Slides 1
Slides 2
Slides 3 (intro to feed forward and conv
nets- TBD)
Slides 4 GPU vs. CPU
Notebook - image classification
Notebook - pytorch tutorial
Notebook - simple FF network
Notebook - CIFAR10 with conv net
6
Case studies
Presentation of two (or more) works,
using deep learning to predict medical
conditions
1. International evaluation of an AI
system for breast cancer
screening, Nature 2019
2. A clinically applicable approach to
continuous prediction of future
acute kidney injury, Nature 2018
7-8
Predicting with trees
Case study
Trees, random forest, bagging, boosting
(AdaBoost, Gradient boosting, XGBoost).
Subtopics: optimization with grid search,
input normalization
Teaching material:
Slides 1
Notebook - housing data with trees
Ex 2 (in)
Ex 3 (out)
Case study (paper):
Trees vs Neurons: Comparison between
random forest and ANN for highresolution prediction of building energy
consumption
9-10
Time series
Case study
Definition, univariate/multivariate,
stochastic process, stationarity,
seasonality, moving average, exponential
smoothing, time-series clustering
techniques (e.g. topics: hierarchical
clustering, DTW, Ward, self-organizing
maps (SOM))
Ex 3 (in)
Ex 4 (out)
Teaching material:
Slides 1 (TBD)
Notebook (TBD, using the dataset:
https://www.kaggle.com/c/dsghackathon/data )
Case study (paper):
The emotional arcs of stories are
dominated by six basic shapes (EPJ Data
Science, 2016)
11
Text analysis
Text representation, embeddings, tagging
and classification with RNN
Teaching material:
Slides 1 - tf idf
Slides 2 - LSA, embedding
Notebook - tweet classification with trees
Slides 3 - RNN (TBD)
12
Common deep
learning
architectures and
their applications
Encode-decoder: Image captioning,
Seq2seq - translation. Attention models
and transformer
Ex 4 (in)
Teaching material:
TBD
13
Case study and
project proposals
Case study, paper:
On the Automatic Generation of Medical
Imaging Reports, ACL 2018
Project proposals and discussion
Project ideas
(out)
Require Prerequisites:
89511-Introduction to machine learning
Grade structure:
4 Home assignments - 40%
Final project - 60%
Related documents
Download