CS 6140: Machine Learning Time and Locafon Course Webpage

advertisement
2/26/16
TimeandLoca@on
CS6140:MachineLearning
Spring2016
•  Time:Thursdaysfrom6:00pm–9:00pm
•  Loca)on:BehrakisHealthSciencesCntr325
Instructor:LuWang
CollegeofComputerandInforma@onScience
NortheasternUniversity
Webpage:www.ccs.neu.edu/home/luwang
Email:luwang@ccs.neu.edu
CourseWebpage
•  hRp://www.ccs.neu.edu/home/luwang/
courses/cs6140_sp2016.html
Prerequisites
•  Programming
–  Beingabletowritecodeinsomeprogramming
languages(e.g.Python,Java,C/C++)proficiently
•  Courses
–  Algorithms
–  Probabilityandsta@s@cs
–  Linearalgebra
•  Someknowledgeinmachinelearning
•  BackgroundCheck
TextbookandReferences
•  MainTextbook
–  KevinMurphy,"MachineLearning-aProbabilis@c
Perspec@ve",MITPress,2012.
•  Othertextbooks
–  ChristopherM.Bishop,"PaRernRecogni@onand
MachineLearning",Springer,2006.
–  TomMitchell,"MachineLearning",McGrawHill,1997.
•  Machinelearninglectures
ContentoftheCourse
•  Regression:linearregression,logis@cregression
•  DimensionalityReduc)on:PrincipalComponentAnalysis(PCA),Independent
ComponentAnalysis(ICA),LinearDiscriminantAnalysis
•  Probabilis)cModels:NaiveBayes,maximumlikelihoodes@ma@on,bayesian
inference
•  Sta)s)calLearningTheory:VCdimension
•  Kernels:SupportVectorMachines(SVMs),kerneltricks,duality
•  Sequen)alModelsandStructuralModels:HiddenMarkovModel(HMM),
Condi@onalRandomFields(CRFs)
•  Clustering:spectralclustering,hierarchicalclustering
•  LatentVariableModels:K-means,mixturemodels,expecta@on-maximiza@on
(EM)algorithms,LatentDirichletAlloca@on(LDA),representa@onlearning
•  DeepLearning:feedforwardneuralnetwork,restrictedBoltzmannmachine,
autoencoders,recurrentneuralnetwork,convolu@onalneuralnetwork
•  andothers,includingadvancedtopicsformachinelearninginnaturallanguage
processingandtextanalysis
1
2/26/16
TheGoal
•  Notonlywhat,butalsowhy!
Grading
•  Assignments
–  3assignments,10%foreach
•  Exam
–  1exam,30%
–  March,31,2016
•  Project
–  1project,35%
•  Par@cipa@on
–  5%
–  Classes
–  Piazza
Exam
•  Openbook
•  March31,2016
CourseProject
•  Op@on1:Amachinelearningrelevant
researchproject
•  Op@on2:YelpChallenge
•  2-3studentsasateam
ResearchProject
YelpChallenge
•  Machinelearningrelevant
–  Naturallanguageprocessing
–  Computervision
–  Robo@cs
–  Bioinforma@cs
–  Healthinforma@cs
–  …
•  Novelty
2
2/26/16
YelpChallenge
YelpChallenge
CourseProjectGrading
CourseProjectGrading
•  Wewanttoseenovelandinteres@ng
projects!
–  Theproblemneedstobewell-defined,novel,
useful,andprac@cal
–  machinelearningtechniques
–  Reasonableresultsandobserva@ons
SubmissionandLatePolicy
•  Eachassignmentorreport,bothelectronic
copyandhardcopy,isdueatthebeginningof
classonthecorrespondingduedate.
•  Electronicversion
–  Onblackboard
•  Hardcopy
•  Threereports
–  Proposal(5%)
–  Progress,withcode(10%)
–  Final,withcode(10%)
•  Onepresenta@on
–  Inclass(10%)
SubmissionandLatePolicy
•  Assignmentorreportturnedinlatewillbe
charged10points(outof100points)offfor
eachlateday(i.e.24hours).
•  Eachstudenthasabudgetof5days
throughoutthesemesterbeforealatepenalty
isapplied.
–  Inclass
3
2/26/16
Howtofindus?
•  Coursewebpage:
–  hRp://www.ccs.neu.edu/home/luwang/courses/
cs6140_sp2016.html
•  Officehours
–  LuWang:Thursdaysfrom4:30pmto5:30pm,orby
appointment,448WVH
–  GabrielBakiewicz(TA),MondaysandTuesdaysfrom
4:00pmto5:00pm,362WVH
WhatisMachineLearning?
•  “Asetofmethodsthatcanautoma@cally
detectpaRernsindata,andthenusethe
uncoveredpaRernstopredictfuturedata,or
toperformotherkindsofdecisionsmaking
undercertainty.”
•  Piazza
–  hRp://piazza.com/northeastern/spring2016/cs6140
–  Allcourserelevantques@onsgohere
RealWorldApplica@ons
RealWorldApplica@ons
RealWorldApplica@ons
RealWorldApplica@ons
4
2/26/16
RealWorldApplica@ons
RealWorldApplica@ons
RealWorldApplica@ons
RealWorldApplica@ons
RealWorldApplica@ons
Rela@onswithOtherAreas
•  NaturalLanguageProcessing
•  ComputerVision
•  Robo@cs
•  Alotofotherareas…
5
2/26/16
Today’sOutline
•  Basicconceptsinmachinelearning
Supervisedvs.UnsupervisedLearning
•  Supervisedlearning
•  K-nearestneighbors
•  Linearregression
•  Ridgeregression
Trainingset Trainingsample Gold-standardlabel
-  Classifica)on,ifcategorical
-  Regression,ifnumerical
SupervisedLearning
SupervisedLearning
SupervisedLearning
SupervisedLearning
•  Goal:
–  Generalizabletonewinputsamples
–  Overfiungvs.underfiung
–  weuseprobabilis@cmodels
•  Typicalsetup:
–  Trainingset,testset,developmentset
–  Features
–  Evalua@on
6
2/26/16
SupervisedLearning
SupervisedLearning
SupervisedLearning
SupervisedLearning
•  Regression
–  Predic@ngstockprice
–  Predic@ngtemperature
–  Predic@ngrevenue
…
Supervisedvs.UnsupervisedLearning
•  UnsupervisedLearning
UnsupervisedLearning
•  Dimensionreduc@on
–  Principalcomponentanalysis
•  Moreabout“knowledgediscovery”
7
2/26/16
UnsupervisedLearning
•  Clustering(e.g.graphmining)
UnsupervisedLearning
•  Topicmodeling
RolX:RoleExtrac.onandMininginLargeNetworks,byHendersonetal,2011
Parametricvs.Non-parametricmodel
•  Fixednumberofparameters?
–  Ifyes,parametricmodel
•  Numberofparametersgrowwiththeamount
oftrainingdata?
–  Ifyes,non-parametricmodel
Anon-parametricclassifier:K-nearest
neighbors(KNN)
•  Basicidea:memorizeallthetrainingsamples
–  Themoreyouhaveintrainingdata,themorethe
modelhastoremember
•  Nearestneighbor:
–  Tes@ngphase:findclosetsample,andreturn
correspondinglabel
•  Computa@onaltractability
Anon-parametricclassifier:K-nearest
neighbors(KNN)
•  Basicidea:memorizeallthetrainingsamples
–  Themoreyouhaveintrainingdata,themorethe
modelhastoremember
•  K-Nearestneighbor:
–  Tes@ngphase:findtheKnearestneighbors,and
returnthemajorityvoteoftheirlabels
8
2/26/16
AboutK
AboutK
•  K=1:justpiecewiseconstantlabeling
•  K=N:globalmajorityvote(class)
ProblemsofkNN
•  Canbeslowwhentrainingdataisbig
–  Searchingfortheneighborstakes@me
•  Needslotsofmemorytostoretrainingdata
ProblemsofkNN
•  Distancefunc@on
–  Euclideandistance
•  Needstotunekanddistancefunc@on
•  Notaprobabilitydistribu@on
ProblemsofkNN
•  Distancefunc@on
–  Mahalanobisdistance:weightsoncomponents
Probabilis@ckNN
•  Wepreferaprobabilis@coutputbecause
some@meswemaygetan“uncertain”result
–  99samplesas“yes”,101samplesas“no”à?
•  Probabilis@ckNN:
9
2/26/16
Probabilis@ckNN
Smoothing
•  Class1:3,class2:0,class3:1
•  Originalprobability:
–  P(y=1)=3/4,p(y=2)=0/4,p(y=3)=1/4
•  Add-1smoothing:
–  Class1:3+1,class2:0+1,class3:1+1
–  P(y=1)=4/7,p(y=2)=1/7,p(y=3)=2/7
3-classsynthe@ctrainingdata
Sowmax
•  Class1:3,class2:0,class3:1
•  Originalprobability:
Aparametricclassifier:linear
regression
•  Assump@on:theresponseisalinearfunc@on
oftheinputs
–  P(y=1)=3/4,p(y=2)=0/4,p(y=3)=1/4
•  Redistributeprobabilitymassintodifferent
classes
–  Defineasowmaxas
Innerproductbetween
inputsampleXand
weightvectorW
Residualerror:difference
betweenpredic@onand
truelabel
Aparametricclassifier:linear
regression
Innerproductbetween
inputsampleXand
weightvectorW
Residualerror:difference
betweenpredic@onand
truelabel
•  Assumeresidualerrorhasanormal
distribu@on
10
2/26/16
Aparametricclassifier:linear
regression
Aparametricclassifier:linear
regression
•  Wecanfurtherassume
•  Basicfunc@onexpansion
LearningwithMaximumLikelihood
Es@ma@on(MLE)
•  MaximumLikelihoodEs@ma@on(MLE)
LearningwithMaximumLikelihood
Es@ma@on(MLE)
•  Log-likelihood
•  Maximizelog-likelihoodisequivalentto
minimizenega@velog-likelihood(NLL)
LearningwithMaximumLikelihood
Es@ma@on(MLE)
•  Withournormaldistribu@onassump@on
Deriva@onofMLEforLinear
Regression
•  Rewriteourobjec@vefunc@onas
Residualsumofsquares(RSS)
àWewanttominimizeit!
11
2/26/16
Deriva@onofMLEforLinear
Regression
Deriva@onofMLEforLinear
Regression
•  Rewriteourobjec@vefunc@onas
•  Rewriteourobjec@vefunc@onas
•  Getthederiva@ve(orgradient)
•  Getthederiva@ve(orgradient)
•  Setourderiva@veto0
Ordinaryleastsquaressolu)on
GeometricInterpreta@on
GeometricInterpreta@on
•  Rememberwehave
•  Thereforetheprojectedvalueofyis
àThiscorrespondstoanorthogonalprojectof
yontothecolumnspaceofX
Overfiung
Overfiung
12
2/26/16
APriorontheWeight
•  Zero-meanGaussianprior
APriorontheWeight
•  Zero-meanGaussianprior
•  Newobjec@vefunc@on
APriorontheWeight
•  Zero-meanGaussianprior
RidgeRegression
•  Wewanttominimize
•  Newobjec@vefunc@on
RidgeRegression
RidgeRegression
•  Wewanttominimize
•  Wewanttominimize
•  Newes@ma@onfortheweight
•  Newes@ma@onfortheweight
L2regulariza)on
13
2/26/16
Whatwelearned
•  Basicconceptsinmachinelearning
•  K-nearestneighbors–non-parametric
•  Linearregression–parametric
Homework
•  ReadingMurphych1,ch2,andch7
•  SignupatPiazza
•  hRp://piazza.com/northeastern/spring2016/cs6140
•  Startthinkingaboutcourseprojectandfinda
team!
–  ProjectproposaldueJan28th
•  Ridgeregression–parametric
14
Download