CSCI598A: Robot Intelligence Apr. 07, 2015 Technical Details of PfD • Input Trajectories 2 Technical Details of PfD • Trajectories are modeled as GMMs The joint probability p(s,t) is encoded in a GMM, which is a continuous model. 3 Technical Details of PfD • Trajectories are generated using Gaussian mixture regression (GMRs) GMR is used to retrieve p(s|t), namely the expected position at each time step. 4 Technical Details of PfD • Another examples 5 Technical Details of PfD • Another examples 6 Technical Details of PfD Have we solved the problem? • How to estimate the parameters of Gaussian Mixture Models? • How to estimate the number of Gaussian components? • How to align the trajectories from demonstrations with different speed? • How to address the curse of dimensionality? 7 Estimate parameters of GMMs • Maximum-likelihood: • We have a density function that is governed by the set of parameters . • We also have a data set of size N, supposedly drawn from this distribution: , which are independent and identically distributed. • The resulting density for the samples is: • This function is called the likelihood of the parameters given the data, or just the likelihood function. 8 Estimate parameters of GMMs • Maximum-likelihood: • Likelihood function: • The likelihood is thought of as a function of the parameters where the data is fixed. • In the maximum likelihood problem, our goal is to find the that maximizes • Often we maximize instead because it is analytically easier. 9 Estimate parameters of GMMs • Does Maximum-likelihood work for GMMs? 10 Estimate parameters of GMMs • Does Maximum-likelihood work for GMMs? • The answer is no …. • Since the data points are not from the identical Gaussian components 11 Estimate parameters of GMMs https://www.youtube.com/watch?v=REypj2sy_5U 12 Estimate parameters of GMMs • Expectation-Maximization • The EM (Expectation-Maximization) algorithm is an iterative algorithm that starts from some initial estimate of (e.g., random), and then proceeds to iteratively update until convergence is detected. • Each iteration consists of an E-step and an M-step • Given the GMM: where are the mixture weights satisfying 13 Estimate parameters of GMMs • Expectation-Maximization • E-Step: Given the current parameter values weights of each data point , compute the membership where 14 Estimate parameters of GMMs • Expectation-Maximization • M-Step: Given the current membership weight and the data, compute new parameter values: 15 Technical Details of PfD Have we solved the problem? • How to estimate the parameters of Gaussian Mixture Models? • How to estimate the number of Gaussian components? • How to align the trajectories from demonstrations with different speed? • How to address the curse of dimensionality? 16 Estimate the number of Gaussian components • Model selection: • Definition: Given different models (defined by different hyper-parameter values), select the best model (i.e., the hyper-parameter resulting in best performance). • A lot of methods have been introduced for different purposes • • • • Cross-validation Structural risk minimization Bayesian information criterion … 17 Estimate the number of Gaussian components • Bayesian information criterion (BIC): • BIC is a criterion for model selection among a finite set of models. where is the log likelihood, K is the number of Gaussian components, and N is the number of data. • The BIC selects hyper-parameters by balancing between model accuracy and model complexity. • The model with the lowest BIC is preferred. • It is widely used to estimate the number of Gaussian components, because • GMM has a discrete number of models (defined by K) • The likelihood function of GMM is directly used for parameter estimation 18 Technical Details of PfD Have we solved the problem? • How to estimate the parameters of Gaussian Mixture Models? • How to estimate the number of Gaussian components? • How to align the trajectories from demonstrations with different speed? • How to address the curse of dimensionality? 19 Align Trajectories • Dynamic Time Warping (DTW) • DTW is a time series alignment algorithm developed originally for speech recognition. • It aims at aligning two sequences by warping the time axis iteratively until an optimal match between the two sequences is found. • Consider two sequences of feature vectors: 20 Align Trajectories • Dynamic Time Warping (DTW) • The two sequences can be arranged on the sides of a grid, with one on the top and the other up the left hand side. • Both sequences start on the bottom left of the grid. • Inside each cell a distance measure can be placed, comparing the corresponding elements of the two sequences. • To find the best match or alignment between these two sequences one need to find a path through the grid, which minimizes the total distance between them. • This shortest path can be found using dynamic programming 21 Align Trajectories • Dynamic Time Warping (DTW) 22 Technical Details of PfD Have we solved the problem? • How to estimate the parameters of Gaussian Mixture Models? • How to estimate the number of Gaussian components? • How to align the trajectories from demonstrations with different speed? • How to address the curse of dimensionality? 23 Address curse of dimensionality • Dimension reduction: Principal Component Analysis https://www.youtube.com/watch?v=_bSMW1Q9_Ks 24