PPT

Learning a Discriminative Hidden Part Model for Human Action Recognition 讲解人：李哲中科院计算所JDL 2010年4月23日 2015/4/9 1 # 提纲作者介绍文章摘要问题的提出相关背景篇章结构模型介绍运动特征部件模型实验结果结论 2015/4/9 2 # 提纲作者介绍文章摘要问题的提出相关背景篇章结构模型介绍运动特征部件模型实验结果结论 2015/4/9 3 # 第一作者 Yang Wang PhD student School of Computer Science Simon Fraser University, Canada Research Interests  Computer vision , statistical machine learning Publications  Human Action Recognition & Human pose detection  2005~2010: PAMI(2), NIPS(3), ICCV(3), CVPR(3), ACCV(1), ECCV(1), ICML(1) Background    Ph.D. (recently finished) : Simon Fraser University, Canada; M.S. : University of Alberta, Canada; B.S. : Harbin Institute of Technology, China. Homepage  2015/4/9 http://www.sfu.ca/~ywang12/ 4 # 第二作者 Greg Mori Assistant Professor School of Computer Science Simon Fraser University, Canada Research Interests  human body pose estimation, pedestrian detection, activity recognition, object recognition, machine learning Awards  Outstanding Reviewer, ICCV 2009 Background  Ph.D. (2004) : University of California at Berkeley, US;  B.S. (1999) : University of Toronto, Canada. Homepage  http://www.cs.sfu.ca/~mori/ 2015/4/9 5 # 提纲作者介绍文章摘要问题的提出相关背景篇章结构模型介绍运动特征部件模型实验结果结论 2015/4/9 6 # Abstract We present a discriminative part-based approach for human action recognition from video sequences using motion features. Our model is based on the recently proposed hidden conditional random field (hCRF) for object recognition. Similar to hCRF for object recognition, we model a human action by a flexible constellation of parts conditioned on image observations. Different from object recognition, our model combines both largescale global features and local patch features to distinguish various actions. Our experimental results show that our model is comparable to other state-of-theart approaches in action recognition. In particular, our experimental results demonstrate that combining large-scale global features and local patch features performs significantly better than directly applying hCRF on local patches alone. 2015/4/9 7 # 摘要本文提出了一种基于部件的判别方法，该方法使用运动特征能够在视频序列中识别人体动作。本方法灵感来自于物体检测中的隐条件随机场（hCRF）方法。与其类似的，本文基于图像对人体动作建立了一个的部件模型；与其不同的，我们将全局特征和局部块特征联合来区分不同的动作。实验结果表明，我们的模型能够与其他的state-of-the-art方法可比。实验结果还证明了，对于hCRF方法，联合全局特征和局部块特征比单单使用局部快特征有更好的性能。 2015/4/9 8 # 提纲作者介绍文章摘要问题的提出相关背景篇章结构模型介绍运动特征部件模型实验结果结论 2015/4/9 9 # 问题的提出动作识别需要解决的问题判断给定的一段视频中的动作类型物体检测 ——动作识别物体检测领域动作识别 global template Global template (时空块扩展) Bag of words Bag of words（时空块扩展） Part model ？？如何扩展Part model ？？物体检测中hCRF的成功应用 Ariadna Quattoni, Michael Collins and Trevor Darrell. Conditional Random Fields for Object Recognition, NIPS 2005 本文对这套方法进行部分改进，引入到Human Action Recognition 2015/4/9 10 # 提纲作者介绍文章摘要问题的提出相关背景篇章结构模型介绍运动特征部件模型实验结果结论 2015/4/9 11 # Hidden Markov model 一阶链式隐马尔可夫模型模型基于贝叶斯理论对联合概率的估计 p( y ) p( x | y ) p( x, y) p ( y | x)    p( y ) p( x | y ) p ( x) 独立性假设 p( x | y )   p( xi | yi ) i # Conditional Random Fields Conditioned on all the observations Definition of CRFs：graph structure G of label set is said to be a CRFs , if and only if it satisfies Markovianity： p( yi | x,YS { yi } )  p( yi | x,YNi ) Model： To predict: p( y | x ) Relax independence assumption model posterior distribution directly y*  arg max y p( y | x) # Conditional Random Fields According to fundamental theorem of MRFs： 1 1 p( y | x )   exp(  E ( yc | x ))  exp( ( y | x )) Z cC Z If the graph structure G=(V, E) of label set forms a tree: ( y | x; )   ( i , j )E ,k p( y | x; )  f k , gk k f k ( yi , y j , x)   k gk ( yi , x) iV ,k 1 exp(  k f k ( yi , y j , x )   k gk ( yi , x )) Z ( x) ( i , j )E ,k iV ,k ：association and interactive potentials OR features   (1 , 2 , 3 ,...; 1 , 2 , 3 ,...) : model parameters # Hidden Conditional Random Fields The model：  在CRF中引入隐含变量hi  隐含变量hi 之间通过无向图产生关联  直接对条件概率进行估计 The posterior ： To predict: p( y, h | x ) p( y | x)   p( y, h | x) h y*  arg max y  p( y, h | x) h # Hidden Conditional Random Fields ( y, h, x; ) • Given potential function： • Conditional probabilistic model： p( y , h | x )  exp( ( y, h, x; ))  exp(( y ', h ', x; )) y ',h '  exp(( y, h ', x; )) p( y | x)   exp(( y ', h ', x; )) h' y ',h ' • If the graph forms a tree, then, the potential function will take the form: ( y, h, x; )   ( i , j )E ,k k f k (hi , hj , y, x)   k gk (hi , y, x) iV ,k #CRF&hCRF资源文章&代码跟踪整理 Hanna m. wallach http://www.inference.phy.cam.ac.uk/hmw26/crf/ 2015/4/9 17 # 提纲作者介绍文章摘要问题的提出相关背景篇章结构模型介绍运动特征部件模型实验结果结论 2015/4/9 18 # 篇章结构 1. Introduction 2. Our Model 2. 1 Motion feature 2. 2 Hidden conditional random field (hCRF) 3. Learning and Inference 3.1 Learning root filter 3.2 Patch initialization 3.3 Inference 4. Experiments 4.1 Weizmann datasets 4.2 KTH datasets 5. Conclusion 2015/4/9 19 # 提纲作者介绍文章摘要问题的提出相关背景篇章结构模型介绍运动特征部件模型实验结果结论 2015/4/9 20 # 模型介绍——Framework Input Input Videos Input Videos Input Videos Videos Each Frame Motion Feature x Class Label y Hidden Part Model Mapping motion feature of one label 2015/4/9 frame to class 21 # 提纲作者介绍文章摘要问题的提出相关背景篇章结构模型介绍运动特征部件模型实验结果结论 2015/4/9 22 #Motion Feature——光流(Optical Flow)简介 • 光流描述视频中的运动信息，可以看做是一种运动特征 • 前提假设 – 与三维空间物体点相对应的图像点及其附近的灰度值在运动中保持不变（亮度守恒），变化的是物体的位置 • 设I(x,y,t)表示视频序列的亮度变化函数，假设t时刻的成像点 (x,y)在t+dt时刻运动到(x+dx,y+dy) =0 忽略不计光流方程 v(vx, vy) 称为光流场 #Motion Feature——特征提取 • Optical Flow Feature – 视频预处理：对每一帧都截取子块，子块以感兴趣的运动物体为中心 – 采用Lucas-Kanade[2] 算法计算光流 – 将光流场F拆分成4个channel • 将F按方向分成：Fx ，Fy • 再将Fx, Fy按正负值分成：Fx+, Fx-, Fy+, Fy• 分别对Fx+, Fx-, Fy+, Fy-，进行Gaussian blur & normalization [1] Recognizing Action at a Distance, ICCV 2003 [2] An Iterative Image Registration Technique with an Application to Stereo Vision, IJCAI 1981 # 提纲作者介绍文章摘要问题的提出相关背景篇章结构模型介绍运动特征部件模型实验结果结论 2015/4/9 25 #Hidden Part Model(1) 图1. 模型示意图每个顶点表示一个变量，每个小方形表示模型中的一个参数（描述2个或3个变量的相关程度）。 2015/4/9 符号说明： I：频序列中的一帧，包含一些显著的patch分别记为{I1, I2, …, Im} x：图像I的运动特征，其中对应于I中patch区域的特征记为{x1, x2, …, xm} y: 该帧图像所在video的动作类别标签。y的集合记为Y h: 隐变量(部件类别标签)。h的集合记为H。对于每个patch Im 均有一个hm与之对应。因此对应于{I1, I2, …, Im},有{h1,h2,…,hm}。hi 与hj之间的连线是由以patch之间的相似度为权重的图结构生成的最小生成树。通过建立上述的图模型，便可完全套用hCRF的一整套理论，建立从x到y的映射 26 #Hidden Part Model(2) 根据隐条件随机场理论有： p ( y, h | x; )  exp( ( y, x, h ; ))   yY hH m exp( ( y, x, h ; )) ，  为模型参数其中 ( y, x, h; )  p( y | x; )   hH m 2015/4/9 为以  为参数的势函数 p( y, h | x; )   exp(( y, x, h; )) hH m   yY hH m exp( ( y, x, h; )) 27 #Hidden Part Model(3) 设 ( y, h, x) 是关于参数则有   { ,  ,  ,} 是线性的  ( y, h, x; )    T   ( x j , h j )    T   ( y, h j ) jV  jV  （j , k )E 2015/4/9  T  ( y, h j , hk )   T  ( y, x) 28 #Hidden Part Model(4)  T   ( x j , hj ) how l i kel y t he pat ch x j i s l abel ed as par t hj  T   ( x j , hj )   cT 1{h c}  [ f a ( x j ) f s ( x j )] cH j 运动信息 [Fbx ( x j ) Fbx ( x j ) Fby ( x j ) Fby ( x j )] 位置信息 1. 将图像均匀划分为L个bin s 2. f ( x j ) 表示一个L维的向量。当且仅当x j 落 s f ( x j ) 的第k维设在图像的第k个bin时，将置为1，其他为0 2015/4/9 f s ( x j ) 示意图 f s ( x j )  [0,0,0,1] 29 #Hidden Part Model(5)  T  ( y, hj ) how l i kel y an i mage wi t h l abel y cont ai ns a pat ch wi t h par t l abel hj  T   ( y, h j )   a,b 1{ y a} 1{h b} aY bH j  T  ( y, hj , hk ) how l i kel y an i mage wi t h l abel y cont ai ns a pai r of pat ches wi t h par t l abel h j and hk  T  ( y, h j , hk )     a,b,c 1{ y a} 1{h b} 1{h c} aY bH cH j k  T ( y, x) Root Model（创新点） t he compat i bi l i t y of cl ass l abel y and t he l ar ge- scal e gl obal f eat ur e of t he whol e i mage.  T  ( y, x)  aT 1{ y a}  g ( x) 其中，g( x)  [Fb ( x ) Fb ( x ) Fb ( x ) Fb ( x )] x j x j y j y j aY 2015/4/9 30 # Learning and Inference(1) 模型参数学习  *  arg max L( )  arg max  log p( yt | xt ; )   t 由于引入了隐含变量h，hCRF是非凸的，但仍然可以使用梯度下降法得到局部最优解利用 belief propagation能够在O(|Y||E||H|2)复杂度计算得到参数 2015/4/9  31 # Learning and Inference(2) 关于参数初值设置给定训练数据(xt, yt),首先解决如下优化问题 exp( T   ( yt , xt ))   arg max  log L( y | x ; )  arg max  log   t t  exp(T  ( yt , xt )) * t t y 将 作为梯度下降法中  的初值，其他参数(  ,  ,  )初值随机设置 * 2015/4/9 32 # Learning and Inference(3) 关于patch的选择 Training sets Testing sets 未知所属动作类别，选用哪个root model，哪个参数？解决方案对于motion feature x，分别采用不同的root model选择patch，得到|Y|组 (k ) (1) (2) (|Y |) patch {x , x ,..., x } .其中 x 表示采用root filter k 得到的一组patch；最终图像x的动作标签为： y*  arg max[max{ p( y | x (1) ; ), p( y | x (2) ; ),..., p( y | x (|Y |) ; )}] y 2015/4/9 33 # 提纲作者介绍文章摘要问题的提出相关背景篇章结构模型介绍运动特征部件模型实验结果结论 2015/4/9 34 # 实验——root filter结果 2015/4/9 35 # 实验——Weizmann Dataset Weizmann数据库包含9个人的9种动作的共83段视频序列。随机抽取5个人的9种动作视频作为训练集，剩下4人的视频作为测试集。下图显示分别采用root model，local hCRF & 本文方法的平均识别正确率。 |H|表示隐变量的个数 2015/4/9 36 # 实验——Weizmann Dataset |H|=10 |H|=10 2015/4/9 37 # 实验——KTH Datasets KTH数据库包含25个人在4种场景下的6种动作，共600段视频。随机选取一半视频作为训练一半测试。对每段视频随机抽取10 帧作为实验数据。下图显示分别采用root model，local hCRF & 本文方法的平均识别正确率。 2015/4/9 38 # 实验——KTH Datasets |H|=10 |H|=10 2015/4/9 39 # 结论本文提出了一种用于人体动作识别的部件模型，模型通过root filter对部件进行初始化，无需人为指定部件。模型同时运用了global 特征和local特征实验结果与当前state-of-art方法可比，同时证明了将两类特征（global & local feature）结合的性能优于单独使用一种特征。 2015/4/9 40 # 讨论对于动作视频中普通的一帧而言，判别其所属的动作类别是否有意义？是否应该先筛选动作的关键帧？模型仅仅通过motion feature利用了帧间信息，是否能够建立适当的帧间模型综合更多的具有帧间连续性的信息。 2015/4/9 41 Thank you! 2015/4/9 42

PPT

Related documents

Products

Support

PPT

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib