PLSA

PLSA建模思想分析张小洪 Contents isse.cqu.edu.cn 什么是建模 LSA思想方法 PLSA图像建模 PLSA建模的应用条件和假设 PLSA应用及发展建模是什么 isse.cqu.edu.cn 软件开发中的建模      业务建模需求模型设计模型实现模型数据库模型 词法分析→提取对象→刻画对象（属性或方法） →对象关系 模型反映了事物或对象之间的关系模型是什么：例子 isse.cqu.edu.cn 模型是什么：例子 isse.cqu.edu.cn 映射建筑汽车电话人像自行车书树木模型是什么：例子 isse.cqu.edu.cn 模型是什么：例子 isse.cqu.edu.cn 映射人手马龟象犬鳄模型是什么：例子 isse.cqu.edu.cn 映射建模是什么 isse.cqu.edu.cn x y 目标函数 x G S y LM 机器学习映射或函数 y 建模是什么 isse.cqu.edu.cn 数学建模  模型函数  泛函  求满足目标和条件的函数过程 基于经验数据的建模机器学习问题  学习问题是指依据经验数据选取所期望的依赖关系的问题  学习过程是一个从给定的函数集中选择一个适当函数的过程。 模式识别  函数值Y：指标集建模是什么：模式识别 isse.cqu.edu.cn 采集数据选取特征选择模型函数集训练分评价分类器类器选择函数的过程 LSA方法 isse.cqu.edu.cn 问题：如何分类文章 Technical Memo Titles c1: Human machine interface for ABC computer applications c2: A survey of user opinion of computer system response time c3: The EPS user interface management system c4: System and human system engineering testing of EPS c5: Relation of user perceived response time to error measurement m1: The generation of random, binary, ordered trees m2: The intersection graph of paths in trees m3: Graph minors IV: Widths of trees and well-quasi-ordering m4: Graph minors: A survey LSA方法 isse.cqu.edu.cn 如何表示文章：Vector Space Model 1 单词本 human interface computer user system response time EPS survey trees graph minors 问题？ c1 1 1 1 0 0 0 0 0 0 0 0 0 c2 0 0 1 1 1 1 1 0 1 0 0 0 r (human.user) r (human.minors) c3 0 1 0 1 1 0 0 1 0 0 0 0 c4 1 0 0 0 2 0 0 1 0 0 0 0 c5 0 0 0 1 0 1 1 0 0 0 0 0 = -.378 = -.378 m1 0 0 0 0 0 0 0 0 0 1 0 0 m2 0 0 0 0 0 0 0 0 0 1 1 0 m3 0 0 0 0 0 0 0 0 0 1 1 1 m4 0 0 0 0 0 0 0 0 1 0 1 1 2 统计词频 LSA方法：SVD isse.cqu.edu.cn Singular Value Decomposition A=USVT Dimension Reduction {~A}~={~U}{~S}{~V}T LSA方法：SVD isse.cqu.edu.cn {U} = 0.22 0.20 0.24 0.40 0.64 0.27 0.27 0.30 0.21 0.01 0.04 0.03 -0.11 -0.07 0.04 0.06 -0.17 0.11 0.11 -0.14 0.27 0.49 0.62 0.45 降至2维 0.29 0.14 -0.16 -0.34 0.36 -0.43 -0.43 0.33 -0.18 0.23 0.22 0.14 -0.41 -0.55 -0.59 0.10 0.33 0.07 0.07 0.19 -0.03 0.03 0.00 -0.01 -0.11 0.28 -0.11 0.33 -0.16 0.08 0.08 0.11 -0.54 0.59 -0.07 -0.30 -0.34 0.50 -0.25 0.38 -0.21 -0.17 -0.17 0.27 0.08 -0.39 0.11 0.28 0.52 -0.07 -0.30 0.00 -0.17 0.28 0.28 0.03 -0.47 -0.29 0.16 0.34 -0.06 -0.01 0.06 0.00 0.03 -0.02 -0.02 -0.02 -0.04 0.25 -0.68 0.68 -0.41 -0.11 0.49 0.01 0.27 -0.05 -0.05 -0.17 -0.58 -0.23 0.23 0.18 LSA方法：SVD isse.cqu.edu.cn {S} = 降至 2 维 3.34 2.54 2.35 1.64 1.50 1.31 0.85 0.56 0.36 LSA方法：SVD isse.cqu.edu.cn {V} = 降至 2 维 0.20 -0.06 0.11 -0.95 0.05 -0.08 0.18 -0.01 -0.06 0.61 0.17 -0.50 -0.03 -0.21 -0.26 -0.43 0.05 0.24 0.46 -0.13 0.21 0.04 0.38 0.72 -0.24 0.01 0.02 0.54 -0.23 0.57 0.27 -0.21 -0.37 0.26 -0.02 -0.08 0.28 0.11 -0.51 0.15 0.33 0.03 0.67 -0.06 -0.26 0.00 0.19 0.10 0.02 0.39 -0.30 -0.34 0.45 -0.62 0.01 0.44 0.19 0.02 0.35 -0.21 -0.15 -0.76 0.02 0.02 0.62 0.25 0.01 0.15 0.00 0.25 0.45 0.52 0.08 0.53 0.08 -0.03 -0.60 0.36 0.04 -0.07 -0.45 LSA方法：SVD 同义词问题 c1 c2 c3 c4 c5 m1 m2 m3 m4 human 0.16 0.40 0.38 0.47 0.18 -0.05 -0.12 -0.16 -0.09 interface 0.14 0.37 0.33 0.40 0.16 -0.03 -0.07 -0.10 -0.04 computer 0.15 0.51 0.36 0.41 0.24 0.02 0.06 0.09 0.12 user 0.26 0.84 0.61 0.70 0.39 0.03 0.08 0.12 0.19 system 0.45 1.23 1.05 1.27 0.56 -0.07 -0.15 -0.21 -0.05 response 0.16 0.58 0.38 0.42 0.28 0.06 0.13 0.19 0.22 time 0.16 0.58 0.38 0.42 0.28 0.06 0.13 0.19 0.22 EPS 0.22 0.55 0.51 0.63 0.24 -0.07 -0.14 -0.20 -0.11 survey 0.10 0.53 0.23 0.21 0.27 0.14 0.31 0.44 0.42 trees -0.06 0.23 -0.14 -0.27 0.14 0.24 0.55 0.77 0.66 graph -0.06 0.34 -0.15 -0.30 0.20 0.31 0.69 0.98 0.85 minors -0.04 0.25 -0.10 -0.21 0.15 0.22 0.50 0.71 0.62 r (human.user) = .94 r (human.minors) = -.83 LSA方法：SVD isse.cqu.edu.cn LSA Titles example: Correlations between titles in raw data c2 c3 c4 c5 m1 m2 m3 m4 c1 -0.19 0.00 0.00 -0.33 -0.17 -0.26 -0.33 -0.33 c2 c3 c4 c5 0.00 0.00 0.58 -0.30 -0.45 -0.58 -0.19 0.47 0.00 -0.21 -0.32 -0.41 -0.41 -0.31 -0.16 -0.24 -0.31 -0.31 -0.17 -0.26 -0.33 -0.33 m1 m2 m3 0.67 0.52 -0.17 0.77 0.26 0.56 1.00 1.00 1.00 1.00 1.00 1.00 Correlations in first-two dimension space c2 c3 c4 c5 m1 m2 m3 m4 0.91 1.00 1.00 0.85 -0.85 -0.85 -0.85 -0.81 0.91 0.88 0.99 -0.56 -0.56 -0.56 -0.50 1.00 0.85 -0.85 -0.85 -0.85 -0.81 0.81 -0.88 -0.88 -0.88 -0.84 -0.45 -0.44 -0.44 -0.37 LSA 方法：讨论 isse.cqu.edu.cn SVD方法为何能有效？其假设是什么？  LSA does not define a properly normalized probability distribution  No obvious interpretation of the directions in the latent space  From statistics, the utilization of L2 norm in LSA corresponds to a Gaussian Error assumption which is hard to justify in the context of count variables  Polysemy problem 怎样可视化SVD的结果？ PLSA：问题 isse.cqu.edu.cn 建筑汽车电话人像自行车书树木 PLSA：问题 isse.cqu.edu.cn 问题     图像怎样表示成特征向量？特征向量怎样构成“图像单词”？训练图像集怎样表示成共生矩阵（词频矩阵）？模型选择？ frequency PLSA：问题 ….. codewords PLSA：问题 Object Bag of ‘words’ learning 1.feature detection & representation recognition 2.codewords dictionary 3.image representation category models (and/or) classifiers category decision isse.cqu.edu.cn PLSA：Feature detection and representation isse.cqu.edu.cn PLSA：Feature detection and representation Compute SIFT descriptor Normalize patch [Lowe’99] Detect patches [Mikojaczyk and Schmid ’02] [Mata, Chum, Urban & Pajdla, ’02] [Sivic & Zisserman, ’03] Slide credit: Josef Sivic isse.cqu.edu.cn PLSA：Feature detection and representation … PLSA：Codewords dictionary formation isse.cqu.edu.cn … PLSA：Codewords dictionary formation isse.cqu.edu.cn … Vector quantization PLSA：Codewords dictionary formation frequency PLSA：Image representation ….. codewords Representation 2. 1. feature detection & representation image representation 3. codewords dictionary Learning and Recognition codewords dictionary category models (and/or) classifiers category decision PLSA Learning and Recognition 1. Generative method: - graphical models 2. Discriminative method: - SVM category models (and/or) classifiers generative models isse.cqu.edu.cn 1. Naïve Bayes classifier  Csurka Bray, Dance & Fan, 2004 2. Hierarchical Bayesian text models (pLSA and LDA)    Background: Hoffman 2001, Blei, Ng & Jordan, 2004 Object categorization: Sivic et al. 2005, Sudderth et al. 2005 Natural scene categorization: Fei-Fei et al. 2005 First, some notations isse.cqu.edu.cn wn: each patch in an image  wn = [0,0,…1,…,0,0]T w: a collection of all N patches in an image  w = [w1,w2,…,wN] dj: the jth image in an image collection c: category of the image z: theme or topic of the patch Case #1: the Naïve Bayes model c w N c  arg max c Object class decision N p(c | w)  p(c) p( w | c)  p(c) p( wn | c) n 1 Prior prob. of the object classes Image likelihood given the class Csurka et al. 2004 Case #2: Hierarchical Bayesian text models Probabilistic Latent Semantic Analysis (pLSA) d D z w N “face” Sivic et al. ICCV 2005 The pLSA model K p(wi | d j )   p( wi | zk ) p( zk | d j ) k 1 Observed codeword distributions Codeword distributions per theme (topic) Theme distributions per image Slide credit: Josef Sivic Recognition using pLSA isse.cqu.edu.cn  z  arg max p( z | d ) z Slide credit: Josef Sivic Learning the pLSA parameters isse.cqu.edu.cn Observed counts of word i in document j Maximize likelihood of data using EM M … number of codewords N … number of images Slide credit: Josef Sivic PLSA：讨论 isse.cqu.edu.cn 数据的特征，PLSA应用条件和假设？  Not a well-defined generative model of documents; d is a dummy index into the list of documents in the training set (as many values as documents)  No natural way to assign probability to a previously unseen document  Number of parameters to be estimated grows with size of training set PLSA的应用及发展 isse.cqu.edu.cn 图像分类文本分类人脸识形状分类视频行 PLSA 别人脸检为分析 …… 测日志分类 iiec.cqu.edu.cn

PLSA

Related documents

Products

Support

PLSA

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib