Latent Tree Analysis of Unlabeled Data Nevin L. Zhang Dept. of Computer Science & Engineering The Hong Kong Univ. of Sci. & Tech. http://www.cse.ust.hk/~lzhang Page 2 Outline Latent tree models Latent tree analysis algorithms What can LTA be used for: Discovery of co-occurrence/correlation patterns Discovery of latent variable/structures Multidimensional clustering Examples Danish beer survey data Text data TCM survey data Page 3 Latent Tree Models Tree-structured probabilistic graphical models Leaves observed (manifest variables) Discrete or continuous Internal nodes latent (latent variables) Discrete Each edge is associated with a conditional distribution One node with marginal distribution Defines a joint distributions over all the variables (Zhang, JMLR 2004) Latent Tree Analysis From data on observed variables, obtain latent tree model Learning latent tree models: Determine • • • • Number of latent variables Numbers of possible states for latent variables Connections among nodes Model Selection Criterion Probability distributions Find the model that maximize the BIC score BIC(m|D) = log P(D|m, θ*) – d/2 logN D: Data, N: sample size m: model, θ*: MLE of parameters d: number of free parameters Page 5 Algorithms: EAST Search-based Extension, Adjustment, Simplification until Termination Can deal with ~100 observed variables (Chen, Zhang et al. AIJ 2011) (Liu, Zhang et al. MLJ 2013) UniDimensioanlity Test (Liu, Zhang et al. MLJ 2013) (Liu, Zhang et al. MLJ 2013) Chow-Liu tree (1968) (Liu, Zhang et al. MLJ 2013) Close to EAST in terms of model quality. Can deal with 1,000 observed variables Page 10 Outline Latent tree models Latent tree analysis algorithms What can LTA be used for: Discovery of co-occurrence/correlation patterns Discovery of latent variable/structures Multidimensional clustering Examples Danish beer survey data Text data TCM survey data Page 11 Danish Beer Market Survey 463 consumers, 11 beer brands Questionnaire: For each brand: Never seen the brand before (s0); Seen before, but never tasted (s1); Tasted, but do not drink regularly (s2) Drink regularly (s3). (Mourad et al. JAIR 2013) Page 12 Why variables grouped as such? GronTuborg and Carlsberg: Main mass-market beers TuborgClas and CarlSpec: Frequent beers, bit darker than the above CeresTop, CeresRoyal, Pokal, …: minor local beers Grouped as such because responses on brands in each group strongly correlated. Intuitively, latent tree analysis: Partitions observed variables into groups such that Variables in each group are strongly correlated, and The correlations among each group can be properly be modeled using one single latent variable Page 13 Multidmensional Clustering Each Latent variable gives a partition of consumers. H1: Class 1: Likely to have tasted TuborgClas, Carlspec and Heineken , but do not drink regularly Class 2: Likely to have seen or tasted the beers, but did not drink regularly Class 3: Likely to drink TuborgClas and Carlspec regularly Intuitively, latent tree analysis is a technique for K-Means, mixture models give only one partition. multiple clustering. Page 14 Binary Text Data: WebKB (Liu et al. PGM 2012, MLJ 2013) 1041 web pages collected from 4 CS departments in 1997 336 words Page 15 Latent Tree Model for WebKB Data by BI Algorithm 89 latent variables Latent Tree Modes for WebKB Data Page 17 Page 18 Page 19 Why variables grouped as such? Group as such because words in in each group tend to co-occur. On binary data, latent tree analysis: Partitions observed word variables into groups such that Words in each group tend to co-occur and The correlations can be properly be explained using one single latent variable LTA is a method for identifying co-occurrence relationships. Multidimensional Clustering LTA is an approach to topic detection Y66=4: Object Oriented Programming (oop) Y66=2: Non-oop programming Y66=1: programming language Y66=3: Not on programming Page 21 Outline Latent tree models Latent tree analysis algorithms What can LTA be used for: Discovery of co-occurrence/correlation patterns Discovery of latent variable/structures Multidimensional clustering Examples Danish beer survey data Text data TCM survey data Page 22 Background of Research Common practice in China, increasingly in Western world Patients of a WM disease divided into several TCM classes Different classes are treated differently using TCM treatments. Example: WM disease: Depression TCM Classes: Liver-Qi Stagnation (肝气郁结). Treatment principle: 疏肝解郁, Prescription: 柴胡疏肝散 Deficiency of Liver Yin and Kidney Yin (肝肾阴虚):Treatment principle: 滋肾养肝, Prescription: 逍遥散合六味地黄丸 Vacuity of both heart and spleen (心脾两虚). Treatment principle: 益 气健脾, Prescription: 归脾汤 …. Page 23 Key Question How should patients of a WM disease be divided into subclasses from the TCM perspective? What TCM classes? What are the characteristics of each TCM class? How to differentiate different TCM classes? Important for Clinic practice Research Randomized controlled trials for efficacy Modern biomedical understanding of TCM concepts No consensus. Different doctors/researchers use different schemes. Key weakness of TCM. Page 24 Key Idea Our objective: Provide an evidence-based method for TCM patient classification Key Idea Cluster analysis of symptom data => empirical partition of patients Check to see whether it corresponds to TCM class concept Key technology: Multidimensional clustering Motivation for developing latent tree analysis Page 25 Symptoms Data of Depressive Patients (Zhao et al. JACM 2014) Subjects: 604 depressive patients aged between 19 and 69 from 9 hospitals Selected using the Chinese classification of mental disorder clinic guideline CCMD-3 Exclusion: Subjects we took anti-depression drugs within two weeks prior to the survey; women in the gestational and suckling periods, .. etc Symptom variables From the TCM literature on depression between 1994 and 2004. Searched with the phrase “抑郁 and 证” on the CNKI (China National Knowledge Infrastructure) data Kept only those on studies where patients were selected using the ICD-9, ICD-10, CCMD-2, or CCMD-3 guidelines. 143 symptoms reported in those studies altogether. Page 26 The Depression Data Data as a table 604 rows, each for a patient 143 columns, each for a symptom Table cells: 0 – symptom not present, 1 – symptom present Removed: Symptoms occurring <10 times 86 symptoms variables entered latent tree analysis. Structure of the latent tree model obtained on the next two slides. Page 27 Model Obtained for a Depression Data (Top) Page 28 Model obtained for a Depression Data (Bottom) Page 29 The Empirical Partitions The first cluster (Y29= s0) consists of 54% of the patients and while the cluster (Y29= s1) consists of 46% of the patients. The two symptoms ‘fear of cold’ and ‘cold limbs’ do not occur often in the first cluster While they both tend to occur with high probabilities (0.8 and 0.85) in the second cluster. Page 30 Probabilistic Symptom co-occurrence pattern Probabilistic symptom co-occurrence pattern: The table indicates that the two symptoms ‘fear of cold’ and ‘cold limbs’ tend to co-occur in the cluster Y29= s1 Pattern meaningful from the TCM perspective. TCM asserts that YANG DEFICIENCY (阳虚) can lead to, among other symptoms, ‘fear of cold’ and ‘cold limbs’ So, the co-occurrence pattern suggests the TCM symdrome type (证型) YANG DEFICIENCY (阳虚). The partition Y29 suggests that Among depressive patients, there is a subclass of patient with YANG DEFICIENCY. In this subclass, ‘fear of cold’ and ‘cold limbs’ co-occur with high probabilities (0.8 and 0.85) Page 31 Probabilistic Symptom co-occurrence pattern Y28= s1 captures the probabilistic co-occurrence of ‘aching lumbus’, ‘lumbar pain like pressure’ and ‘lumbar pain like warmth’. This pattern is present in 27% of the patients. It suggests that Among depressive patients, there is a subclass that correspond to the TCM concept of KIDNEY DEPRIVED OF NOURISHMENT (肾虚失养) Characteristics of the subclass given by distributions for Y28= s1 Page 32 Probabilistic Symptom co-occurrence pattern Y27= s1 captures the probabilistic co-occurrence of ‘weak lumbus and knees’ and ‘cumbersome limbs’. This pattern is present in 44% of the patients It suggests that, Among depressive patients, there is a subclass that correspond to the TCM concept of KIDNEY DEFICIENCY (肾虚) Characteristics of the subclass given by distributions for Y27= s1 Y27, Y28, Y29 together provide evidence for defining KIDNEY YANG DEFICIENCY Page 33 Probabilistic Symptom co-occurrence pattern Pattern Y21= s1: evidence for defining STAGNANT QI TURNING INTO FIRE (气郁化火) Y15= s1 : evidence for defining QI DEFICIENCY Y17 = s1 : evidence for defining HEART QI DEFICIENCY Y16= s1 : evidence for defining QI STAGNATION Y19= s1: evidence for defining QI STAGNATION IN HEAD Page 34 Probabilistic Symptom co-occurrence pattern Y9= s1 :evidence for defining DEFICIENCY OF BOTH QI AND YIN (气阴两虚) Y10= s1: evidence for defining YIN DEFICIENCY (阴虚) Y11= s1: evidence for defining DEFICIENCY OF STOMACH/SPLEEN YIN (脾 胃阴虚) Page 35 Symptom Mutual-Exclusion Patterns Some empirical partitions reveal symptom exclusion patterns Y1 reveals the mutual exclusion of ‘white tongue coating’, ‘yellow tongue coating’ and ‘yellow-white tongue coating’ Y2 reveals the mutual exclusion of ‘thin tongue coating’, ‘thick tongue coating’ and ‘little tongue coating’. Page 36 Summary of TCM Data Analysis By analyzing 604 cases of depressive patient data using latent tree models we have discovered a host of probabilistic symptom co-occurrence patterns and symptom mutual-exclusion patterns. Most of the co-occurrence patterns have clear TCM syndrome connotations, while the mutual-exclusion patterns are also reasonable and meaningful. The patterns can be used as evidence for the task of defining TCM classes in the context of depressive patients and for differentiating between those classes. Page 37 Another Perspective: Statistical Validation of TCM Postulates (Zhang et al. JACM 2008) ….. Kidney deprived of nourishment Yang Deficiency ….. Y28 = s1 Y29 = s1 TCM terms such as Yang Deficiency were introduced to explain symptom cooccurrence patterns observed in clinic practice. Page 38 Value of Work in View of Others D. Haughton and J. Haughton. Living Standards Analytics: Development through the Lens of Household Survey Data. Springer. 2012 Zhang et al. provide a very interesting application of latent class (tree) models to diagnoses in traditional Chinese medicine (TCM). The results tend to confirm known theories in Chinese traditional medicine. This is a significant advance, since the scientific bases for these theories are not known. The model proposed by the authors provides at least a statistical justification for them. Page 39 Summary Latent tree models: Tree-structure probabilistic graphical models Leaf nodes: observed variables Internal nodes: latent variable What can LTA be used for: Discovery of co-occurrence patterns in binary data Discovery of correlation patterns in general discrete data Discovery of latent variable/structures Multidimensional clustering Topic detection in text data Key role in TCM patient classification Page 40 References: N. L. Zhang (2004). Hierarchical latent class models for cluster analysis. Journal of Machine Learning Research, 5(6):697-723, 2004. T. Chen, N. L. Zhang, T. F. Liu, Y. Wang, L. K. M. Poon (2011). Model-based multidimensional clustering of categorical data. Artificial Intelligence, 176(1), 2246-2269. T.F.Liu, N. L. Zhang, A.H. Liu, L.K.M. Poon (2012). A Novel LTM-based Method for Multidimensional Clustering. European Workshop on Probabilistic Graphical Models (PGM-12), 203-210. T.F, Liu, N. L. Zhang, P. X. Chen, A. H.Liu, L. K. M. Poon, and Yi Wang (2013). Greedy learning of latent tree models for multidimensional clustering. Machine Learning, doi:10.1007/s10994-013-5393-0. R. Mourad, C. Sinoquet, N. L. Zhang, T.F. Liu and P. Leray (2013). A survey on latent tree models and applications. Journal of Artificial Intelligence Research, 47, 157-203 , 13 May 2013. doi:10.1613/jair.3879. N. L. Zhang, S. H. Yuan, T. Chen and Y. Wang (2008). Statistical Validation of TCM Theories. Journal of Alternative and Complementary Medicine, 14(5):583-7. N. L. Zhang, S. H. Yuan, T. Chen and Y. Wang (2008). Latent tree models and diagnosis in traditional Chinese medicine. Artificial Intelligence in Medicine. 42: 229-245. Z.X. Xu, N. L. Zhang, Y.Q. Wang, G.P. Liu, J. Xu, T. F. Liu, and A. H. Liu (2013). Statistical Validation of Traditional Chinese Medicine Syndrome Postulates in the Context of Patients with Cardiovascular Disease. The Journal of Alternative and Complementary Medicine. Y. Zhao, N. L. Zhang, T.F.Wang, Q. G. Wang (2014). Discovering Symptom Co-Occurrence Patterns from 604 Cases of Depressive Patient Data using Latent Tree Models. The Journal of Alternative and Complementary Medicine. Thank You!