Part IV - Department of Computer Science and Engineering

AAAI 2014 Tutorial
Latent Tree Models
Part IV: Applications
Nevin L. Zhang
Dept. of Computer Science & Engineering
The Hong Kong Univ. of Sci. & Tech.
Applications of Latent Tree Analysis (LTA)
What can LTA be used for:
Discovery of co-occurrence patterns in binary data
Discovery of correlation patterns in general discrete data
Discovery of latent variable/structures
Multidimensional clustering
Topic detection in text data
Probabilistic modelling
Analysis of survey data
Analysis of text data
Market survey data, social survey, medical survey data
Topic detection
Approximate probabilistic inference
Approximate Inference in Bayesian Networks
Analysis of social survey data
Topic detection in text data
Analysis of medical symptom survey data
LTMs for Probabilistic Modelling
Attractive Representation of Joint Distributions
Computationally very simple to work with.
Represent complex relationships among observed variables.
What does the structure look like without the latent variables?
Approximate Inference in Bayesian Networks
In a Bayesian network over observed variables,
exact inference can be computationally prohibitive.
Two-phase approximate inference:
 Offline
(Wang et al. AAAI 2008)
Sample data set from the original network
Learn a latent tree model (secondary representation)
Make inference using the latent tree model. (Fast)
Learn LTM
Empirical Evaluations
Original networks
LTM (1k), LTM (10k), LTM (100k): with different sample size for Phase 1.
CL (100k): Phase 1 learns Chow-Liu tree
LCM (100k): Phase 1 learns latent class model
Loopy Belief Propagation (LBP)
500 random queries
Quality of approximation measured using KL from exact answer.
Empirical Results
C: cardinality of latent
When C is large enough,
LTM achieves good
approximation in all cases.
Better than LBP on g, d,h
Better than CL on d, h.
Key Advantage: Online
phase is 2 to 3 orders of
magnitude faster than
exact inference
Approximate Inference in Bayesian networks
Analysis of social survey data
Topic detection
Analysis of medical symptom survey data
Social Survey Data
// Survey on corruption in Hong Kong and performance of the anti-corruption
agency -- ICAC
//31 questions, 1200 samples
s0 s1 s2 s3
// very common, quite common, uncommon, very uncommon
s0 s1 s2 s3
s0 s1 s2 s3
s0 s1 s2 s3
s0 s1 s2 s3
s0 s1 s2
// yes, no, depends
s0 s1
// yes, no
//totally intolerable, intolerable, tolerable, totally tolerable
s0 s1 s2 s3 s4
// very sufficient, sufficient, average, ...
s0 s1 s2 s3 s4
//very e, e, a, in-e, very in-e
s0 s1 s2 s3 s4
// very sufficient, sufficient, average, ...
-1 -1 -1 0 0 -1 -1 -1 -1 -1 -1 0 -1 -1 -1 0 1 1 -1 -1 2 0 2 2 1 3 1 1 4 1 0 1.0
-1 -1 -1 0 0 -1 -1 1 1 -1 -1 0 0 -1 1 -1 1 3 2 2 0 0 0 2 1 2 0 0 2 1 0 1.0
-1 -1 -1 0 0 -1 -1 2 1 2 0 0 0 2 -1 -1 1 1 1 0 2 0 1 2 -1 2 0 1 2 1 0 1.0
Latent Structure Discovery
Y2: Demographic info;
Y3: Tolerance toward corruption;
Y4: ICAC performance;
Y5: Change in level of corruption;
Y6: Level of corruption;
Y7: ICAC accountability
Multidimensional Clustering
Y2=s0: Low income youngsters;
Y2=s1: Women with no/low income;
Y2=s2: people with good education and good income;
Y2=s3: people with poor education and average income.
Multidimensional Clustering
Y3=s0: people who find corruption totally intolerable; 57%
Y3=s1: people who find corruption intolerable; 27%
Y3=s2: people who find corruption tolerable; 15%
Interesting finding:
Y3=s2: 29+19=48% find C-Gov totally intolerable or intolerable; 5% for C-Bus
Y3=s1: 54% find C-Gov totally intolerable; 2% for C-Bus
Y3=s0: Same attitude toward C-Gov and C-Bus
People who are tough on corruption are equally tough toward C-Gov and C-Bus.
People who are lenient about corruption are more lenient C-Bus than C-GOv
Multidimensional Clustering
 Who are the toughest toward corruption among the 4 groups?
Y2=s2: ( good education and good income) the least tolerant. 4% tolerable
Y2=s3: (poor education and average income) the most tolerant. 32% tolerable
The other two classes are in between.
 Summary: Latent tree analysis of social survey data can reveal
• Interesting latent structures
• Interesting clusters
• Interesting relationships among the clusters.
Approximate Inference
Analysis of social survey data
Topic detection (Analysis of text data)
Analysis of medical symptom survey data
Latent Tree Models for Topic Detection
Aggregation of miniature topics
Topic extraction and characterization
Empirical results
What is a topic in LTA?
LTM for
toy text data
Topic: State of latent variable, soft collection of documents
Characterized by: Conditional probability of word given latent state, or, document
frequency of word in collection:
# docs containing the word / total # of docs in the topic
Probabilities all words for a topic (in a column) do not sum to 1.
Y1=2: oop; Y1=1: Programming; Y1=0: background
Background topics for other latent variables not shown.
How are topics and documents are related?
Topic: A collection of documents
A document is a member of a topic
Can belong to multiple topics with different probabilities
Probabilities for each document (in each row) do not sum to 1.
D97, D115, D205, D528 are documents from the toy text data
Table shows:
D97 is a web page on OOP from U of Wisconsin Madison
D528 is a web page on AI from U of Texas Austin
LTA Differs from Latent Dirichlet Allocation (LDA)
LDA Topic: Distribution over vocabulary
Frequencies a writer would use each word when writing about the topic
Probabilities for a topic (in a column) sum to 1
In LDA a document is a mixture of topics (LTA: Topic is a collection of
Probabilities in each row sum to 1
Latent Tree Models for Topic Detection
Aggregation of miniature topics
Topic extraction and characterization
Empirical results
Latent Tree Model for a Subset of Newsgroup Data
Latent variable give miniature topics.
Intuitively, more interesting topics can be detected if we combine
Z11, Z12, Z13
Z14, Z15, Z16
Z17, Z18, Z19
BI algorithm produces flat models: Each latent variable directly
connected to at least one observed variables.
Hierarchical Latent Tree Analysis (HLTA)
Convert the latent variables into observed one via hard assignment.
Afterwards, Z11-Z19 become observed.
Run BI on Z11-Z19
AAAI 2014 Tutorial Nevin L. Zhang HKUST
Hierarchical Latent Tree Analysis (HLTA)
Stack model for Z11-Z19 on top of model for the words
Repeat until no more than 2 latent variables or predetermined level
The result is called a hierarchical latent tree model (HLTM)
Hierarchical Latent Tree Analysis (HLTA)
Part II: Cannot determine edge orientations based solely on data.
Here hierarchical structure introduced to improve model
Data + interpretability  hierarchical structure.
It does not necessarily improve model fit.
Latent Tree Models for Topic Detection
Aggregation of miniature topics
Topic extraction and characterization
Empirical results
Semantic Base
Interpreting states of Z21
Z11, Z12, and Z13 introduced because of co-occurrence of
“computer”, “Science”;
“card”, “display”, …., “video”; and
“dos” , “windows”
Z21 introduced because of correlations among Z11, Z12, Z13
So, interpretation of the states of Z21 is to be based on the words in
the sub-tree rooted at Z21. They form the semantic base of Z21.
Effective Semantic Base
Semantic base might be too large to handle.
Effective base: Subset of semantic base that matters.
Sort variables Xi from semantic base in descending of I(Z; Xi).
I(Z; X1, …, Xi): Mutual information between Z and first i-th variables
Chen et al. AIJ 2012
Estimated via sampling, increases with i.
I(Z; X1, …, Xm): Mutual information between Z and all m variables in
Information coverage of the first i-th variable
semantic base
I(Z; X1, …, Xi)/ I(Z; X1, …, Xm):
Effective semantic base:
Set of leading variables with information coverage higher than a certain level,
i.e., 95%.
Upper: Information coverage
Lower: Mutual Information
Effective semantic bases are typically smaller than Semantic bases.
Z22: Semantic base --10 variables, Effective semantic base – 8 variable
Differences are much larger in models with hundreds of variables.
Words are the front are more informative in distinguishing between
the states of the latent variable.
Topic Characterizations
HLTA characterizes Latent state (topics) using
probabilities of words from effective semantic base
Topic Z22=s1 characterized using words
NOT sorted according to probability, but mutual information
Occur with high probabilities in documents on to the topic,
Occur with low probability in documents NOT on the topic.
Topic characterized using words that occur with highest
probability in the topic.
Not necessarily the best words to distinguish the topic from
other topics.
Latent Tree Models for Topic Detection
Aggregation of miniature topics
Topic extraction and characterization
Empirical results
Empirical Results
Show the results of HLTA on real-world data
Compare HLTA with HLDA and LDA
1,740 papers published at NIPS between 1988 – 1999.
HLTA produced a model with 382 latent variables, arranged on 5
Level 1 – 279; Level 2 – 72; Level 3 - 21; Level 4 - 8; Level 5 - 2
Example topics on next few slides
1,000 words selected using average TF-IDF.
Topic characterizations, topic sizes,
Topic groups, topic group labels.
For details:
HLTA Topics: Level-3
likelihood bayesian statistical gaussian
0.34 likelihood bayesian statistical conditional
0.16 gaussian covariance variance matrix
0.21 eigenvalues matrix gaussian covariance
0.20 markov speech speaker hmms hmm
0.14 speech hmm speaker hmms markov
0.13 reinforcement sutton barto policy actions
0.10 reinforcement sutton barto actions policy
trained classification classifier
regression classifiers
0.25 validation regression svm machines
0.07 svm machines vapnik regression
0.38 trained test table train testing
0.30 classification classifier classifiers class cl
0.27 cells cortex cortical activity visual
0.33 neurons neuron synaptic synapses
images image pixel pixels object
0.18 membrane potentials spike spikes firing
0.15 firing spike membrane spikes potentials
0.18 circuit voltage circuits vlsi chip
0.26 dynamics dynamical attractor stable
hidden propagation layer backpropagation
0.40 hidden backpropagation multilayer
architecture architectures
0.40 propagation layer units back net
cells neurons cortex firing visual
0.17 visual cells cortical cortex activity
0.25 images image pixel pixels texture
0.16 receptive orientation objects object
0.21 object objects perception receptive
reinforcement markov speech hmm
HLTA Topics: Level-2
markov speech hmm speaker hmms
reinforcement sutton barto actions policy
0.14 markov stochastic hmms sequence hmm
0.12 transition states reinforcement reward
0.10 hmm hmms sequence markov stochastic
0.10 reinforcement policy reward states
0.15 speech language word speaker acoustic
0.14 trajectory trajectories path adaptive
0.06 speech speaker acoustic word language
0.12 actions action control controller agent
0.16 delay cycle oscillator frame sound
0.09 sutton barto td critic moore
0.10 frame sound delay oscillator cycle
0.14 strings string length symbol
HLTA Topics: Level-2
likelihood bayesian statistical conditional
0.34 likelihood statistical conditional density
0.35 entropy variables divergence mutual
0.19 probabilistic bayesian prior posterior
0.11 bayesian posterior prior bayes
0.15 mixture mixtures experts latent
0.14 mixture mixtures experts hierarchical
0.34 estimate estimation estimating estimated
0.21 estimate estimation estimates estimated
regression validation vapnik svm
0.24 regression svm vapnik margin kernel
0.05 svm vapnik margin kernel regression
0.19 validation cross stopping pruning
0.07 machines boosting machine boltzmann
classification classifier classifiers class
gaussian covariance matrix variance
0.28 classification classifier classifiers class
0.09 matrix pca gaussian covariance variance
0.23 gaussian covariance variance matrix pca
0.09 pca gaussian matrix covariance variance
0.18 eigenvalues eigenvalue eigenvectors ij
0.15 blind mixing ica coefficients inverse
0.13 handwritten digit character digits
0.24 discriminant label labels discrimination
trained test table train testing
0.38 trained test table train testing
0.44 experiments correct improved
improvement correctly
HLTA Topics: Level-1
likelihood statistical conditional density log
mixture mixtures experts hierarchical latent
0.30 likelihood conditional log em maximum
0.19 mixture mixtures
0.42 statistical statistics
0.34 multiple individual missing hierarchical
0.19 density densities
0.15 hierarchical sparse missing multiple
0.07 experts expert
0.32 weighted sum
entropy variables variable divergence
0.16 entropy divergence mutual
0.31 variables variable
estimate estimation estimated estimates
0.38 estimate estimation estimated estimating
bayesian posterior probabilistic prior bayes
0.19 bayesian prior bayes posterior priors
0.09 bayesian posterior prior priors bayes
0.29 probabilistic distributions probabilities
0.16 inference gibbs sampling generative
0.19 estimate estimates estimation estimated
0.29 estimator true unknown
0.33 sample samples
0.40 assumption assume assumptions assumed
0.27 observations observation observed
0.19 mackay independent averaging ensemble
0.08 belief graphical variational
0.09 monte carlo
0.09 uk ac
… for aggregate miniature topics:
Many Level 1 topics correspond to
trivial word co-occurrences , not
HLTA Topics: Level-4 & 5
Level 4
visual cortex cells neurons firing
0.34 cells cortex firing neurons visual
0.28 cells neurons cortex firing visual
0.41 approximation gradient optimization
0.29 algorithms optimal approximation
0.39 likelihood bayesian statistical gaussian
images image trained hidden pixel
0.22 regression classification classifier
0.29 trained classification classifier classifiers
0.02 classification classifier regression
0.28 learn learned structure feature features
0.23 feature features structure learn learned
0.24 images image pixel pixels object
0.13 reinforcement transition markov speech
0.14 speech hmm markov transition
0.40 hidden propagation layer
backpropagation units
Level 5
visual cortex cells neurons firing
0.37 visual cortex firing neurons cells
0.39 visual cells firing cortex neurons
0.25 images image pixel hidden trained
0.09 hidden trained images image pixel
0.20 trained hidden images image pixel
0.15 image images pixel trained hidden
Summary of HLTA Results on NIPS Data
Level 1: 279 latent variables
Level 2: 72 latent variables
Meaningful topics, very general
Level 5: 2 latent variables
Meaningful topics, and meaningful topic groups
More general than Level 2 topics
Level 4: 8 latent variables
Meaningful topics, and meaningful topic groups
Level 3 : 21 latent variables
Many capture trivial word co-occurrence patterns
Too few
In application, one can choose to output the topics at a certain level
according the desired number of topics.
For NIPS data, either level-2 topics or level-3 topics.
HLDA Topics
units hidden layer unit weight
 gaussian log density likelihood estimate
margin kernel support xi bound
generalization student weight teacher optimal
gaussian bayesian kernel evidence posterior
chip analog circuit neuron voltage
classifier rbf class classifiers classification
speech recognition hmm context word
ica independent separation source sources
image images matching level object
tree trees node nodes boosting
variables variable bayesian conditional
family face strategy differential functional
weighting source grammar sequences
polynomial regression derivative
em machine annealing max min
regression prediction selection criterion query
validation obs generalization cross pruning
mlp risk classifier classification confidence
loss song transfer bounds wt
principal curve eq curves rules
control optimal algorithms approximation step
policy action reinforcement states actions
experts mixture em expert gaussian
convergence gradient batch descent means
control controller nonlinear series forward
distance tangent vectors euclidean distances
robot reinforcement position control path
bias variance regression learner exploration
blocks block length basic experiment
td evaluation features temporal expert
path reward light stimuli paths
Long hmms recurrent matrix term
channel call cell channels rl
image images recognition pixel feature
video motion visual speech recognition
face images faces recognition facial
ocular dominance orientation cortical cortex
character characters pca coding field
resolution false true detection context
LDA Topics
inputs outputs trained produce actual
dynamics dynamical stable attractor
synaptic synapses inhibitory excitatory
correlation power correlations cross
states stochastic transition dynamic
basis rbf radial gaussian centers
solution constraints solutions constraint
type elements group groups element
edge light intensity edges contour
recurrent language string symbol strings
propagation back rumelhart bp hinton
ii region regions iii chain
graph matching annealing match
context mlp letter nn letters
fig eq proposed fast proc
variables variable belief conditional i
pp vol ca eds ieee
units unit hidden connections connected
hmm markov probabilities hidden hybrid
object objects recognition view shape
robot environment goal grid world
entropy natural statistical log statistics
experts expert gating architecture jordan
trajectory arm inverse trajectories hand
sequence step sequences length s
gaussian density covariance densities
positive negative instance instances np
target detection targets FALSE normal
activity active module modules brain
mixture likelihood em log maximum
channel stage channels call routing
term long scale factor range
Comparisons between HLTA and HLDA
HLTA Topics
HLDA Topics
likelihood bayesian statistical conditional
gaussian log density likelihood estimate
margin kernel support xi bound
generalization student weight teacher optimal
gaussian bayesian kernel evidence posterior
chip analog circuit neuron voltage
classifier rbf class classifiers classification
speech recognition hmm context word
control optimal algorithms approximation step
policy action reinforcement states actions
experts mixture em expert gaussian
convergence gradient batch descent means
control controller nonlinear series forward
distance tangent vectors euclidean distances
robot reinforcement position control path
bias variance regression learner exploration
blocks block length basic experiment
0.34 likelihood statistical conditional density
0.35 entropy variables divergence mutual
0.19 probabilistic bayesian prior posterior
0.11 bayesian posterior prior bayes
0.15 mixture mixtures experts latent
0.14 mixture mixtures experts hierarchical
reinforcement sutton barto actions policy
0.12 transition states reinforcement reward
0.10 reinforcement policy reward states
0.14 trajectory trajectories path adaptive
0.12 actions action control controller agent
0.09 sutton barto td critic moore
HLTA topics have sizes, HLDA/LDA topics do not
HLTA produces better hierarchy
HLTA gives better topic characterizations
Measure of Topic Quality
Suppose a topic t is described using M words
The topic coherence score for t is:
The words for a topic would tend to co-occur.
Given a list of words, the more often the words co-occur, than the better the list is
as a definition of a topic.
Score decreases with M.
Topics be compared should be described using the same number of words
D. Mimno, H. M. Wallach, E. Talley, M. Leenders, and A. McCallum. Optimizing semantic
coherence in topic models. In Proceedings of the Conference on Empirical Methods in
HLTA Found More Coherent Topics than LDA and HLDA
HLTA (L3-L4): All non-background topics from Levels 3 and 4: 47
HLTA (L2-L3-L4): All non-background topics from Levels 2, 3 and 4: 140
LDA was instructed to find two sets of topics with 47 and140 topics
HLDA found more 179.
HLDA-s: A subset of the HLDA topics were sampled for fair comparison.
Comparisons in Terms of Model Fit
Regard LDA, HLDA and HLTA as methods for text modeling
Build a probabilistic model for the corpus
Per-document held-out loglikelihood (-log(perplexity)).
Measure performance of model on predicting unseen data
1,740 papers from NIPS, 1,000 words,
536 abstracts from J of ACM, 1,809 words.
NEWSGROUP: 20,000 newsgroup posts, 1,000 words.
HLTA results robust w.r.t UD-test threshold
 The values 1, 3, 5 are from literature on Bayes factor (see Part III)
LDA produced by far worst models in all cases.
HLTA out-performed HLDA on NIPS, tied on JACP, and beaten on
Caution: Better model does not implies better topics
Running time on NIPS:
 LDA – 3.6 hours, HLTA – 17 hours, HLDA – 68 hours.
Topic: collection of documents
Topic: Distribution over vocabulary
Have sizes
Don’t have sizes
Characterization: Words occur with
high probability in topic, low
probability in other documents
Characterization: Words occur with
high probability in topic
Document: A member of topic, can
belong to multiple topics with
probability 1.
Document: A mixture of topics
HLTA produces better hierarchy than HLDA
HLTA produce more coherent topics than LDA and HLDA
Approximate Inference in Bayesian networks
Analysis of social survey data
Topic detection
Analysis of medical symptom survey data
Background of Research
Common practice in China, increasingly in Western world
 Patients of a WM disease divided into several TCM classes
 Different classes are treated differently using TCM treatments.
 WM disease: Depression
 TCM Classes:
Liver-Qi Stagnation (肝气郁结). Treatment principle: 疏肝解郁, Prescription:
Deficiency of Liver Yin and Kidney Yin (肝肾阴虚):Treatment principle: 滋肾养
肝, Prescription: 逍遥散合六味地黄丸
Vacuity of both heart and spleen (心脾两虚). Treatment principle: 益气健脾,
Prescription: 归脾汤
Key Question
How should patients of a WM disease be divided into
subclasses from the TCM perspective?
 What TCM classes?
 What are the characteristics of each TCM class?
 How to differentiate different TCM classes?
Important for
 Clinic practice
 Research
Randomized controlled trials for efficacy
Modern biomedical understanding of TCM concepts
No consensus. Different doctors/researchers use different
schemes. Key weakness of TCM.
Key Idea
Our objective:
 Provide an evidence-based method for TCM patient classification
Key Idea
 Cluster analysis of symptom data => empirical partition of patients
 Check to see whether it corresponds to TCM class concept
Key technology: Multidimensional clustering
 Motivation for developing latent tree analysis
Symptoms Data of Depressive Patients
604 depressive patients aged between 19 and 69 from 9 hospitals
Selected using the Chinese classification of mental disorder clinic
guideline CCMD-3
(Zhao et al. JACM 2014)
Subjects we took anti-depression drugs within two weeks prior to the survey;
women in the gestational and suckling periods, .. etc
Symptom variables
From the TCM literature on depression between 1994 and 2004.
Searched with the phrase “抑郁 and 证” on the CNKI (China National
Knowledge Infrastructure) data
Kept only those on studies where patients were selected using the ICD-9,
ICD-10, CCMD-2, or CCMD-3 guidelines.
143 symptoms reported in those studies altogether.
The Depression Data
Data as a table
 604 rows, each for a patient
 143 columns, each for a symptom
 Table cells: 0 – symptom not present, 1 – symptom present
Removed: Symptoms occurring <10 times
86 symptoms variables entered latent tree analysis.
Structure of the latent tree model obtained on the next two slides.
Model Obtained for a Depression Data (Top)
Model obtained for a Depression Data (Bottom)
The Empirical Partitions
The first cluster (Y29= s0) consists of 54% of the patients and while the cluster
(Y29= s1) consists of 46% of the patients.
The two symptoms ‘fear of cold’ and ‘cold limbs’ do not occur often in the first
While they both tend to occur with high probabilities (0.8 and 0.85) in the
second cluster.
Probabilistic Symptom co-occurrence pattern
Probabilistic symptom co-occurrence pattern:
The table indicates that the two symptoms ‘fear of cold’ and ‘cold limbs’ tend
to co-occur in the cluster Y29= s1
Pattern meaningful from the TCM perspective.
TCM asserts that YANG DEFICIENCY (阳虚) can lead to, among other
symptoms, ‘fear of cold’ and ‘cold limbs’
So, the co-occurrence pattern suggests the TCM symdrome type (证型)
The partition Y29 suggests that
Among depressive patients, there is a subclass of
patient with YANG DEFICIENCY.
In this subclass, ‘fear of cold’ and ‘cold limbs’
co-occur with high probabilities (0.8 and 0.85)
Probabilistic Symptom co-occurrence pattern
Y28= s1 captures the probabilistic co-occurrence of ‘aching lumbus’, ‘lumbar pain
like pressure’ and ‘lumbar pain like warmth’.
This pattern is present in 27% of the patients.
It suggests that
Among depressive patients, there is a subclass that correspond to the TCM
Characteristics of the subclass given by distributions for Y28= s1
Probabilistic Symptom co-occurrence pattern
Y27= s1 captures the probabilistic co-occurrence of ‘weak lumbus and knees’ and ‘cumbersome
This pattern is present in 44% of the patients
It suggests that,
Among depressive patients, there is a subclass that correspond to the TCM concept of
Characteristics of the subclass given by distributions for Y 27= s1
Y27, Y28, Y29 together provide evidence for defining KIDNEY YANG DEFICIENCY
Pattern Y21= s1: evidence for defining STAGNANT QI TURNING INTO FIRE
Y15= s1 : evidence for defining QI DEFICIENCY
Y17 = s1 : evidence for defining HEART QI DEFICIENCY
Y16= s1 : evidence for defining QI STAGNATION
Y19= s1: evidence for defining QI STAGNATION IN HEAD
AAAI 2014 Tutorial Nevin L. Zhang HKUST
Probabilistic Symptom co-occurrence pattern
Y9= s1 :evidence for defining DEFICIENCY OF BOTH QI AND YIN (气阴两虚)
Y10= s1: evidence for defining YIN DEFICIENCY (阴虚)
Y11= s1: evidence for defining DEFICIENCY OF STOMACH/SPLEEN YIN (脾胃
Symptom Mutual-Exclusion Patterns
Some empirical partitions reveal
symptom exclusion patterns
Y1 reveals the mutual exclusion of
‘white tongue coating’, ‘yellow tongue
coating’ and ‘yellow-white tongue
Y2 reveals the mutual exclusion of ‘thin
tongue coating’, ‘thick tongue coating’
and ‘little tongue coating’.
Summary of TCM Data Analysis
By analyzing 604 cases of depressive patient data using latent tree models
we have discovered a host of probabilistic symptom co-occurrence patterns
and symptom mutual-exclusion patterns.
Most of the co-occurrence patterns have clear TCM syndrome connotations,
while the mutual-exclusion patterns are also reasonable and meaningful.
The patterns can be used as evidence for the task of defining TCM classes
in the context of depressive patients and for differentiating between those
(Zhang et al. JACM 2008)
Another Perspective: Statistical Validation of TCM Postulates
Y28 = s1
Kidney deprived of
Y29 = s1
Yang Deficiency
TCM terms such as Yang Deficiency were introduced to explain symptom cooccurrence patterns observed in clinic practice.
Value of Work in View of Others
D. Haughton and J. Haughton. Living Standards Analytics:
Development through the Lens of Household Survey Data.
Springer. 2012
Zhang et al. provide a very interesting application of latent class
(tree) models to diagnoses in traditional Chinese medicine
The results tend to confirm known theories in Chinese traditional
This is a significant advance, since the scientific bases for these
theories are not known.
The model proposed by the authors provides at least a statistical
justification for them.
Approximate Inference in Bayesian networks
Analysis of social survey data
Topic detection
Analysis of medical symptom survey data
Implementation of LTM learning algorithms: EAST, BI
Tool for manipulate LTMs: Lantern
LTM for topic detection: HLTA
Implementation of other LTM learning algorithms
 NJ, RG, CLRG and regCLRG:
 − NJ (fast implementation):
AAAI 2014 Tutorial Nevin L. Zhang HKUST