Uploaded by Stats Work

Panel Data Analysis A Survey On Model-Based Clustering Of Time Series - Statswork

Panel Data Analysis:
A Survey on Model-Based
Clustering of Time Series
An Academic presentation by
Dr. Nancy Agens, Head, Technical Operations, Statswork
Group www.statswork.com
Email: [email protected]
Outline of Topics
In Brief
Dirichlet Prior
Longitudinal Data
MCMC Simulation
Model Based Clustering
Example on Model Based Clustering
In Brief
Clustering technique in Statistical Analysis is used to determine the
subsets as clusters in the data using specified distance measure.
We will discuss about some of the methods used for modeling
longitudinal or panel data using Clustering Analysis technique
Longitudinal Data
Longitudinal data is actually a sample of observations which are measured repeatedly
over time.
And, nowadays, longitudinal/repeated measure data or panel data exists in all areas of
Applied statistics such as finance, psychology, economics and social sciences.
Most studies deals with analyzing homogeneity in such Time series data.
The most common method of capturing the heterogeneity is to assume the presence of
latent classes and each class are stratified using the covariates.
Model Based
Measuring the distance between time series data is not
appropriate thus a cluster based modeling strategy for
finite mixture models is adopted using Bayesian rule.
Model based clustering considers each time series to a
single unit contained in an unknown latent class.
One can see an excellent review of finite mixture
models for longitudinal data in Vermunt (2010)
especially in the areas of psychology, bio-statistics and
other applied areas.
Example on Model Based Clustering
The data consists of 237 teenagers who use marijuana for the year 1976-1980.
The use marijuana is categorized into three types as never, not more than once a month and more
than once a month.
The following figure represents the sample of 10 observed response of use of marijuana usage
among the 237 teenagers.
The model considered for analyzing the marijuana usage is based on Generalized transition model.
Figure: Model
Based clustering
Dirichlet Prior
A Dirichlet prior is chosen in this case since the observed response variable is of categorical in nature.
Five different kernel classes are considered and evaluated the model using Dirichlet prior
distribution and the results for the same is presented in the following table.
The clustering kernel M2 to M5 shows that there exists a common behaviour in marijuana usage.
If the value is smaller than one, then one may conclude that the method is overfitting, in this case, H3
class of kernel seems to be overfitting.
Table: Dirichlet Prior
MCMC Simulation
An MCMC simulation is carried out for M3 with H2 and the following figure explains the sample
of boxplots of the posterior probabilities for male and female groups.
Comparing the likelihood results obtained from the above table (598.5) and the previous table
(596.5) the stratified Model based clustering reduces to Standard Model based clustering and it
is clear that the use of marijuana is not associated with the gender classification.
From this results, it is concluded that the use of marijuana among teenagers may be clustered
into two with never-use and other being more user groups.
Figure: Boxplots
for MCMC
Table: Gender Specific Posterior Inference
To sum up, model-based clustering technique along with the Bayesian flavor yields better
results since it provides an answer to the most troublesome problems in the cluster analysis.
In longitudinal or Panel data studies, usage of eculidean distance may be a valid one and
hence a kernel based clustering for Time series data Analysis is considered and selection of
the best method is analysed using different information criteria.
An MCMC simulation is carried out to find the optimal clustering methodology.
[email protected]