Case Study: 2004 Movies 1 Description

advertisement
Case Study: 2004 Movies
1
Description
This data was extracted from www.imdb.com. These are the movies that appeared in 2004. There are 63
movies, and 4 variables:
budget
length
rating
votes
How much the movie cost to make
Length in minutes
Average user rating
Number of users logging into web site to rate movie
The primary question is “Can the movies be grouped into a small number of clusters according to their
similarity?”
Other possible questions might be:
• Does a bigger budget suggest a better user rating?
• Which low budget movies that have rated unusually highly?
• Do shorter movies have lower budgets?
1
2
Plan for Analysis
Approach
Summary
statistics
(marginal and conditional)
Plots
Reason
extract location/scale information
Type of questions addressed
“What movie is rated highest by
users?” “What is longest movie?”
explore data distributions
Numerical clustering
Grouping the tracks into clusters of
similar audio attributes. Use hierarchical, k-means, model-based and
self-organizing maps.
“Are most movies short or long?”,
“Is there any obvious clustering of
the movies?”
“Which movies might be considered
alike?”
2
Download