Uploaded by 564939872

Assignment3

advertisement
Assignment 3
1. Pull the swap rates data for different maturities from quandl (please create/use your own api
key)
data = quandl.get(['FRED/DSWP1', 'FRED/DSWP2', 'FRED/DSWP3', 'FRED/DSWP4',
'FRED/DSWP5', 'FRED/DSWP7', 'FRED/DSWP10', 'FRED/DSWP30'])
rename the columns appropriately, check for missing data and drop missing data points. Make a
time series plot of rates in a single plot. [5]
2. Compute covariance and correlation matrices. [5]
3. Run a 3 component PCA on swap rates data (note that fit within sklearn package can run on
original data and you not need to use covariance or correlation matrix for that) [5]. Print the
loadings, and PVE [5]. Make a plot of PVE with PVE on y axis and components on x axis [5]. Make
a plot of principal components [5]. Reconstruct the time series using three components and
print the reconstructed time-series [20] (hint: check transform function with PCA) – 40
4. Use attached customers_data.csv (the data represents units purchased by customers in
different product categories such as Fresh, Milk, Grocery etc). Run Hierarchical Clustering to
segment similar customers in different clusters. Use all 8 features, normalize first and then use
scipy.cluster.hierarchy, make a Dendrogram to show clusters. [25]
5. Run kmeans clustering (from sklearn) on the customers_data.csv for k=1 to 20, make an elbow
plot of inertia. For k = 2, make a scatter plot showing two clusters for variables milk and grocery.
[25]
Submit the notebook by May 02 11.59 PM.
-
-
Notebook should not have dependency on any other additional files except customers_data.csv.
and no other packages except pandas, numpy, scipy, sklearn matplotlib, seaborn, quandl keep
the file in same folder and do not specify any path while reading the file.
No errors (-20)
File name should be FirstName_LastName
Download