Uploaded by alexander.neblett

Algorithms for DS HW3

advertisement
Algorithms_for_DS_HW3
July 3, 2023
Johns Hopkins University
Whiting School of Engineering
Engineering for Professionals
685.621 Algorithms for Data Science
Homework 3
Assigned at the start of Module 6
Due at the end of Module 8
Total Points 100/100
Class, the below is a standard set of instructions for each HW, in this assignment groups will be
set up for collaboration.
Make sure your group starts one thread for the collaborative problems. You are required to participate in the collaborative problem and subproblem separately. Please do not directly post a
complete solution, the goal is for the group to develop a solution after everyone has participated.
Please ensure you have a write-up with solutions to each problem and subproblems, you are also required to submit code that will be compiled when grading the assignment. In each of the problems
you are allowed to use built-in functions.
1
Module 6 - Note this is not a Collaborative Problem
10 Points Total
In this problem the goal is to build a set of numerical images from a set of arrays. The data set is
from the Kaggle web site will be used: https://www.kaggle.com/c/digit-recognizer/data This data
has a training.csv, test.csv and sample submission.csv files. In this exercise the focus will be on the
train.csv data. The web site has the following data description:
The data files train.csv and test.csv contain gray-scale images of hand-drawn digits, from zero
through nine.
Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each
pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel,
with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive.
The training data set, (train.csv), has 785 columns. The first column, called ”label”, is the digit
that was drawn by the user. The rest of the columns contain the pixel-values of the associated
image.
1
Each pixel column in the training set has a name like pixel x, where x is an integer between 0 and
783, inclusive. To locate this pixel on the image, suppose that we have decomposed 𝑥 as 𝑥 = 28𝑖+𝑗,
where 𝑖 and 𝑗 are integers between 0 and 27, inclusive. Then pixel x is located on row 𝑖 and column
𝑗 of a 28 × 28 matrix, (indexing by zero).
For example, pixel 31 indicates the pixel that is in the fourth column from the left, and the second
row from the top, as in the ascii-diagram below.
This data is set up in a csv file which will require the reshaping of the data to be 28 × 28 matrix
representing images. There are 42000 images in the train.csv file. For this problem it is only
necessary to process approximately 100 images, 10 each of the numbers from 0 through 9. The goal
is to learn how to generate features from images using transforms and first order statistics.
1. [5 points] Read-in and store the data in a data structure of your choice so that the data is
reshaped into a matrix of size 28 × 28 which represents each digit as an image.
2. [5 points] Display the images for indices 0, 1, 3, 6, 7, 8, 10, 11, 16 and 21. These indices
represent the numerical values from 0 to 9.
[ ]: # import additional libraries
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
# import dataset
train_df = pd.read_csv("train.csv")
show_outputs = False # define if outputs will be printed / shown
[ ]: ## Type code here for part 1 ##
# Convert data to numpy array, ignoring first column
train_array = train_df.iloc[:, 1:].to_numpy()
1.0.1
plot_pixels
plot_pixels Function that takes in a list with numpy matrices that represent each the pixels of
an image, create subplots and show each image in a different subplot.
• pixel_arrays: list[np.ndarray] - list with the matrices with the pixel data
• title: str - string with the upper title of the plot
returns: none, show plots
[ ]: def plot_pixels(pixel_arrays: list[np.ndarray], title="Plot of pixels"):
"""Function that takes in a list with numpy matrices that represent each
the pixels of an image, create subplots and show each image in a
different subplot
Keyword arguments
pixel_arrays: list[np.ndarray] - list with the matrices with the pixel data
title: str - string with the upper title of the plot
2
"""
# Determine the size m x n of subplots needed
t = len(pixel_arrays)
m = int(t**0.5)
n = m + int(np.ceil(t**0.5 - m))
if m * n < t:
m = m + 1
# Create subplots
fg, ax = plt.subplots(m, n, figsize=(10, 10), tight_layout=True)
for k in range(m * n):
a = k // n # index of the row of the subplot
b = k % n # index of the column of the subplot
if k >= t:
ax[a, b].axis("off")
else:
ax[a, b].set_title(f"Pixel Array {k}")
ax[a, b].imshow(pixel_arrays[k], cmap="gray") # create subplot
plt.suptitle(title) # Add superior title
plt.show()
[ ]: ## Type code here for part 2 ##
image_indices = [0, 1, 3, 6, 7, 8, 10, 11, 16, 21]
image_pixels = [] # list to keep the arrays with the pixels of each number
for i in image_indices:
image_pixels.append(train_array[i, :].reshape((28, 28)))
if show_outputs:
plot_pixels(image_pixels, title="Figure 1. Numbers from train.csv file")
2
Module 6 - Note this is a Collaborative Problem
20 Points Total
In this problem each image from the train.csv (42,000 images in total) is to be processed to generate
a set of features using the discrete cosine transform and Eigen decomposition. 1. [5 points] Take
the 2 dimensional Discrete Cosine Transform (DCT) of each matrix from Problem 1, the matrix
represents each number (0-9). 2. [2.5 points] Extract the vertical, horizontal and diagonal coefficients from the transform (using the indexes indicated by the masks provided). 3. [5 points] For
each of the three sets of DCT coefficients perform Eigen decomposition. 4. [2.5 Points] Retain the
top 20 Eigen vectors of each direction. 5. [5 points] Using your top Eigen vectors reduce the DCT
transformed data. This will create a new data set that represents each image as a smaller subset
of values. 6. Save the new data in a file of your choice, .txt, .csv, etc. The name is up to you (you
will use this in the subsequent question).
[ ]: ## Type code here for part 1 ##
from scipy.fft import dctn
3
image_dct2 = []
for im in image_pixels:
image_dct2.append(dctn(im, type=2, norm="ortho"))
if show_outputs:
plot_pixels(image_dct2, title="Figure 3. DCT2 transformed numbers")
[ ]: ## Type code here for part 2 ##
mask_files = ["diagMask.csv", "vertMask.csv", "horizMask.csv"] # list of masks
masks = [] # list with the arrays with the masks
image_masked = [] # list to store masked images
i = 0 # counter to use in loop
for m in mask_files:
j = 0 # counter to use in loop
image_masked.append([])
this_mask = np.loadtxt(m, delimiter=",").astype(bool) # get csv data with␣
↪mask
masks.append(this_mask)
for im in image_dct2:
# Apply mask and store in a list
image_masked[i].append(np.multiply(im, this_mask))
j = j + 1
if show_outputs:
plot_pixels(image_masked[i], title=f"Figure {3+i}. DCT2 images with {m[:
↪-4]}")
i = i + 1
[ ]: ## Type code here for part 3 ##
n_obs = 42000 # number of observations that will be considered
print(f"Working with {n_obs} observations\n")
image_pixels = [] # list to keep the matrices with the pixels of each number
image_dct2 = [] # list to keep the dct2 transformed matrices
k = 0 # counter to use in loop
for i in range(n_obs):
image_pixels.append(train_array[i, :].reshape((28, 28)))
# use the built-in dct2 function
image_dct2.append(dctn(image_pixels[k], type=2, norm="ortho"))
k = k + 1
# Use the masks to flatten out each dct2 image and form matrices
# with the masked dct2 image for each mask (diag, vert, horiz)
i = 0 # counter to use in loop
data_masked = []
for mask in masks:
data_masked.append([])
for im in image_dct2:
4
data_masked[i].append(list(im[mask]))
data_masked[i] = np.array(data_masked[i])
i = i + 1
# Convert each masked matrix to zero mean, then get the
# covariance matrices and then do the eigen decomposition
data_masked_zm = data_masked.copy() # make a copy to zero-mean it
cov_data = []
eig_values = []
eig_vectors = []
i = 0
for each in data_masked_zm:
# convert to zero-mean
data_masked_zm[i] = data_masked_zm[i] - np.mean(data_masked_zm[i], axis=0)
# get the covariance matrix using columns as features
cov_data.append(np.cov(data_masked_zm[i], rowvar=False))
# do the eigen decomposition
eig_values.append([])
eig_vectors.append([])
(eig_values[i], eig_vectors[i]) = np.linalg.eig(cov_data[i])
# Even though the eigenvalues are real (because the covariance matrix is
# symmetric), numpy shows the imaginary part as 0j, so let us use the
# np.real method to eliminate the 0j
eig_values[i] = np.real(eig_values[i])
eig_vectors[i] = np.real(eig_vectors[i])
print(
f"For the {mask_files[i][:-4]}, there are {eig_values[i].shape[0]}␣
↪eigenvalues"
)
i = i + 1
# increase counter
Working with 42000 observations
For the diagMask, there are 335 eigenvalues
For the vertMask, there are 224 eigenvalues
For the horizMask, there are 224 eigenvalues
[ ]: ## Type code here for part 4 ##
n_top = 20 # number of top eigenvalues to retain
print(f"The top {n_top} eigenvalues and eigenvectors are retained")
top_eig_val = [] # list with the top eigenvalues in each direction
top_eig_vec = [] # list with the top eigenvectors in each direction
5
for i in range(len(eig_values)):
# retain the top eigenvalues and eigenvectors
top_eig_val.append(eig_values[i][0:n_top])
top_eig_vec.append(eig_vectors[i][:, 0:n_top])
The top 20 eigenvalues and eigenvectors are retained
[ ]: ## Type code here for part 5 ##
new_features = [] # list with the matrices of new features
first_loop = True # boolean value to use in first loop
for i in range(len(data_masked_zm)):
# Do the transformation multiplying by the eigenvectors matrix
new_feature = np.matmul(data_masked[i], top_eig_vec[i])
if first_loop:
new_features = new_feature
first_loop = False
else:
new_features = np.append(new_features, new_feature, axis=1)
print(f"The new features shape is {new_features.shape}")
The new features shape is (42000, 60)
[ ]: ## Type code here for part 6 ##
# Create column labels, create dataframe and get number labels
column_labels = [f"feat_{i}" for i in range(new_features.shape[1])]
features_df = pd.DataFrame(new_features, columns=column_labels)
features_df["label"] = train_df["label"].iloc[:n_obs]
features_df = features_df[["label", *features_df.columns[:-1]]]
# Export as csv
if show_outputs:
features_df.to_csv("transformed_features.csv", index=False)
3
Module 4 Note this is not a Collaborative Problem
20 Points Total
In this problem use the developed numerical features from Question 2 (if you are not able to
generate the features, they are provided in the module for HW 3). In this problem the following is
to be completed:
Use the Fisher’s Linear Discriminant Ratio (FDR) from the Data Processing document, specifically
Equation 20. 1. [10 points] For each feature and combination of numbers apply the FDR, e.g., 0
vs 1, 0 vs 2, …, 0 vs 9, …, 7 vs 8, 7 vs 9, and 8 vs 9 (which should result in a 60 x 45 matrix where
60 represents the number of features and 45 represents the number of pairwise comparisons). 2.
[10 points] Place the results in a table and provide an initial analysis of which feature provides the
best class separation.
6
[ ]: # import libraries
import pandas as pd
# read in the train features data
file_name = "trainFeatures42k.xls"
features_df = pd.read_excel(file_name, header=None)
# new_features = features_df.iloc[:,1:].to_numpy()
# create column labels, considering that the first column is the class label
column_labels = [f"feat_{i}" for i in range(features_df.shape[1] - 1)]
column_labels = ["label", *column_labels]
features_df.columns = column_labels
[ ]: ## Type the code for part 1 here ##
# Get the unique values of the labels (classes)
classes = sorted(features_df["label"].unique())
FDRs = [] # list to keep the Fisher's Discriminant Ratios values
FDR_classes = [] # list for the names of the rows
k = 0 # counter to use in loop
for i in range(len(classes)):
for j in range(i):
FDR_classes.append(f"{classes[i]}-{classes[j]}")
FDRs.append([])
for feat in features_df.columns:
if feat != "label":
# Calculate means and std of the 2 classes
mean1 = float(
features_df[features_df["label"] == classes[j]][feat].mean()
)
mean2 = float(
features_df[features_df["label"] == classes[i]][feat].mean()
)
std1 = float(
features_df[features_df["label"] == classes[j]][feat].std()
)
std2 = float(
features_df[features_df["label"] == classes[i]][feat].std()
)
# Calculate the FDR and append to list
this_FDR = (mean1 - mean2) ** 2 / (std1**2 + std2**2)
FDRs[k].append(this_FDR)
k = k + 1
# Convert list of FDR to a DataFrame
FDR_df = pd.DataFrame(FDRs, columns=features_df.columns[1:])
FDR_df["Classes"] = FDR_classes
7
FDR_df = FDR_df[["Classes", *FDR_df.columns[:-1]]]
[ ]: ## Type the code for part 2 here ##
print("Table 1. FDR of each feature for each pair of classes:")
print(FDR_df.T.to_string())
Table 1. FDR of each feature
0
1
7
8
9
15
16
17
23
24
25
31
32
33
39
40
41
Classes
1-0
2-0
4-1
4-2
4-3
6-0
6-1
6-2
7-2
7-3
7-4
8-3
8-4
8-5
9-3
9-4
9-5
feat_0
2.186007
4.32305
0.004375 0.314971 0.534519
1.028907 0.383191 1.359242
0.461032
0.22191 0.037306
0.856415 0.068577 1.140832
0.15169 0.173328 0.279179
feat_1
0.112855 0.237641
0.213848 0.015226 0.042758
0.217858 0.075051 0.002704
0.289862 0.452029 0.177206
0.003012 0.016433 0.039874
0.108749 0.016431 0.196999
feat_2
0.617201 0.509011
1.0298
0.86196 3.741996
0.359315 1.848472 1.603272
0.04248 1.695366 0.681113
1.456388 0.281325 0.042267
3.28672 0.030286 0.431031
feat_3
0.011335 0.001052
0.000387 0.004186 0.260229
0.002839 0.002323 0.000106
0.712327
0.50227 1.249017
0.996356 0.282489 0.303075
0.131933 0.007181 0.007841
feat_4
0.919689 0.679472
1.065433 0.585679 0.352801
2.229075 1.049379 0.670763
0.076159 0.272145 1.270133
0.06289 0.716183 0.032677
for each pair of classes:
2
3
4
5
6
10
11
12
13
14
18
19
20
21
22
26
27
28
29
30
34
35
36
37
38
42
43
44
2-1
3-0
3-1
3-2
4-0
5-0
5-1
5-2
5-3
5-4
6-3
6-4
6-5
7-0
7-1
7-5
7-6
8-0
8-1
8-2
8-6
8-7
9-0
9-1
9-2
9-6
9-7
9-8
0.200442
0.41588 0.534618 1.464421 2.507917
0.308543 0.743904 1.870382 0.010968 0.771178
0.043586 0.380236 0.114712 1.324156 0.056177
0.346078
0.107886 2.983841 0.031388 0.075042
0.704592 0.171019 1.519868 0.193833 0.957562
0.045338
0.0244 0.430049
0.086195 0.201743 0.051983 0.005277
0.34439
0.084934
0.0 0.049361 0.027778 0.116147
0.012136 0.001842
0.05341 0.796043 1.240804
0.528114
0.11324
0.21323 0.064923 0.000153
0.003598 0.257462 0.445358 0.349132
0.05888
0.01933 0.064466 0.057683
0.000751
2.90817 1.009532 0.933053 0.044496
0.35114 0.025921 0.015712
1.21577 0.653207
4.852698 0.164168 1.336514 0.344351 0.064313
0.004633
1.430466 0.112505
0.12967 0.101523
0.761332 0.026392 0.001908 0.743072 0.609847
0.331569 0.435692 0.147894
0.002435 0.132721 0.230808 0.097109 0.016336
0.018002 0.000463 0.004486 0.285563 0.000001
0.163365 0.004568 0.005058 0.975994
1.18245
1.323453
1.011209 0.401123
0.29248 0.220365
0.31378 2.228291 0.000946 0.004368 0.000036
0.000374 0.908346 0.307953
0.00002 1.061172 0.075066 0.044335 2.209851
1.001736 0.036531
0.02028 0.007347 0.520243
0.442586 0.021222 0.600953 0.400754 0.135629
0.21853
1.291128 0.698192 0.001378 0.000599
0.789111 0.071683 0.992643 0.032588 0.018039
8
0.009109 0.539225
0.0001 0.618322 0.211743 0.029676
feat_5
0.001719 0.059056 0.084664 0.055257 0.118642 0.241062 0.001402
0.000009 0.055852 0.089082 0.054118 0.079961 0.000354 0.237444 0.050746
0.037275 0.047961 0.002535 0.192184 0.032073 0.001123 0.120767 0.313882
0.441304
0.00573 0.199602
0.44797
0.357134 0.103581 0.221436 0.356209
0.00586 0.160162 0.356006
0.29157 0.000084 0.166993 0.390016 0.514085
0.020941 0.260416 0.522117 0.425704 0.007171 0.004337
feat_6
0.039457
0.00094 0.053819 0.315693 0.280531 0.339226 0.409855
0.403972 0.434898 0.004703 0.595086 0.692951 0.621412 0.028092 0.009424
0.045198 0.269449 0.030694 0.671331 0.828622 1.156492 0.793183
0.93479
0.819341 0.100538 0.063914 0.029043
1.399729
0.57068 0.654851 0.596976
0.023448 0.006803 0.000234
1.11685 0.033992 1.351434 1.759619 1.373395
0.346688 0.280385 0.215007
2.18526
0.07706 0.227399
feat_7
0.001941 0.108767 0.164273 0.074045 0.110605 0.001051 0.152031
0.171523 0.393638 0.314542 0.045801 0.042943 0.205204 0.162968 0.011752
0.62253 0.834508 0.167859 0.178853 1.076381 0.681594 0.000567 0.000124
0.110702 0.077511 0.120004 0.034858
0.593968 0.003796
0.01038 0.052487
0.034633 0.146621
0.05613 0.397339 0.006096
0.11594 0.137807 0.362653
0.279262 0.008217 0.001572 1.097014 0.085246 0.111995
feat_8
0.015376 0.014541
0.06807 0.029328 0.008139 0.083139 0.027626
0.006567 0.082103 0.000138 0.025228 0.005461 0.076255 0.000151 0.000001
0.000568 0.023516 0.009535 0.038415 0.036765 0.033666 0.130946 0.123479
0.217116
0.04342 0.052347 0.047932
0.148717 0.015531 0.001686 0.054003
0.000704 0.000303 0.000252 0.021484 0.041334 0.004154 0.053824 0.004918
0.068002
0.0673 0.060558 0.001452 0.224627 0.037941
feat_9
0.056164
0.04954 0.001427 0.028271
0.00112 0.003026 0.117167
0.043275 0.013547 0.029881 0.312921 0.290394
0.13877
0.1749 0.086726
0.077948 0.014603 0.004284 0.013941 0.001811 0.089779 0.038988 0.001469
0.003634 0.000027 0.044261 0.255172
0.018262 0.023063 0.003277 0.005454
0.000329 0.037678 0.192954
0.01875 0.000712 0.005552 0.044005 0.034411
0.014389 0.114579 0.365104 0.064569 0.023853 0.010055
feat_10
0.003802 0.033423 0.091603 0.276495 0.591296 0.111969
0.01008
0.065985 0.014035 0.277614 0.181545
0.38825 0.057813 0.007876 0.165181
0.072873 0.205853 0.005027 0.083973 0.049038 0.035615
0.0028 0.034325
0.028355 0.342118 0.004292
0.21208
0.078389 0.122093 0.279901 0.026243
0.030166 0.100002 0.006688 0.010674 0.137088 0.019617 0.018917 0.121202
0.563303 0.094223
0.39349 0.224209 0.063909 0.295932
feat_11
0.089389 0.591033 1.358483 0.062153 0.372249 0.302998 0.058603
0.000636 0.849968 0.217549 0.010925 0.027086 0.724609 0.122451 0.020863
0.018134 0.208785 0.407339 0.012096 0.129861
0.0554 0.155053 0.024587
1.403784 0.454796 0.007218 0.073006
0.287212 0.006742 0.034248 0.671622
0.104188 0.025902 0.000408 0.044503 0.082405 0.019771 0.256465 0.469097
0.015332 0.143156 0.061767 0.000022 0.338822 0.049182
feat_12
0.356933 0.018764 0.159364 0.115555 0.013959 0.041345
0.15505
0.025691 0.053865 0.000014 0.034934 0.684514 0.101117 0.262868 0.353013
0.033082 0.083965
0.00286 0.019196 0.023675 0.122487
0.1745 0.038318
0.058943 0.000092 0.000048 0.399797
0.02485 0.050024 0.720548 0.123289
0.292671 0.387808 0.001652 0.144956
0.43619 0.053071 0.135122 0.005857
9
0.022152
feat_13
0.005417
0.006511
0.001454
0.105329
0.000697
feat_14
0.00079
0.000034
0.008246
0.004484
0.013907
feat_15
0.354559
0.01897
0.106313
0.00398
0.021482
feat_16
0.062767
0.009995
0.003024
0.026817
0.000211
feat_17
0.034249
0.000435
0.02105
0.009318
0.012041
feat_18
0.097494
0.016728
0.00756
0.022511
0.124524
feat_19
0.082481
0.012861
0.062137
0.192514
0.016249
feat_20
4.598488
1.785902
0.00022
0.045139
0.030422
0.03553
0.00505
0.007267
0.02423
0.033368
0.030378
0.028466
0.023181
0.025162
0.002235
0.000383
0.065278
0.068611
0.000668
0.015448
0.075455
0.000284
0.070951
0.000458
0.015853
0.021543
0.025271
0.036493
0.000499
0.044458
0.006107
0.029934
0.014348
0.003278
0.003661
0.076749
0.032445
0.219167
0.018729
0.017388
0.127579
0.026024
0.060381
0.000341
0.078617
0.436858
0.148112
15.273306
0.215865
3.584494
0.050292
0.707704
0.184051
0.000708
0.031469
0.002736
0.00205
0.015052
0.218379
0.000077
0.010886
0.000191
0.005093
0.014695
0.118844
0.004372
0.007967
0.046844
0.231897
0.025901
0.004923
0.002823
0.001375
0.002301
0.038329
0.042531
0.001521
0.003884
0.00231
0.006135
0.004806
0.000606
0.000397
0.029779
0.001137
0.096947
0.013405
0.02944
0.145568
0.019252
0.056264
0.000425
0.273912
0.367393
0.102461
2.34201
0.536908
0.031003
0.210295
0.285979
0.000152 0.033886 0.213198
0.02198 0.006242 0.070533 0.010477 0.010372
0.150765 0.095467 0.125921 0.193278 0.098744
0.023767 0.000254 0.096719 0.005099 0.022882
0.147251
0.000564 0.070169 0.023616 0.054563
0.034868 0.057832 0.003743
0.0835 0.007703
0.021483 0.022872 0.114506
0.022022 0.001729 0.008783 0.001052 0.028557
0.073381 0.027839 0.064328 0.042138 0.016586
0.001973 0.026201
0.06759 0.011044 0.003069
0.036846
0.010548 0.012276 0.000003 0.010129
0.012154 0.001114 0.007612 0.071408 0.008518
0.005685 0.038112 0.032215
0.131502 0.000313 0.095051 0.002604 0.010891
0.004189
0.05494 0.019989 0.007841 0.044747
0.027451 0.094751 0.007266
0.05597 0.000015
0.040332
0.011069 0.006122 0.170036
0.00008
0.05811 0.132377 0.014865 0.013689 0.037846
0.000059 0.010495 0.046221
0.007896 0.021105 0.044204 0.008362 0.031895
0.039152 0.078004 0.020839 0.002951
0.00026
0.001659 0.005704 0.008589 0.000018 0.000981
0.047723
0.011449 0.001997 0.001248 0.007844
0.016106 0.002627 0.024782 0.050139
0.01089
0.002927 0.029637 0.030192
0.066109 0.000924 0.058329 0.001151 0.000343
0.003717 0.020382 0.013895 0.008206 0.001698
0.002423 0.000009 0.001332 0.008108 0.013091
0.000785
0.004018 0.005325 0.006897 0.014302
0.00284 0.000008 0.006485 0.014289 0.018355
0.003035 0.000057 0.000075
0.003617 0.000501 0.091871 0.037734 0.000086
0.001914 0.114552 0.047878 0.000436 0.003301
0.011068 0.023344 0.007255
0.0125 0.040248
0.026997
0.073215 0.016873 0.007505 0.000802
0.064394
0.00209 0.109775 0.018687 0.023637
0.234709 0.073327 0.028472
0.000034 0.015754 0.000247 0.000332 0.013887
0.003015 0.047664 0.035153 0.031151 0.003844
0.000013 0.047757 0.026084 0.173995 0.095711
0.214299
0.070115 0.320715 0.231939 0.164047
0.173355 0.035127 0.072075 0.017798 0.010311
0.014814 0.030801 0.125943
2.822765 4.615823 3.560506 0.040506 0.895865
2.948804 4.773451 0.010569 0.141291 0.184375
0.167769 0.087321 0.009816 2.406134 3.036261
0.007795
0.02709 4.378949 2.029197
0.12846
0.300962 0.146164
2.23394 3.147415 0.002911
10
0.072243
feat_21
1.844964
0.484145
0.646087
0.083392
0.569836
feat_22
0.134812
0.065633
0.716963
0.611085
0.281122
feat_23
0.014784
0.371801
0.418253
0.173064
0.295981
feat_24
0.075235
0.041185
0.004641
0.40411
0.153216
feat_25
0.009906
0.128737
0.01339
0.01873
0.197504
feat_26
0.000744
0.424792
0.174619
0.228432
0.002807
feat_27
0.276895
0.139322
0.05929
0.00867
0.128967
feat_28
0.060117
0.021218
0.091903
0.02641
0.174811
0.009359
0.421108
1.044719
0.405473
0.058425
0.045692
0.334925
0.261719
0.861875
0.742002
1.036674
0.005046
0.006903
0.135549
0.159481
0.959906
0.065119
0.015531
0.041671
0.005844
0.004238
0.212567
0.000019
0.011427
0.000648
0.153676
0.187885
0.000142
0.028317
0.012463
0.134296
0.003133
0.036223
0.089898
0.171643
0.004474
0.03375
0.363466
0.046422
0.138823
0.610449
0.123858
0.307266
0.143975
0.648539
0.014492
0.002355
0.001797
0.317939
0.23843
0.041591
0.010621
0.064978
0.51324
0.142067
0.246886
0.487518
0.030459
0.068671
0.965163
0.516886
0.422101
0.039766
0.026241
0.050673
0.005127
0.069585
0.256083
0.184405
0.000002
0.487856
0.161589
0.100471
0.095406
0.41724
0.099268
0.047036
0.00664
0.125998
0.000398
0.133284
0.083018
0.043506
0.108345
0.017546
0.538913
0.077535
0.118296
0.448359
0.049571
0.012515
0.046112
0.00062
0.010585
0.003443
0.015444
0.776998
0.527164
0.000194
0.359763
0.063592
0.486601
0.072686
0.002122
0.587697
1.657401
0.021569
1.143295
0.263453
0.123697
0.357583
0.078917
0.003641
0.01038
0.139753
0.315478
0.090283
0.222319
0.124392
0.024351
0.129545
0.009535
0.338888
0.146069
0.212023
0.05731
0.007218
0.020669
0.057682
0.382325
0.554945
0.025498
0.003801
0.311365
0.159727
0.045607
0.213717
0.003216
0.487643
0.032793
0.057269
0.000038
0.145176
0.001599 0.177933
0.517557 1.203254 0.043149 1.032586
1.185593
0.05149 0.000688 0.208801
0.201506 0.000111 1.307988 2.387446
0.341877 0.834302 1.768162 0.233587
0.138706 1.516347 2.691754 0.840961
0.013653 0.243155
0.184995 0.056882 0.003063 0.530139
0.687335 0.225145 0.330621 0.757667
1.008162
0.13236 0.961913 0.522226
1.678743 0.022327 1.012337 0.466965
1.986327 0.570582 0.137433 0.290228
0.086859 1.284876
1.103534 0.631526 0.130023 0.048567
0.044542 0.134379 0.505942
0.00492
0.059782 0.046678
0.00485
0.00059
0.280539 0.292519 0.150192 0.010053
0.227169 0.142555
0.06372 0.059141
0.099691 0.017862
0.258795 0.022574 0.479057 0.018022
0.019418 0.563158 0.000931
0.27611
0.077357 0.098552 0.014951 0.066796
0.063984 0.029164 0.097055
0.01015
0.000031 0.000028 0.031817
0.03958
0.010102 0.014991
0.054735 0.069079 0.009777 0.003501
0.023571 0.206855 0.136798 0.001332
0.101316 0.092114 0.055299 0.073359
0.354237 0.010313 0.009514 0.051765
0.017752 0.028461 0.058643 0.275947
0.212736 0.084549
0.163686
0.00011 0.007687 0.123135
0.071607
0.05668 0.090373 0.061284
0.054157 0.339985 0.467571 0.060594
0.007167 0.002734 0.182871
0.18751
0.594117 0.181153 0.001353 0.018564
0.048567 0.243275
0.000211 0.036823 0.018141 0.448171
0.161319 0.229029 0.378745 0.020618
0.077783 0.023287 0.119212 0.031427
0.002244 0.004725 0.072758 0.047694
0.192871 0.111331 0.027479 0.053409
0.000103 0.181584
0.004891 0.279501 0.037219 0.069587
0.174346 0.096356 0.014393 0.012644
0.188257 0.137921 0.032217 0.152175
0.128774
0.04633 0.090995
0.10904
0.002608 0.027443 0.167984 0.083927
11
0.011035 0.013904 0.000107 0.118794
0.00025 0.004341
feat_29
0.081639 0.000203 0.079294 0.019216 0.260977 0.025979 0.151801
0.020435 0.151213 0.369465 0.107586 0.003333 0.105899 0.305296
0.00746
0.160936 0.029951 0.159935 0.356701
0.00158 0.014591 0.069591 0.000454
0.066894 0.229511 0.025613 0.005978
0.035526 0.400812 0.183742 0.410844
0.786808 0.071196 0.133788 0.043277 0.194424 0.025365 0.023376 0.022404
0.129333 0.079817 0.042442 0.090295 0.016484
0.32148
feat_30
0.045189 0.020469 0.299687 0.066087
0.50445 0.031762 0.013832
0.209864
0.00065 0.033246 0.073694 0.571162 0.040671 0.000228 0.040851
0.036022 0.280036 0.007017
0.00395 0.009633 0.005946 0.027326 0.346921
0.001132 0.021955 0.003053 0.029163
0.003316 0.243853 1.038998 0.279851
0.127048 0.247588 0.126462 0.131727 0.255765 0.004567 0.165789 0.011536
0.074068 0.005123 0.088298 0.028541 0.019684
0.35784
feat_31
0.250007 0.024757 0.141719 0.066909 0.075793 0.011977 0.019095
0.115376 0.000002 0.009502 0.000012 0.295113 0.029827 0.081445 0.021798
0.000223 0.274751 0.031385 0.078785 0.023981 0.000407 0.009289 0.189286
0.004257 0.030319 0.003097 0.010907
0.013156 0.039422
0.11058
0.00207
0.004002 0.001706 0.047661 0.048013 0.012118 0.000108 0.294287 0.033005
0.084533
0.02456
0.00024
0.00003 0.013378 0.050983
feat_32
0.091937 0.010548
0.05801 0.007206 0.222092 0.053572 0.010578
0.041235 0.000117 0.045111 0.000025 0.142659 0.015946 0.013965 0.014489
0.024948 0.025721 0.004914 0.083872 0.002709
0.0371 0.003957 0.167248
0.03581 0.000372 0.031979 0.007147
0.060295 0.002532 0.083248 0.003441
0.025811
0.0039 0.003348 0.015159 0.016251
0.02014 0.026537 0.003036
0.068118 0.001578 0.028518 0.000113 0.049651 0.011231
feat_33
0.055235
0.10517 0.003292 0.268065 0.052546 0.036248 0.089074
0.004511
0.00031 0.021557 0.342351
0.09495 0.077333 0.010066 0.051627
0.005086 0.023071 0.049986 0.158574 0.045938 0.219816 0.002234 0.035828
0.073961 0.213266 0.064438 0.283108
0.000745 0.056781 0.001485 0.000052
0.022759 0.000428 0.049476 0.028589 0.040051 0.028599 0.005053 0.019165
0.102416 0.019143 0.157541 0.007551 0.014843 0.010035
feat_34
0.014623 0.126754 0.097527 0.018955 0.000776 0.077427 0.031926
0.132918 0.334965 0.138632 0.000249 0.020802 0.179584 0.026644 0.059758
0.001027 0.027462 0.160825 0.032881 0.022465 0.003122 0.050638 0.179363
0.393938 0.183797
0.00263
0.09196
0.039148 0.009787 0.079818 0.272274
0.086181 0.010331 0.022562 0.004292 0.024593 0.017113 0.000583 0.071514
0.000003
0.12411 0.022622 0.029626 0.164726 0.075472
feat_35
0.006065 0.119656 0.113045 0.073739 0.061695 0.006089 0.007854
0.000692 0.073991 0.037574 0.009332 0.044721 0.221375 0.153898 0.039557
0.023674
0.01087 0.053939 0.022024 0.003831
0.07973 0.027923 0.014574
0.051625
0.02006 0.005445 0.090752
0.00013 0.000695 0.002621 0.113195
0.066884 0.004346 0.017332 0.017908 0.021802 0.094819 0.088485 0.005425
0.000128
0.05066
0.20198 0.031859 0.029611 0.088318
feat_36
0.011586
0.01402 0.056934 0.029118 0.088688 0.002734 0.025663
0.004197 0.077816
0.10978 0.003264 0.029415 0.003909 0.013337 0.047842
0.060485 0.019754 0.153415 0.210847 0.003534 0.101073 0.000018 0.012245
0.017431 0.035928 0.027252 0.004313
0.06718 0.006179 0.033917 0.000865
0.00578
0.05214 0.000722
0.10124 0.007637 0.002135 0.003751 0.028751
12
0.050407 0.014124 0.011239 0.040085
0.00202 0.015335
feat_37
0.049647 0.002463 0.026986 0.016249 0.009179
0.00524 0.002526
0.017808 0.000062 0.002904 0.001209 0.057645 0.006269 0.023051 0.005808
0.003656 0.017098 0.000254 0.002313 0.000047 0.007515 0.034349 0.002043
0.015858 0.002877 0.009811
0.04222
0.009032 0.005142 0.013809 0.000747
0.001271 0.000292 0.009417 0.000112
0.00669 0.020487 0.004012 0.008471
0.00066 0.005291 0.027357 0.004602 0.000535 0.003131
feat_38
0.000722 0.001266 0.000122 0.009633 0.022913 0.025274 0.002296
0.007622
0.00919 0.002909 0.017668 0.036601 0.039203 0.001871 0.008747
0.000797 0.004009 0.005217 0.005722 0.000452 0.013055 0.027413 0.055753
0.05853
0.00618 0.016658 0.001026
0.022575 0.014228 0.028017
0.03033
0.001191 0.006549 0.000019 0.010017 0.001112 0.010694 0.025776 0.028257
0.000011 0.003458 0.001722 0.006577 0.006118
0.00106
feat_39
0.054404
0.18086 0.075323 0.087238 0.013165 0.019353 0.149549
0.055503
0.00022 0.013402
0.13967 0.040305 0.007699 0.003818 0.004349
0.003571 0.027847 0.138404 0.057111 0.112601 0.099589 0.115406 0.030344
0.006239 0.003152
0.0036
0.0
0.081734 0.086014 0.017662 0.008466
0.001 0.005494 0.000416 0.058953 0.000369 0.048674 0.001524
0.03436
0.003209 0.026092
0.01365
0.02776
0.01151 0.006495
feat_40
1.211417
0.08914 0.283816 0.148372
2.34235
0.40385
4.21039
4.428045
1.82292 5.796107 0.040976 0.382053 0.006457 0.284762 1.961286
1.37305 0.122625 0.439833 2.428394 1.174854 0.545493 3.773238 2.924691
1.685542 5.263264 0.000003 1.824495
0.954263 0.097458 0.823756 0.002895
0.523241
4.30832 0.001863 0.982475 3.667279
3.14537 1.781355 1.454824
4.483965 0.001152 1.591503 0.682911 0.000886 2.859376
feat_41
0.007403 1.337053 1.969598 0.205002 0.442988 2.331113 0.003032
0.044746 1.886246 0.221444 0.242826 0.501297 2.398531 0.002072 0.264925
2.450197 5.148559 0.023903 3.862155
3.9963
3.90909 0.808454
3.39283
4.854472 0.074748 1.253167 0.043516 10.443484 0.313457 0.970772 3.244972
0.00019 0.399431
0.00159 6.306819 0.124688 0.575587 2.089938 4.168179
0.027343 0.828212
0.01108 8.600509 0.025434 0.040316
feat_42
2.1101
1.16983 0.006417 1.650308 0.075608 0.071956 3.974959
1.582564 0.658036 0.191307 0.573308 0.136056 0.058159
0.22491 0.953293
1.777648 0.000124 0.006089 0.059301 0.977385 0.119394 0.127983 1.836485
0.755553 1.223278 4.000659 0.261381
1.38187 1.554898 0.004597 0.012253
0.032582 0.576476 0.124144 0.002653 1.127756 1.966502
0.0255 0.027764
0.024977 0.722364 0.176353 0.014276 1.587253 0.002288
feat_43
1.107126 0.103972 1.399705
0.3105 0.006737 0.590882 0.228755
0.231278 0.532279 0.038807 0.230643 0.037503 0.494725 0.005967 0.011794
0.290388 0.292426 0.618134 0.036367 0.000501 0.009559 0.095803 0.160216
0.305691
0.05006 0.004688 0.022232
0.007851 0.012663 1.474935 0.052504
0.421854 0.359084 0.330107 0.449676 0.162709 0.003624
0.68603 0.125615
0.226065
0.13807
0.16049 0.172365 0.057584 0.026711
feat_44
0.548481 0.352205 0.000779 0.040358 0.234569 0.154413 0.059109
1.47697 0.808672 0.215855 0.006579 0.680101 0.443327 0.077944 0.022575
0.021706 1.439901 0.704626 0.144666 0.014977 0.002644 0.995627 0.278949
0.086391 0.580955 2.126796 1.147397
2.119017 0.019342 0.471053 0.262656
0.006501 0.182036 0.050635 0.110701 0.961754 0.015367 1.150271 0.617744
13
0.120584 0.018466 0.001043 0.000436 1.773645 0.087784
feat_45
0.640098 0.081072
0.27797 0.004135 0.695413 0.058124 0.042501
0.736678 0.015433 0.022734
0.19157 0.085798 0.029537 0.166948 0.100521
0.009918 1.213487 0.183253
0.03276 0.131606 0.355391 0.008843 1.535614
0.075585 0.000439 0.030439 0.228399
0.060053 0.061827 0.354277 0.001855
0.040502 0.005929
0.04683 0.151985 0.051589 0.001404 1.344092 0.101185
0.001523 0.053146 0.259161 0.028887
0.00618 0.074872
feat_46
1.054235 0.029898 0.318662 0.131813 0.103392 0.028813 0.016516
1.018781
0.07269 0.196991 0.225789 0.001328 0.096777 0.028121 0.291777
1.548792 0.371292 0.734905 0.418054 1.526292 0.109108 0.580936 0.002683
0.23458 0.085743 0.655784 0.003598
0.145928 0.273954 0.073451
0.07621
0.007242 0.353086 0.012398 0.420184 0.056785 0.386348 0.000044 0.159782
0.052217 0.465578 0.000655
0.13945 0.001559 0.028754
feat_47
0.007851 0.026261 0.096177 0.061352 0.193499 0.006694 0.002603
0.003169 0.064926 0.136727 0.004751 0.028676 0.006968 0.026183
0.01698
0.148261 0.215737 0.323524 0.461283 0.204259 0.189111 0.084387
0.11092
0.229251 0.348156 0.110352 0.120726
0.014635 0.000458 0.005821 0.040447
0.08913 0.001005 0.009078 0.169753 0.094123 0.010123
0.0013
0.08911
0.168882 0.005453 0.030302
0.13919 0.064523 0.008103
feat_48
0.255814 0.234407 0.026863 0.069664 0.009906 0.038473 0.015715
0.399924 0.339594 0.135643 0.145549
0.01461 0.000013 0.026279 0.215509
0.050739 0.020997 0.053053 0.001181 0.109226 0.036076 0.093522 0.000543
0.017711 0.002765 0.163007 0.012748
0.007247 0.210369
1.15893 0.797381
0.449465 0.101475 0.497807 0.398094 0.474945 0.084158 0.009219
0.03871
0.000085 0.159097 0.025571 0.002047 0.002132 0.517372
feat_49
0.120227 0.191979 0.058658
0.12469 0.032768 0.000064 0.000567
0.227839 0.260924 0.154764 0.097519 0.378995 0.435951 0.316442 0.101729
0.025305 0.226652
0.29375 0.204486 0.023811 0.023298 0.113731 0.008525
0.016921 0.012081 0.167344 0.343719
0.208634 0.002852
0.04041 0.105021
0.076466 0.005998 0.104421 0.035085 0.051966 0.020449 0.039708 0.112868
0.070664
0.037 0.193563 0.085359 0.048103 0.003754
feat_50
0.041982 0.028112 0.003223 0.008819 0.001914 0.005461
0.02633
0.257669 0.088441 0.054615 0.007855 0.072933 0.050589 0.026213 0.001452
0.050668 0.010662 0.000334 0.011085 0.157697 0.079401 0.034371 0.189297
0.095116 0.062199 0.002804 0.005599
0.153052 0.033303 0.003515 0.000027
0.005707 0.112373 0.058677 0.000697 0.115345 0.119089 0.065052 0.014427
0.045582 0.272024 0.150271 0.014305 0.253223 0.019284
feat_51
0.032558 0.049715 0.017913 0.229399 0.435537 0.344666 0.067247
0.351338 0.179272 0.106929 0.004438 0.046005 0.063936 0.140856 0.018208
0.368674 0.732922 0.478026 0.002631 0.200068 0.220748 0.188588 0.718052
0.301039 0.042549 0.041018 0.074897
0.095062 0.229541 0.189695 0.022172
0.689942 0.611595 0.217877 0.998303 0.906083 0.021259
0.00006 0.016129
0.365097 0.207927
0.0354 0.580447 0.425139 0.142021
feat_52
0.000799 0.129712 0.200215 0.024698 0.043747 0.032884 0.015604
0.020192 0.221586 0.069881 0.020162 0.033642 0.029861 0.000006 0.057352
0.046558 0.104261 0.036917 0.000327 0.125664 0.000375 0.182887 0.288755
0.002051 0.054288 0.297669 0.048672
0.064549 0.322353 0.588639 0.012737
0.10373 0.508445 0.090098 0.138323 0.004274 0.071618 0.103613 0.328002
14
0.143141 0.025018 0.120283 0.233254
0.4182 0.659545
feat_53
0.012204 0.073153 0.057298
0.11674 0.105844 0.003151 0.031488
0.01515 0.021717 0.048796 0.007763 0.036453 0.100169 0.144095 0.057784
0.000892 0.003017 0.050058
0.08278 0.014962 0.011445 0.286506 0.364765
0.027531 0.010881
0.17076 0.292025
0.197826 0.009033 0.000669
0.02771
0.052402
0.00256 0.026757 0.003556 0.140608 0.012477
0.00078 0.039892
0.07543 0.005544 0.034715 0.004234 0.230419 0.000037
feat_54
0.024658 0.031854 0.007877 0.044229 0.016557 0.001168 0.063595
0.029537 0.003221 0.000388 0.032752 0.009792 0.000246
0.00028 0.001341
0.000287 0.029317 0.035762
0.04844 0.068406 0.036466 0.012163 0.000021
0.004268 0.009686 0.015952 0.005908
0.015041 0.012682 0.000016 0.004515
0.010159 0.016792 0.006195 0.015669
0.0
0.0169 0.000796 0.002287
0.006626 0.011668 0.003669 0.020139 0.000321 0.000357
feat_55
0.091743 0.003223 0.065974 0.006908 0.015988
0.01344 0.036749
0.00292 0.038474 0.005394
0.04672 0.002275 0.050465 0.015663 0.005494
0.039648 0.000076 0.043355 0.010148 0.001804 0.000999 0.009279 0.035461
0.015951
0.00005 0.010123 0.022267
0.01563 0.158265 0.047414 0.127019
0.065274 0.048021 0.009813 0.021005 0.100776 0.005721 0.135074 0.000002
0.020109 0.064865 0.070249 0.063854 0.027903 0.199458
feat_56
0.0 0.006619
0.00875 0.021239
0.03644 0.037214 0.064713
0.136505 0.009879 0.169403 0.062625 0.096328 0.014911 0.144466 0.002151
0.0 0.000002 0.007246 0.024084 0.077469 0.070428 0.020141 0.043361
0.03611 0.000528 0.208174 0.160165
0.023786 0.040379 0.068646 0.055464
0.00305
0.22175 0.184432 0.045825 0.007246 0.111272 0.242778 0.108373
0.026903 0.462532
0.32708 0.132533 0.048686 0.010033
feat_57
0.005367 0.009695 0.030947 0.006631 0.028186 0.000515 0.053812
0.148447 0.009512 0.017789 0.187076
0.29064 0.101129 0.125949 0.072287
0.0081 0.038553 0.000555 0.000001 0.021278 0.138928 0.007859 0.030493
0.000228 0.000063 0.015093 0.119404
0.000061
0.00061 0.000857 0.012234
0.009131 0.051161 0.175347 0.010645 0.010389 0.004487 0.028741 0.002007
0.000492 0.030288
0.1554 0.000643 0.000932
0.00694
feat_58
0.00271 0.034137 0.030001 0.005064 0.001973 0.013198 0.004748
0.001404 0.018226 0.000158 0.292639 0.356595 0.045149 0.156097 0.214611
0.048565
0.0516 0.001325 0.012048 0.020688 0.115403 0.000287 0.004125
0.035114 0.006422 0.006157 0.262742
0.046126 0.106304 0.121058 0.002667
0.043114 0.063717 0.043416 0.014763 0.098202 0.040422 0.041249 0.002463
0.008787 0.015657 0.127819 0.000375 0.038936 0.019478
feat_59
0.030523 0.001767 0.033021 0.000504 0.013848 0.003379 0.012975
0.002344 0.018769 0.005621 0.008032 0.000762 0.013357 0.003924 0.000006
0.009902 0.001392 0.015624 0.004548
0.0 0.000005
0.02876 0.001852
0.033425 0.016016 0.004918 0.002478
0.003619 0.056149 0.021342
0.0587
0.037804 0.024647 0.015165 0.019671 0.008756 0.026874 0.169677 0.008486
0.028427 0.092553
0.0533
0.06896 0.122903 0.155213
To evaluate the “usefulness” of each feature to separate the classes, we will calculate as indicator
the sum of the FDRs of each feature.
15
[ ]: ## Provide code for analysis ##
feature_ranking = (
FDR_df.iloc[:, 1:].sum(axis=0).sort_values(ascending=False)
) # get ranking of features
n_top = 10 # number of top ranked features to consider
print(f"Table 2. Top {n_top} best ranked features.")
print(feature_ranking.iloc[:n_top])
3.1
Type analysis for part 2 here
Since the Fisher’s Discriminant Ratio is obtained as the difference of the means squared, divided
by the sum of the variances of the classes, a larger value of FDR is an indication that the means
of the classes are more separated and/or the variances of the classes are smaller, so the larger the
FDR value, the better separability between classes. Considering this, the features that provide the
best separability between the classes are features 41, 40, 20, 2 and 42. Table 2 shows the top 10
features.
4
Cross-Validation [2], [7] This is a Collaborative Problem
Not covered in lecture notes
20 Points Total
In this problem you are to develop and implement a k-fold cross validation algorithm. You are
allowed to use either the Iris data set or the developed numerical features from HW2 to test your
implementation. In this problem the following is to be completed:
1. [5 points] Develop (pseudocode) an algorithm to randomly shuffle input data. Then divide the
data into groups of testing and training sets based on the number of desired folds/experiments,
the term used will be k-fold cross validation. Use the 5-fold cross validation in Figure 1 as a
reference.
2. [5 points] Implement your k-fold cross validation algorithm.
3. [5 points] Test your implementation using the numerical features generated question 2.
4. [5 points] Perform analysis to determine if your implementation is correct. Explain your
method of analysis and conclusions.
[ ]: %matplotlib inline
from IPython.display import Image
Image('cross_val.png')
4.1
Type your psuedocode for part 1 here
You may use this cell in markdown or python code based on your preference
4.1.1
shuffle_data
shuffle_data Function that takes in a dataframe, shuffles its rows randomly and returns the
shuffled dataframe
• x_df: pandas DataFrame - pandas dataframe to shuffle
16
returns: DataFrame with rows shuffled randomly
[ ]: ## Type response to part 2 here ##
import random
def shuffle_data(x_df: pd.DataFrame):
"""Function that takes in a dataframe, shuffles its rows randomly
and returns the shuffled dataframe
Input Arguments:
x_df - pandas dataframe to shuffle
Returns: the dataframe with rows shuffled randomly
"""
indices = x_df.index # get indices of the dataframe
n_rows = len(indices) # number of indices
shuffled_indices = [] # list to store the shuffled indices
for i in range(n_rows):
shuffled_indices.append(indices.pop(random.randrange(0, len(indices))))
shuffled_df = x_df.loc[shuffled_indices]
return shuffled_df
[ ]: def k_fold(x_df: pd.DataFrame, k: int) -> list[tuple[tuple[int]]]:
"""Function that takes in a dataframe and an integer k and returns
the necessary indices to perform a cross validation k times.
Input Arguments:
x_df - pandas dataframe with the train and test data
k - int with the number of cross validations that will be performed
Returns: list with k tuples, each tuple containing two lists, one with the
indices of the training dataset and one with the indices of the test dataset
"""
n_rows = x_df.shape[0] # number of rows in dataframe
indices = x_df.index
# make sure k is not too big
if k > n_rows:
print("k cannot be greater than the number of rows of the dataframe")
result = None
raise ValueError
else:
n_train = int(np.ceil(n_rows/k))
k_fold_list = []
for i in range(k):
17
list1 = tuple(indices[i*n_train:(i+1)*n_train])
list2 = tuple([*indices[0:i*n_train], *indices[(i+1)*n_train:]])
k_fold_list.append((list1, list2))
result = k_fold_list
return result
[ ]: ## Type the code for part 3 here ##
import numpy as np
import pandas as pd
iris_df = pd.read_csv("iris.csv")
x = k_fold(iris_df, 5)
print(x)
[ ]: ## Type your code for part 4 here ##
4.2
5
Explain your methods for part 4 here
5. - Module 8 Note this is a Collaborative Problem - Parzen
Window
30 Points Total
In this problem the following is to be completed: 1. [10 points] Using your 5-fold cross validation
implementation from Problem 4, the Gaussian kernel in Eq. 27 (Parzen Window) of the Machine
Learning document, implement an algorithm to process training observations and compare with
test observations. 2. [10 point] Using all observations and the petal length from the Iris data
replicate the subfigures in Figure 2. 3. [10 point] Using all observations, the petal length and the
petal width from the Iris data replicate the subfigures in Figure 3 without contour lines.
[ ]: Image("gk11d.png")
[ ]: Image("gk251d.png")
[ ]: Image("gk51d.png")
[ ]: Image("gk12d.png")
[ ]: Image("gk252d.png")
[ ]: Image("gk52d.png")
[ ]: ## Type the code for part 1 here ##
[ ]: ## Write the code for part 2 here ##
[ ]: ## Write the code for part 3 here ##
18
6
References
[1] Bishop, Christopher M., Neural Networks for pattern Recognition, Oxford University Press,
1995
[2] Bishop, Christopher M., Pattern Recognition and Machine Learning, Springer,
2006,
https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-PatternRecognitionand-Machine-Learning-2006.pdf
[3] Duin, Robert P.W., Tax, David and Pekalska, Elzbieta, PRTools, http://prtools.tudelft.nl/
[4] Franc, Vojtech and Hlavac, Vaclav, Statistical Pattern Recognition Toolbox,
https://cmp.felk.cvut.cz/cmp/software/stprtool/index.html
[5] Fukunaga, Keinosuke, Introduction to Statistical Pattern Recognition, Academic Press, 1972
[6] Goodfellow, Ian, Bengio, Yoshua, and Courville, Aaron, Deep Learning, MIT Press, 2016,
https://www.deeplearningbook.org/contents/ml.html
[7] Russell, S., and Norvig, P., Artificial Intelligence A Modern Approach, 4th Edition, Pearson,
2020
[8] Fisher, R.A., The use of Multiple Measurements in Taxonomic Problems, Annals of Human
Genetics, Vol. 7, Issue 2, pp. 179-188, 1936
[9] Hotelling, H., Analysis of a complex of statistical variables into principal components, Journal of Educational Psychology, Number 24, pp. 417–441, 1933
[10] Rao, K. P. and Yip, P., Discrete Cosine Transform Algorithms, Advantages, Applications,
San Diego, CA: Academic Press, Inc., 1990
19
Download