Data Science Algorithms Homework: Image Processing

Algorithms_for_DS_HW3 July 3, 2023 Johns Hopkins University Whiting School of Engineering Engineering for Professionals 685.621 Algorithms for Data Science Homework 3 Assigned at the start of Module 6 Due at the end of Module 8 Total Points 100/100 Class, the below is a standard set of instructions for each HW, in this assignment groups will be set up for collaboration. Make sure your group starts one thread for the collaborative problems. You are required to participate in the collaborative problem and subproblem separately. Please do not directly post a complete solution, the goal is for the group to develop a solution after everyone has participated. Please ensure you have a write-up with solutions to each problem and subproblems, you are also required to submit code that will be compiled when grading the assignment. In each of the problems you are allowed to use built-in functions. 1 Module 6 - Note this is not a Collaborative Problem 10 Points Total In this problem the goal is to build a set of numerical images from a set of arrays. The data set is from the Kaggle web site will be used: https://www.kaggle.com/c/digit-recognizer/data This data has a training.csv, test.csv and sample submission.csv files. In this exercise the focus will be on the train.csv data. The web site has the following data description: The data files train.csv and test.csv contain gray-scale images of hand-drawn digits, from zero through nine. Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive. The training data set, (train.csv), has 785 columns. The first column, called ”label”, is the digit that was drawn by the user. The rest of the columns contain the pixel-values of the associated image. 1 Each pixel column in the training set has a name like pixel x, where x is an integer between 0 and 783, inclusive. To locate this pixel on the image, suppose that we have decomposed 𝑥 as 𝑥 = 28𝑖+𝑗, where 𝑖 and 𝑗 are integers between 0 and 27, inclusive. Then pixel x is located on row 𝑖 and column 𝑗 of a 28 × 28 matrix, (indexing by zero). For example, pixel 31 indicates the pixel that is in the fourth column from the left, and the second row from the top, as in the ascii-diagram below. This data is set up in a csv file which will require the reshaping of the data to be 28 × 28 matrix representing images. There are 42000 images in the train.csv file. For this problem it is only necessary to process approximately 100 images, 10 each of the numbers from 0 through 9. The goal is to learn how to generate features from images using transforms and first order statistics. 1. [5 points] Read-in and store the data in a data structure of your choice so that the data is reshaped into a matrix of size 28 × 28 which represents each digit as an image. 2. [5 points] Display the images for indices 0, 1, 3, 6, 7, 8, 10, 11, 16 and 21. These indices represent the numerical values from 0 to 9. [ ]: # import additional libraries import numpy as np import pandas as pd from matplotlib import pyplot as plt # import dataset train_df = pd.read_csv("train.csv") show_outputs = False # define if outputs will be printed / shown [ ]: ## Type code here for part 1 ## # Convert data to numpy array, ignoring first column train_array = train_df.iloc[:, 1:].to_numpy() 1.0.1 plot_pixels plot_pixels Function that takes in a list with numpy matrices that represent each the pixels of an image, create subplots and show each image in a different subplot. • pixel_arrays: list[np.ndarray] - list with the matrices with the pixel data • title: str - string with the upper title of the plot returns: none, show plots [ ]: def plot_pixels(pixel_arrays: list[np.ndarray], title="Plot of pixels"): """Function that takes in a list with numpy matrices that represent each the pixels of an image, create subplots and show each image in a different subplot Keyword arguments pixel_arrays: list[np.ndarray] - list with the matrices with the pixel data title: str - string with the upper title of the plot 2 """ # Determine the size m x n of subplots needed t = len(pixel_arrays) m = int(t**0.5) n = m + int(np.ceil(t**0.5 - m)) if m * n < t: m = m + 1 # Create subplots fg, ax = plt.subplots(m, n, figsize=(10, 10), tight_layout=True) for k in range(m * n): a = k // n # index of the row of the subplot b = k % n # index of the column of the subplot if k >= t: ax[a, b].axis("off") else: ax[a, b].set_title(f"Pixel Array {k}") ax[a, b].imshow(pixel_arrays[k], cmap="gray") # create subplot plt.suptitle(title) # Add superior title plt.show() [ ]: ## Type code here for part 2 ## image_indices = [0, 1, 3, 6, 7, 8, 10, 11, 16, 21] image_pixels = [] # list to keep the arrays with the pixels of each number for i in image_indices: image_pixels.append(train_array[i, :].reshape((28, 28))) if show_outputs: plot_pixels(image_pixels, title="Figure 1. Numbers from train.csv file") 2 Module 6 - Note this is a Collaborative Problem 20 Points Total In this problem each image from the train.csv (42,000 images in total) is to be processed to generate a set of features using the discrete cosine transform and Eigen decomposition. 1. [5 points] Take the 2 dimensional Discrete Cosine Transform (DCT) of each matrix from Problem 1, the matrix represents each number (0-9). 2. [2.5 points] Extract the vertical, horizontal and diagonal coeﬀicients from the transform (using the indexes indicated by the masks provided). 3. [5 points] For each of the three sets of DCT coeﬀicients perform Eigen decomposition. 4. [2.5 Points] Retain the top 20 Eigen vectors of each direction. 5. [5 points] Using your top Eigen vectors reduce the DCT transformed data. This will create a new data set that represents each image as a smaller subset of values. 6. Save the new data in a file of your choice, .txt, .csv, etc. The name is up to you (you will use this in the subsequent question). [ ]: ## Type code here for part 1 ## from scipy.fft import dctn 3 image_dct2 = [] for im in image_pixels: image_dct2.append(dctn(im, type=2, norm="ortho")) if show_outputs: plot_pixels(image_dct2, title="Figure 3. DCT2 transformed numbers") [ ]: ## Type code here for part 2 ## mask_files = ["diagMask.csv", "vertMask.csv", "horizMask.csv"] # list of masks masks = [] # list with the arrays with the masks image_masked = [] # list to store masked images i = 0 # counter to use in loop for m in mask_files: j = 0 # counter to use in loop image_masked.append([]) this_mask = np.loadtxt(m, delimiter=",").astype(bool) # get csv data with␣ ↪mask masks.append(this_mask) for im in image_dct2: # Apply mask and store in a list image_masked[i].append(np.multiply(im, this_mask)) j = j + 1 if show_outputs: plot_pixels(image_masked[i], title=f"Figure {3+i}. DCT2 images with {m[: ↪-4]}") i = i + 1 [ ]: ## Type code here for part 3 ## n_obs = 42000 # number of observations that will be considered print(f"Working with {n_obs} observations\n") image_pixels = [] # list to keep the matrices with the pixels of each number image_dct2 = [] # list to keep the dct2 transformed matrices k = 0 # counter to use in loop for i in range(n_obs): image_pixels.append(train_array[i, :].reshape((28, 28))) # use the built-in dct2 function image_dct2.append(dctn(image_pixels[k], type=2, norm="ortho")) k = k + 1 # Use the masks to flatten out each dct2 image and form matrices # with the masked dct2 image for each mask (diag, vert, horiz) i = 0 # counter to use in loop data_masked = [] for mask in masks: data_masked.append([]) for im in image_dct2: 4 data_masked[i].append(list(im[mask])) data_masked[i] = np.array(data_masked[i]) i = i + 1 # Convert each masked matrix to zero mean, then get the # covariance matrices and then do the eigen decomposition data_masked_zm = data_masked.copy() # make a copy to zero-mean it cov_data = [] eig_values = [] eig_vectors = [] i = 0 for each in data_masked_zm: # convert to zero-mean data_masked_zm[i] = data_masked_zm[i] - np.mean(data_masked_zm[i], axis=0) # get the covariance matrix using columns as features cov_data.append(np.cov(data_masked_zm[i], rowvar=False)) # do the eigen decomposition eig_values.append([]) eig_vectors.append([]) (eig_values[i], eig_vectors[i]) = np.linalg.eig(cov_data[i]) # Even though the eigenvalues are real (because the covariance matrix is # symmetric), numpy shows the imaginary part as 0j, so let us use the # np.real method to eliminate the 0j eig_values[i] = np.real(eig_values[i]) eig_vectors[i] = np.real(eig_vectors[i]) print( f"For the {mask_files[i][:-4]}, there are {eig_values[i].shape[0]}␣ ↪eigenvalues" ) i = i + 1 # increase counter Working with 42000 observations For the diagMask, there are 335 eigenvalues For the vertMask, there are 224 eigenvalues For the horizMask, there are 224 eigenvalues [ ]: ## Type code here for part 4 ## n_top = 20 # number of top eigenvalues to retain print(f"The top {n_top} eigenvalues and eigenvectors are retained") top_eig_val = [] # list with the top eigenvalues in each direction top_eig_vec = [] # list with the top eigenvectors in each direction 5 for i in range(len(eig_values)): # retain the top eigenvalues and eigenvectors top_eig_val.append(eig_values[i][0:n_top]) top_eig_vec.append(eig_vectors[i][:, 0:n_top]) The top 20 eigenvalues and eigenvectors are retained [ ]: ## Type code here for part 5 ## new_features = [] # list with the matrices of new features first_loop = True # boolean value to use in first loop for i in range(len(data_masked_zm)): # Do the transformation multiplying by the eigenvectors matrix new_feature = np.matmul(data_masked[i], top_eig_vec[i]) if first_loop: new_features = new_feature first_loop = False else: new_features = np.append(new_features, new_feature, axis=1) print(f"The new features shape is {new_features.shape}") The new features shape is (42000, 60) [ ]: ## Type code here for part 6 ## # Create column labels, create dataframe and get number labels column_labels = [f"feat_{i}" for i in range(new_features.shape[1])] features_df = pd.DataFrame(new_features, columns=column_labels) features_df["label"] = train_df["label"].iloc[:n_obs] features_df = features_df[["label", *features_df.columns[:-1]]] # Export as csv if show_outputs: features_df.to_csv("transformed_features.csv", index=False) 3 Module 4 Note this is not a Collaborative Problem 20 Points Total In this problem use the developed numerical features from Question 2 (if you are not able to generate the features, they are provided in the module for HW 3). In this problem the following is to be completed: Use the Fisher’s Linear Discriminant Ratio (FDR) from the Data Processing document, specifically Equation 20. 1. [10 points] For each feature and combination of numbers apply the FDR, e.g., 0 vs 1, 0 vs 2, …, 0 vs 9, …, 7 vs 8, 7 vs 9, and 8 vs 9 (which should result in a 60 x 45 matrix where 60 represents the number of features and 45 represents the number of pairwise comparisons). 2. [10 points] Place the results in a table and provide an initial analysis of which feature provides the best class separation. 6 [ ]: # import libraries import pandas as pd # read in the train features data file_name = "trainFeatures42k.xls" features_df = pd.read_excel(file_name, header=None) # new_features = features_df.iloc[:,1:].to_numpy() # create column labels, considering that the first column is the class label column_labels = [f"feat_{i}" for i in range(features_df.shape[1] - 1)] column_labels = ["label", *column_labels] features_df.columns = column_labels [ ]: ## Type the code for part 1 here ## # Get the unique values of the labels (classes) classes = sorted(features_df["label"].unique()) FDRs = [] # list to keep the Fisher's Discriminant Ratios values FDR_classes = [] # list for the names of the rows k = 0 # counter to use in loop for i in range(len(classes)): for j in range(i): FDR_classes.append(f"{classes[i]}-{classes[j]}") FDRs.append([]) for feat in features_df.columns: if feat != "label": # Calculate means and std of the 2 classes mean1 = float( features_df[features_df["label"] == classes[j]][feat].mean() ) mean2 = float( features_df[features_df["label"] == classes[i]][feat].mean() ) std1 = float( features_df[features_df["label"] == classes[j]][feat].std() ) std2 = float( features_df[features_df["label"] == classes[i]][feat].std() ) # Calculate the FDR and append to list this_FDR = (mean1 - mean2) ** 2 / (std1**2 + std2**2) FDRs[k].append(this_FDR) k = k + 1 # Convert list of FDR to a DataFrame FDR_df = pd.DataFrame(FDRs, columns=features_df.columns[1:]) FDR_df["Classes"] = FDR_classes 7 FDR_df = FDR_df[["Classes", *FDR_df.columns[:-1]]] [ ]: ## Type the code for part 2 here ## print("Table 1. FDR of each feature for each pair of classes:") print(FDR_df.T.to_string()) Table 1. FDR of each feature 0 1 7 8 9 15 16 17 23 24 25 31 32 33 39 40 41 Classes 1-0 2-0 4-1 4-2 4-3 6-0 6-1 6-2 7-2 7-3 7-4 8-3 8-4 8-5 9-3 9-4 9-5 feat_0 2.186007 4.32305 0.004375 0.314971 0.534519 1.028907 0.383191 1.359242 0.461032 0.22191 0.037306 0.856415 0.068577 1.140832 0.15169 0.173328 0.279179 feat_1 0.112855 0.237641 0.213848 0.015226 0.042758 0.217858 0.075051 0.002704 0.289862 0.452029 0.177206 0.003012 0.016433 0.039874 0.108749 0.016431 0.196999 feat_2 0.617201 0.509011 1.0298 0.86196 3.741996 0.359315 1.848472 1.603272 0.04248 1.695366 0.681113 1.456388 0.281325 0.042267 3.28672 0.030286 0.431031 feat_3 0.011335 0.001052 0.000387 0.004186 0.260229 0.002839 0.002323 0.000106 0.712327 0.50227 1.249017 0.996356 0.282489 0.303075 0.131933 0.007181 0.007841 feat_4 0.919689 0.679472 1.065433 0.585679 0.352801 2.229075 1.049379 0.670763 0.076159 0.272145 1.270133 0.06289 0.716183 0.032677 for each pair of classes: 2 3 4 5 6 10 11 12 13 14 18 19 20 21 22 26 27 28 29 30 34 35 36 37 38 42 43 44 2-1 3-0 3-1 3-2 4-0 5-0 5-1 5-2 5-3 5-4 6-3 6-4 6-5 7-0 7-1 7-5 7-6 8-0 8-1 8-2 8-6 8-7 9-0 9-1 9-2 9-6 9-7 9-8 0.200442 0.41588 0.534618 1.464421 2.507917 0.308543 0.743904 1.870382 0.010968 0.771178 0.043586 0.380236 0.114712 1.324156 0.056177 0.346078 0.107886 2.983841 0.031388 0.075042 0.704592 0.171019 1.519868 0.193833 0.957562 0.045338 0.0244 0.430049 0.086195 0.201743 0.051983 0.005277 0.34439 0.084934 0.0 0.049361 0.027778 0.116147 0.012136 0.001842 0.05341 0.796043 1.240804 0.528114 0.11324 0.21323 0.064923 0.000153 0.003598 0.257462 0.445358 0.349132 0.05888 0.01933 0.064466 0.057683 0.000751 2.90817 1.009532 0.933053 0.044496 0.35114 0.025921 0.015712 1.21577 0.653207 4.852698 0.164168 1.336514 0.344351 0.064313 0.004633 1.430466 0.112505 0.12967 0.101523 0.761332 0.026392 0.001908 0.743072 0.609847 0.331569 0.435692 0.147894 0.002435 0.132721 0.230808 0.097109 0.016336 0.018002 0.000463 0.004486 0.285563 0.000001 0.163365 0.004568 0.005058 0.975994 1.18245 1.323453 1.011209 0.401123 0.29248 0.220365 0.31378 2.228291 0.000946 0.004368 0.000036 0.000374 0.908346 0.307953 0.00002 1.061172 0.075066 0.044335 2.209851 1.001736 0.036531 0.02028 0.007347 0.520243 0.442586 0.021222 0.600953 0.400754 0.135629 0.21853 1.291128 0.698192 0.001378 0.000599 0.789111 0.071683 0.992643 0.032588 0.018039 8 0.009109 0.539225 0.0001 0.618322 0.211743 0.029676 feat_5 0.001719 0.059056 0.084664 0.055257 0.118642 0.241062 0.001402 0.000009 0.055852 0.089082 0.054118 0.079961 0.000354 0.237444 0.050746 0.037275 0.047961 0.002535 0.192184 0.032073 0.001123 0.120767 0.313882 0.441304 0.00573 0.199602 0.44797 0.357134 0.103581 0.221436 0.356209 0.00586 0.160162 0.356006 0.29157 0.000084 0.166993 0.390016 0.514085 0.020941 0.260416 0.522117 0.425704 0.007171 0.004337 feat_6 0.039457 0.00094 0.053819 0.315693 0.280531 0.339226 0.409855 0.403972 0.434898 0.004703 0.595086 0.692951 0.621412 0.028092 0.009424 0.045198 0.269449 0.030694 0.671331 0.828622 1.156492 0.793183 0.93479 0.819341 0.100538 0.063914 0.029043 1.399729 0.57068 0.654851 0.596976 0.023448 0.006803 0.000234 1.11685 0.033992 1.351434 1.759619 1.373395 0.346688 0.280385 0.215007 2.18526 0.07706 0.227399 feat_7 0.001941 0.108767 0.164273 0.074045 0.110605 0.001051 0.152031 0.171523 0.393638 0.314542 0.045801 0.042943 0.205204 0.162968 0.011752 0.62253 0.834508 0.167859 0.178853 1.076381 0.681594 0.000567 0.000124 0.110702 0.077511 0.120004 0.034858 0.593968 0.003796 0.01038 0.052487 0.034633 0.146621 0.05613 0.397339 0.006096 0.11594 0.137807 0.362653 0.279262 0.008217 0.001572 1.097014 0.085246 0.111995 feat_8 0.015376 0.014541 0.06807 0.029328 0.008139 0.083139 0.027626 0.006567 0.082103 0.000138 0.025228 0.005461 0.076255 0.000151 0.000001 0.000568 0.023516 0.009535 0.038415 0.036765 0.033666 0.130946 0.123479 0.217116 0.04342 0.052347 0.047932 0.148717 0.015531 0.001686 0.054003 0.000704 0.000303 0.000252 0.021484 0.041334 0.004154 0.053824 0.004918 0.068002 0.0673 0.060558 0.001452 0.224627 0.037941 feat_9 0.056164 0.04954 0.001427 0.028271 0.00112 0.003026 0.117167 0.043275 0.013547 0.029881 0.312921 0.290394 0.13877 0.1749 0.086726 0.077948 0.014603 0.004284 0.013941 0.001811 0.089779 0.038988 0.001469 0.003634 0.000027 0.044261 0.255172 0.018262 0.023063 0.003277 0.005454 0.000329 0.037678 0.192954 0.01875 0.000712 0.005552 0.044005 0.034411 0.014389 0.114579 0.365104 0.064569 0.023853 0.010055 feat_10 0.003802 0.033423 0.091603 0.276495 0.591296 0.111969 0.01008 0.065985 0.014035 0.277614 0.181545 0.38825 0.057813 0.007876 0.165181 0.072873 0.205853 0.005027 0.083973 0.049038 0.035615 0.0028 0.034325 0.028355 0.342118 0.004292 0.21208 0.078389 0.122093 0.279901 0.026243 0.030166 0.100002 0.006688 0.010674 0.137088 0.019617 0.018917 0.121202 0.563303 0.094223 0.39349 0.224209 0.063909 0.295932 feat_11 0.089389 0.591033 1.358483 0.062153 0.372249 0.302998 0.058603 0.000636 0.849968 0.217549 0.010925 0.027086 0.724609 0.122451 0.020863 0.018134 0.208785 0.407339 0.012096 0.129861 0.0554 0.155053 0.024587 1.403784 0.454796 0.007218 0.073006 0.287212 0.006742 0.034248 0.671622 0.104188 0.025902 0.000408 0.044503 0.082405 0.019771 0.256465 0.469097 0.015332 0.143156 0.061767 0.000022 0.338822 0.049182 feat_12 0.356933 0.018764 0.159364 0.115555 0.013959 0.041345 0.15505 0.025691 0.053865 0.000014 0.034934 0.684514 0.101117 0.262868 0.353013 0.033082 0.083965 0.00286 0.019196 0.023675 0.122487 0.1745 0.038318 0.058943 0.000092 0.000048 0.399797 0.02485 0.050024 0.720548 0.123289 0.292671 0.387808 0.001652 0.144956 0.43619 0.053071 0.135122 0.005857 9 0.022152 feat_13 0.005417 0.006511 0.001454 0.105329 0.000697 feat_14 0.00079 0.000034 0.008246 0.004484 0.013907 feat_15 0.354559 0.01897 0.106313 0.00398 0.021482 feat_16 0.062767 0.009995 0.003024 0.026817 0.000211 feat_17 0.034249 0.000435 0.02105 0.009318 0.012041 feat_18 0.097494 0.016728 0.00756 0.022511 0.124524 feat_19 0.082481 0.012861 0.062137 0.192514 0.016249 feat_20 4.598488 1.785902 0.00022 0.045139 0.030422 0.03553 0.00505 0.007267 0.02423 0.033368 0.030378 0.028466 0.023181 0.025162 0.002235 0.000383 0.065278 0.068611 0.000668 0.015448 0.075455 0.000284 0.070951 0.000458 0.015853 0.021543 0.025271 0.036493 0.000499 0.044458 0.006107 0.029934 0.014348 0.003278 0.003661 0.076749 0.032445 0.219167 0.018729 0.017388 0.127579 0.026024 0.060381 0.000341 0.078617 0.436858 0.148112 15.273306 0.215865 3.584494 0.050292 0.707704 0.184051 0.000708 0.031469 0.002736 0.00205 0.015052 0.218379 0.000077 0.010886 0.000191 0.005093 0.014695 0.118844 0.004372 0.007967 0.046844 0.231897 0.025901 0.004923 0.002823 0.001375 0.002301 0.038329 0.042531 0.001521 0.003884 0.00231 0.006135 0.004806 0.000606 0.000397 0.029779 0.001137 0.096947 0.013405 0.02944 0.145568 0.019252 0.056264 0.000425 0.273912 0.367393 0.102461 2.34201 0.536908 0.031003 0.210295 0.285979 0.000152 0.033886 0.213198 0.02198 0.006242 0.070533 0.010477 0.010372 0.150765 0.095467 0.125921 0.193278 0.098744 0.023767 0.000254 0.096719 0.005099 0.022882 0.147251 0.000564 0.070169 0.023616 0.054563 0.034868 0.057832 0.003743 0.0835 0.007703 0.021483 0.022872 0.114506 0.022022 0.001729 0.008783 0.001052 0.028557 0.073381 0.027839 0.064328 0.042138 0.016586 0.001973 0.026201 0.06759 0.011044 0.003069 0.036846 0.010548 0.012276 0.000003 0.010129 0.012154 0.001114 0.007612 0.071408 0.008518 0.005685 0.038112 0.032215 0.131502 0.000313 0.095051 0.002604 0.010891 0.004189 0.05494 0.019989 0.007841 0.044747 0.027451 0.094751 0.007266 0.05597 0.000015 0.040332 0.011069 0.006122 0.170036 0.00008 0.05811 0.132377 0.014865 0.013689 0.037846 0.000059 0.010495 0.046221 0.007896 0.021105 0.044204 0.008362 0.031895 0.039152 0.078004 0.020839 0.002951 0.00026 0.001659 0.005704 0.008589 0.000018 0.000981 0.047723 0.011449 0.001997 0.001248 0.007844 0.016106 0.002627 0.024782 0.050139 0.01089 0.002927 0.029637 0.030192 0.066109 0.000924 0.058329 0.001151 0.000343 0.003717 0.020382 0.013895 0.008206 0.001698 0.002423 0.000009 0.001332 0.008108 0.013091 0.000785 0.004018 0.005325 0.006897 0.014302 0.00284 0.000008 0.006485 0.014289 0.018355 0.003035 0.000057 0.000075 0.003617 0.000501 0.091871 0.037734 0.000086 0.001914 0.114552 0.047878 0.000436 0.003301 0.011068 0.023344 0.007255 0.0125 0.040248 0.026997 0.073215 0.016873 0.007505 0.000802 0.064394 0.00209 0.109775 0.018687 0.023637 0.234709 0.073327 0.028472 0.000034 0.015754 0.000247 0.000332 0.013887 0.003015 0.047664 0.035153 0.031151 0.003844 0.000013 0.047757 0.026084 0.173995 0.095711 0.214299 0.070115 0.320715 0.231939 0.164047 0.173355 0.035127 0.072075 0.017798 0.010311 0.014814 0.030801 0.125943 2.822765 4.615823 3.560506 0.040506 0.895865 2.948804 4.773451 0.010569 0.141291 0.184375 0.167769 0.087321 0.009816 2.406134 3.036261 0.007795 0.02709 4.378949 2.029197 0.12846 0.300962 0.146164 2.23394 3.147415 0.002911 10 0.072243 feat_21 1.844964 0.484145 0.646087 0.083392 0.569836 feat_22 0.134812 0.065633 0.716963 0.611085 0.281122 feat_23 0.014784 0.371801 0.418253 0.173064 0.295981 feat_24 0.075235 0.041185 0.004641 0.40411 0.153216 feat_25 0.009906 0.128737 0.01339 0.01873 0.197504 feat_26 0.000744 0.424792 0.174619 0.228432 0.002807 feat_27 0.276895 0.139322 0.05929 0.00867 0.128967 feat_28 0.060117 0.021218 0.091903 0.02641 0.174811 0.009359 0.421108 1.044719 0.405473 0.058425 0.045692 0.334925 0.261719 0.861875 0.742002 1.036674 0.005046 0.006903 0.135549 0.159481 0.959906 0.065119 0.015531 0.041671 0.005844 0.004238 0.212567 0.000019 0.011427 0.000648 0.153676 0.187885 0.000142 0.028317 0.012463 0.134296 0.003133 0.036223 0.089898 0.171643 0.004474 0.03375 0.363466 0.046422 0.138823 0.610449 0.123858 0.307266 0.143975 0.648539 0.014492 0.002355 0.001797 0.317939 0.23843 0.041591 0.010621 0.064978 0.51324 0.142067 0.246886 0.487518 0.030459 0.068671 0.965163 0.516886 0.422101 0.039766 0.026241 0.050673 0.005127 0.069585 0.256083 0.184405 0.000002 0.487856 0.161589 0.100471 0.095406 0.41724 0.099268 0.047036 0.00664 0.125998 0.000398 0.133284 0.083018 0.043506 0.108345 0.017546 0.538913 0.077535 0.118296 0.448359 0.049571 0.012515 0.046112 0.00062 0.010585 0.003443 0.015444 0.776998 0.527164 0.000194 0.359763 0.063592 0.486601 0.072686 0.002122 0.587697 1.657401 0.021569 1.143295 0.263453 0.123697 0.357583 0.078917 0.003641 0.01038 0.139753 0.315478 0.090283 0.222319 0.124392 0.024351 0.129545 0.009535 0.338888 0.146069 0.212023 0.05731 0.007218 0.020669 0.057682 0.382325 0.554945 0.025498 0.003801 0.311365 0.159727 0.045607 0.213717 0.003216 0.487643 0.032793 0.057269 0.000038 0.145176 0.001599 0.177933 0.517557 1.203254 0.043149 1.032586 1.185593 0.05149 0.000688 0.208801 0.201506 0.000111 1.307988 2.387446 0.341877 0.834302 1.768162 0.233587 0.138706 1.516347 2.691754 0.840961 0.013653 0.243155 0.184995 0.056882 0.003063 0.530139 0.687335 0.225145 0.330621 0.757667 1.008162 0.13236 0.961913 0.522226 1.678743 0.022327 1.012337 0.466965 1.986327 0.570582 0.137433 0.290228 0.086859 1.284876 1.103534 0.631526 0.130023 0.048567 0.044542 0.134379 0.505942 0.00492 0.059782 0.046678 0.00485 0.00059 0.280539 0.292519 0.150192 0.010053 0.227169 0.142555 0.06372 0.059141 0.099691 0.017862 0.258795 0.022574 0.479057 0.018022 0.019418 0.563158 0.000931 0.27611 0.077357 0.098552 0.014951 0.066796 0.063984 0.029164 0.097055 0.01015 0.000031 0.000028 0.031817 0.03958 0.010102 0.014991 0.054735 0.069079 0.009777 0.003501 0.023571 0.206855 0.136798 0.001332 0.101316 0.092114 0.055299 0.073359 0.354237 0.010313 0.009514 0.051765 0.017752 0.028461 0.058643 0.275947 0.212736 0.084549 0.163686 0.00011 0.007687 0.123135 0.071607 0.05668 0.090373 0.061284 0.054157 0.339985 0.467571 0.060594 0.007167 0.002734 0.182871 0.18751 0.594117 0.181153 0.001353 0.018564 0.048567 0.243275 0.000211 0.036823 0.018141 0.448171 0.161319 0.229029 0.378745 0.020618 0.077783 0.023287 0.119212 0.031427 0.002244 0.004725 0.072758 0.047694 0.192871 0.111331 0.027479 0.053409 0.000103 0.181584 0.004891 0.279501 0.037219 0.069587 0.174346 0.096356 0.014393 0.012644 0.188257 0.137921 0.032217 0.152175 0.128774 0.04633 0.090995 0.10904 0.002608 0.027443 0.167984 0.083927 11 0.011035 0.013904 0.000107 0.118794 0.00025 0.004341 feat_29 0.081639 0.000203 0.079294 0.019216 0.260977 0.025979 0.151801 0.020435 0.151213 0.369465 0.107586 0.003333 0.105899 0.305296 0.00746 0.160936 0.029951 0.159935 0.356701 0.00158 0.014591 0.069591 0.000454 0.066894 0.229511 0.025613 0.005978 0.035526 0.400812 0.183742 0.410844 0.786808 0.071196 0.133788 0.043277 0.194424 0.025365 0.023376 0.022404 0.129333 0.079817 0.042442 0.090295 0.016484 0.32148 feat_30 0.045189 0.020469 0.299687 0.066087 0.50445 0.031762 0.013832 0.209864 0.00065 0.033246 0.073694 0.571162 0.040671 0.000228 0.040851 0.036022 0.280036 0.007017 0.00395 0.009633 0.005946 0.027326 0.346921 0.001132 0.021955 0.003053 0.029163 0.003316 0.243853 1.038998 0.279851 0.127048 0.247588 0.126462 0.131727 0.255765 0.004567 0.165789 0.011536 0.074068 0.005123 0.088298 0.028541 0.019684 0.35784 feat_31 0.250007 0.024757 0.141719 0.066909 0.075793 0.011977 0.019095 0.115376 0.000002 0.009502 0.000012 0.295113 0.029827 0.081445 0.021798 0.000223 0.274751 0.031385 0.078785 0.023981 0.000407 0.009289 0.189286 0.004257 0.030319 0.003097 0.010907 0.013156 0.039422 0.11058 0.00207 0.004002 0.001706 0.047661 0.048013 0.012118 0.000108 0.294287 0.033005 0.084533 0.02456 0.00024 0.00003 0.013378 0.050983 feat_32 0.091937 0.010548 0.05801 0.007206 0.222092 0.053572 0.010578 0.041235 0.000117 0.045111 0.000025 0.142659 0.015946 0.013965 0.014489 0.024948 0.025721 0.004914 0.083872 0.002709 0.0371 0.003957 0.167248 0.03581 0.000372 0.031979 0.007147 0.060295 0.002532 0.083248 0.003441 0.025811 0.0039 0.003348 0.015159 0.016251 0.02014 0.026537 0.003036 0.068118 0.001578 0.028518 0.000113 0.049651 0.011231 feat_33 0.055235 0.10517 0.003292 0.268065 0.052546 0.036248 0.089074 0.004511 0.00031 0.021557 0.342351 0.09495 0.077333 0.010066 0.051627 0.005086 0.023071 0.049986 0.158574 0.045938 0.219816 0.002234 0.035828 0.073961 0.213266 0.064438 0.283108 0.000745 0.056781 0.001485 0.000052 0.022759 0.000428 0.049476 0.028589 0.040051 0.028599 0.005053 0.019165 0.102416 0.019143 0.157541 0.007551 0.014843 0.010035 feat_34 0.014623 0.126754 0.097527 0.018955 0.000776 0.077427 0.031926 0.132918 0.334965 0.138632 0.000249 0.020802 0.179584 0.026644 0.059758 0.001027 0.027462 0.160825 0.032881 0.022465 0.003122 0.050638 0.179363 0.393938 0.183797 0.00263 0.09196 0.039148 0.009787 0.079818 0.272274 0.086181 0.010331 0.022562 0.004292 0.024593 0.017113 0.000583 0.071514 0.000003 0.12411 0.022622 0.029626 0.164726 0.075472 feat_35 0.006065 0.119656 0.113045 0.073739 0.061695 0.006089 0.007854 0.000692 0.073991 0.037574 0.009332 0.044721 0.221375 0.153898 0.039557 0.023674 0.01087 0.053939 0.022024 0.003831 0.07973 0.027923 0.014574 0.051625 0.02006 0.005445 0.090752 0.00013 0.000695 0.002621 0.113195 0.066884 0.004346 0.017332 0.017908 0.021802 0.094819 0.088485 0.005425 0.000128 0.05066 0.20198 0.031859 0.029611 0.088318 feat_36 0.011586 0.01402 0.056934 0.029118 0.088688 0.002734 0.025663 0.004197 0.077816 0.10978 0.003264 0.029415 0.003909 0.013337 0.047842 0.060485 0.019754 0.153415 0.210847 0.003534 0.101073 0.000018 0.012245 0.017431 0.035928 0.027252 0.004313 0.06718 0.006179 0.033917 0.000865 0.00578 0.05214 0.000722 0.10124 0.007637 0.002135 0.003751 0.028751 12 0.050407 0.014124 0.011239 0.040085 0.00202 0.015335 feat_37 0.049647 0.002463 0.026986 0.016249 0.009179 0.00524 0.002526 0.017808 0.000062 0.002904 0.001209 0.057645 0.006269 0.023051 0.005808 0.003656 0.017098 0.000254 0.002313 0.000047 0.007515 0.034349 0.002043 0.015858 0.002877 0.009811 0.04222 0.009032 0.005142 0.013809 0.000747 0.001271 0.000292 0.009417 0.000112 0.00669 0.020487 0.004012 0.008471 0.00066 0.005291 0.027357 0.004602 0.000535 0.003131 feat_38 0.000722 0.001266 0.000122 0.009633 0.022913 0.025274 0.002296 0.007622 0.00919 0.002909 0.017668 0.036601 0.039203 0.001871 0.008747 0.000797 0.004009 0.005217 0.005722 0.000452 0.013055 0.027413 0.055753 0.05853 0.00618 0.016658 0.001026 0.022575 0.014228 0.028017 0.03033 0.001191 0.006549 0.000019 0.010017 0.001112 0.010694 0.025776 0.028257 0.000011 0.003458 0.001722 0.006577 0.006118 0.00106 feat_39 0.054404 0.18086 0.075323 0.087238 0.013165 0.019353 0.149549 0.055503 0.00022 0.013402 0.13967 0.040305 0.007699 0.003818 0.004349 0.003571 0.027847 0.138404 0.057111 0.112601 0.099589 0.115406 0.030344 0.006239 0.003152 0.0036 0.0 0.081734 0.086014 0.017662 0.008466 0.001 0.005494 0.000416 0.058953 0.000369 0.048674 0.001524 0.03436 0.003209 0.026092 0.01365 0.02776 0.01151 0.006495 feat_40 1.211417 0.08914 0.283816 0.148372 2.34235 0.40385 4.21039 4.428045 1.82292 5.796107 0.040976 0.382053 0.006457 0.284762 1.961286 1.37305 0.122625 0.439833 2.428394 1.174854 0.545493 3.773238 2.924691 1.685542 5.263264 0.000003 1.824495 0.954263 0.097458 0.823756 0.002895 0.523241 4.30832 0.001863 0.982475 3.667279 3.14537 1.781355 1.454824 4.483965 0.001152 1.591503 0.682911 0.000886 2.859376 feat_41 0.007403 1.337053 1.969598 0.205002 0.442988 2.331113 0.003032 0.044746 1.886246 0.221444 0.242826 0.501297 2.398531 0.002072 0.264925 2.450197 5.148559 0.023903 3.862155 3.9963 3.90909 0.808454 3.39283 4.854472 0.074748 1.253167 0.043516 10.443484 0.313457 0.970772 3.244972 0.00019 0.399431 0.00159 6.306819 0.124688 0.575587 2.089938 4.168179 0.027343 0.828212 0.01108 8.600509 0.025434 0.040316 feat_42 2.1101 1.16983 0.006417 1.650308 0.075608 0.071956 3.974959 1.582564 0.658036 0.191307 0.573308 0.136056 0.058159 0.22491 0.953293 1.777648 0.000124 0.006089 0.059301 0.977385 0.119394 0.127983 1.836485 0.755553 1.223278 4.000659 0.261381 1.38187 1.554898 0.004597 0.012253 0.032582 0.576476 0.124144 0.002653 1.127756 1.966502 0.0255 0.027764 0.024977 0.722364 0.176353 0.014276 1.587253 0.002288 feat_43 1.107126 0.103972 1.399705 0.3105 0.006737 0.590882 0.228755 0.231278 0.532279 0.038807 0.230643 0.037503 0.494725 0.005967 0.011794 0.290388 0.292426 0.618134 0.036367 0.000501 0.009559 0.095803 0.160216 0.305691 0.05006 0.004688 0.022232 0.007851 0.012663 1.474935 0.052504 0.421854 0.359084 0.330107 0.449676 0.162709 0.003624 0.68603 0.125615 0.226065 0.13807 0.16049 0.172365 0.057584 0.026711 feat_44 0.548481 0.352205 0.000779 0.040358 0.234569 0.154413 0.059109 1.47697 0.808672 0.215855 0.006579 0.680101 0.443327 0.077944 0.022575 0.021706 1.439901 0.704626 0.144666 0.014977 0.002644 0.995627 0.278949 0.086391 0.580955 2.126796 1.147397 2.119017 0.019342 0.471053 0.262656 0.006501 0.182036 0.050635 0.110701 0.961754 0.015367 1.150271 0.617744 13 0.120584 0.018466 0.001043 0.000436 1.773645 0.087784 feat_45 0.640098 0.081072 0.27797 0.004135 0.695413 0.058124 0.042501 0.736678 0.015433 0.022734 0.19157 0.085798 0.029537 0.166948 0.100521 0.009918 1.213487 0.183253 0.03276 0.131606 0.355391 0.008843 1.535614 0.075585 0.000439 0.030439 0.228399 0.060053 0.061827 0.354277 0.001855 0.040502 0.005929 0.04683 0.151985 0.051589 0.001404 1.344092 0.101185 0.001523 0.053146 0.259161 0.028887 0.00618 0.074872 feat_46 1.054235 0.029898 0.318662 0.131813 0.103392 0.028813 0.016516 1.018781 0.07269 0.196991 0.225789 0.001328 0.096777 0.028121 0.291777 1.548792 0.371292 0.734905 0.418054 1.526292 0.109108 0.580936 0.002683 0.23458 0.085743 0.655784 0.003598 0.145928 0.273954 0.073451 0.07621 0.007242 0.353086 0.012398 0.420184 0.056785 0.386348 0.000044 0.159782 0.052217 0.465578 0.000655 0.13945 0.001559 0.028754 feat_47 0.007851 0.026261 0.096177 0.061352 0.193499 0.006694 0.002603 0.003169 0.064926 0.136727 0.004751 0.028676 0.006968 0.026183 0.01698 0.148261 0.215737 0.323524 0.461283 0.204259 0.189111 0.084387 0.11092 0.229251 0.348156 0.110352 0.120726 0.014635 0.000458 0.005821 0.040447 0.08913 0.001005 0.009078 0.169753 0.094123 0.010123 0.0013 0.08911 0.168882 0.005453 0.030302 0.13919 0.064523 0.008103 feat_48 0.255814 0.234407 0.026863 0.069664 0.009906 0.038473 0.015715 0.399924 0.339594 0.135643 0.145549 0.01461 0.000013 0.026279 0.215509 0.050739 0.020997 0.053053 0.001181 0.109226 0.036076 0.093522 0.000543 0.017711 0.002765 0.163007 0.012748 0.007247 0.210369 1.15893 0.797381 0.449465 0.101475 0.497807 0.398094 0.474945 0.084158 0.009219 0.03871 0.000085 0.159097 0.025571 0.002047 0.002132 0.517372 feat_49 0.120227 0.191979 0.058658 0.12469 0.032768 0.000064 0.000567 0.227839 0.260924 0.154764 0.097519 0.378995 0.435951 0.316442 0.101729 0.025305 0.226652 0.29375 0.204486 0.023811 0.023298 0.113731 0.008525 0.016921 0.012081 0.167344 0.343719 0.208634 0.002852 0.04041 0.105021 0.076466 0.005998 0.104421 0.035085 0.051966 0.020449 0.039708 0.112868 0.070664 0.037 0.193563 0.085359 0.048103 0.003754 feat_50 0.041982 0.028112 0.003223 0.008819 0.001914 0.005461 0.02633 0.257669 0.088441 0.054615 0.007855 0.072933 0.050589 0.026213 0.001452 0.050668 0.010662 0.000334 0.011085 0.157697 0.079401 0.034371 0.189297 0.095116 0.062199 0.002804 0.005599 0.153052 0.033303 0.003515 0.000027 0.005707 0.112373 0.058677 0.000697 0.115345 0.119089 0.065052 0.014427 0.045582 0.272024 0.150271 0.014305 0.253223 0.019284 feat_51 0.032558 0.049715 0.017913 0.229399 0.435537 0.344666 0.067247 0.351338 0.179272 0.106929 0.004438 0.046005 0.063936 0.140856 0.018208 0.368674 0.732922 0.478026 0.002631 0.200068 0.220748 0.188588 0.718052 0.301039 0.042549 0.041018 0.074897 0.095062 0.229541 0.189695 0.022172 0.689942 0.611595 0.217877 0.998303 0.906083 0.021259 0.00006 0.016129 0.365097 0.207927 0.0354 0.580447 0.425139 0.142021 feat_52 0.000799 0.129712 0.200215 0.024698 0.043747 0.032884 0.015604 0.020192 0.221586 0.069881 0.020162 0.033642 0.029861 0.000006 0.057352 0.046558 0.104261 0.036917 0.000327 0.125664 0.000375 0.182887 0.288755 0.002051 0.054288 0.297669 0.048672 0.064549 0.322353 0.588639 0.012737 0.10373 0.508445 0.090098 0.138323 0.004274 0.071618 0.103613 0.328002 14 0.143141 0.025018 0.120283 0.233254 0.4182 0.659545 feat_53 0.012204 0.073153 0.057298 0.11674 0.105844 0.003151 0.031488 0.01515 0.021717 0.048796 0.007763 0.036453 0.100169 0.144095 0.057784 0.000892 0.003017 0.050058 0.08278 0.014962 0.011445 0.286506 0.364765 0.027531 0.010881 0.17076 0.292025 0.197826 0.009033 0.000669 0.02771 0.052402 0.00256 0.026757 0.003556 0.140608 0.012477 0.00078 0.039892 0.07543 0.005544 0.034715 0.004234 0.230419 0.000037 feat_54 0.024658 0.031854 0.007877 0.044229 0.016557 0.001168 0.063595 0.029537 0.003221 0.000388 0.032752 0.009792 0.000246 0.00028 0.001341 0.000287 0.029317 0.035762 0.04844 0.068406 0.036466 0.012163 0.000021 0.004268 0.009686 0.015952 0.005908 0.015041 0.012682 0.000016 0.004515 0.010159 0.016792 0.006195 0.015669 0.0 0.0169 0.000796 0.002287 0.006626 0.011668 0.003669 0.020139 0.000321 0.000357 feat_55 0.091743 0.003223 0.065974 0.006908 0.015988 0.01344 0.036749 0.00292 0.038474 0.005394 0.04672 0.002275 0.050465 0.015663 0.005494 0.039648 0.000076 0.043355 0.010148 0.001804 0.000999 0.009279 0.035461 0.015951 0.00005 0.010123 0.022267 0.01563 0.158265 0.047414 0.127019 0.065274 0.048021 0.009813 0.021005 0.100776 0.005721 0.135074 0.000002 0.020109 0.064865 0.070249 0.063854 0.027903 0.199458 feat_56 0.0 0.006619 0.00875 0.021239 0.03644 0.037214 0.064713 0.136505 0.009879 0.169403 0.062625 0.096328 0.014911 0.144466 0.002151 0.0 0.000002 0.007246 0.024084 0.077469 0.070428 0.020141 0.043361 0.03611 0.000528 0.208174 0.160165 0.023786 0.040379 0.068646 0.055464 0.00305 0.22175 0.184432 0.045825 0.007246 0.111272 0.242778 0.108373 0.026903 0.462532 0.32708 0.132533 0.048686 0.010033 feat_57 0.005367 0.009695 0.030947 0.006631 0.028186 0.000515 0.053812 0.148447 0.009512 0.017789 0.187076 0.29064 0.101129 0.125949 0.072287 0.0081 0.038553 0.000555 0.000001 0.021278 0.138928 0.007859 0.030493 0.000228 0.000063 0.015093 0.119404 0.000061 0.00061 0.000857 0.012234 0.009131 0.051161 0.175347 0.010645 0.010389 0.004487 0.028741 0.002007 0.000492 0.030288 0.1554 0.000643 0.000932 0.00694 feat_58 0.00271 0.034137 0.030001 0.005064 0.001973 0.013198 0.004748 0.001404 0.018226 0.000158 0.292639 0.356595 0.045149 0.156097 0.214611 0.048565 0.0516 0.001325 0.012048 0.020688 0.115403 0.000287 0.004125 0.035114 0.006422 0.006157 0.262742 0.046126 0.106304 0.121058 0.002667 0.043114 0.063717 0.043416 0.014763 0.098202 0.040422 0.041249 0.002463 0.008787 0.015657 0.127819 0.000375 0.038936 0.019478 feat_59 0.030523 0.001767 0.033021 0.000504 0.013848 0.003379 0.012975 0.002344 0.018769 0.005621 0.008032 0.000762 0.013357 0.003924 0.000006 0.009902 0.001392 0.015624 0.004548 0.0 0.000005 0.02876 0.001852 0.033425 0.016016 0.004918 0.002478 0.003619 0.056149 0.021342 0.0587 0.037804 0.024647 0.015165 0.019671 0.008756 0.026874 0.169677 0.008486 0.028427 0.092553 0.0533 0.06896 0.122903 0.155213 To evaluate the “usefulness” of each feature to separate the classes, we will calculate as indicator the sum of the FDRs of each feature. 15 [ ]: ## Provide code for analysis ## feature_ranking = ( FDR_df.iloc[:, 1:].sum(axis=0).sort_values(ascending=False) ) # get ranking of features n_top = 10 # number of top ranked features to consider print(f"Table 2. Top {n_top} best ranked features.") print(feature_ranking.iloc[:n_top]) 3.1 Type analysis for part 2 here Since the Fisher’s Discriminant Ratio is obtained as the difference of the means squared, divided by the sum of the variances of the classes, a larger value of FDR is an indication that the means of the classes are more separated and/or the variances of the classes are smaller, so the larger the FDR value, the better separability between classes. Considering this, the features that provide the best separability between the classes are features 41, 40, 20, 2 and 42. Table 2 shows the top 10 features. 4 Cross-Validation [2], [7] This is a Collaborative Problem Not covered in lecture notes 20 Points Total In this problem you are to develop and implement a k-fold cross validation algorithm. You are allowed to use either the Iris data set or the developed numerical features from HW2 to test your implementation. In this problem the following is to be completed: 1. [5 points] Develop (pseudocode) an algorithm to randomly shuffle input data. Then divide the data into groups of testing and training sets based on the number of desired folds/experiments, the term used will be k-fold cross validation. Use the 5-fold cross validation in Figure 1 as a reference. 2. [5 points] Implement your k-fold cross validation algorithm. 3. [5 points] Test your implementation using the numerical features generated question 2. 4. [5 points] Perform analysis to determine if your implementation is correct. Explain your method of analysis and conclusions. [ ]: %matplotlib inline from IPython.display import Image Image('cross_val.png') 4.1 Type your psuedocode for part 1 here You may use this cell in markdown or python code based on your preference 4.1.1 shuffle_data shuffle_data Function that takes in a dataframe, shuffles its rows randomly and returns the shuffled dataframe • x_df: pandas DataFrame - pandas dataframe to shuffle 16 returns: DataFrame with rows shuffled randomly [ ]: ## Type response to part 2 here ## import random def shuffle_data(x_df: pd.DataFrame): """Function that takes in a dataframe, shuffles its rows randomly and returns the shuffled dataframe Input Arguments: x_df - pandas dataframe to shuffle Returns: the dataframe with rows shuffled randomly """ indices = x_df.index # get indices of the dataframe n_rows = len(indices) # number of indices shuffled_indices = [] # list to store the shuffled indices for i in range(n_rows): shuffled_indices.append(indices.pop(random.randrange(0, len(indices)))) shuffled_df = x_df.loc[shuffled_indices] return shuffled_df [ ]: def k_fold(x_df: pd.DataFrame, k: int) -> list[tuple[tuple[int]]]: """Function that takes in a dataframe and an integer k and returns the necessary indices to perform a cross validation k times. Input Arguments: x_df - pandas dataframe with the train and test data k - int with the number of cross validations that will be performed Returns: list with k tuples, each tuple containing two lists, one with the indices of the training dataset and one with the indices of the test dataset """ n_rows = x_df.shape[0] # number of rows in dataframe indices = x_df.index # make sure k is not too big if k > n_rows: print("k cannot be greater than the number of rows of the dataframe") result = None raise ValueError else: n_train = int(np.ceil(n_rows/k)) k_fold_list = [] for i in range(k): 17 list1 = tuple(indices[i*n_train:(i+1)*n_train]) list2 = tuple([*indices[0:i*n_train], *indices[(i+1)*n_train:]]) k_fold_list.append((list1, list2)) result = k_fold_list return result [ ]: ## Type the code for part 3 here ## import numpy as np import pandas as pd iris_df = pd.read_csv("iris.csv") x = k_fold(iris_df, 5) print(x) [ ]: ## Type your code for part 4 here ## 4.2 5 Explain your methods for part 4 here 5. - Module 8 Note this is a Collaborative Problem - Parzen Window 30 Points Total In this problem the following is to be completed: 1. [10 points] Using your 5-fold cross validation implementation from Problem 4, the Gaussian kernel in Eq. 27 (Parzen Window) of the Machine Learning document, implement an algorithm to process training observations and compare with test observations. 2. [10 point] Using all observations and the petal length from the Iris data replicate the subfigures in Figure 2. 3. [10 point] Using all observations, the petal length and the petal width from the Iris data replicate the subfigures in Figure 3 without contour lines. [ ]: Image("gk11d.png") [ ]: Image("gk251d.png") [ ]: Image("gk51d.png") [ ]: Image("gk12d.png") [ ]: Image("gk252d.png") [ ]: Image("gk52d.png") [ ]: ## Type the code for part 1 here ## [ ]: ## Write the code for part 2 here ## [ ]: ## Write the code for part 3 here ## 18 6 References [1] Bishop, Christopher M., Neural Networks for pattern Recognition, Oxford University Press, 1995 [2] Bishop, Christopher M., Pattern Recognition and Machine Learning, Springer, 2006, https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-PatternRecognitionand-Machine-Learning-2006.pdf [3] Duin, Robert P.W., Tax, David and Pekalska, Elzbieta, PRTools, http://prtools.tudelft.nl/ [4] Franc, Vojtech and Hlavac, Vaclav, Statistical Pattern Recognition Toolbox, https://cmp.felk.cvut.cz/cmp/software/stprtool/index.html [5] Fukunaga, Keinosuke, Introduction to Statistical Pattern Recognition, Academic Press, 1972 [6] Goodfellow, Ian, Bengio, Yoshua, and Courville, Aaron, Deep Learning, MIT Press, 2016, https://www.deeplearningbook.org/contents/ml.html [7] Russell, S., and Norvig, P., Artificial Intelligence A Modern Approach, 4th Edition, Pearson, 2020 [8] Fisher, R.A., The use of Multiple Measurements in Taxonomic Problems, Annals of Human Genetics, Vol. 7, Issue 2, pp. 179-188, 1936 [9] Hotelling, H., Analysis of a complex of statistical variables into principal components, Journal of Educational Psychology, Number 24, pp. 417–441, 1933 [10] Rao, K. P. and Yip, P., Discrete Cosine Transform Algorithms, Advantages, Applications, San Diego, CA: Academic Press, Inc., 1990 19

Data Science Algorithms Homework: Image Processing

Products

Support

Data Science Algorithms Homework: Image Processing

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib