Solution : 1 Visualizing the Histograms of each feature of the data. Feature Transformations (taking log or not). Feature Normalization (Z-scoring). Doing Clustering on the data using K-means clustering. Correct Order: E,D,A,C,B Projection of Data using PCA projection. Solution : 2 a) Increasing the Dimension. b) Full Covariance Matrix / Non equal Diagonal. Complexity increases by c) Increase the K [the no. of clusters]. d) Decreasing Sigma. e) Increasing the no. of Gaussians. Solution-3 a) PCA. b) Single Linkage Agglomerative Clustering. c) MDS. d) Histogram. e) Parzen window. f) Self Organizing Map [SOM]. g) Spectral Clustering. h) Itemset Mining. i) Hierarchical Clustering. j) Mixture Of Gaussians [MoG]. Similarity & Difference (a) PCA vs. SOM Similarity: They can be both thought of as projections and dimensionality reduction algorithms. PCA does it by defining new principal components which are orthogonal to each other and linear combination of all the existing dimensions. SOM tries to project the entire data onto a lower dimensional space (mostly 2D) wherein each observation is a point in the new space with similar observations closer to each other. Difference: SOM is a visualization tool wherein the entire data is projected onto a 2D space and results in similar observations being close to each other. Hence, it results in clustering of data. PCA doesn’t result in any direct clustering. Also, PCA works on only numeric data while SOM can work on numeric as well as text data. Similarity & Difference (b) PCA vs. MDS Similarity: Both are ways to do projections. While PCA projects the data on orthogonal principal components, MDS projects the data onto a X-Y plane such that the proximity is preserved. Also, both the algorithms help reduce dimensionality. Difference: While PCA works on multi dimension numeric data, MDS can be used in cases where pair wise distances are given or can be calculated. (c) SOM vs. MDS Similarity: Again, both of these are projection methods. While SOM projects the data onto a grid, MDS projects the data on the basis on distance between the observations. Both thus result in dimensionality reduction. Difference: The main purpose of SOM is clustering or grouping similar observations together. However, the main purpose of MDS is create a graph out of the distance data while preserving the proximity between the points. So, SOM works on the original vector data while MDS works on pair wise distance data. Similarity & Difference (d) Spectral Clustering vs. Partitional Clustering Similarity: They are similar in the sense of the underlying principal that they both are trying to divide the data into different clusters, and you need to decide the number of clusters before starting the process of clustering i.e. the number of clusters is an input to the algorithm Difference: While spectral clustering is applied on graph data or where pair wise distance data is given, partitional clustering is done on vector data. (e) K-means clustering vs. Mixture of Gaussians (MOG) Similarity: Both the algorithms are used for dividing the data into groups which are called clusters in k-means and mixtures in MOG. In a sense, k-means is a more specialized version of Mixture of Gaussians. Difference: While k-means only looks at mean vectors, mixture of gaussians looks at both mean vectors as well as covariance matrix. As a result, Mixture of Gaussians allows elliptical gaussians with different radius. In contrast, k-means allows only circular gaussians and fixed radius. Similarity & Difference (f) Parzen Window density estimation vs. Mixture-of-Gaussians (MOG) density estimation Similarity: Both are density estimation algorithms i.e. they calculate the probability of a point given the existing data. In a way Parzen window is a special form of mixture of gaussian wherein the number of mixtures is equal to the number of observations. Difference: Parzen window is non-parametric and thus requires the entire data to calculate density of any point. MOG is a parametric form of density estimation, wherein we calculate the parameters of the assumed probability density function and then calculate the density of a point using that the density function. Parzen window is not suited for real time analysis while MOG can be used for real time density estimation and hence in real time outlier/fraud detection. (h) Support of an Item-Set vs. Coherence of an Item-set Similarity: Both of these are parameters of a group of products that help us understand whether the products of the group are bought together or not. Both should have a high value is their respective algorithms. Difference: Both are used in difference contexts. Support is used in item-set mining and represents the number of times the group of products (item-set) were bought together. Coherence is used which doing co-occurrence analysis in which we try to find the soft maximal clique that is a clique which has more coherence than both its up-neighbors and down-neighbors. Similarity & Difference (g) Frequent Itemset Mining vs. Logical Itemset Mining Similarity: They are similar in the purpose that they want to achieve i.e. to understand the latent intent of the customer and be able to find patterns in the purchase of items and thus use these patters for either product promotion, inventory planning or store layout optimization. Difference: While frequent itemset mining in used to find large and frequent itemset based on the data of products purchased together (retail data), it fails to capture the entire latent intent of the customer. This is because one purchase (or a group of purchases) can represent both a sub-set of the complete latent intent or a mixture of 2 latent intents. Logical itemset mining overcomes these deficiencies by looking a pair wise co-occurrence across customers to build a logical itemset which represents the complete latent intent and is able to capture the rare events as well. (i) Frequent Item-set Mining vs. N-gram patterns mining Similarity: There are similar in the sense that in both the algorithms, we are trying to find things that are purchased or used together. In item-set mining it could be items that are bought together, which in n-grams it is the words that are used together. Difference: While in item-set mining the order doesn’t matter, it matters in n-gram pattern mining. Also, in n-gram pattern mining, you need more than one consistency matrices. For eg. if you are trying to find an n-gram of length 4, you will to check consistency for 1 step ahead, 2 step ahead and 3 step ahead and all should be high for an n-gram of length 4. Similarity & Difference (j) K-means clustering vs. Spherical K-means clustering Similarity: Both are partitional clustering algorithms and are similar in the sense that they do clustering using the EM algorithm which involves 2 iterative steps of assigning each data point to the cluster centers and then calculating the new cluster centers. Also, both require the number of clusters as an input i.e. k is a hyper parameter that needs to be decided before starting the algorithms. Difference: While K-means is applied on the vector data, spherical k-means is used for clustering text data. As a result, there are 2 additional steps required in spherical k-means. First is to normalize text data so that length of data doesn’t matter and only relative proportion of words matter. We use TFIDF for this purpose. Similarly, this normalization is required again when we find the new cluster centers at the end of each iteration. This is again required to make length of the new mean vector same as the data points. Solution-5 Parameters Definition Hyperpar ameters a) PCA projections [K x D] K K= no. of projections b) The K mean vectors [K x D] K K= no. of clusters c) The Projected points[K x N] K K= no. of MDS dimensions d) The SOM grid clusters [K x K x D] K K= size of the grids e) The mean & the covariance of each Gaussian [ K x D + K x D] K K= no. of Gaussians, fact that the covariance matrix is diagonal f)The means of Top level & Next Level clusters [ K1 x D + K1 x K2 X D] K1 , K2 K1 , K2 : no. of clusters at 2 levels Solution-6 a) 𝑀1 × 𝑀2 × ⋯ × 𝑀𝐷 − 1 = b) 𝐷 Π𝑑=1 𝑑+1 2 −1= 𝐷 Π𝑑=1 𝑀1 − 1 + 𝑀2 − 1 + ⋯ + 𝑀𝐷 − 1 = σ𝐷 𝑑=1 𝑑 + 1 𝑫 𝑫+𝟏 𝟐𝑫+𝟏 𝐷 2 σ𝐷 σ 𝑑 + 2 𝑑 = + 𝑫(𝑫 + 𝟏) 𝑑=1 𝑑=1 2 𝑑+1 2 −1= 𝟐 𝑫+𝟏 ! - 1 2 − 𝐷 = σ𝐷 𝑑=1 𝑑 + 2𝑑 + 1 − 𝐷 = 𝟔 c) 𝑀1 − 1 + 𝑀2 − 1 × 𝑀1 + 𝑀3 − 1 × 𝑀2 + ⋯ + 𝑀𝐷 − 1 × 𝑀𝐷−1 = (𝑀1 − 1) + (𝑀1 × 𝑀2 − 𝑀1 ) + (𝑀2 × 𝑀3 − 𝑀2 ) + ⋯ + (𝑀𝐷−1 × 𝑀𝐷 − 𝑀𝐷−1 ) 𝐷 𝐷−1 𝑫 = 𝑀𝑑−1 × 𝑀𝑑 − 𝑀𝑑 = 𝒅𝟐 × 𝒅 + 𝟏 𝑑=2 𝑑=2 𝒅=𝟐 𝑫−𝟏 𝟐 − 𝒅+𝟏 𝒅=𝟐 𝟐 Solution-7 𝑢𝑗 , 𝑣𝑗 𝑀 a) PARAMETERS: b) LATENT PARAMETERS: 𝛿𝑖,𝑗 – whether apartment i is associated with library j [N x M parameters] c) 𝑀 Optimization Function: 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒: 𝐽 𝑼, 𝑽 𝚫 = σ𝑁 𝑖=1 𝑘𝑖 σ𝑗=1 𝛿𝑖,𝑗 𝐶(𝑖, 𝑗) = 2 2 𝑀 σ𝑁 σ 𝑘 𝛿 𝑢 − 𝑥 + 𝑣 − 𝑦 𝑖 𝑖,𝑗 𝑗 𝑖 𝑗 𝑖 𝑖=1 𝑗=1 d) OPTIMIZATION SOLUTION: 𝑗=1 [2 x M parameters] 𝑁 𝑁 𝜕 𝜕𝐽 𝑼, 𝑽 𝚫 𝜕𝐶(𝑖, 𝑗) = 𝛿𝑖,𝑗 𝑘𝑖 = 𝛿𝑖,𝑗 𝑘𝑖 𝜕𝑢𝑗 𝜕𝑢𝑗 𝑖=1 =0 𝑁 σ𝑁 𝑖=1 𝛿𝑖,𝑗 𝑘𝑖 𝑢𝑗 = σ𝑖=1 𝛿𝑖,𝑗 𝑘𝑖 𝑥𝑖 𝑁 𝑢𝑗 σ𝑁 𝑖=1 𝛿𝑖,𝑗 𝑘𝑖 = σ𝑖=1 𝛿𝑖,𝑗 𝑘𝑖 𝑥𝑖 , 𝑠𝑜: 𝑢ො𝑗 = σ𝑁 𝑖=1 𝛿𝑖,𝑗 𝑘𝑖 𝑥𝑖 σ𝑁 𝑖=1 𝛿𝑖,𝑗 𝑘𝑖 𝑖=1 𝑢𝑗 − 𝑥𝑖 2 + 𝑣𝑗 − 𝑦𝑖 𝜕𝑢𝑗 2 𝑁 = 2 𝛿𝑖,𝑗 𝑘𝑖 𝑢𝑗 − 𝑥𝑖 𝑖=1 Solution-7 continues 𝜕𝐽 𝑼, 𝑽 𝚫 𝜕𝑣𝑗 = 𝜕𝐶(𝑖,𝑗) σ𝑁 𝛿 𝑘 𝑖=1 𝑖,𝑗 𝑖 𝜕𝑣 𝑗 𝑦𝑖 ൯ = 0 𝑁 σ𝑁 𝑖=1 𝛿𝑖,𝑗 𝑘𝑖 𝑣𝑗 = σ𝑖=1 𝛿𝑖,𝑗 𝑘𝑖 𝑦𝑖 𝑁 𝑣𝑗 σ𝑁 𝑖=1 𝛿𝑖,𝑗 𝑘𝑖 = σ𝑖=1 𝛿𝑖,𝑗 𝑘𝑖 𝑦𝑖 , 𝑠𝑜: 𝑣ෝ𝑗 = σ𝑁 𝑖=1 𝛿𝑖,𝑗 𝑘𝑖 𝑦𝑖 σ𝑁 𝑖=1 𝛿𝑖,𝑗 𝑘𝑖 = σ𝑁 𝑖=1 𝛿𝑖,𝑗 𝑘𝑖 𝜕 𝑢𝑗 −𝑥𝑖 2 + 𝑣𝑗 −𝑦𝑖 𝜕𝑣𝑗 2 = 2 σ𝑁 𝑖=1 𝛿𝑖,𝑗 𝑘𝑖 ൫𝑣𝑗 − Solution-8 What are the parameters? How many parameters are we estimating? PARAMETER: p (1 parameter) What is the observed data and knowledge that is given to us? DATA: Coin toss observations: 𝑥1 , 𝑥2 , … , 𝑥2𝑁 KNOWLEDGE: 𝑤𝑛 = 𝑘 × 𝑛 Pose this as an optimization problem and simplify to make it more solvable. [HINT: Make sure you take the weights into account]. OBJECTIVE: Maximize the Likelihood of seeing the data 2𝑁 𝐽 𝑝 = ෑ 𝑃 𝑥𝑛 2𝑁 𝑤𝑛 𝑛=1 = ෑ 𝑝 𝑥𝑛 1 − 𝑝 𝑛=1 Taking the log to maximize the log likelihood. 2𝑁 2𝑁 𝐽 𝑝 = 𝑤𝑛 log 𝑃(𝑥𝑛 ) = 𝑤𝑛 log 𝑝 𝑥𝑛 1 − 𝑝 𝑛=1 𝑛=1 1−𝑥𝑛 𝑤𝑛 2𝑁 1−𝑥𝑛 = 𝑤𝑛 𝑥𝑛 log 𝑝 + 1 − 𝑥𝑛 log(1 − 𝑝) 𝑛=1 Solution-8 [continues] Estimate the parameter in terms of the data and weights. 2𝑁 𝜕𝐽 𝑝 𝑥𝑛 1 − 𝑥𝑛 = 𝑤𝑛 − 𝜕𝑝 𝑝 1−p 𝑛=1 2𝑁 2𝑁 2𝑁 1 1 = 𝑤𝑛 𝑥𝑛 − 𝑤𝑛 (1 − 𝑥𝑛 ) = 0 𝑝 1−𝑝 𝑛=1 2𝑁 𝑛=1 1 1 𝑤𝑛 𝑥𝑛 = 𝑤𝑛 (1 − 𝑥𝑛 ) 𝑝 1−𝑝 𝑛=1 2𝑁 𝑛=1 2𝑁 (1 − 𝑝) 𝑤𝑛 𝑥𝑛 = 𝑝 𝑤𝑛 (1 − 𝑥𝑛 ) 𝑛=1 2𝑁 2𝑁 2𝑁 𝑛=1 2𝑁 𝑤𝑛 𝑥𝑛 = 𝑝 𝑤𝑛 𝑥𝑛 + 𝑤𝑛 (1 − 𝑥𝑛 ) = 𝑝 𝑤𝑛 𝑛=1 𝒑= σ𝟐𝑵 𝒏=𝟏 𝒘𝒏 𝒙𝒏 σ𝟐𝑵 𝒏=𝟏 𝒘𝒏 = σ𝟐𝑵 𝒏=𝟏 𝒌×𝒏×𝒙𝒏 σ𝟐𝑵 𝒏=𝟏 𝒌×𝒏 𝑛=1 = σ𝟐𝑵 𝒏=𝟏 𝒏×𝒙𝒏 σ𝟐𝑵 𝒏=𝟏 𝒏 𝑛=1 𝑛=1 Solution-8 [continues] If every even numbered coin toss is a heads and every odd numbered coin toss is a tail, estimate the value of the parameter 𝑝 in terms of 𝑁 Odd numbered coin tosses are tails: 𝑥1 = 𝑥3 = 𝑥5 = ⋯ = 0 Even numbered coin tosses are heads: 𝑥2 = 𝑥4 = 𝑥6 = ⋯ = 1 σ2𝑁 2 + 4 + 6 + ⋯ + 2𝑁 2 × 1 + 2 + ⋯ + 𝑁 𝑛=1 𝑛 × 𝑥𝑛 = = = σ2𝑁 1 + 2 + 3 + ⋯ + 2𝑁 1 + 2 + ⋯ + 2𝑁 𝑛 𝑛=1 Numerator: 2× 1+ 2+ ⋯+ 𝑁 = 2× Denominator: 𝑁 𝑁+1 = 𝑁(𝑁 + 1) 2 2𝑁 2𝑁 + 1 = 𝑁 2𝑁 + 1 2 𝑵× 𝑵+𝟏 𝑵+𝟏 𝒑= = 𝑵 × 𝟐𝑵 + 𝟏 𝟐𝑵 + 𝟏 1 + 2 + ⋯ + 2𝑁 = Solution-9 We are given the following facts about this data: 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑎, 𝑏, 𝑐 → 𝑑 = , 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑎, 𝑏 → 𝑐, 𝑑 2 ,𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 5 7 10 𝑎 → 𝑐 = 1 3 = 1 6 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑏, 𝑑 → 𝑎 = [5 points] Given the above four confidence values, find the relationships among the counts, 𝑈, 𝑉, 𝑊, 𝑋, 𝑎𝑛𝑑 𝑌 [Hint: You can write each Confidence in terms of U, V, W, X, Y and simplify] 𝑺𝒖𝒑𝒑𝒐𝒓𝒕( 𝒂, 𝒃, 𝒄, 𝒅 ) 𝑼 𝟏 𝑪𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 𝒂, 𝒃, 𝒄 → 𝒅 = = = ⇒ 𝑽 = 𝟐𝑼 𝑺𝒖𝒑𝒑𝒐𝒓𝒕( 𝒂, 𝒃, 𝒄 ) 𝑼+𝑽 𝟑 𝑺𝒖𝒑𝒑𝒐𝒓𝒕( 𝒂, 𝒃, 𝒄, 𝒅 ) 𝑼 𝟏 𝑪𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 𝒂, 𝒃 → 𝒄, 𝒅 = = = ⇒ 𝑾 = 𝟑𝑼 𝑺𝒖𝒑𝒑𝒐𝒓𝒕 {𝒂, 𝒃} 𝑼+𝑽+𝑾 𝟔 𝑺𝒖𝒑𝒑𝒐𝒓𝒕 {𝒂, 𝒃, 𝒅} 𝑼+𝑾 𝟐 𝑪𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 𝒃, 𝒅 → 𝒂 = = = ⇒ 𝒀 = 𝟔𝑼 𝑺𝒖𝒑𝒑𝒐𝒓𝒕 {𝒃, 𝒅} 𝑼+𝑾+𝒀 𝟓 𝒔𝒖𝒑𝒑𝒐𝒓𝒕 {𝒂, 𝒄} 𝑼+𝑽+𝑿 𝟕 𝑪𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 𝒂 → 𝒄 = = = ⇒ 𝑿 = 𝟒𝑼 𝑺𝒖𝒑𝒑𝒐𝒓𝒕 {𝒂} 𝑼 + 𝑽 + 𝑾 + 𝑿 𝟏𝟎 [2 points] Compute 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑎 → 𝑏, 𝑐, 𝑑 given (a) 𝑺𝒖𝒑𝒑𝒐𝒓𝒕 𝒂, 𝒃, 𝒄, 𝒅 𝑼 𝑼 𝟏 𝑪𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 𝒂 → 𝒃, 𝒄, 𝒅 = = = = 𝑺𝒖𝒑𝒑𝒐𝒓𝒕 𝒂 𝑼 + 𝑽 + 𝑾 + 𝑿 𝑼 + 𝟐𝑼 + 𝟑𝑼 + 𝟒𝑼 𝟏𝟎 Solution-9 [continues] Compute 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑏, 𝑑 → 𝑎 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑏, 𝑑 → 𝑎 given (a) 𝑆𝑢𝑝𝑝𝑜𝑟𝑡 𝑎, 𝑏, 𝑑 = 𝑆𝑢𝑝𝑝𝑜𝑟𝑡 𝑏, 𝑑 Compute 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑐 → 𝑎, 𝑑 given (a) 𝑆𝑢𝑝𝑝𝑜𝑟𝑡 𝑎, 𝑐, 𝑑 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑐 → 𝑎, 𝑑 = 𝑆𝑢𝑝𝑝𝑜𝑟𝑡 𝑐 If (𝑆𝑢𝑝𝑝𝑜𝑟𝑡 𝑎, 𝑐 = 49) what is 𝑆𝑢𝑝𝑝𝑜𝑟𝑡 𝑏, 𝑑 𝑆𝑢𝑝𝑝𝑜𝑟𝑡 𝑎, 𝑐 𝑈+𝑊 𝑈 + 3𝑈 2 = = = 𝑈 + 𝑊 + 𝑌 𝑈 + 3𝑈 + 6𝑈 5 𝑈+𝑋 𝑈 + 4𝑈 5 = = = 𝑈 + 𝑉 + 𝑋 + 𝑌 𝑈 + 2𝑈 + 4𝑈 + 6𝑈 13 given (a) = 𝑈 + 𝑉 + 𝑋 = 𝑈 + 2𝑈 + 4𝑈 = 7𝑈 = 49 ⇒ 𝑈 = 7 𝑆𝑢𝑝𝑝𝑜𝑟𝑡 𝑏, 𝑑 = 𝑈 + 𝑊 + 𝑌 = 𝑈 + 3𝑈 + 6𝑈 = 10𝑈 = 70 Solution-9 [continues] What is the minimum support threshold that will give 𝑎, 𝑏, 𝑐, 𝑑 as a candidate. For {a,b,c,d} to be a candidate, all its subsets of size 3 should be above threshold, i.e. 𝑆𝑢𝑝𝑝𝑜𝑟𝑡 𝑎, 𝑏, 𝑐 = 𝑈 + 𝑉 = 𝑈 + 2𝑈 = 3𝑈 = 21 𝑆𝑢𝑝𝑝𝑜𝑟𝑡 𝑎, 𝑏, 𝑑 = 𝑈 + 𝑊 = 𝑈 + 3𝑈 = 4𝑈 = 28 𝑆𝑢𝑝𝑝𝑜𝑟𝑡 𝑎, 𝑐, 𝑑 = 𝑈 + 𝑋 = 𝑈 + 4𝑈 = 5𝑈 = 35 𝑆𝑢𝑝𝑝𝑜𝑟𝑡 𝑏, 𝑐, 𝑑 = 𝑈 + 𝑌 = 𝑈 + 6𝑈 = 7𝑈 = 49 In order for all these supports to be higher than the threshold, the threshold must be 21 (or 3U). Solution-10 What would be the MAP step for the three examples above and what would be the reduce step? MAP: For each pair in the LHS of the input, we dump the total count (n) and the unique count (1) both. INPUT: 𝑎, 𝑏, 𝑐 → 𝑛1 MAP OUTPUT: 𝑎, 𝑏 → 𝑛1 𝑎, 𝑐 → 𝑛1 𝑏, 𝑐 → 𝑛1 INPUT: 𝑏, 𝑐, 𝑒, 𝑓 → 𝑛2 MAP OUTPUT: 𝑏, 𝑐 → 𝑛2 𝑏, 𝑒 → 𝑛2 𝑏, 𝑓 → 𝑛2 𝑐, 𝑒 → 𝑛2 𝑐, 𝑓 → 𝑛2 𝑒, 𝑓 → 𝑛2 Solution-10 [continues] INPUT: 𝑏, 𝑐, 𝑔 → 𝑛3 MAP OUTPUT: 𝑏, 𝑐 → 𝑛3 𝑏, 𝑔 → 𝑛3 𝑐, 𝑔 → 𝑛3 REDUCE: For each key, take the sum and divide by counts: Example: 𝑏, 𝑐 → 𝑛1 𝑏, 𝑐 → 𝑛2 𝑏, 𝑐 → 𝑛3 Output: Sum(values) / Count(values) Solution-10 [continues] b) You are given a weighted undirected graph as the following adjacency matrix. 𝑎 → < 𝑏, 1 >, < 𝑐, 3 >, < 𝑑, 5 > ,𝑏 → < 𝑑, 4 > ,𝑐 → < 𝑑, 3 > Here the edge (𝑎, 𝑏) is the same as 𝑏, 𝑎 and has a weight of 1, 𝑐, 𝑑 has a weight of 3. (Note that since the graph is directed the data only contains edges where the neighbour indices are higher than the node index. i.e. b → {<a,1>} is not given since it is already there in a → {<b,1>,…}) We want to write a MAP/REDUCE job to generate the AVERAGE of all the edge weights associated with each node. So, the output we want here is as follows: 𝑤 𝑎,𝑏 +𝑤 𝑎,𝑐 +𝑤 𝑎,𝑑 (a is connected to three nodes) 3 𝑤 𝑎,𝑏 +𝑤 𝑏,𝑐 +𝑤 𝑏,𝑑 𝑏→ (b is connected to three nodes 3 𝑤 𝑎,𝑐 +𝑤 𝑎,𝑐 𝑐→ (c is connected to only two nodes) 2 𝑤 𝑎,𝑑 +𝑤 𝑏,𝑑 +𝑤 𝑐,𝑑 𝑑→ (d is connected to three nodes) 3 𝑎→ What would be the MAP and REDUCE? MAP: INPUT: 𝑎 → < 𝑏, 1 >, < 𝑐, 3 >, < 𝑑, 5 > Solution-10 [continues] a→1 a→3 a→5 b→1 c→3 d→5 INPUT: 𝑏 → < 𝑑, 4 > b→4 d→4 INPUT: 𝑐 → < 𝑑, 3 > c→3 d→3 REDUCE: For each key, sum(value)/count(value)