Chapter 3 cont’d. Adjacency, Histograms, & Thresholding RAGs (Region Adjacency Graphs) Define graph G=(V,E) where V is a set of vertices or nodes and E is a set of edges Define graph G=(V,E) where V is a set of vertices or nodes and E is a set of edges represented of vertices by either unordered or ordered pairs RAGs (Region Adjacency Graphs) Steps: 1. 2. label image scan and enter adjacencies in graph (RAGs also represent containment.) 0 (background) 1 2 3 -1 -3 -2 (Personally, I’d draw it this way!) Define degree of a node. What is special about nodes with degree 1? But how do we obtain binary images (from gray or color images)? Histograms & Thresholding Gray to binary Thresholding GB const int t = 200; if (G[r][c] > t) B[r][c] = 1; else B[r][c] = 0; How do we choose t? 1. 2. interactively automatically Gray to binary 1. Interactively. How? 2. Automatically. Many, many, many, …, many methods. a) Experimentally (using a priori information). Supervised / training methods. Unsupervised b) c) Otsu’s method (among many, many, many, many, … other methods). Histogram “Probability” of a given gray value in an image. h(g) = count of pixels w/ gray value equal to g. p(g) = h(g) / (w*h) w*h = # of pixels in entire image What are the range of possible values for p(g)? Histogram “Probability” of a given gray value in an image. h(g) = count of pixels w/ gray value equal to g. What data type is used for counts? p(g) = h(g) / (w*h) w*h = # of pixels in entire image What are the range of possible values for p(g)? So what data type is p(g)? What happens when h(g) is divided by w*h? Histogram Note: Sometimes we need to group gray values together in our histogram into “bins” or “buckets.” E.g., we have 10 bins in our histogram and 100 possible different gray values. So we put 0..9 into bin 0, 10..19 into bin 1, … Histogram Something is missing here! Example of histogram Example of histogram We can even analyze the histogram just as we analyze images. One common measure is entropy: Entropy Ice melting in a warm room is a common example of “entropy increasing”, described in 1862 by Rudolf Clausius as an increase in the disgregation of the molecules of the body of ice. from http://en.wikipedia.org/wiki/Entropy Entropy “My greatest concern was what to call it. I thought of calling it ‘information’, but the word was overly used, so I decided to call it ‘uncertainty’. When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, ‘You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, nobody knows what entropy really is, so in a debate you will always have the advantage.” – Conversation between Claude Shannon and John von Neumann regarding what name to give to the “measure of uncertainty” or attenuation in phone-line signals (1949) Example of histogram We can even analyze the histogram just as we analyze images! One common measure is entropy: Calculating entropy Notes: 1. 2. 3. 4. p(k) is in [0,1] If p(k)=0 then don’t calculate log(p(k)). Why? My calculator only has log base 10. How do I calculate log base 2? Why ‘-’ to the left of the summation? Let’s calculate some histogram entropy values. Say we have 3 bits per gray value. So our histogram has 8 bins. Calculate the entropy for the following histograms (image size is 10x10): 0.08 1. 99 0 0 0 0 0 0 1 0.08 2. 99 0 1 0 0 0 0 0 2.92 3. 20 20 10 10 10 10 10 10 1.00 4. 50 0 0 50 0 0 0 0 2.00 5. 25 0 25 0 25 0 25 0 most disorder Example histograms Same subject but different images and histograms (because of a difference in contrast). Example of different thresholds So how can we determine the threshold value automatically? Example automatic thresholding methods 1. Otsu’s method 2. K-means clustering Otsu’s method Otsu’s method Automatic thresholding method automatically picks “best” threshold t given an image histogram Assumes 2 groups are present in the image: 1. 2. Those that are <= t. Those that are > t. Otsu’s method Best choices for t. Otsu’s method For every possible t: A. B. Calculate within group variances: 1. probability of being in group 1; probability of being in group 2 2. determine mean of group 1; determine mean of group 2 3. calculate variance for group 1; calculate variance for group 2 4. calculate weighted sum of group variances Remember which t gave rise to minimum. Otsu’s method: probability of being in each group t q1 t p i i 0 q2 t max pi i t 1 Otsu’s method: mean of individual groups t 1 t i p i / q1 t i 0 2 t max i pi / q t i t 1 2 Otsu’s method: variance of individual groups t t i 1 t p i / q1 t 2 1 t 2 2 2 i 0 max i t pi / q t i t 1 2 2 2 Otsu’s method: weighted sum of group variances t q1 t t q2 t t 2 W 2 1 Calculate for all t’s and minimize. min 2 2 Demo Otsu. 2 W t | 0 t max Demo of Otsu’s method before Demo of Otsu’s method Otsu’s report Demo of Otsu’s method Otsu’s threshold Generalized thresholding Generalized thresholding Single range of gray values const int t1 = 200; const int t2 = 500; if (G[r][c] > t1 && G[r][c] < t2) B[r][c] = 1; else B[r][c] = 0; Even more general thresholding Union of ranges of gray values. const int t1 = 200, t2 = 500; const int t3 =1200, t4 =1500; if (G[r][c] > t1 && G[r][c] < t2) B[r][c] = 1; else if (G[r][c] > t3 && G[r][c] < t4) B[r][c] = 1; else B[r][c] = 0; Something is missing here! K-means clustering K-Means Clustering In statistics and machine learning, k-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. It is similar to the expectation-maximization algorithm for mixtures of Gaussians in that they both attempt to find the centers of natural clusters in the data. from wikipedia K-Means Clustering Clustering = the process of partitioning a set of pattern vectors into subsets called clusters. K = number of clusters (must be known in advance). Not an exhaustive search so it may not find the globally optimal solution. (see section 10.1.1) Iterative K-Means Clustering Algorithm Form K-means clusters from a set of nD feature vectors. 1. 2. 3. 4. 5. 6. Set ic=1 (iteration count). Choose randomly a set of K means m1(1), m2(1), … mK(1). For each vector xi compute D(xi,mj(ic)) for each j=1,…,K. Assign xi to the cluster Cj with the nearest mean. ic =ic+1; update the means to get a new set m1(ic), m2(ic), … mK(ic). Repeat 3..5 until Cj(ic+1) = Cj(ic) for all j. K-Means Clustering Example 0. Let K=3. 1. Randomly (may not necessarily be actual data points) choose 3 means (i.e., cluster centers). - figure from wikipedia K-Means Clustering Example 1. Randomly (may not necessarily be actual data points) choose 3 means (i.e., cluster centers). 2. Assign each point to nearest cluster center (mean). - figure from wikipedia K-Means Clustering Example 2. Assign each point to nearest cluster center (mean). 3. Calculate centroid of each cluster center (mean). These will become the new centers. - figure from wikipedia K-Means Clustering Example 3. Calculate centroid of each cluster center (mean). These will become the new centers. 4. Repeat steps 2 and 3 until convergence. - figure from wikipedia K-Means for Optimal Thresholding What are the features? K-Means for Optimal Thresholding What are the features? Individual pixel gray values K-Means for Optimal Thresholding What value for K should be used? K-Means for Optimal Thresholding What value for K should be used? K=2 to be like Otsu’s method. Iterative K-Means Clustering Algorithm Form 2 clusters from a set of pixel gray values. 1. 2. 3. 4. 5. 6. Set ic=1 (iteration count). Choose 2 random gray values as our initial K means, m1(1), and m2(1). For each pixel gray value xi compute fabs(xi,mj(ic)) for each j=1,2. Assign xi to the cluster Cj with the nearest mean. ic =ic+1; update the means to get a new set m1(ic), m2(ic), … mK(ic). Repeat 3..5 until Cj(ic+1) = Cj(ic) for all j. Iterative K-Means Clustering Algorithm Form 2 clusters from a set of pixel gray values. 1. 2. Set ic=1 (iteration count). Choose 2 random gray values as our initial K means, m1(1), and m2(1). This can be derived from the original image or from the histogram. 3. 4. 5. 6. For each pixel gray value xi compute fabs(xi,mj(ic)) for each j=1,2. Assign xi to the cluster Cj with the nearest mean. ic =ic+1; update the means to get a new set m1(ic), m2(ic), … mK(ic). Repeat 3..5 until Cj(ic+1) = Cj(ic) for all j. Iterative K-Means Clustering Algorithm Example. m1(1)=260.83, m2(1)=539.00 m1(2)=39.37, m2(2)=1045.65 m1(3)=52.29, m2(3)=1098.63 m1(4)=54.71, m2(4)=1106.28 m1(5)=55.04, m2(5)=1107.24 m1(6)=55.10, m2(6)=1107.44 m1(7)=55.10, m2(7)=1107.44 . . . Demo K-Means. Demo of K Means method before Demo of K Means method K Means reports for K=2 & K=3 Demo of K Means method K Means t=56 Demo of K Means method K Means t=54 Demo of K Means method K Means t=128 Otsu vs. K-Means Otsu’s method as presented determines the single best threshold. How many objects can it discriminate? Suggest a modification to discriminate more. Otsu vs. K-Means How is Otsu’s method similar to K-Means? What does Otsu’s method determine? What does K-Means determine (for K=2)? Otsu vs. K-Means How is Otsu’s method similar to K-Means? What does Otsu’s method determine? a single threshold, t What does K-Means determine (for K=2)? m1 and m2 are cluster means (centers) Once m1 and m2 are determined, how can they be used to determine the threshold? Otsu vs. K-Means A final word, . . . K-Means readily generalizes to: 1. arbitrary number of classes (K) 2. can easily be extended to many, many features (i.e., feature vectors instead of only gray values/higher dimensions) K-Means will find a local optimum, but that may not be the global optimum!