Measures of similarity and structural equivalence Jing Zhou Contents • Measuring similarity/dissimilarity • Valued relations • Binary relations • Visualizing similarity and distance • Clustering tools • Multi-dimensional scaling tools • Describing structural equivalence sets • Clustering similarities or distance profiles • Concor • Optimization by Tabu search • Recommendation Measuring similarity and dissimilarity • Actors 2,5,7 might be structurally similar in that they seem to have reciprocal ties with each other and almost everyone else. • Actors 6,8,10 are “regularly” similar in that they are rather isolated, but they are not structurally similar. Two actors may be said to be structurally equivalent to if they have the same patterns of ties with other actors. CO COM MAY NEW UWA EDUC INDU WRO WELF WEST UN M R S Y 1 1 0 0 1 0 1 0 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0 1 1 0 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 0 1 0 2 1 3 0 1 4 1 1 0 5 1 1 1 1 6 0 0 1 0 0 7 0 1 0 1 1 0 8 1 1 0 1 1 0 1 9 0 1 0 0 1 0 1 0 10 1 1 1 0 1 0 1 0 0 0 COUN COMM EDUC INDU MAYR WRO NEWS UWAY WELF WEST -1 0 0 1 0 1 0 1 0 1 -1 1 1 0 1 1 1 0 0 1 -1 1 1 1 0 0 1 1 1 0 -1 0 1 0 0 0 1 1 1 1 -0 1 1 1 1 0 0 1 0 0 -1 0 1 0 0 1 0 1 1 0 -0 0 0 1 1 0 1 1 0 1 -1 0 0 1 0 0 1 0 1 0 -0 1 1 1 0 1 0 1 0 0 --1 0 1 1 0 0 1 0 1 -1 1 1 1 0 1 1 1 1 -0 1 0 1 1 0 0 0 1 -0 1 1 1 0 1 1 0 0 -1 1 1 1 0 1 1 1 1 -0 0 1 0 0 0 0 0 0 -1 1 1 1 1 1 1 1 1 -0 1 0 0 1 0 0 0 0 -1 1 0 0 1 1 0 1 0 -0 0 1 0 1 0 0 0 0 Valued relations: Pearson correlation coefficients • -1: two actors have exactly the opposite ties to each other actor • 0 : knowing one actor’s tie to a third party doesn’t help us at all in guessing what the other actor’s tie to the third party might be. Actor 1 and 9 have identical pattern of ties • 1 : the two actors always have exactly the same tie to other actors Actor 6 and 7 have different pattern of ties. For actor 6 to have ties to actor that actor 7 does not Euclidean distance COUN COM EDUC INDU MAYR WRO NEWS UWAY WELF WEST M 1 -1 0 0 1 0 1 0 1 0 2 1 -1 1 1 0 1 1 1 0 3 0 1 -1 1 1 1 0 0 1 4 1 1 0 -1 0 1 0 0 0 5 1 1 1 1 -0 1 1 1 1 6 0 0 1 0 0 -1 0 1 0 7 0 1 0 1 1 0 -0 0 0 8 1 1 0 1 1 0 1 -1 0 9 0 1 0 0 1 0 1 0 -0 10 1 1 1 0 1 0 1 0 0 -- What is Euclidean distance? πΈπ·1,2 = (1 − 0)2 +(1 − 0)2 +(1 − 1)2 +(0 − 0)2 +(1 − 1)2 +(1 − 0)2 +(1 − 1)2 +(0 − 0)2 = 3=1.732 Binary relations: matches, Jaccard and Hamming CO COM UN M 1 1 2 1 3 0 1 4 1 1 5 1 1 6 0 0 7 0 1 8 1 1 9 0 1 10 1 1 • Matches: 5/8=0.625 • In comparing actor 1 and 2, they have the same tie to other actors 62.5% of the time. Jaccard coefficients (I can’t get the results as the output said) • The number of times that both actors report a tie to the same third actors as percentage of the total number of ties reported. 4/11=36.4%??? CO COM UN M 1 1 2 1 3 0 1 4 1 1 5 1 1 6 0 0 7 0 1 8 1 1 9 0 1 10 1 1 Hamming distance: The number of entries in the vector for one actor that would need to be changed in order to make it identical to the vector of the other actor CO COM MAY NEW UWA EDUC INDU WRO WELF WEST UN M R S Y 1 1 0 0 1 0 1 0 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0 1 1 0 1 0 0 0 0 1 1 1 1 1 0 1 0 0 0 0 1 0 2 1 3 0 1 4 1 1 0 5 1 1 1 1 6 0 0 1 0 0 7 0 1 0 1 1 0 8 1 1 0 1 1 0 1 9 0 1 0 0 1 0 1 0 10 1 1 1 0 1 0 1 0 0 0 Visualizing similarity and distance Clustering tool • E-I index is often most helpful. It measures the ratio of the numbers of ties within the clusters to ties between clusters. • But how can negative number exit??? Information on the relative size of the clusters at each stage Multi-dimensional scaling tools • MDS represents the patterns of similarity or dissimilarity associated with multiple underlying dimensions. Measure of badness of fit Coordinates of the case in two dimensions • Graph the nodes according to their coordinates. In this case, maybe case 1 and case 2 are similar. Describing structural equivalence sets • Two actors that are structurally equivalence have the same ties to all other actors. They are perfectly substitutable. But in real data maybe approximate equivalence is meaningful. • Clustering analysis • Concor • Numerical optimization by tabu search Clustering similarities or distances profiles We measure dissimilarity in this case Rough character-mapped graphic of the clustering and dendogram of structural equivalence dta CONCOR • Correlate each pair of actors • Generate the row of this actor-by-actor correlation matrix • Then correlate with each other row • The process is repeated until eventually the elements in this “iterated correlation matrix” converge on value of either 1 or -1 • Concor divides the data into two sets on the basis of correlation • Then within each set, the process is repeated • It continues until all actors are separated 5 1 2 3 22% of the variance in the ties in the concor model can be accounted for by a perfect structural block model 4 Optimization by Tabu search • Search for sets of actors who, if placed into a block, produce the smallest sum of within-block variances in the tie profiles. • That is, if actors in a block have similar ties, their variance around the block mean profile will be small • This analysis can help us answer three answers: • How many equivalence classes or approximate equivalence classes are there? • How good is the fit of this simplification into equivalence classes in summarizing the information about all the nodes? • What is the position of each class as defined by it’s relations to the other classes? Answer question 1 Answer question 2 Answer question 3 Recommendation • Structural equivalence in a journal network • Present the description of the structure of the journal network at three time periods by considering the block structures of it. • Two way of grouping journals • Journal roles: aims and objectives of the journals in the network • Journal position: patterns of citation activity • Construct an image structure of the network and this reveals a simple core-periphery structure of the network. • Each of the periphery sociological journals are tied to the core but not to each other. Citation network Grouping journals • Journal roles • Disciplinary • Comprehensive sociology: AJS ASR SF • Social structural HR SSPQ • Methodology/Models SM SSR SMR JMS QQ • Interdisciplinary • • • • Behavioral science Public opinion Structural analysis Social conflict BS POQ SN JCR Journal positions