Social Network Something Interesting Ruizhi Gao Contents • • • • The Born of Social Networks New types of Social Networks My Social Networks Research related History Early Years They are client-based application which can allow you to add friends and have friends list. You can communicate with each other by send text message, image and other information ICQ AIM Problems: You have to remember the account ID if you want to add new friends. Your friends list is not visible It’s more like a communication tool but SNS SixDrgree.com Combine the function of many different tools like ICQ and AIM, and provide a new search approach based on users’ own information. SixDegrees.com was a social network service website that lasted from 1997 to 2001 and was based on the Web of Contacts model of social networking. It was named after the six degrees of separation concept and allowed users to list friends, family members and acquaintances both on the site and externally; external contacts were invited to join the site. Users could send messages and post bulletin board items to people in their first, second, and third degrees, and see their connection to any other user on the site. It was one of the first manifestations of social networking websites in the format now seen today. Six Degrees was followed by more successful social networking sites based on the "social-circles network model" such as Friendster, MySpace, LinkedIn, XING and Facebook. ----- Wikipedia http://en.wikipedia.org/wiki/SixDegrees.com LiveJournal.com It gives a new way that people mark others as Friends to follow their journals and manage privacy settings SNS on business networks • Ryze • Tribe.net • Friendster MySpace • Myspace could grow rapidly because many other SNS are trying to collect fees. • MySpace open some public webpages to famous bands or singers so their fans can follow. • MySpace did not restrict users from add HTML code into the forms to make their own pages special. Today Twitter is an online social networking service and microblogging service that enables its users to send and read text-based messages of up to 140 characters, known as "tweets" Harvard - Only High School Networks Corporate Networks Reference • Wikipedia • Boyd, d. m., & Ellison, N. B. (2007). Social network sites: Definition, history, and scholarship. Journal of Computer-Mediated Communication, 13(1), article 11. • Visual Academy http://www.onlineschools.org/visualacademy/history-of-social-networking/ New Types of Social Networks • General Facebook, Twitter … • School, college Classmates.com, Friends Reunited(UK)… • Art Community deviantART, Taltopia • Movies , TV series Youtube, douban, Flixster, Filmow … • Photo sharing Flickr, Fotki, DailyBooth … • … SoLoMo • SoLoMo, short for social-local-mobile, refers to a more mobile-centric version of the addition of local entries to search engine results.(GPS) • SoLoMo based social network. The most famous one is Foursquare Foursquare You can check-in in different place (restaurant, movie center … ) and leave your comments. The more you checked in, more point you earn. You also can be the “mayor of one place” if you check-in many times Private Social Network • General social network may not allow you to upload some “sensitive” information or private information. However private social network allow you to create a group-centric SNS in which you can share more if you want. Sgrouples • Sgrouples (sounds like ‘scruples’) is our very own private, group-centric social network designed to mimic how small groups of people interact in their real lives. Sgrouples allows you to easily post content to different groups based on your real life interests – friends, family, work, sports teams, and hobby groups. Reference • http://blog.sgrouples.com/new-socialnetworking-sites/ • http://en.wikipedia.org/wiki/List_of_social_ne tworking_websites Experience X-land Project • We designed the Xland project as a 3D immersive blog. Xland was part of the CHIPS (CHina Innovation Program for Students) program sponsored by Sun Microsystems and the Chinese Education Department. • It may be not called as a blog but a social space. Every user has its own room and we provide a open space like a community. People can decorate their own room and share information with each other. The most important thing is it’s not a client based but can be accessed from your web browser, which is very hard at that time. You need to consider very carefully about using the resource. Trailer Functions Friend list and live chat HUD Keyboard piano Decorate your room Albums Functions Background music of your room Dairy We tried to transfer the “my page” in facebook Into “my room” in a 3D world. Good idea but impractical like what Kaifu Li did 3D browser Our blog: http://blogs.openwonderland.org/2010/07/23/xland-3d-blog/ Visitor AlienSandal • A real start up project SoLoMo based social network cooperated with students in UC Berkley • AlienSandal is a social platform based on Google Map, which enable users to trace and customize Life Track and connect to people who have similar tracks. • Trailer http://www.youtube.com/watch?v=ACYFEsodwrY Different Types of check-in Free draw in the map Difficulties • What is the most important thing in SNS? • Why do you want to use a SNS application? • Why can SNS do? • Answer from VC, “friends, money and …” How to make your SNS popular? RECOMMENDATION Clustering Why clustering is useful • Grouping users in social networks is an important process that improves matching and recommendation activities in social networks. The data mining methods of clustering can be used in grouping the users in social networks [1] • [1]S. Alsaleh, R. Nayak, Y. Xu, “Grouping People in Social Networks Using a Weighted Multi-Constraints Clustering Method,” in Proceedings of WCCI 2012, June,10-15. Brisbane, Australia. [2] GAN, G., MA, C. & WU, J. (2007) Data clustering: theory, algorithms,and applications. ASASIAM Series on Statistics and Applied Probability, 20, 219-230. [3] NAYAK, R. (2011) Utilizing past relations and user similarities in a social matching system. Advances in Knowledge Discovery and Data Mining, 99-110. First Question • How many cluster??????? K-means, Fuzzy C means, K-medoids … Rule of Thumb [1] But this is not reliable…. [1]Kanti Mardia et al. (1979). Multivariate Analysis. Academic Press. Cluster Estimation 1 • Stephen L. Chiu, “Fuzzy Model Identification Based on Cluster Estimation,” Journal of Intelligent and Fuzzy Systems, Vol.2, pp 267-278, 1994. • Suppose we have a collection of n data points {x1,x2,…,xn}, in our case. For each data points, we can assign a potential value. n Pi e || xi x j ||2 j 1 and 4 r2 ra is a positive constant. We have ra = 0.5 here (suggested by paper). and ||xi – xj|| is the distance function between data xi and xj. Kendall tau Distance • Kendall tau distance Example: Two sample data R1: 1,2,3,4,5 R2: 3,4,1,2,5 There are 4 disorder between R1 and R2 The distance will be 4 1 5(5 1) 2 For R1 For R2 Count 1<2 3<4 1<3 3>1 * 1<4 3>2 * 1<5 3<5 2<3 4>1 * 2<4 4>2 * 2<5 4<5 3<4 1<2 3<5 1<5 4<5 2<5 Euclidean Distance Two sample data R1: 1,2,3,4,5 R2: 3,4,1,2,5 • De = (1 3) 2 (2 4) 2 (3 1) 2 (4 2) 2 (5 5) 2 Hamming Distance Two sample data R1: 1,2,3,4,5 R2: 3,4,1,2,5 4 and 2 are different • DH = 1+1+1+1+0 1 and 3 are different 2 and 4 are different 5 and 5 are different 3 and 1 are different Cluster Estimation 1 • After we have potential value for each data point, we select the * x highest potential as the first cluster center. Let 1 be the data point of the first cluster center and P1* be its potential value. Then we will revise the potential of each data point xi by Pi Pi P e * 1 • In which, we have: and rb 1.5ra 4 rb2 || xi x1* ||2 Cluster Estimation 1 • Next we select the data point with the highest remaining potential as the second cluster center. We further reduce the potential of each data point according to their distance to the second cluster center. In general, after the kth cluster center has been obtained, we revise the potential of each data point by Pi Pi P e * k || xi xk* ||2 * Where xk is the location of the kth cluster center and Pk* is its potential value. Cluster Estimation 1 • Every time we got the Pk*, we need to decide whether we should select that as a new center , we have following process. * * if Pk P1 Accept xk* as a cluster center and continue. * * else if Pk P1 Reject x* and end the selecting process k else * Let dmin = [shortest of the distance between xk and all previously found cluster centers] if dmin Pk* * 1 ra P1 * Accept xk as a cluster center and continue the whole process else * * Reject xk and set the potential at xk to 0. * Select the data point with the next highest potential as new xk and re-test end if end if In which, we have 0.5 and 0.15 (suggested in paper) Running Example 1 • Suppose we have the following rankings set (which may represents different pages you viewed. 1 is page 1, 2 is page 2) R1 = {1,2,3,4,5,6,7}, R2 = {1,2,4,3,5,7,6}, R3 = {7,6,4,5,3,1,2}, R4 = {7,6,5,4,1,3,2}. First we will assign potential value for each of them by We have P for each of them P1 = 1.9311 P2 = 1.9336 P3 = 1.7451 P4 = 1.7496 n Pi e j 1 || xi x j ||2 Running Example 1 • We choose the highest value P2 = 1.9336, that is, we choose R2 as our first center and P1* 1.9336 • After this, we need to revise the potential value for each ranking by Pi Pi P e * 1 || xi x1* ||2 So we got: P1 = 0.06006 P2 = 0.0 P3 = 1.69382 P4 = 1.57027 We will choose P3 then go on the process. Running Example 1 • Since we have P1* 0.290049 and P1* 0.966833 P3 = 1.69382 which is greater than P1* 0.966833 Refer to if Pk* P1* * Accept xk as a cluster center and continue. So P3 will be our second center. Then we will revise the potential value again by Pi Pi P e Then we will have P1 = -0.02683 P2 = -0.04499 P3 = 0.0 P4 = 0.085356 So P4 will be our next choice. * 1 || xi x2* ||2 Running Example 1 • Since we have P1* 0.290049 and P1* 0.966833 P4 = 0.085356 which is less than P1* 0.290049 Refer to if Pk* P1* * Reject xk and end the selecting process So we will stop the whole process • Out of 4 data set, we have estimate 2 clusters and 2 centers which are R2 and R3, this result makes sense. Cluster Estimation 2: Max-min Distance • • • ZHOU Shi-bing, XU Zhen-yuan, TANG Xu-qing, “New method for determining optimal number of clusters in K-means clustering algorithm,” Computer Engineering and Applications, Vol.46, pp 27-31, 2010. Beliakov, Gleb and King, Matthew 2006, Density based fuzzy c-means clustering of non-convex patterns, European journal of operational research, vol. 173, no. 3, pp. 717-728. Suppose we have a collection of n data points {x1,x2,…,xn}, in our case, they are the ranks. For each two data points, we calculate the distance.(We use Kendall tau distance here) Step 1(in paper): Randomly choose one data point as the first center c1. Step 2(in paper): Choose the data point which has largest distance from the first center c1 as the second center c2. Cluster Estimation 2: Max-min Distance • Step 3: Get the distance between the rest of the data and all the centers. dij xi c j j 1, 2....k (# center ) then we get di min(di1 , di 2 ...dik ) i 1, 2....n(# data) Then, if D max{di } || c1 c2 || then, we choose xi as our next center. , we choose 0.6 Repeat this process until we cannot find any more centers. Running Example 2 • Suppose we have the following rankings set. R1 = {1,2,3,4,5,6,7}, R2 = {1,2,4,3,5,7,6}, R3 = {7,6,4,5,3,1,2}, R4 = {7,6,5,4,1,3,2}. • The Kendall tau distance between each other is: R1 R2 R3 R4 R1 0 0.095238 0.904762 0.904762 R2 0.095238 0 1 0.809524 R3 0.904762 1 0 0.190476 R4 0.904762 0.809524 0.190476 0 Running Example 2 • The largest distance is between R2 and R3 which is 1. So we choose these 2 as two centers then d1 min(di1 , di 2 ) 0.095238 d4 min(di1 , di 2 ) 0.190476 Since D max{di } 0.190476 0.6 || c1 c2 || We stop the process and choose R2 and R3 as the centers. Pros and Cons • Estimation 1 ra, , , at least 3 parameters needs to be decided in the process, and all of this parameters are application sensitive. For now, there is no adaptive way to adjust such parameter. • Estimation 2 The problem is on the initial center selection. The result will be affected by the noise data point • Question