International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 6 – March 2015 Finding High-Quality Content in Social Media Ms. Aparna Todwal1, Prof. M. Wanjari2 1 2 M.Tech Scholar, CSE Department,SRCOEM, Nagpur, India1 Associate Professor, CSE Department, SRCOEM, Nagpur, India 2 Abstract— The quality of user-generated content differs drastically from excellent to abuse. As the availability of such content grows, the task of identifying high-quality content in sites based on user contributions in social media sites which becomes drastically important. Social media in general serves a rich variety of information sources: in addition to the content itself, there is a broad range of non-content information available, such as connectivity between items and explicit quality ratings from teamworker of the community. This paper investigates methods for exploiting such community feedback to automatically identify high quality content. As a test case, the concentration is on Yahoo! Answers system, a large community question/answering portal. In particular, for the community question/answering domain, this paper shows that our system is able to separate high-quality items from the rest. Keywords— Social Media; Community Question Answering; User Interaction. I. INTRODUCTION From the early 2000s, user-generated contents has become increasingly popular on www; so majority of users participate in content Creation, rather than just consumption. Popular social media domain contains blogs, social bookmarking sites, photo and video sharing communities and social networking platform such as MySpace and Facebook, which offers a combination of all these and relationship among users. Community-driven question/answering portals are the specific form of user-generated content that is gaining a large audience in recent years. These systems provide an alternative way. For gaining information on the web. An important difference between user-generated content and traditional content that is particularly significant for knowledge-based media such as question/answering portals is the difference in the quality of the content. The main challenge posed by content in social media sites is the fact that the distribution of quality has high variance: from very highquality items to not concerning quality items, sometimes bad content. This makes the tasks of filtering and ranking in such systems more complex than in other domains. 1. What are the elements of social media that can be used to facilitate automated discovery of high-quality con-tent? In addition to the content itself, there is a heavy range of noncontent information present, from links between items to explicit and implicit quality rating from members of the community. What is the utility of each source of information to the task of estimating quality? 2. How are these different factors co- related? Is content alone sufficient for identifying high-quality items? 3. Can community feedback approximate judgments of specialists? A. Yahoo! Answers Yahoo! Answers1 is a question/answering system where people ask and answer questions on any topic. A user can vote for answers of other users, mark interesting questions and answers. Thus, overall, each user has a three role: asker, answerer and evaluator. The central element of the Yahoo! Answers system are basically questions. Each question has its lifecycle. It starts in an “open" state where it receives the answers. Then at a point the question is considered closed and can receive no answers. At this point, a “best answer" is selected either by the asker or through a voting procedure from other users; once a better answer is selected, the question is solved. Yahoo! Answers is a very popular service as a result, it hosts a very large amount of questions and answers in a wide variety of areas, making it a particularly useful domain for evaluating content quality in social media. II. BASIC ELEMENTS OF QUESTION/ANSWERING PORTAL In this paper we address the task of identifying highquality content in community-driven question/answering sites. As a test case, we focus on Yahoo! Answers, a large portal that is particularly rich in the amount and types of content and social interaction available in it. We focus on the following research questions: Fig:-1 Entity relationship diagram of answers of question. ISSN: 2231-5381 http://www.ijettjournal.org Page 321 International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 6 – March 2015 (Adopted from [1] ) Now, in such a system users mark stars to the question and asks the question. Again user can give answer to the question, can vote for the answers and also can evaluate the answer. The relationships between questions, users asking and answering questions, and answers can be captured by a tripartite graph given in Figure 2, edge represents an explicit relationship between various node types. Fig:- Interrelation of users-question-answers modeled as tri-partite graph. (Adopted from [1]) As given in figure 2, a user is not allowed to answer his/her own questions, there are no triangles in the graph, so in fact all cycles in the graph have their length at least 6.[1] III. RELATED WORK Link analysis in social media. In particular, link-based ranking algorithms that were triumphant in estimating the quality of web pages have been applied in this context. Linkbased methods have been shown to be triumphant for several tasks in social media [2].Two of the most prominent linkbased ranking algorithms are Page Rank[4] and HITS [3] Consider a graph G = (V;E) with vertex set V with respect to the users of a question/answer system and contains a directed edge e = (u; v) є E from a user u є V to a user v є V if user u has answered to at least one question of user v. Expertise Rank [8] respect to Page Rank over the transposed graph Gʹ= (V;Eʹ), that is, a score is cultivated from the person receiving the answer to the person giving the answer. The recursion implies that if person u was able to give an answer to person v, and person v was able to provide an answer to person w, then u should receive some extra points given that he/she was able to provide an answer to a person with a certain degree of expertise. Propagating reputation. Guha et al. [5] study the problem of propagating trust and distrust among Epinions users, who may assign positive (trust) and negative (distrust) ratings to each other. The authors study ways of integrating trust and distrust and observe that, while appraising trust as a transitive property makes sense, distrust cannot be considered transitive. ISSN: 2231-5381 Ziegler and Lausen [7] has also studied the models for propagation of trust. They present a anotomy of trust metrics and discuss ways of incorporating information about distrust into the rating scores. Question/answering portals and forums. The particular framework of question/answering communities we focus on in this paper has been the object of some study in recent years. According to Su et al. [6], the quality of answers in question/answering portals is good on average, but the quality of specific answers varies considerably. In particular, in a study of the answers to a set of questions in Yahoo! Answers, the authors found that the fragment of correct answers to specific questions asked by the authors of the study, varied from 17% to 45%. The fragment of questions in their sample with atleast one good answer was much higher, varying from 65% to 90%, meaning that a method for finding high-quality answers can have a remarkable effect in the user's satisfaction with the system. Expert Finding. Zhang et al. [8] examine data from an online forum, seeking to identify users with high expertise. They research the user answers graph in which there is association between users u and v if u answers a question by v, pertaining both Expertise Rank and HITS to identify users with high expertise. Their results gives high interrelationship between link-based metrics and the answer quality. The authors also develop synthetic models that record some of the characteristics of the interactions among users in their dataset. The HITS algorithm is run on the user-answer graph. Jurczyk and Agichtein [9] presents an application of the HITS algorithm [3] to a question/answering portal. The results finds that HITS is a promising approach, as the obtained authority score is better interrelated with the number of votes that the items receive, than simply counting the number of answers the answerer has given in the past. Dom et al. [10] studied the interpretation of several link-based algorithms to rank people by expertise on a network of e-mail exchanges. Text analysis for content quality. Most work on evaluating the quality of text has been in the field of Automated Essay Grading (AES), where writings of students are sorted by machines on several aspects, including compositionality, style, precision, and soundness. AES systems are typically shown to correlate very well with human judgments. Although simplistic and disputable, these methods are widely-used and provide a rough estimation of the difficulty of text. Implicit feedback for ranking. Implicit feedback from millions of web users has been shown to be very important source of result quality and ranking information. Authors had incorporate the results on click interpretation on web search results from these studies, as a source of quality information in social media. In particular, clicks on results and methods for interpreting the clicks have been studied in references [11]. http://www.ijettjournal.org Page 322 International Journal of Engineering Trends and Technology (IJETT) – Volume 21 Number 6 – March 2015 IV. PROPOSED APPROACH Using the literature survey of Question/Answering portals in Social Media we can say can that the finding high quality contents in such a systems becomes efficient by sorting answers with respect to that of the question or the standard answer. Sorting of the answers can be based on scoring each answers and sorting them on the basis of scores. Scoring mechanism can be accomplished with the help of word net. Word net is the lexical data base, which contains synonyms for the English words. After scoring answers positioning them in descending order scores. Now in the first approach the question will get compared with that of the answers and for each answer for a question the score will get generated and positioning of answers will be done. In the second approach, the users rating will come into an account and the scoring will done on the basis of users rating. And positioning of answers will be done on the basis of scores come out of user rating. The third approach is to compare each answers of a question to that of the standard answer. The standard answer will get generated by Wikipedia and each answer will get its score and positioning of answers will done. The main perspective of this approach are: i. Efficient way of getting answers on the basis of scores. ii. Useful for the users those who are seeking answers on the question answering portals. [3] J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604{632, 1999. [4] L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: bringing order to the Web. Technical report, Stanford Digital Library Technologies Project, 1998. [5] R. Guha, R. Kumar, P. Raghavan, and A. Tomkins. Propagation of trust and distrust. In WWW '04: Proceedings of the 13th international conference on World Wide Web, pages 403{412, New York, NY, USA, 2004. ACM Press [6] Q. Su, D. Pavlov, J.-H. Chow, and W. C. Baker. Internet-scale collection of human-reviewed data. In WWW '07: Proceedings of the 16th international conference on World Wide Web, pages 231{240, New York, NY, USA, 2007. ACM Press. [7] C.-N. Ziegler and G. Lausen. Propagation models for trust and distrust in social networks. Information Systems Frontiers, 7(4-5):337{358, December 2005. [8] J. Zhang, M. S. Ackerman, and L. Adamic. Expertise networks in online communities: structure and algorithms. In WWW '07: Proceedings of the 16th international conference on World Wide Web, pages 221{230, New York, NY, USA, 2007. ACM Press. [9] P. Jurczyk and E. Agichtein. HITS on question answer portals: an exploration of link analysis for author ranking. In SIGIR (posters). ACM, 2007. [10] B. Dom, I. Eiron, A. Cozzi, and Y. Zhang. Graph-based ranking algorithms for e-mail expertise analysis. In Proceedings of Workshop on Data Mining and Knowledge Discovery, pages 42{48, San Diego, CA, USA, 2003. ACM Press. [11] T. Joachims, L. A. Granka, B. Pan, H. Hembrooke, and G. Gay. Accurately interpreting click through data as implicit feedback. In SIGIR, pages 154{161, 2005. [12] K. Ali and M. Scarr. Robust methodologies for modeling web click distributions. In WWW, pages 511{520, 2007. [13] C. Anderson. The Long Tail: Why the Future of Business Is Selling Less of More. Hyperion, July 2006. [14] Y. Attali and J. Burstein. Automated essay scoring with e-rater v.2. Journal of Technology, Learning, and Assessment, 4(3), February 2006. [15] A. L. Berger, V. J. D. Pietra, and S. A. D. Pietra. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39{71, 1996. V. CONCLUSION Community Question Answering Portal is a popular information seeking paradigm. Variance of contents in such a sites is drastically high. So the task of quality estimation in such a sites or portals is important for the users those who require relevant answer to their question. So the proposed approach could be the efficient way for finding high quality contents in social media by calculating scores for each answers and positioning them accordingly. User will get ease of getting answers by having highly ranked answer on top and relevancy in answer at certain level. REFERENCES [1] Carlos Castillo, Debora Donato et al., “FINDING HIGH QUALITY CONTENTS IN SOCIAL MEDIA”Yahoo! Research Barcelona, Spain, WSDM'08, February 11.12, 2008. [2] J. P. Scott. Social Network Analysis: A Handbook.SAGE Publications, January 2000. ISSN: 2231-5381 http://www.ijettjournal.org Page 323