ANALYSIS OF POLITICS AND INDUSTRY NEXUS: INDIA Project Supervisor: Prof. Aaditeshwar Seth Himanshu Sharma (2010CS50284) Mayank Srivastava (2010CS10224) OBJECTIVES Extract information about political-industry and intra political nexus from newspapers and some available structured sources on the web. Represent it in the form of a graph with nodes representing entities and edges representing the relation between entities. Analyze the graph obtained, rank the entities, and find correlation between news in different newspapers. IMPLEMENTATION Structured information collected from netapedia.in, myneta.info, PPPIndia.com and capitaline.info. Continuous RSS feed collection from different newspapers. Processing of the news through an NLP tool, OpenCalais. Storing information in database in tables, filtering it and ranking the entities. SYSTEM IN DETAIL RANKING OF ENTITIES Ranking entities using exponential moving average (called Fame from now onwards), which is updated on occurrence basis: High sensitivity to changing news, important entities in news come up while less important ones go down. Ranking using PageRank algorithm with the exponential moving average used as personalization vector: Low sensitivity to changing news, shows the overall influence of an entity in the network. CORRELATION BETWEEN NEWSPAPERS Used Spearman’s rank correlation coefficient. High correlation when entities are ranked using PageRank values. Correlation coefficients as on 1st March (with respect to the overall data): DNA (Business Section): 0.99118 Hindustan Times: 0.99147 DNA (Political Section): 0.99290 The Times of India: 0.99305 The Hindu: 0.99336 CORRELATION BETWEEN NEWSPAPERS Low correlation when entities are ranked by Fame values. Correlation coefficients as on 1st March (with respect to the overall data): DNA (Business Section): 0.33939 Hindustan Times: 0.41778 DNA (Political Section): 0.52837 The Times of India: 0.54673 The Hindu: 0.57951 Low correlation suggests that newspapers are biased. MORE ON CORRELATION Plotted week to week correlation Higher correlation between DNA (Business Section) and DNA (Political Section). Hindu Shows a little lower correlation with Hindustan Times and The Times of India, showing some “different news from Times”. Plotted inter-week correlation coefficients for newspaper: Mostly varies between 0.2 to 0.4 Increased time duration to see longevity of news. Correlation values reach an asymptotic value of around 0.15 for political newspapers. MORE ON CORRELATION For DNA (Business section), correlation touches 0.05. DNA (Business Section) has lowest maximum longevity- It frequently switches news. Longevity lower in general for The Hindi and Hindustan Times, as compared to DNA (Political Section) and The Times of India. DNA (Political Section) and TOI cling to the same news and repeat it through a prolonged duration, while HT and Hindu prefer to switch news. BIAS BY NEWSPAPERS: EXAMPLES In August 2012, TOI gives a lot of emphasis on Nitish Kumar; while Hindu chooses to neglect it. During mid of March 2013, Hindu, Hindustan Times and DNA (Political Section) give a lot of emphasis on Manmohan Singh,but The Times of India gives him less importance. Instead, it shows a number of news pertaining to Xi Jinping, while the rest ignore him. TIMELINES SHOWING SOME IMPORTANT ENTITIES Hindustan Times TIMELINES SHOWING SOME IMPORTANT ENTITIES The Hindu TIMELINES SHOWING SOME IMPORTANT ENTITIES The Times of India TIMELINES SHOWING SOME IMPORTANT ENTITIES DNA (Political Section) TIMELINES SHOWING BIAS WITH POLITICAL PARTIES Hindustan Times TIMELINES SHOWING BIAS WITH POLITICAL PARTIES The Times of India TIMELINES SHOWING BIAS WITH POLITICAL PARTIES DNA (Political Section) TIMELINES SHOWING BIAS WITH POLITICAL PARTIES The Hindu CONCLUSIONS The most important parts of news are shown almost equally by all newspapers. Newspapers generally do biasing in showing the less important components of news. Newspapers are generally biased in showing regional parties. Janata Dal (United) is given preference by TOI and DNA, while ignored by Hindu. Both Samajwadi Party and Akhilesh yadav are very clearly avoided by Hindustan Times. CPI is closely followed by Hindu, while Shiv Sena is avoided by it. REFERENCES www.visualdataweb.org/relfinder.php www.mpi-inf.mpg.de/yago-naga/yago www.dbpedia.org www.opencalais.com www.wikipedia.org www.myneta.info www.netapedia.in www.semanticproxy.com “Identifying Influencers in Social Networks” by Kushal Dave, Rushi Bhatt, VasudevaVarma. Thank You