Project Presentation

advertisement
ANALYSIS OF POLITICS AND
INDUSTRY NEXUS: INDIA
Project Supervisor: Prof. Aaditeshwar Seth
Himanshu Sharma (2010CS50284)
Mayank Srivastava (2010CS10224)
OBJECTIVES
Extract information about political-industry and
intra political nexus from newspapers and some
available structured sources on the web.
 Represent it in the form of a graph with nodes
representing entities and edges representing the
relation between entities.
 Analyze the graph obtained, rank the entities,
and find correlation between news in different
newspapers.

IMPLEMENTATION
Structured information collected from
netapedia.in, myneta.info, PPPIndia.com and
capitaline.info.
 Continuous RSS feed collection from different
newspapers.
 Processing of the news through an NLP tool,
OpenCalais.
 Storing information in database in tables,
filtering it and ranking the entities.

SYSTEM IN DETAIL
RANKING OF ENTITIES


Ranking entities using exponential moving
average (called Fame from now onwards), which
is updated on occurrence basis: High sensitivity
to changing news, important entities in news
come up while less important ones go down.
Ranking using PageRank algorithm with the
exponential moving average used as
personalization vector: Low sensitivity to
changing news, shows the overall influence of an
entity in the network.
CORRELATION BETWEEN NEWSPAPERS
Used Spearman’s rank correlation coefficient.
 High correlation when entities are ranked using
PageRank values.
 Correlation coefficients as on 1st March (with
respect to the overall data):






DNA (Business Section): 0.99118
Hindustan Times: 0.99147
DNA (Political Section): 0.99290
The Times of India: 0.99305
The Hindu: 0.99336
CORRELATION BETWEEN NEWSPAPERS
Low correlation when entities are ranked by
Fame values.
 Correlation coefficients as on 1st March (with
respect to the overall data):







DNA (Business Section): 0.33939
Hindustan Times: 0.41778
DNA (Political Section): 0.52837
The Times of India: 0.54673
The Hindu: 0.57951
Low correlation suggests that newspapers are
biased.
MORE ON CORRELATION
Plotted week to week correlation
 Higher correlation between DNA (Business
Section) and DNA (Political Section).
 Hindu Shows a little lower correlation with
Hindustan Times and The Times of India,
showing some “different news from Times”.
 Plotted inter-week correlation coefficients for
newspaper: Mostly varies between 0.2 to 0.4
 Increased time duration to see longevity of news.
Correlation values reach an asymptotic value of
around 0.15 for political newspapers.

MORE ON CORRELATION
For DNA (Business section), correlation touches
0.05.
 DNA (Business Section) has lowest maximum
longevity- It frequently switches news.
 Longevity lower in general for The Hindi and
Hindustan Times, as compared to DNA (Political
Section) and The Times of India.
 DNA (Political Section) and TOI cling to the same
news and repeat it through a prolonged duration,
while HT and Hindu prefer to switch news.

BIAS BY NEWSPAPERS: EXAMPLES


In August 2012, TOI gives a lot of emphasis on
Nitish Kumar; while Hindu chooses to neglect it.
During mid of March 2013, Hindu, Hindustan
Times and DNA (Political Section) give a lot of
emphasis on Manmohan Singh,but The Times of
India gives him less importance. Instead, it
shows a number of news pertaining to Xi Jinping,
while the rest ignore him.
TIMELINES SHOWING SOME IMPORTANT
ENTITIES
Hindustan Times
TIMELINES SHOWING SOME IMPORTANT
ENTITIES
The Hindu
TIMELINES SHOWING SOME IMPORTANT
ENTITIES
The Times of India
TIMELINES SHOWING SOME IMPORTANT
ENTITIES
DNA (Political Section)
TIMELINES SHOWING BIAS WITH
POLITICAL PARTIES
Hindustan Times
TIMELINES SHOWING BIAS WITH
POLITICAL PARTIES
The Times of India
TIMELINES SHOWING BIAS WITH
POLITICAL PARTIES
DNA (Political Section)
TIMELINES SHOWING BIAS WITH
POLITICAL PARTIES
The Hindu
CONCLUSIONS
The most important parts of news are shown
almost equally by all newspapers.
 Newspapers generally do biasing in showing the
less important components of news.
 Newspapers are generally biased in showing
regional parties.

Janata Dal (United) is given preference by TOI and
DNA, while ignored by Hindu.
 Both Samajwadi Party and Akhilesh yadav are very
clearly avoided by Hindustan Times.
 CPI is closely followed by Hindu, while Shiv Sena is
avoided by it.

REFERENCES
www.visualdataweb.org/relfinder.php
 www.mpi-inf.mpg.de/yago-naga/yago
 www.dbpedia.org
 www.opencalais.com
 www.wikipedia.org
 www.myneta.info
 www.netapedia.in
 www.semanticproxy.com
 “Identifying Influencers in Social Networks” by
Kushal Dave, Rushi Bhatt, VasudevaVarma.

Thank You
Download