Uploaded by Ramyaa

MOD 9 - Set wise operations - Mithesh And Yogeeswar

advertisement
CS6029 SOCIAL NETWORK
ANALYSIS
SET WISE
OPERATIONS
WHY SET WISE
OPERATONS ?
• Query limitations
• Operator types: standalone and conjunctionrequired
• Boolean operators and grouping
• Order of operations
• Punctuation, diacritics, and case sensitivity
• Specificity and efficiency
BUILDING QUERIES
FOR SEARCH TWEETS
QUERY
• A database query is a request for data from a database.
• The request should come in a database table or a combination of tables
using a code known as the query language.
QUERY LIMITATIONS IN TWITTER
• Your queries will be limited depending on which access level you are
using.
• If you have Essential or Elevated access, your query can be 512
characters long.
• If you have Academic Research access, your query can be 1024
characters long.
Standalone operators
used alone or together
with any other
operators
#samantha
OPERATOR
TYPES
"twitter data"
has:mentions (has:media
OR has:links)
can only be used when at
least one standalone
operator is included in
the query.
Conjunction-required
operators
BOOLEAN OPERATORS
AND GROUPING
AND LOGIC
• Successive operators with a space between them will result in boolean "AND"
logic, meaning that Tweets will match only if both conditions are met.
• Ex: rainyday #ilayaraja
OR LOGIC
• Successive operators with OR between them will result in OR logic,
meaning that Tweets will match if either condition is met.
• Ex: small boi OR #epuuraaa OR #meme
BOOLEAN OPERATORS
AND GROUPING
NOT LOGIC, NEGATION
• Prepend a dash (-) to a keyword (or any operator) to negate it (NOT). For
example, cat #meme -grumpy will match Tweets containing the hashtag
#meme and the term cat, but only if they do not contain the term
grumpy.
• One common query clause is -is:retweet, which will not match on
Retweets, thus matching only on original Tweets, Quote Tweets, and
replies.
• All operators can be negated, but negated operators cannot be used
alone.
BOOLEAN OPERATORS
AND GROUPING
GROUPING
• Use parentheses to group operators together. For example, (grumpy
cat) OR (#meme has:images) will return either Tweets containing the
terms grumpy and cat, or Tweets with images containing the
hashtag #meme.
• Note that ANDs are applied first, then ORs are applied.
ORDER OF OPERATIONS
• When combining AND and OR functionality, the following order of operations
will dictate how your query is evaluated.
• Operators connected by AND logic are combined first
• Then, operators connected with OR logic are applied
For example:
• poori OR pongal chutney would be evaluated as poori OR (pongal chutney)
• poori pongal OR chutney would be evaluated as (poori pongal) OR chutney
PUNCTUATION, DIACRITICS, AND
CASE SENSITIVITY
• If you specify a keyword or hashtag query with character accents or diacritics, it will
match Tweet text that contains both the term with the accents and diacritics, as well
as those terms with normal characters.
• For example, queries with a keyword Diacrítica or hashtag #cumpleaños will match
Diacrítica or #cumpleaños, as well as with Diacritica or #cumpleanos without the
tilde í or eñe.
• Characters with accents or diacritics are treated the same as normal characters and
are not treated as word boundaries. For example, a query with the keyword
cumpleaños would only match activities containing the word cumpleaños and
would not match activities containing cumplea, cumplean, or os.
• All operators are evaluated in a case-insensitive manner. For example, the query
osma will match Tweets with all of the following: osma, OSMA, Osma.
SPECIFICITY
• Using broad, standalone operators for your query such as a single keyword or
#hashtag is generally not recommended.
• For example, if your query was just the keyword happy you will likely get
anywhere from 200,000 - 300,000 Tweets per day.
• Adding more conditional operators narrows your search results Adding more
conditional operators narrows your search results, for example (happy OR
happiness) place_country:GB -birthday -is:retweet
EFFICIENCY
• Writing efficient queries is also beneficial for staying within the
characters query length restriction. The character count includes the
entire query string including spaces and operators.
• For example, the following query is 67 characters long: (happy OR
happiness) place_country:Kailasa -nithyananda -is:retweet
ANALYSING AN USER'S FRIENDS AND FOLLOWERS
Problem
You’d like to conduct a basic analysis that compares a user’s friends and
followers.
Solution
Use set wise operations such as intersection and difference to analyze the
user’s friends and followers.
ANALYZING
A USER’S
FRIENDS
AND
FOLLOWERS
DISCUSSION
Given two sets, the intersection of the sets returns the items that they have in common,
whereas the difference between the sets “subtracts” the items in one set from the other,
leaving behind the difference. Recall that intersection is a commutative operation, while
difference is not commutative.
In the context of analyzing friends and followers, the intersection of two sets can be
interpreted as “mutual friends” or people you are following who are also following you
back, while the difference of two sets can be interpreted as followers who you aren’t
following back or people you are following who aren’t following you back, depending on
the order of the operands.
Given a complete list of friend and follower IDs, computing these setwise operations is a
natural starting point and can be the springboard for subsequent analysis. For
example, it probably isn’t necessary to use the GET users/lookup API to fetch profiles for
millions of followers for a user as an immediate point of analysis.
THANK
YOU
MITHESH A (2019503533)
YOGEESWAR S (2019503573)
Download