4 v’s of social media ● Volume ● Veracity - degree to which data can be trusted ● Variety ● Velocity 6 degrees of separation - any two random ppl can be connected via 6 ppl Strength of weak ties - imp of ppl you don’t have a strong connection to Node centrality ● Degree centrality - assigns an imp score based on no. of links held by each node - how many direct connections a node has ● Closeness centrality - avg shortest distance from a vertex to every other vertex - can be used to find influencers ● Betweenness centrality - how often a node occurs on all shortest paths b/w 2 nodes - shows which nodes are bridges in the network Homophily - same type of ppl connect w/ each other Exponential vs power law ● Degrees of a random network follow poisson distribution - most nodes have same no. of links ● In a network that follows power law most nodes have a few links, few nodes have a lot of links Local clustering coefficient - ci = no. of triangles connected to node i / no. of triples centered around node i Global clustering coefficient - no. of closed triplets / no. of all triplets = 3 * no. of triangles / no. of all triplets Phishing ● Spear - for a specific group ● Whaling - for a large group ● Vishing Methods of data collection ● Simple search ● Api calls ● Web scraping & crawling Streaming api - collect live data - override functions of MyStreamListener class - runs until stopped