e-discovering fake identities on twitter - UvA-DARE

advertisement
 E‐DISCOVERING FAKE IDENTITIES ON TWITTER Master Thesis New Media Academic year 2011‐2012 22 June 2012 Media Studies, MA New Media Universiteit van Amsterdam Faculty of Humanities Annet Bos Student ID: 5878039 Annet.bos@student.uva.nl Thesis Supervisor HvA: Dr. Hans Henseler Thesis Supervisor UvA: Prof. Dr. Richard Rogers Second Reader: Dr. Geert Lovink Abstract Fake profiles and stolen identities are not unknown problems on Twitter. The consequences of impersonation are often innocent, but can be very severe sometimes, causing economic or personal harm. Due to juridical protocol, cases are often arranged in a way that charges can be made only after the damage has been done; moreover, it takes a long time for a victim to recover from an impersonation attack. Thus, it is important to get fake profiles offline as soon as possible. Twitter uses a profile verification system, but this system does not always work efficiently and is currently only available for celebrities. Therefore, it would be helpful to have a way to (automatically) detect fake profiles. There is research done and methods proposed in this field, but these methods are often developed for either another platform than Twitter, or another form of cybercrime, for instance for detecting spam. Are there any approaches that address and are suitable for detecting false or impersonated profiles on Twitter? This thesis aims at answering this question through analyzing the concept of identity (theft), the legal implications, the characteristics of fake profiles, and the structure of the network. Table of Contents Acknowledgements……………......................................................................................................... 03 Introduction……………..................................................................................................................... 04 Related work………………………………........……………………………………………………............……..… 05 Approach…………………………………………………………………………………………….……….………......... 06 Chapter organization….…………………………………………………………………….……….………......... 06 1. Identity……………………………………………………………………………………………………………………………….... 07 1.1 What is defined as an online identity?................................................................... 08 1.2 Identity on social networking sites............................................................................ 09 1.3 Juridical: General identity rules……………………..……………………………………………………..... 11 2. Online identity theft….……………………………................................................................................. 11 2.1 Online identity theft (compared to offline identity theft)………………..…………………..... 12 2.2 Identity theft on social network sites........................................................................ 13 2.3 Automated identity theft (bots)………………………………………………….…………................. 15 2.4 Potential dangers and impacts.................................................................................. 16 2.5 Cases………………………………………………………………………………………………………................. 17 2.6 Juridical: Online identity theft and Internet Law……………….……………………………......... 18 2.6.1 American legislation.................................................................................... 19 2.6.2 Dutch legislation………………………..………………………….........................…........ 21 3. Twitter..…………......................................................................................................................... 22 3.1 Introducing Twitter………………………………………………………………………………………...…..…. 22 3.2 Juridical: Twitter’s policy………………………................................................................... 23 3.2.1 General terms and services…………………………….....………………………............ 23 3.2.2 Twitter’s identity policy……………………………………….…..…………………….......... 24 4. Network analysis..…………………………………………………………………………………………….……................. 26 4.1 Social graph……………..……………….………………………………………………………………............. 27 4.2 Correlation……………..……………….………………………………………………………………............... 28 5. Characteristics of fake profiles………………………………………………………….………….………….........…… 28 6. E‐discovery………….…………........................................................................................................... 32 6.1 E‐discovery on social media platforms……………………………………………………………………. 33 6.2 Juridical: ranges and restrictions of e‐discovery……………………………………………........... 33 7. Existing discovering methods………............................................................................................. 34 7.1 Algorithms……….......................................................................................................... 36 7.2 Linkability ……….......................................................................................................... 38 1 7.3 Language analysis…………….......................................................................................... 40 7.4 Ranking social entities………………………………………..……………………………………………………. 40 7.5 Social bots…………………………………….................………………………………..................……… 42 8. Results……………………………………………………………………………………………………………………………………. 43 9. A tool proposed…………………………….……………………………………………………………………………………….. 44 9.1 Case study…………………………….…………………………………………………………………….…………... 46 10. Conclusion……………..................................................................................................................... 50 Bibliography……………....................................................................................................................... 54 Appendix……………............................................................................................................................ 63 2 ACKNOWLEDGEMENTS This thesis would not have been possible without the help of: 
Geert Lovink, who listened carefully to my interests at my first thesis meeting, informed me of e‐discovery (a term I had never heard of before that day), and introduced me to his colleague at the Amsterdam University of Applied Sciences (HvA), Hans Henseler 
My supervisor Hans Henseler, lector and associate professor e‐discovery at the HvA, for initiating the idea of studying fake identities. I would like to thank him for his support within the process, being critical, and encouraging me to put more practical research and more of my own findings into this thesis 
Willem Koops, who inspired me with his enthusiastic speech at the Symposium E‐Discovery 2012, on the field of tension between e‐discovery and digital juridical evidence 
Ewoud Sanders, for writing a very useful guide on doing electronic research and providing this freely at the E‐Discovery Symposium. The booklet gives many tips and tricks for smarter searching on the Internet and in documents, making my research more productive 
Richard Rogers, my supervisor from the University of Amsterdam, for encouraging and supporting me with my struggling of writing my thesis and giving me very useful and constructive feedback in the last stage of writing Also, many thanks go out to Michael Dieter (my thesis coordinator, always being very fast in answering my e‐mails with practical questions), Joe Mier (for proofreading my thesis on language and spelling), and Maarten Groen & Wouter Meys (both working with Twitter data and advising me on my research). 3 INTRODUCTION In 2006, the 49‐year‐old American woman Lori Drew created a fake profile on MySpace to bully a 13‐
year‐old girl named Megan Meier, the former friend and now ‘rival’ of Lori’s own daughter. She pretended to be a 16‐year‐old boy, with the alias ‘Josh Evans’, to befriend Megan and find out what the girl was saying online, after she was concerned that Megan was spreading rumors about her daughter. Lori also used the fake profile to flirt, date, and finally break up with Megan. Pretending to be a boy, she continued to bully Megan, until the girl committed suicide (Sterritt 2). This example is one of the most well‐known cases in cyber‐bullying, a type of cybercrime, involving the creation of a fake social networking profile. Nowadays, there is increasing attention to the topic of cybercrime, which is currently one of the most omnipresent and challenging types of crime there is. The city of The Hague will soon create a European center for cybercrime. This was announced on March 28, 2012 by Cecilia Malmström, the European Commissioner for Internal Affairs. The institute will be part of Europol, the research bureau of the police organization of member states of the European Union. The goal of the center is to coordinate the fight against online crimes, including child pornography, credit card fraud, online identity theft, terrorism, and viruses. Malmström states that we cannot allow criminals to disrupt our digital lives and calls on the cybercrime market to be “more lucrative than the worldwide trade in marijuana, cocaine and heroin combined” (Nielsen). Not only cybercrime in general, but also imposter fraud, is very much playing on the minds of politicians these days. On their October session in 2006, The Committee on Consumer Policy (CCP) agreed to provide input to the ‘Future of the Internet Economy Project’, presented by the Organization for Economic Co‐operation and Development (OECD) and approved by its thirty member countries, including the Netherlands. As a main theme, cyber fraud was identified with the particular focus on identity theft. Also the Dutch House of Representatives wants action against the use of someone else’s name on social media. Members of Parliament of the political parties SP, PVV and D66 want this to be prosecuted criminally. The three parties have, separately from each other, posed an inquiry to minister Ivo Opstelten from the Ministry Security and Justice (Schoemaker). The direct cause for this was the broadcast of the Dutch television program De Wereld Draait Door on January 9, 2012, which broadcasted an item on this issue. Guests were actor Jan Kooijman and lawyer Bénédicte Ficq, who both suffered from imposters tweeting under their names. Ficq advocated a new article in the Dutch Penal Code where in the false adoption of someone else’s name on the Internet is punishable. According to her, there is currently not yet anything that can be done 4 against online identity fraud. Since 2005, all inhabitants of the Netherlands, starting from the age of fourteen, are obliged to keep their identification card with them at all times. Online however, different rules apply. Member of Parliament of the Socialistic Party, Nine Kooiman, asked the question whether the criminal law suffices in the point that criminal damage can be done without the ‘identity theft’ leading to fraud, misrepresentation, defamation, or “any other article in the Penal Code”. If this cannot be addressed in the current legislation, Kooiman wants a requirement for social media companies to identify the true person behind fake accounts (Schoemaker). Related work Many studies have been done on identity presentation and profiling on profile‐based websites and social network sites (Marwick & boyd 98) but are lacking on the subject of fake online identities (impersonation). On the other hand, a lot of research has been done on identity fraud, but not so much related to the topic of social network sites and Twitter specifically. Also, the empirical scholarly literature in this area is limited to the closely related issue of online privacy (Milne, Rohm and Bahl 219). This is where the gap in the field lies. Research that does approach the topic of identity theft in social networking sites is that of Bilge et al. They created fake identities in two ways. One way was cloning an already existing account and adding the same contacts, while the other one was creating a profile on a social network site where the person (present on other online social networks) did not have a profile yet, which is called ‘profile porting’ (Cutillo and Molva 96). The (automated) ‘attacks’ resulted to be feasible in practice. Although this research approaches the topic of identity theft, it focuses on testing whether and how it is possible to create a fake profile. There is also a lot written about countering identity theft on both the side of the user, through personal information management, data minimization, and care of the ‘virtual self’ (Whitson and Haggerty), as well as on the side of the social networks, through improving the authentication and verification process for instance (van Oorschot and Stubblebine). Instead of looking at creating fake profiles, this research focuses specifically on detecting them. The objects of research are adversaries that use a fake identity with purposes that go beyond the purpose of simply privacy protection. So accounts using a nickname, a pseudonym, or a fake name just to obscure someone’s real identity (the so‐called ‘fakester’) (Marwick and boyd 105), valid accounts with impersonated screen‐names (Orita and Hada 17), and faux accounts or Phweeters (phony tweeters; pranksters with a clearly fake account) are not treated. My research is about impersonating for the purpose of identity theft, using someone else’s name and often a real picture of that person with displayed user information. 5 Approach The approach for my thesis is studying the fake profile phenomenon within the light of electronic discovery, since online identity theft is a form of cybercrime. Electronic discovery, also called e‐
discovery, is a computer forensics term referring to discovery in civil litigation dealing with the exchange of information of electronic format. Because e‐discovery and law are inextricably linked to each other, also much attention will go to the latter, in the context both of the Netherlands as in America and California (where Twitter is based). Besides this theoretical framework, I will be using theories on identity (theft) to clarify the phenomenon and use network culture and graph theories to go deeper into the structure of false and real profiles. My object of study will be the micro‐blog service Twitter. One of the main questions I will be trying to answer through this thesis is if it is possible to detect fake profiles on Twitter and in what ways this is possible. I will summarize the already existing tools and research done in this field, analyze whether the (proposed) tools or methods are effective for detecting fake or impersonated profiles specifically, and – where possible – suggest improvements. I will also provide a scheme with the characteristics of fake profiles and a list of e‐discovering tools that can possibly help in testing profiles on these points. Finally, through the use of a case study I will test whether these methods are actually successful by applying them to profiles that are known to be fake or real. Chapter organization Chapter one will explain the concept of identity, with the focus on the online identity, in particular on social networking sites. The next chapter will go deeper into the phenomenon of impersonation, both performed by humans as wells as by bots (automatically). In what forms and ways can identity theft on online social networks occur? What harms can be caused by identity fraud? This chapter will be illustrated by examples and cases of both famous and non‐famous persons. In chapter three, Twitter, the object of this research is studied. All of the previously mentioned chapters also include a juridical paragraph. What does the law say about identity (theft)? And what does Twitter mention about it in their policies? Chapter four is a network analysis, describing Twitter’s correlation and how its graph is structured. The characteristics of fake profiles are described in chapter five. Chapter six will go deeper into the concept of e‐discovery and its ranges and restrictions, followed by a summary of already existing discovery methods, ordered by their type of approach in chapter seven. Their strong points and limitations will be treated in chapter eight. Chapter nine involves a case study of a proposed detection method based on online e‐discovery tools. This thesis is ended by a conclusion of my research. 6 1. IDENTITY Sociologist Erving Goffman (1963) distinguished three different categories of identity: individual identity, social identity, and legal identity (Finch 87). Individual identity is the sense of self that is based upon internalization of all that is known about oneself. It is not a static construction, but is constantly developing and rearranging in line with the life experience of the individual. Individual identity can be predisposed by the way society receives an individual. Social identity on the other hand is contingent upon the way in which individuals present themselves. Goffman defines it as “the categorization of an individual to determine the acceptability of the membership of social groups”. Even though both individual and social identity may be affected by identity theft, neither of these two can be stolen. Only legal identity has the potential to be adopted and abused by others (Finch 88). This research will explore the grey field between Goffman’s idea of the social identity, as presented by individuals on online social networks, and the legal identity, as legitimate when talking about identity theft and the key factor when referring to identity in e‐discovery terms. In order to understand the concept of identity fully, there are a few subsets and stakeholders that must be explained. The individual that is subject to his or her identity is called the identity owner. This person possesses the legal right to own and use his or her identity and can apply to obtain identity certificates for various determinations related to financial services and social activities throughout his or her life. This includes the birth certificate, a driving license, a passport, a bank account, and so on as issued by an identity issuer (Wang, Yuan & Archer 33). An identity issuer is a reliable government or private institution that issues these identity certificates for a finite time period with the purpose of authorizing specific financial or social right to the individual. The certificates usually contain six information constituents: certificate identifier (for instance a passport number), certificate receiver (the name of the owner), certificate purpose (citizenship for example), the certificate issuer (the government for instance), validation time period, and the issuer’s signature or certification. In addition to this, a certificate also contains a verification mark of the user, like a photograph or a fingerprint, and the identity of the certificate authorizer, as for example a stamp or watermark (ibid 33). The service provider verifying the holder’s authenticity and eligibility is called the identity checker. The identity checker (the traffic police or a customer officer for instance) has to verify both the owner’s certificate and its identity. They verify identity by comparing the identity certificates with other identifying information, for example the physical appearance. The responsibility of the identity checker is protecting identity information against intrusions and alerting identity owners if this is discovered (ibid 34). The last player is the identity protector, an individual or organization working to protect both identity owners as well as issuers and checkers from identity 7 theft. Most of the protectors have the legal right to detain and penalize offenders. The identity protectors can be government policymakers, law enforcement agencies (such as the FBI in the United Sates), public and private security‐service providers, technical security‐solution providers, and the law system itself (ibid 34). As people go about their daily lives they actively invoke or unintentionally draw upon a number of bureaucratic identity markers. This way they produce yet more information about their behaviors. From buying something with a debit card to making a phone call; an increasing series of activities leave informational traces adding up to everyone’s identity profile, which is also referred to in literature as ‘data double’ (Haggerty & Ericson, 2000), ‘digital person’ (Solove, 2006), or ‘virtual self’ (Agger, 2003) (Whitson and Haggerty 574). 1.1 What is defined as an online identity? “On the Internet individuals construct their identities, doing so in relation to ongoing dialogues not as acts of pure consciousness” (Poster) An online identity is a social identity that an Internet user establishes in online communities and websites (Bowker 328). Online identities are emergent. Social identity is by definition a group project: something created by the context in which the identified operates (Crawford 211). Individuals are continuously watching and learning from people around them. Everyone who makes up their ‘group’ has a hand in their identity, and individuals emerge over and over again changed by the interactions they have with that group. Identity and reputation go hand in hand, as individuals achieve reputations that are connected to particular contexts and groups. What is new about virtual worlds is that they make this group‐shaping clear. As game researcher Richard Bartle puts it: “the celebration of identity is the fundamental, critical, absolutely core point of virtual worlds” (Crawford 213). The Internet has revealed another world to investigate ideas regarding identity. Free from the physical constraints of the body, the virtual world provides an environment where anonymity can be easily acquired, and an online persona, similar to psychiatrist Carl G. Jung’s (1976) notions of a public ‘mask’ can be easily exploited (Huffaker 2). When logging onto the Internet, a user can choose to use a different (screen‐)name for instance. Research from 2012 shows that approximately over 75 percent of social media users prefer to hide their real name online, however a majority of the users (50 percent) keep holding their unique pseudonym (Orita and Hada 17). “In a difference to physical identity, online identity, depending heavily on context, can be altered in a matter of seconds” (Edwards). Because there is no frequent face‐to‐face interaction and it is easy to say or be anyone 8 online, identity is more fluid online than offline. In the online environment there is a different relationship between the self and the body (Saco 131). Because there are no direct physical means in which an individual can be judged by, language becomes very important (Edwards). Online style is one marker of identity in the online environment (Wong 148). However, the physical body is not totally absent in the online environment; it has a big influence on how we judge and create who we are online. The online identity knows three main structural elements: unlinkability (it cannot be determined whether two attributes can be linked to each other, for example if two messages are sent by one person), undetectability, and unobservability (did a certain occurrence really happen?) (Pfitzmann and Hansen 6). Additionally, the pseudonym is defined as partial identities of entities, whose transactions are linkable. Especially the (un)linkability element is highly related to identity management. The higher the unlinkability degree, the stronger the anonymity is. The existence of linkability in transactions implicates that the observer can clearly distinguish an existing or non‐
existing relationship between the transactions, meaning if messages posted to a social networking site seem to be written by the same user, then these posts are linkable (Orita and Hada 18). 1.2 Identity on social networking sites In today’s modern society, online social media and social network(ing) sites are ubiquitous. Social network sites have been defined as “web‐based services that allow Individuals to (1) construct a public or semi‐public profile within a bounded system, (2) articulate a list of other users with whom they share a connection, and (3) view and traverse their list of connections and those made by others within the system. The nature and nomenclature of these connections may vary from site to site” (boyd and Ellison 2). Social network sites and communities are vividly changing how identity information is shared online (Stutzman 2). One of the main goals of social network sites is to encourage disclosure of personal information with others online (Nosko, Wood and Molema 407). Boyd argues that online social networks are a type of networked public with four properties that are not typically present in face‐to‐face public life and contribute to the online identity: persistence, searchability, exact copyability, and invisible audiences. “Because the digital world requires people to write themselves into being, profiles provide an opportunity to craft the intended expression through language, imagery, and media” (boyd 16). Since the popularity of social media and social networking sites, people are communicating, self‐expressing, and contributing to the development of culture and community, and they are doing so with more ease and frequency than they ever did before. The rise of social media brought a shift in the relationship between the virtual world online and the real world 9 offline, which increases the risk and danger related to Internet usage (Burrell 709). Emily Finch, a criminal lawyer who studies identity theft in cyberspace, states (96) that some people see their online life as a kind of game with rules and norms that do not apply to their daily life, referring to what Suler (322) calls the 'disinhibition effect' of the Internet. “Once they turn off the computer and return to their daily routine, they believe they can leave behind that game and their game identity. They relinquish their responsibility for what happens in a make‐believe play world that has nothing to do with reality” (Suler 323). In the usage of social networking sites, the concept of identity is separated into three layers. The top layer represents the users’ perception. Therefore, information for user validation might be concealed because of privacy reasons. It is allowed to choose any name as one’s screen‐name in transaction. The middle layer represents the treatment of the service provider, and the bottom layer represents identification of each user with online identifier or credential info (Orita and Hada 18). Identities are properties of the entities themselves. Online, an identity association with a particular resource or activity is determined by the presence of one or more identifiers normally linked to that identity. Identifiers have a range of ‘trustability’, dependent on their intended usage. Often this ‘trustability’ is a composite value, based on relationships between identifiers for several different online identities (Marshall and Tompsett 130). The layer classification shows that registered accounts and screen‐
names are in the different layer of identity representation and even an anonymous user stores reputation to its identity on service (Orita and Hada 18). Figure 1: Layered Identity Representation (Orita and Hada 18) 10 1.3 Juridical: general identity rules “Identity is not a matter of ‘rights’ that we can think of in the abstract or in advance. For this reason, having some centralized one‐size‐fits‐all ‘law of identity’ (and associated rights) does not make any sense” (Crawford 211) Online intermediaries, such as games and social networking sites, now have the ‘ownership’ of people’s online identities. Their identity is actually a database entry, and the intermediary can claim that someone’s identity is their intellectual property. “In other words, the ‘gods’ of the virtual worlds are making all the rules (or laws) about identity” (Crawford 212). But because there is no norm of transparency regarding these laws, since it is hard for an individual to understand or predict how his or her identity will be treated by the intermediary, responsibility is problematic (Crawford 213). The rules of the social networking sites may be effectively unreviewable by any terrestrial court or legislature (Crawford 219). “Because identities belong to identity owners, it is their responsibility to safeguard them” (Wang, Yuan & Archer 33). For this reason, identity owners should be aware of several risks, including identity fraud and the financial and legal damage this can cause. At the same time, they should take the responsibility and control for using their identity legally and ethically, and not maltreating their rights, such as letting relatives or acquaintances borrow their health cards, which could be harmful to those who issue identity certificates. It is the responsibility of the identity issuer to verify the receiver’s true identity before issuing a valid certificate that a validating institution can then use to authenticate this identity and to provide a protection mechanism if identity information seems to be stolen (Wang, Yuan & Archer 33). The concept of general identity is somewhat different from that of legal identity. Goffman describes the latter in terms of “a set of characteristics that are unique to the individual thus providing a way in which one person can be differentiated from another” (Finch 88). 2. ONLINE IDENTITY THEFT “Identity theft represents the dramatic moment when the tensions, ironies and contradictions inherent in the relationship between the dividual and its human doppelgänger are most starkly revealed” (Whitson and Haggerty 575) Identity theft and imposter fraud are amongst the fasted growing types of crime in the World, partly because of increased information technology. Identity theft can be done in different ways. Phishing, 11 skimming, hacking or, the study of this research: publishing under someone else’s name (impersonation). What is online identity theft? How much does it differ from ‘offline’ identity theft, such as the creation of false passports? Online identity theft is not a new phenomenon caused by social networking sites, but already existed with the emergence of the Internet. In the ‘pre‐Twitter’ era this mainly concerned domain names that were unjustly claimed (Security.nl). Also, identity theft has been the topic in many Hollywood movies. In addition, identity fraud itself is not a new form of crime, but does however make use of new techniques (Pontell 263). 2.1 Online identity theft (compared to offline identity theft) Identity theft is the criminal’s usage of someone else’s personal identity and other relevant information in unauthorized ways (Wang, Yuan & Archer 30). According to another definition from United States federal law, identity theft is “the knowing transfer, possession, or use of another’s means of identification without authority”. One’s ‘means of identification’ is defined as “any name or number that may be used, alone or in conjunction with any other information, to identify a specific individual” (Sterritt 10). The U.S. Federal Trade Commission broadly categorizes identity theft as misuse of existing accounts (85 percent of victims), opening of new accounts (17 percent), and misuse of personal information (17 percent). Multiple instances per victim can be the case (Mercuri 17). Identity theft has become a significantly growing problem. A key reason for this is the fierce growth of Internet applications, resulting in a more widely used identity information, providing an easy target for criminals. Not only is online information used to create a false offline identity, fake online identities are also made by copying or stealing online information. Traditional approaches to identity stealing include ‘dumpster diving’ (gaining personal information from utility bills and discarded credit cards) and the stealing of wallets and purses known as ‘shoulder surfing’ (copying personal identification information). Online identity theft is performed by phishing (asking targeted individuals by email to enter a website that simulates a trustable institution but can actually access and reveal the individual’s private information), ‘spoofing’ (attacking databases by sending messages to a computer coming from a source pretending the message is sent by an IP address from a trusted computer), and breaking into identity information storages and databases (Wang, Yuan & Archer 31). Stolen identities can be used to get loans or certain social benefits, to open credit card, telecommunication or utility accounts, to commit crimes with someone else having the risk of getting detected and arrested, or even to gain employment and start a new life. Creating a false profile on a social network site using someone else’s identity often has much more innocent purposes, mostly closely related to spam, such as viral marketing (spamming around links under a false name or social networking site profile to a set of seed users, based on the fact that people are more receptive to a 12 service or product recommended by their friends, inspired by the traditional ‘word‐of‐mouth’ marketing (Shrivastava, Majumder, and Rastogi 487). Offline identity theft is often detected a long time after the damage has occurred. The U.S. Federal Trade Commission reports that the average time between an identity theft event and the date of discovery is 12 months. The best way to prevent identity fraud is through restrictive authentication (Wang, Yuan & Archer 35). 2.2 Identity theft on social network sites Online social networking sites such as Twitter have far gone beyond the traditional networking service of connecting people together. They have attracted adversaries who exploit the sites as an effective medium to reach and possibly influence a large population of web users in diverse ways, varying from eliciting ‘online revolutions’ as the Arab Spring to committing online crime like impersonation (Boshmaf et al. 1). The three main security objectives of social networking sites are privacy, integrity (the identity and data of the user must be protected against unauthorized modification and interference), and availability (of user profiles with data access) (Zilpelwar, Bedhi and Wadhai 24). Some forms of identity theft attacks break several objectives, but still primarily focus on a vulnerability to one of these objectives, in which case they are attributed to only this objective. Social networking sites are susceptible to different attacks on all three levels by either insiders (legitimate parties) or outsiders (intruders) (Cutillo and Molva 97). Many social networking sites have a weak user authentication mechanism, mostly constructed by information such as displayed screen‐name, photo, and a set of common social links. This results in easy‐to‐exploit identity cloning attacks to establish a fake social link (Bhumiratana 681). Creating a user account on a social networking site typically includes three tasks: providing an active email address, creating a user profile, and sometimes solving a CAPTCHA (a type of challenge‐response test as an attempt to ensure that the response is generated by a person) (von Ahn et al. 294). Each user account maps to one profile, but many user accounts can be owned by the same person or organization using different email addresses (Boshmaf et al. 3). This account creation process can be fully automatic, when performed by bots. A fake profile can only seem real when the user has a decent amount of friends, or in the case of Twitter: other users to follow as well as followers. Bilge et al. show that most social networking site users are not careful when accepting connection requests that are sent to them. The authors did an experiment to test how willing users are to accept connection requests from fake and impersonated user profiles of people who were already in their friendship or followers list as 13 confirmed contacts. The results of their experiment show that the acceptance rate for impersonated profiles was always more than 60 percent. There are two popular ways in which identity theft takes place on social media. The first is when an imposter collects an individual’s personal information, such as name, address, and phone number, from an already existing social networking profile and then uses it to obtain credit or gain employment in the victim’s name, as well as gain access to already established accounts. This method is called ‘profile porting’. The second is when an imposter uses someone else’s personal information to create a fake profile on a social network site, which consequently can do harm to one’s status and conceivably one’s financial and criminal records. In addition, this latter form of identity theft through social networking sites can lead to other crimes facilitated through the Internet (Stutzman 3). Most people, especially ‘professionals’, have enough personal information in the public domain so that a third party can effortlessly create an accurate and misleading ‘online persona’ on a social network site (Sterritt 10). Besides anxieties concerning the protection of identity, disclosure of personal information, even if limited, can be enough when combined with other Internet‐based tools, such as reverse directory checks, to secure phone numbers, addresses, age and gender and other information that could leave an individual exposed (Nosko, Wood and Molema 407). Imposter fraud is especially a growing issue for brands and celebrities, who have often discovered fake profiles of themselves on social network sites (Stutzman 4). Given the enormous growth of social networking websites, and the high visibility that social media sites attain on search engine result pages for brand names and celebrity queries, social network user names have become very valuable. Consequently, cyber squatters, spammers, impersonators, and celebrity fans have jacked many high‐profile usernames (Malachowski 224). In this light, it is critical for trademark holders and celebrities to possess their social network usernames. These are valuable, since social media have altered the way in which celebrities and brands market themselves and how customers decide on making purchases (Malachowski 225). Online social media also provide brands with the ability to directly communicate with consumers, providing information, answering their questions, or solving their problems. In case of jacking, the brand’s message and its whole identity lie in the hands of someone else. The registration of a company’s trademark or the name of a famous person on Twitter with the intention to profit or cause confusion is called ‘Twittersquatting’. Erik Heels, a legal commentator found out that at the start of 2009, ninety‐three of the top 100 global brands have been ‘Twitterjacked’ (Heels). Impersonation is possible since social networking sites give out usernames on a ‘first‐come, first‐serve’ basis. A new user is free to pick any username as long as it has not yet been taken (Malachowski 226). 14 A similar concept to that of online identity theft is profile cloning or identity clone attack (ICA), which tries to create a fake online identity of a victim in a social networking site in an effort to trick their friends into believing the authenticity of the fake identity, to establish social links, and in turn, capture the private data of the victim’s friends or followers that is not shared in their public profiles (Bhumiratana 681). In this form of attack, the adversary first tries to find ways to attain personal information of the victim from his or her public profile on online social networks or his or her personal homepage(s). As a next step, the adversary forges the victim’s identity and creates a similar or even identical profile on social networking sites. Then contacts of the victims are sent friend requests. Once these requests are accepted, the adversary builds the victim’s friend network and gains access to profiles of the victim’s followers and friends. Another implementation is attacking through an automated, cross‐platform profile cloning (Jin, Takabi, and Joshi 27). But there are even more forms of impersonation. They include image retrieval and face recognition (a more sophisticated collection of data, probably related to automated face recognition algorithms for further profiling, where the impersonation attacks are due to a basic limitation), sybil attacks (the attacker aims to subvert a reputation system of an online social network by creating a large number of pseudonymous entities profiting from the shortcoming of the online social network to ensure that a profile is associated with a single real person), defamation and ballot stuffing (aiming at disturbing the reputation of a person using the system, or disrupting digital reputation systems) and friend‐in‐
the‐middle attacks (hijacking cookies or the HTTP sessions on the network layer to interact with the social network without proper authorization, accessing communications between the social network and the user) (Cutillo and Molva 97 and Zilpelwar, Bedi and Wadhai 25). 2.3 Automated Identity Theft Not all content on social networking sites are written by human beings. Social network identity infiltration on a large scale can happen through socialbots. These are computer programs that control social networking accounts and mimic real users (Boshmaf et al. 1). They have the ability to perform basic activities such as posting messages and sending connection requests. The thing that makes these social bots different from self‐declared bots (for example Twitter bots that post the latest weather forecasts) and spambots is that they are designed to be stealthy, meaning they are able to pass themselves as human beings. This permits the socialbot to be more influential through infiltration. This influential position can then be exploited for traditional impersonation purposes, for instance spreading misinformation and propaganda in order to bias the public opinion (Morozov). An example of such socialbot use is Realboy, an experimental project that mimics existing Twitter users. 15 Instead of impersonating users, the Realboys are imitating them (Coburn and Marra 1). As socialbots infiltrate a targeted social network profile, they can further harvest personal users' data. In the Internet black market, socialbots are offered for sale at twenty‐nine dollars per bot (Boshmaf et al. 1). Social networking sites use CAPTCHAs to prevent automated bots and socialbots from misusing the network. However, a rival can often avoid this countermeasure by using various techniques such as automated analysis using optical character recognition (manipulating botnets to trick infected victims into manually solving CAPTCHAs), reusing session IDs of known CAPTCHAs, cracking MD5 hashes of CAPTCHAs that are validated at the client side, and putting in cheap human labor, also known as CAPTCHA‐breaking business (Boshmaf et al. 2). Attacks can be implemented by the prototype system iCloner for instance, that consists of various components that are able to crawl popular online social networks, collect user information, automatically create profiles, and send friend requests and personal messages. Moreover, iCloner also supports CAPTCHA‐breaking capabilities (Bilge et al. 552). 2.4 Potential dangers and impacts “Identity theft in any form takes an incredible toll on its victims” (Sterrit 12) Cases of identity theft have escalated as social network sites have become more popular (Ahearn). A 2009 Gallup Crime Survey designated that identity theft provokes greater concern among Americans than any other crime with two in three adults worried about falling victim to identity theft (Sterritt 3). While creating fake social network profiles can be innocent, a maliciously created false profile can lead to personal and economic harm (Kay 2). John Douceur describes a sybil attack (in a peer‐to‐peer network), in which a malicious user creates multiple identities and uses them to bias the outcome of an online voting process. The hijacking of Twitter accounts could be used for various criminal activities, such as cyber stalking, online bullying, and identity theft. Twitter accounts will only be removed when enough evidence has been collected, or after significant damage has taken place. As the next chapter will illustrate, the consequences can be seriously harmful. “Identity theft is one of the worst frauds that can be perpetrated by one individual to another” (ShockWaveWriter). There is harassment and the danger of assault. But there are more consequences of identity misappropriation. The creation of fake profiles can be trivial, for instance in the case when pedophiles pretend to be a child to contact their victims (Hughes et al. 125). Identity theft attacks on Twitter can cause, besides emotional harm, also serious financial problems. In one such an attack, hackers made off with over three hundred personal and confidential documents. And these documents went far beyond just the individual’s birthday and personal interests. Some of the 16 documents included credit card numbers, PayPal accounts, and even security codes for the office buildings of companies such as AOL, Dell, Ericsson and Nokia (Nelson, Simek and Foltin 28). In case of successful imposter fraud, the traces always lead to the victim instead of the adversary. The victim has to prove he or she did nothing wrong, which is in most cases a difficult task. Identity fraud is potentially damaging to a victim’s online and offline persona. Because many Internet applications now track an individual’s online identity to increase functionality and accuracy, dilution caused by misappropriation can harm the functionality of user‐driven applications (Burrell 711). In addition, just as with offline identity theft, victims of online imposter fraud dedicate large amounts of time to clearing their names (Sterrit 12). “It generally takes about 44 months to clear up their cases, and victims report that they spend on average 175 hours actively trying to restore their credit rating and to clear their good name” (Mihalko). 2.5 Cases Sophos, an IT‐security company, states that 21 percent of Web users report that they have been the target of malicious programs that ‘hijack’ their profiles on social network sites (Stone). The hijacker acts on behalf of the social network site used. Security experts say social network profiles are key targets for profile hijackers, for the reason that the user’s friends and followers implicitly trust the message coming from a friend (Kay 4). The past years, the media has reported on many cases of identity theft on social networking sites, mostly from famous persons. Reason for this is that celebrities are commercially more appealing, and they are more likely to reach the attention of media outlets over ‘regular’ persons. In this chapter, some examples of known cases will be given and explained briefly. As mentioned earlier, it is especially celebrities that suffer from online identity theft. Among the celebrities who have fallen victim to impersonation are American president Barack Obama, former Secretary of State Condoleezza Rice, Microsoft founder Bill Gates and rappers Notorious B.I.G. and Kanye West (Malachowski 228). The most publicized case is that of Tony La Russa. In 2009, Twitter users were shocked about the statements that La Russa, former manager of the American baseball team St. Louis Cardinals, had tweeted. He said many rude, ‘derogatory’ and ‘demeaning’ things to his followers, for example: “Lost 2 out of 3, but we made it out of Chicago without one drunk driving incident or dead pitcher” (Matier and Ross). In many of his statements, his team or players were insulted. As it turned out, La Russa was not the one who wrote the tweets, but was as surprised as anyone else. His identity had been stolen to create a false profile of him (Kay 2). The former manager sued Twitter for trademark infringement, false designation of origin, trademark dilution, 17 cybersquatting, misappropriation of name, misappropriation of likeness, invasion of privacy and intentional misrepresentation. His profile was deleted by Twitter within half an hour after the complaint. La Russa dismissed the claim after Twitter moved the case to federal court. Biz Stone, co‐
founder of Twitter, wrote on his blog that La Russa’s suit was “an unnecessary waste of judicial resources bordering on frivolous” (Malachowski 231). In 2007, Sarah Palin sent a message through her Twitter account “AKGovSarahPalin” making apologies for false information coming from an imposter behind the fake Twitter account “EXGovSarahPalin” (Sterritt 4). Penguins’ players Sidney Crosby and Evgeni Malkin became victim of online identity theft when imposters created fake MySpace and Facebook profiles to ask for money for the stated, but false, purpose of benefiting a Minneapolis park (Sterritt 4). The Moroccan Fouad Mourtada was sentenced by court to three years in prison for creating a fake profile of the king’s brother on Facebook (Williams). The identity theft problem, however, ranges beyond the world of the celebrities and also affects the average person, who probably does not have the resources or influence to resolve the problem before serious harm results (Sterritt 4). In the case of non‐celebrities, the fake profiles are often made for bullying, states Nine Ludwig, chief‐editor of the Dutch social networking site Hyves (Klooster). There is, for example, the case of the Dutch Middle‐East specialist Bertus Hendriks, whose fake account made him seem like an anti‐Semite (Klooster). In Oregon, the ex‐boyfriend of Cynthia Barnes made a fake Yahoo profile in her name, which contained her address, phone number, and nude photographs of her (Sterritt 4). An assistant principal from San Antonio Texas sued two students for creating a fake MySpace profile that deceptively portrayed her as an immoral lesbian with a sex problem. The students listed the assistant principal’s place of employment and phone number and uploaded explicit pictures and comments (Kay 3). One of the most extreme cases is United States v. Lori Drew, mentioned in the introduction of this thesis. 2.6 Juridical: Online identity theft and Internet law The ease of communicating with large numbers of people through social network sites has led to emerging legal problems that did not exist less than a decade ago (Kay 2). Online identity theft is not an offense per se. It is a crime in a few others. It is a cross‐cutting problem that violates not only consumer protection rules, but also security, privacy, and anti‐spam rules (OECD 3). Most of the victims of online identity theft do not know what legal remedies are there to address this problem (Kay 4). Many legal problems have developed because of acts committed over the Internet. Internet torts are significantly different from the “bricks and mortar world of traditional civil litigation in which family law and personal injury tort cases predominate” (Rustand and Koenig 87). The nature of 18 the plaintiff’s injuries, which is in most Internet cases financial loss, is one of the major differences between the two torts. Another difference is the given that 97 percent of Internet torts are intentional, while traditional torts are mainly inattentions (Kay 13). Additionally, Internet torts are almost always anonymous. To elude alarming expression, courts usually uphold anonymity for people’s actions and postings on the Web, as is the case when someone creates a fake profile on Twitter. Before suing a person, the accuser must sue the social network site to receive a declaratory judgment that the website is obligatory to provide the personal details of the impersonator, which are essential for a court procedure. This causes plaintiffs extra legal obstacles (Kay 14). Common defenses to publicity claims and misappropriation are social commentary, criticism and parody, with the latter being the most likely used defense (Kay 16). Some social networking sites, including Twitter, actually allow parody profiles to be created (Kay 22). Read more about this in chapter 3.2, concerning Twitter’s policy. 2.6.1 American legislation Although the law is still struggling to catch up to this fairly recent development of identity theft on social network sites, in the United States the courts can rely on the tools to provide legal remedy to victims of fake profiles that have been part of the American jurisprudence for quite some time (Kay 4). Both Congress and state legislature criminalized identity theft (Sterritt 35). In the 1980s, the United States Congress enacted the Computer Fraud and Abuse Act of 1986. The Texas State Legislature followed their lead by enacting an online harassment statute. This statute makes it a crime to “[use] the name or persona of another person to create websites or post messages on social networking sites” (Kay 23). However, victims that do not qualify for protection under these acts are often left without legal recourse. In 1977 the American Law Institute published the second edition of its Restatement of Torts. Included in this was section 652A, which distinguished four categories of ‘invasion of privacy’, with one of them being ‘misappropriation of name or likeness’. This is the cause of action that protects an individual from unauthorized use of his or her identity (Kay 6). The elements for establishing a misappropriation of name or likeness are: 1) the defendant’s use of the plaintiff’s identity (not only applied to the accuser’s name or picture, but also his or her allusions), 2) the appropriation of the plaintiff’s name or likeness to defendant’s advantage, commercially or otherwise (the accuser must prove the defendant has gained in some way), 3) lack of consent (the plaintiff must prove that he did not consent to the defendant using the plaintiff’s identity), and 4) resulting injury (the defendant’s actions resulted in actual harm) (Riley 417). The property right that 19 an individual has concerning his or her identity is the ‘right of publicity’. This is the inherent right for everyone to control the commercial use of his or her identity. The three elements that make up the prima facie case (denoting evidence that the case is sufficient to prove a particular proposition or fact) of a violation of a person’s right of publicity are 1) validity (the plaintiff must prove that the defendant used the plaintiff’s identity without permission), 2) infringement (the plaintiff must prove that the defendant used his or her identity without the accuser’s consent), and 3) damage (protecting the plaintiff from losing the benefit of his or her work put into creating a marketable image) (Kay 10‐12). The causes of action for both violation of right of publicity and misappropriation for name or likeness have been extended to acts committed over the Internet (Kay 17). However, victims of imposter fraud are often left without any protection or legal resource. Generally, celebrities have a claim to right of publicity while ‘private persons’ only have a claim to misappropriation of name or likeness. While a private person’s typical injury will be mental distress and embarrassment, a celebrity is most likely concerned about the commercial loss that comes with impersonation. Because celebrities are more commercially valuable, they will be more likely to have a property right in their identity. Courts usually hold rigidly to this distinction (Kay 13). American courts have dealt with the issue whether social networking sites can be held responsible for information posted on their website. The answer was that they are not liable for information posted by third party users (Samson). Now, courts are starting to hold people responsible for the information they post online. Kay suggests three ways the courts should look at the problem of impersonation on social networking sites, depending on the situation (18). A claim to misappropriation of likeness or a claim to right of publicity should be dismissed when a user has created an obviously fake profile. That of a cartoon, for instance. It involves no harm because the entities are not real people. Then there is the case of online identity theft of a private person. Traditionally, non‐famous people have not been allowed to have a viable claim to a violation of right of publicity. In this claim, the plaintiff has to prove a commercial loss caused by the impersonation of his or her identity, but this will not be very likely since their identities normally do not have enough protectable property interest. For this reason, in the case of a false profile, courts should apply the traditional tort of misappropriation of name or likeness, because the appropriation of a victim’s name or likeness will be to the defendant’s advantage, also in a non‐commercial way. In the case of celebrities, the right of publicity cause of action is the most logical cause of action. Their public image is very important and usually a lot of time and money is spent on cultivating it. Additionally, the infringement and validity of a celebrity will be easy to prove in court. 20 In California, the home base of Twitter, a new law was legislated in 2011 specifically aiming at preventing imposter fraud on social media accounts. Pursuant to the provision in Section 528.5 of the California Penal Code, it is “a misdemeanor for any person to knowingly and without consent credibly impersonate another actual person on the internet for the purposes of harming, intimidating, threatening, or defrauding” (Tsoutsanis). The penalties involved boil down to a maximum of one‐year custody plus or else a fine of one thousand dollars. The intention of the requirement of ‘credible impersonation’ is to guarantee that parody accounts are still permissible, on the condition that the account is clearly a fake or parody (Tsoutsanis). This requirement is similar to that imposed on Twitter itself. Read more about that in the following chapter. Recently, the problems caused by online identity theft resulted in another new American law: the ‘Identity Theft Penalty Enhancement Act’, boosting criminal penalties for phishing and other types of identity fraud (van Oorschot and Stubblebine 32). 2.6.2 Dutch legislation The Handreiking politie Identiteitsfraude (3) (identity fraud guide of the police) defines the concept of identity fraud in short as the “obtaining, appropriating, possessing or creating of false identification means and thus committing or having the intention to commit unlawful acts”. This definition is very much similar to that of the US federal law. The introduction of this thesis already mentioned Bénédicte Ficq, the lawyer that was impersonated on Twitter and threatened in response to things that were said on the fake account. Some of her ‘followers’ were fake profiles as well, mostly other high‐status persons like Piet Heijn Donner, but others were real persons, often colleagues of her, that did not have a clue the profile was fake. They in turn contribute to the legitimacy of her impersonated profile. On the Twitter account, there were undifferentiated things written about her clients and cases. Ficq explains she would never do this, because of her professional pledge of secrecy. She wrote and proposed bill 310A (bill 310 is ‘theft’ in general) concerning identity theft. “It was a bit of a playful action, but I am hoping that the legislator will do something with it. Law 310 is theft, but if I compare the theft of my car to the theft of my identity the former is much more innocent” (Ficq in De Wereld Draait Door). The law Ficq proposes reads as follows: “Anyone who unlawfully appropriates the unique identity of a person, abuses this identity digitally or otherwise by using, operating, and exploiting it as real and genuine, digitally or otherwise, for any purpose whatsoever, is guilty of identity theft, and punished by imprisonment not exceeding four years or a fine of fourth category (€19,000)." 21 The proposal received quite some criticism. Opponents state it is not really clear whether there is any added advantage in introducing new legislation on ‘identify theft’, because the use of someone else’s personal name without approval (impersonation) is already prohibited in the Dutch Civil Code (Article 1:8) (Tsoutsanis). The registration of a Twitter account under someone else’s name will be in many cases an ‘unlawful processing of personal data’, protected by the Wet Bescherming Persoonsgegevens (the law protecting personal data). In addition, the use of someone else's photo on a fake Twitter account for instance is a clear violation of image rights. Both grounds enable a victim of such a fake social networking account to gain an injunction and financial compensation (Tsoutsanis). However, reputational and integrity damage and other non‐physical pain has then already been caused and hard to turn back. 3. TWITTER What does Twitter do to prevent identity theft from happening and what can be done when detecting a fake profile? Is it Twitter’s ‘responsibility’? What is stated in the terms and conditions? The Dutch social networking site Hyves has for example community managers (a kind of ‘Hyves police’) that are 24/7 available to react on abuse on their SNS, including fake accounts, which are normally deleted in four hours (Klooster). Twitter has an account‐verifying method for establishing authenticity, but this seems to be reserved for ‘superstars’; it is closed for the public. 3.1 Introducing Twitter Twitter, launched in 2006, is a microblogging service, originally developed for mobile phones. On the social network, people can post short, 140‐character‐limited text messages called ‘tweets’. These tweets create a constantly‐updated timeline, or stream of updates and messages ranging from humor and reflections on life to links and breaking news (Marwick & boyd 3). Unlike most other social networking sites that oblige users to grant friends links to other users befriending them, Twitter works with a social‐networking model called ‘following’ (Weng et al. 1). This is a directed friendship model: users can choose Twitter accounts to ‘follow’ in their stream without having to ask their permission, and they, on their turn each have their own group of ‘followers’. “There is no technical requirement of reciprocity, and often, no social expectation of such” (Marwick and boyd 3). Twitter employs a clear distinction between public and private messages. Most of Twitter’s functions are accessible directly via an API (application programming interface) (Bhumiratana 684). Twitter enables people to claim usernames in the URLs of their profile pages. These usernames are called ‘handles’ (Malachowski 225). Twitter has gained vast popularity since the day it was launched. It has also drawn growing attention from the academic research community (Weng et al. 1). 22 3.2 Juridical: Twitter’s (identity) policy Firstly, a brief summary of Twitter’s Terms of Service will be given, as extracted from their website www.twitter.com/tos on May 16, 2012. Next, Twitter’s specific policies concerning impersonation and identity theft will be treated. 3.2.1 General terms and services Twitter’s Terms of Service govern the user access to and use of the services on Twitter’s website (the services) and any information, text, graphics, photos, or other materials uploaded, downloaded or appearing on the services (collectively referred to as content). The user’s access to and use of the services is conditioned on their acceptance of and compliance with these terms. By accessing or using the services, the user agrees to be bound by these terms. The basis term states that the user is responsible for their use of the website, and for any content posted and uploaded on it, and for any consequences thereof. The content a user submits, posts, or displays will be able to be viewed by other users of the services and through third party services and websites. In the ‘account settings’ page, the user can control who sees their content. “You may use the Services only if you can form a binding contract with Twitter and are not a person barred from receiving services under the laws of the United States or other applicable jurisdiction. You may use the Services only in compliance with these Terms and all applicable local, state, national, and international laws, rules and regulations.” (Twitter Terms of Service) These terms and any action related thereto will be governed by the laws of the State of California without regard to or application of its conflict of law provisions or the user’s state or country of residence. Twitter’s services are continuously evolving, and the form and the nature of the services can change from time to time without prior notice to the users. Twitter also retains the right to create limits on use and storage at their “sole discretion” at any time. Any information provided to Twitter is subject to their Privacy Policy. The user is responsible for safeguarding his or her password and any activity or action under this password. The terms of service state that Twitter cannot and will not be liable for any loss or damage arising from the user’s failure to comply with the above requirements. Twitter reserves the right at all times to remove or refuse to distribute any content on the services and to terminate users or reclaim usernames. They also reserve the right to “access, read, preserve, and disclose any information as we reasonably believe is necessary to (i) satisfy any applicable law, regulation, legal process or governmental request, (ii) enforce the Terms, including investigation of potential violations hereof, (iii) detect, prevent, or otherwise address fraud, security 23 or technical issues, (iv) respond to user support requests, or (v) protect the rights, property or safety of Twitter, its users and the public” (Twitter’s Restrictions on Content and Use of the Services). 3.2.2 Twitter’s identity policy Part of Twitter’s terms of service are Twitter’s rules. The first bullet point on the list of limitations on the type of content that can be published with Twitter is ‘impersonation’, saying: “you may not impersonate others through the Twitter service in a manner that does or is intended to mislead, confuse, or deceive others.” The second bullet point is ‘trademark’ (“we reserve the right to reclaim user names on behalf of businesses or individuals that hold legal claim or trademark on those user names. Accounts using business names and/or logos to mislead others will be permanently suspended”). Also on the list is ‘misuse of Twitter Badges’ (“you may not use a Verified Account badge or Promoted Products badge unless it is provided by Twitter. Accounts using these badges as part of profile pictures, background images, or in a way that falsely implies affiliation with Twitter will be suspended”) (Twitter’s Content Boundaries and Use of Twitter). The second list in the rules page is ‘Spam and Abuse’. It prohibits amongst other serial accounts, spam, malware and phishing, and username squatting: You may not engage in username squatting. Accounts that are inactive for more than 6 months may also be removed without further notice. Some of the factors that we take into account when determining what conduct is considered to be username squatting are: 
the number of accounts created 
creating accounts for the purpose of preventing others from using those account names 
creating accounts for the purpose of selling those accounts 
using feeds of third‐party content to update and maintain accounts under the names of those third parties Twitter’s extensive policy is not limited to its terms and rules. There is also a large section called ‘policy information’, consisting of several subsections, including an ‘impersonation policy’, a ‘name squatting policy’ and a ‘parody, commentary, and fan accounts policy’. Fake accounts have cropped up so often that Twitter has adopted its very own ‘impersonation policy’. According to the this policy, cases where in the policy is not applied are accounts in which the user shares another user’s name but has no other commonalities, or profiles that clearly state that 24 they are not affiliated with or connected to any similarly‐named individuals. “Accounts with similar usernames or that are similar in appearance (for example the same background or avatar image) are not automatically in violation of the impersonation policy. In order to be impersonation, they must also pretend to be another person in order to mislead or deceive”. Upon receipt of an impersonation report from the individual being impersonated or a legally authorized representative, Twitter will investigate the reported accounts to determine if the accounts are in violation of their rules. After confirming the reporter’s identification, accounts determined to be in violation of the impersonation policy, or not in compliance with the parody/commentary policy, will either be suspended or asked to make edits. The ‘parody, commentary and fan accounts policy’ allows parody accounts that a reasonable person would know is a joke. Twitter will only edit or remove user content in cases of violations of their terms of service, such as having the clear intent to deceive or confuse. In order to avoid impersonation, an account's profile information should make it clear that the creator of the account is not actually the same person or entity as the subject of the parody or commentary. In guidelines for parody, commentary and fan accounts, Twitter listed some suggestions. 
Username: The username should not be the exact name of the subject of the parody, commentary, or fandom; to make it clearer, you should distinguish the account with a qualifier such as "not," "fake," or "fan." 
Name: The profile name should not list the exact name of the subject without some other distinguishing word, such as "not," "fake," or "fan." 
Bio: The bio should include a statement to distinguish it from the real identity, such as "This is a parody," "This is a fan page," "Parody Account," "Fan Account," "Role‐playing Account," or "This is not affiliated with…" 
Communication with other users: The account should not, through private or public communication with other users, try to deceive or mislead others about your identity. For example, if operating a fan account, do not direct message other users implying you are the actual subject (i.e., person, band, sports team, etc.) of the fan account. Role‐Playing: Twitter allows role‐playing accounts. If you are operating a role‐playing account that may include inflammatory or controversial topics, we suggest that you add a clarification to your bio, such as "role‐playing," in addition to complying with our best practices. 25 In order to report impersonation violation, the user must “include a detailed description of the information on the account that shows a clear intent to deceive others by using your real identity. This could include @replies, links to reproduced content, or links to Tweets”. Kay says about these policies: “they show that social network creators contemplated parody profiles and made decisions about whether or not the creator of a fake profile should be punished” (22). One of the points prohibited by Twitter is the misuse of Twitter Badges. In June 2009 Twitter announced that they feature a special ‘seal’ for the verified account (Katz). One can contact Twitter via an online form in an attempt to gain verification. The form asks for some basic information, such as the user’s name and official Web site. Putting a link to a Twitter user profile on the official Web page speeds up the verification process (Wilcox 2010). Two of the immediate causes to combat fake accounts were the cases of Tony La Russa and the Dalai Lama, who existed on Twitter but had never created his own account (Orita and Hada 17). However, this solution is still restricted for the public sector to famous users. Biz Stone, co‐founder of Twitter, states the ‘experiment’ in verifying accounts will begin with "public officials, public agencies, famous artists, athletes, and other well‐known individuals at risk of impersonation”. He added that the company hopes to verify more accounts afterwards, but verification will begin with a small set due to the resources required (Katz). Consequently, common users are still at the risk of having their identity stolen. Moreover, this system does not always work perfectly: the verifying of the fake account of Rupert Murdoch’s wife Wendi Deng happened accidently (O’Carroll and Halliday). Furthermore, even if the verified seal is adopted through the whole network, any screen‐name usage is allowed on the Internet. “As long as the usage of pseudonyms are allowed, the problem of impersonation still exists” (Orita and Hada 17). 4. NETWORK ANALYSIS There are two different types of social networks: the undirected graph network (examples include Facebook and LinkedIn) and directed graph networks, including non‐symmetric friends relationships (either reciprocated or one‐way) such as Twitter and Flickr (Bhumiratana 682). In most works of literature, Twitter is modeled as the directed graph G = (V ,A) which consists of a set V of nodes (vertices) representing user profiles and a set A of arcs (directed edges) that connect nodes. Unlike most other online social networks that are modeled as an undirected graph, following on Twitter is not a mutual relationship. Any user can follow another user and they do not have to approve (unless the profile is set to ‘private’) or follow back (Wang 3). There are four types of relationships on Twitter. Followers are the incoming links, or inlinks, of a node. The second relationship is the friend relationship. Twitter defines friends as the people whose updates you are subscribed to. In other 26 words, friends are the people whom you are following. Friends are the outgoing links, or outlinks, of a node. The third relationship in Twitter is the mutual friend relationship, when two users follow or are friends of each other. Finally, two users are strangers if there is no connection between them (Wang 3). 4.1 Social graphs A lot of research has been done so far in analyzing compound relations between individuals and organizations in the real world. They are easily observable in one of their possible representations: social relation graphs. In such graphs, the nodes typically represent individuals or whole organizations, while links between the nodes have the meaning of different typed relations. Social relation graphs are characterized by numerous specific properties. An example of such a property is the ‘small world’ phenomenon, meaning that two nodes in the graph are related with each another by relations of a small amount of other intermediate nodes. Psychologist Stanley Milgram for example showed in an experiment made in 1967 that the chain of social associates connecting one arbitrary person to another arbitrary person averagely needed just five intermediates (Brendel and Kawczuk 267). The social relation graph theory is also very suitable when analyzing digital networks, because they are already structured as a graph. “A social network is a social structure modeled as a graph, where nodes represent users or other entities (a group for instance) embedded in a social context, and edges represent specific types of relationships between entities. Such relationship may be based on real‐world friendship, common values, shared visions and ideas, kinship, shared likes and dislike, etcetera” (Jin, Takabi, and Joshi 28). In a sybil attack, it is assumed that the edges between attackers and victims are based on mutual trust, and are very small in number. This is very different from random link attacks, since a sybil group will have few edges with the victims (Shrivastava, Majumder, and Rastogi 489). The research of Watts and Strogatz showed that the neighborhood of a node in a social network has many more triangles than a node in a random graph (441). The neighborhood of a ‘good user’ typically contains group of nodes that also have edges between them. The number of links between a randomly chosen subset of nodes (victims nodes) however, tends to be very small. Characteristically, social networks are modeled as power law graphs, where it can be shown that the probability of an edge between a pair of randomly selected nodes is the formula 0(1:n), with n being the number of nodes in the network (Watts, Strogatz, Newman 2567). 27 4.2 Correlation A social network’s most distinctive property is the tendency to cluster. They usually contain many dense sub graphs. For example, if person A knows person B and person C in a social network, person B is considerably more likely to know person C than in a random network with similar degree distribution (Shrivastava, Majumder, and Rastogi 489). The study of Java et al. showed Twitter to be a network with a high degree correlation and high reciprocity, implying that although it is not uncommon to follow strangers, there are a large number of mutual acquaintances in the graph. In many cases, new Twitter users initially join the social network on invitation from their friends. Further, new friends are added to the network by browsing through user profiles and adding other known acquaintances. A high degree correlation signifies that users who are followed by many people also have large number of friends (Java et al. 59). Figure 2: Scatter plot showing the degree correlation of Twitter (Java et al. 61) 5. CHARACTERISTICS OF FAKE PROFILES Jin, Takabi, and Joshi have analyzed the characteristics of a faked identity based on its attribute and friend network characteristics (29). They assume that each identity on an online social network may have three different lists associated with it: a friend list that indicates other users in his friend network, a recommended friend list that social networking sites generate to recommend potential 28 new friends to users based on activities or common interests, and an excluded friend list that indicates people who users avoid from having in their friend network such as parents and bosses. It is assumed here that friendship relationships in the friend list are bidirectional, which may not be the case for the other two lists (Jin, Takabi, and Joshi 29). In the specific case of Twitter, the word ‘friend’ should be replaced by ‘follower’. The appendix includes a diagram showing the structures of friend lists in case of a faked identity. Profiles of celebrities that do not contain a link to the artist’s official website are usually fake. Other characteristics are that the account only links to press photos instead of personal ones, and that respond online to their ‘posers’ (Nation). An adversary usually creates a faked profile that has the same or a similar name to that of the victim, such as “Martin L. King” and “Martin King” (Jin, Takabi, and Joshi 29). In imposter fraud, there are usually more values that are the same or similar to the victim besides the name, such as the birthday. In Figure 1, the left part represents the victim’s profile and the right part is the faked identity created by an adversary. In the victim’s profile, the graduate school and city are set to be private (indicated by an *), the others are public. Figure 3: Attribute characteristics of a faked identity (Jin, Takabi, and Joshi 29) These differences in privacy settings for various details count for information coming from another social network site, because in Twitter, the whole profile is either public or private, but is still relevant when referring to cross‐platform profile cloning. Because not all attributes are always available for the adversary, an impersonated profile is not always a hundred percent the same to the original. The recommended follower list system helps the adversary to obtain followers, even when this is set to private, because the recommended friends generated for a faked identity may be same as those of the victim, since the adversary has successfully forged associated attributes of the victim (Jin, Takabi, and Joshi 29). Also, a Twitter faker will often make a huge network of poser accounts and have them ‘interacting’ with each other, to make them appear real (Nation). When carefully done, 29 the fake profile is not (if barely) distinguishable from the real one, because of the above mentioned attribute similarity and friend network similarity (Jin, Takabi, and Joshi 30). A study of Weng et al. has shown that reciprocity is widespread in Twitter. They observed that 72 percent of the Twitter users follow more than 80 percent of their followers. If user A follows user B, then user B typically will follow user A as a sign of courtesy. Most legitimate users typically follow not more than one thousand users (Lee, Eoff and Caverlee 3). In the case of spammers, there are some well‐known features used in both email and microblog that can identify them. A popular spamming technique is the use of malformed words for the product to mislead filters. Deliberately misspelled words (Vaigra), the addition of symbols in words (Vi@gra), reducing the ‘spam‐iness’ of a message by adding ‘good’ words along with spam words and sending a larger than normal number of duplicate or remarkably similar tweets are other features often indicating spam (Shekar, Liszka and Chan 194). Spammers usually have Twitter handles that do not look like real names and their profiles contain no photo or a stock photo. They might be new to Twitter but are already following thousands of people, they usually never interact with anyone and just send tweet after tweet, often containing links. The tool Tweepi detects a user’s followers who never uploaded a profile image or users with no bio descriptions. In Twitter, every user can follow two thousand persons in total. Once they reached this number, there are limits to the number of additional users that can be followed. This limit is different for every user and is based on their ratio of followers to following. Twitter users can be categorized into three types: ‘listeners’, ‘talkers’, and ‘hubs’. Listeners have a low ratio of followers to those who follow, talkers have a high ratio of followers to following users and hubs have a follower to following ratio of approximately 1. Celebrities are usually talkers. Users following more than 200 accounts will have to wait for more people to follow them in order to follow more people themselves. The ‘ideal’ or ‘common’ ratio depends on how many followers a user has, but it must be in a range near 1.0, for example between 0.5 and 1.5 (Schaffer). Because it can be hard to see at first sight whether a profile is real or fake, many peoples tend to distinguish credibility by the amount of followers someone has. However, the amount of followers can be easily manipulated. A method for this is following as many people as possible (within Twitter’s limits) and un‐following any user who does not follow back. This keeps the follower‐to‐following ratio in check. To spot a possible ‘manipulator’, it can be useful to look at the historical growth of an account. Iain McDonald wrote a guide for discovering fake profiles through looking at their growth profiles. To view someone’s Twitter growth profile, enter his or her username in Twitter Counter, a Twitter statistics tool. Then watch for tell‐tale signs in the graph. A ‘normal’ Twitter user has the marked pattern of a steady growth, while a pattern showing various 30 unexplainable spikes (such as no evidence of retweets), is likely belonging to a manipulative person, as indicated in figure 4. This person is most likely to have commercial intentions with the fake profile. In the latter graph, it becomes clear where the true line of growth has been manipulated through mass following other accounts. This resulted in a significant jump in the accounts ‘following back’ (McDonald). Figure 4: Growth profile containing a regular (top) and a manipulated (bottom) growth pattern (McDoland) Also network and graph analysis can help in detecting spammers. The analysis of these always changing entity relations can point out specific social threats, such as spam and other forms of digital crime. Observing the tendency of changes in the social relation graphs, such threats can be early detected (Brendel and Krawczyk 267). Boykin and Roychowdhury showed that e‐mail spammers reliably have many edges but few wedges in their social graphs (62). Wedges arise from shared 31 communities and geography, a common feature which spammers lack. They utilize the notion of clustering coefficient for finding spammers in a social network. According to the authors, e‐mail spammers can be identified by the fact that there are no triangles in their social graphs (Boykin and Roychowdhury 63). 6. E‐DISCOVERY Electronic discovery refers to the process by which electronic data is retrieved, secured and searched with the intent to use it as evidence in any civil or criminal case (Socha 4). A much‐used source for e‐ discovery is e‐mail. However, e‐mail is “yesterday’s means of communication”. Today it is all about social media like Twitter (Frankel). Douglas E. Winter, head of the E‐discovery unit at the business and litigation firm Bryan Cave, explains why Twitter fits this definition: “Twitter posts are like any other electronically stored information. They are discoverable and can therefore be subject to e‐
discovery” (Stephenson 729). Storage is essential. Although social networking sites are not like e‐mail or word processing documents when it comes to preservation, the business maintaining social media pages has the duty to preserve data relevant in anticipated or actual litigation. Parties who fail to preserve electronically stored information are subject to penal. “Judges are increasingly likely to order litigants to provide access to their social media accounts and to preserve their posts” (Frankel). E‐discovery of such posts, especially in the case of corporate use, is a certainty according to the Internet security company Symantec. Additionally, the technology consulting firm Gartner stated that by 2013, half of all companies will have faced e‐discovery demands for material from social media sites (2009). Debra Logan, analyst at Gartner, stated that “in e‐discovery, there is no difference between social media and electronic or even paper artifacts. If it exists, it is discoverable” (Frankel). Because electronic stored information is such a rich source of material in litigation proceedings, e‐
discovery has experienced fast growth during the last few years as a result (Kannan). Searchability however brings some challenges. Search results in Twitter only go back for a couple of days. Older tweets are saved, but are very hard to find again, especially since Google has stepped down its real‐time search service (Veltman 26). In order to tackle this problem, and due to the fact that social networking sites are owned and controlled by third parties, there are online services and technologies that can capture dynamic web pages for preservation and store and archive information in the cloud, sometimes specifically for the aim of e‐discovery. Examples include Google Postini, Pagefreezer, Iterasi and Nextpoint. A necessary feature for any workable amalgamation of practical e‐discovery is the auto‐discovery of data sources. “An e‐discovery suite must have the capability to 32 auto‐discover informational sources anywhere on the network, since critical data may reside in the enterprise file or storage server” (Kannan). 6.1 E‐discovery on social media platforms Once the access to online social media information has been secured, either through court order or simply due to public accessibility, which is the case in most Twitter profiles, evidence must be gathered in a way that is legal and useful. Collecting evidence from social networking sites can be a challenge for various reasons. Social media is continually changing, and users can easily update and delete material that could be evidence in a case. However, once a user is aware of an ongoing investigation, he or she is under an obligation to preserve social media evidence just as if it were any other type of evidence. So far there is relatively yet a small number of standardized, widely accepted methods for e‐discovery on social networking sites. Attorney Benjamin Wright, expert in e‐discovery and author and instructor at an internet security company, explains: “a common approach for gathering social media for someone is to just try to print what they see on their screen onto a piece of paper and show it to the judge or administrator.” But such printouts do not always enclose all of the information and the interactivity taking place on the social networking sites. A better alternative to the printout is a screencast (How to Gather 5). A screencast is a digital recording of a computer screen output. Differing from a screen printout or screenshot, the screencast is more of a movie that shows the changes on the screen over time, usually including webcast narration. It captures the look, images, words interactivity and inter‐relationships from one page to the next. There are multiple tools for making a screencast, for example Camtasia and Screencast‐O‐Matic (How to Gather 5). Another challenge is e‐discovery on social networking site lies in proving that the gathered information is authentic, although the profile might be impersonated. 6.2 Juridical: the legal sides of e‐discovery In the context of law, e‐discovery is the pre‐trial phase of a lawsuit in which each involved party can, through the law of civil procedure, request documents and other evidence from third parties (Socha and Esq 6). However, e‐discovery is not completely a federal courts phenomenon. More than half of the American states have adopted e‐discovery procedures for state court cases. California for example has adopted its Electronic Discovery Act, in where rules for e‐ discovery are specified. Social media evidence can be a valuable addition to an investigation, revealing the type of information that, years ago, would have been hard or even impossible to find. However, a crucial point is that it has to be gathered in a way that will hold up in court. Because social media evidence is such a relatively 33 new source of evidence in investigations, case law is developing quickly (How to Gather 1). In the process of e‐discovery on social network sites, there are a few (legal) concerns: whether the information is considered private, whether it is discoverable and whether it is admissible as evidence. These concerns are especially an issue when proceeding against a Twitter user profile that is set to private. Many boards of ethics do not allow lawyers to send a follower request to gain access to private profiles of information, which will also violate the terms of service set out by the social networking site and has prompted some American states to address the practice in writing. In the California Penal Code 528.5 for instance this is: “Any person who knowingly and without consent credibly impersonates another actual person through or on an Internet Web site or by other electronic means for purposes of harming, intimidating, threatening, or defrauding another person is guilty of a public offense”. The Connecticut Rules of Evidence Section 52‐184a mentions: “No evidence obtained illegally by the use of any electronic device is admissible in any court of this state” (How to Gather 4). Instead, the court can send discovery requests to the specific platform. In certain cases, the court has ordered that passwords be disclosed, but this is exceptional. Generally, courts allow discovery of personal information posted on a social networking sites if it is relevant to the litigation. User postings on Twitter are legally binding and subject to the legal rules of e‐discovery, meaning tweets could be summoned in a lawsuit (Stephenson 729). The concern of whether the e‐discovery results are admissible as evidence is also a highly complicated one. Since information on social networking site profiles is not directly verifiable, printouts or screenshots are not generally acknowledged as evidence on their own. The American Federal Rules of Evidence dictate that “material taken from social media accounts generally requires additional corroboration to link the printouts to the account holder in order to consider the information as evidence” (How to Gather 5). A screencast is more likely to be acknowledged, especially if it comes with a script including a statement that acts as the plaintiff’s signature, authenticating what there is to see and the statements made (How to Gather 6). 7. EXISTING DISCOVERING METHODS In regard to defending against imposter fraud and identity clone attacks, most solutions focus on educating users to control the distribution of their sensitive private information and digital identities. However, arranging privacy settings on social network sites is usually complicated and time consuming and a task that many users feel confused about and often skip (Jin, Takabi, and Joshi 28). A study of Gross and Acquisti showed reported that 99 percent of the Twitter users that they checked retained the default privacy settings. Additionally, having the optimal privacy settings does 34 not even guarantee a user’s protection against adversaries. Jin, Takabi, and Joshi state that detecting fake identities is challenging work (28). One of the key challenges is that is quite common that several people have similar names in real world; and therefore their identities on social networking sites may be similar. For this reason, it is impossible to arbitrarily consider all the similar identities that have similar names as faked identities. Similar approaches to those for detecting imposter fraud are spam‐detecting methods. Stringhini et al. for example analyzed to what extent spam has entered social networking sites, and how spammers who target these online networks operate. The authors developed techniques to detect spammers, and show that it is possible to automatically identify the accounts they use. Also the method of Aaron Zinman and Judith Donath (2007) is mainly developed for discovering spammers and based on the structure of MySpace. They tried to detect fake profiles on social network sites using learning algorithms. As with traditional spam analysis, their work focuses on detecting deception, meaning finding profiles that mimic ordinary human users but which are actually commercial and usually unwelcome entities. The authors address this within the larger context of making the key cues presented by any unknown contact more clear. They developed a research prototype that classifies senders into broader categories than spam/not spam, using features exclusive to online social networks. They attempted automatic characterization of MySpace profiles using higher‐level social categories, describing someone’s profile’s valence in two independent dimensions: sociability (the presence of information of social nature) and promotion (the amount of information meant to influence others, whether of political beliefs or of commercial nature). Typical spam would rate high in promotion but low in sociability. As for ‘social activity’ in MySpace, the authors give examples as containing a large number of personal comments and graphical customization. In Twitter however, these features are not really present, since one cannot customize the layout for instance. The work of Zinman and Donath work does not concretely address the problem of impersonation because the spammers do not pretend to be another existing person, but helps people open to meeting new persons on social networking sites in recognizing fake, however seemingly real, undesirable entities (Zinman and Donath 1). Many existing techniques against spam use various machine learning and Bayesian spam filtering algorithms. The Bayesian spam filtering technique makes use of a naive Bayes classifier (working by correlating the use of tokens with a previously‐classified set of good and spam emails) to identify spam (Shrivastava, Majumder, and Rastogi 489). Recently there has been some research done on spam detection that leverages the structure of the social networking graph, as in the research of Boykin and Rowchowdhury for example. There are also some online tools available for anyone to detect spam. Both TwitBlock and Twerp Scan are applications that scan a user’s profile and provide ‘spam ratings’ to each follower, 35 based on how likely they are to be spammers. Both tools provide options for the user to report or flag the suspicious detected profiles, or indicate them as ‘not spam’. Compared to spam detection techniques, there are far less methods known on detecting online identity theft. Some of these tools and techniques are discussed below. 7.1 Algorithms Jin, Takabi, and Joshi proposed a detection framework that is focused on discovering suspicious identities and then validating them. It is based on a profile similarity algorithm to measure and detect likable cloned identities in social networking sites through the use of social links and attributes. It includes three steps. The first step is searching and filtering identities in a profile set where the input is a profile. The second is discovering a list of suspicious identities related to the input profile using profile similarity schemes, and the last one is verifying the identities in a suspicious identity list and removing the false ones (Jin, Takabi, and Joshi 28). For detecting suspicious profiles, they use two methods based on attribute similarity and similarity of friend networks. The first approach addresses a similar situation where mutual friends in friend networks are considered, the second one captures the scenario where similar friend identities are involved (see chapter ‘characteristics of fake profiles’). Although the detection method uses only a small number of parameters (friend list, recommended friend list and public attributes), the algorithm successfully detected numerous cloned identities (Bhumiratana 682). However, Jin et al. stated that the detection algorithm is yet primitive and can be avoided by identity thieves who are aware of the detection mechanism (37). They propose a more secure (and also much more time consuming) approach in where it is needed to calculate profile similarities of each profile with the victim and then select the profiles whose similarities with the victim is above the threshold, and suggest this approach may be extended by an algorithm proposed by Bayardo et al. Their algorithm based on novel indexing and optimization strategies for all‐pairs similarity search solves the problem of Jin et al. without relying on approximation methods or extensive parameter tuning. The algorithm to detect cross‐platform identity clone attack presented by Bilge et al. is also based on attribute similarity, but leaves out the friend network similarity (Jin et al. 38). Brendel and Krawczyk define an algorithm that comes to identifying spammers as people who are not well connected within the social graph. In order to perform this detection, they defined two categories of graphs. The first is a compound of directed social relation graphs that describe the current activity state of all the members of the network or community, representing actual relations between network users after a specified period of time. The second category contains pattern graphs 36 indicating typical and abnormal behavior of the users. This one is dynamic in the sense that the relation graphs are constantly changing over time to reflect the current state of relations established between the users (Brendel and Kawczyk 268). An example of abnormal behavior considering the Internet and e‐mail community could be the activity of one member (one e‐mail user) that started hundreds of relations with other members, spreading large amounts of advertising e‐mails (Brendel and Kawcyk 269). However, in Twitter, this kind of behavior would not necessarily been seen as abnormal. Whereas e‐mail users tend to form local communities, where their members are tightly connected between each other, a typical Twitter social graph usually looks different. Many Twitter users not only follow their friends, colleagues and acquaintances, but they also follow total strangers, companies, brands, politicians and celebrities. Especially the latter ones are often followed by a huge number of people, why they themselves follow only a few accounts. Because of this inconsistency in characteristic social Twitter graphs, it is hard to define typical and abnormal behavior generalizing all kind of Twitter users. One should find specific features and typical behavior for all kind of Twitter users. Shrivastava, Majumder, and Rastogi also make use of social graphs in their method. In their paper they suggest an algorithm for detecting random link attacks using social graphs. Figure 5: Example of a random link attack (Shrivastava, Majumder, and Rastogi 487)
They describe a random link attack as an attack involving the creation of a set of fake identities and using them to spam people randomly. A random link attack consists of two sets of nodes, the attackers and the victims. The malicious user creates a set of ‘attack identities’, in the case of Twitter these are in the form of fake Twitter profile pages. The attacker then randomly chooses the set of victim nodes, and uses the attack identities to send messages to them. In case of a successful attack, the size of the victim set is usually very large as compared to the size of the attack set (Shrivastava, Majumder, and Rastogi 486). This kind of attack can be a real‐life scenario for viral marketing for instance. In order to detect these random link attacks, the authors mine the social networking graph 37 extracted from user interactions in the communication network. Unlike in the work of Brendel and Kawcyk, these networks can be any type and are not limited to specific domains like e‐mail spam filtering, since their techniques are oblivious to the content of the interaction among users. In this social networking graph, each user (or its identity in the network) is a node. There is an edge between two nodes if the corresponding users communicate or are connected with each other. As figure 5 shows, the random nodes close to the attackers have a notable different structure than the nodes in the neighborhood of a good user. Pretending to be regular users, the adversaries may form a dense web of connections with each other, this way increasing the amount of triangles. This way, the adversary makes sure that his or her neighborhood is structurally similar to that of a regular user (Shrivastava, Majumder, and Rastogi 487). In his paper, Bhumiratana develops a prototype system to exploit the weak trust model of online social networks and keep authenticity of the fake online identity established by an identity cloning attack to harvest more private information. The technique is developed to take advantage of the cloned fake profiles and carry authentic conversation between the exploited users. The experiment turned out to be successful. The author not only shows how relatively easy it is to commit imposter fraud on online social networks, he also suggests a detection model, and says a behavioral‐based anomaly detection method can be employed to try to detect action replay bootstrapping (Bhumiratana 685). However, it is not really clear what the author defined as ‘abnormal behaviour’. To address the problem of sybil attacks on social networking sites, several researchers have developed algorithms to perform decentralized detection of sybils on social graphs. These systems (including for example SybilGuard, SybilInfer and SybilLimit) detect sybils by identifying tightly connected communities of sybil nodes. However, up till now no large‐scale studies have been performed to characterize the behavior of sybils in online social networks ‘in the wild’. Therefore, the assumptions underlying these algorithms remain untested (Yang et al 1). 7.2 Linkability Orita and Hada (19) have proposed a way to identify the originality of an entity, by collecting related information to find out someone’s identity inductively, where linkability of transactions is the key factor. Linkability is an element of online identity. The originality of a profile is proven by the extent of linkability and its direction. It is assumed that if there are bidirectional links among two or more entities, they recognize each other and are possibly created by the same user (Orita and Hada 20). Their experiment on Twitter showed that identified pairs are clearly filtered from suspicious pairs only based on direction of links. Their experiment researched fake accounts named after ‘Barack’. 38 They collected datasets of suspicious accounts and checked the existence of bidirectional links. “As the result shown in [Figure 4], both the account (1) and (5) have bidirectional links between Barack Obama's official website. These accounts seem to have tight relationship with the original site. Compared to that, the account (2), which also has a link to the official website, has only uni‐
directional link. Though 'ObamaNews' account seems to pretend to have tight relationship with the official website, there is no return link from the official website. The account (2) might be a suspicious one” (Orita and Hada 19). Figure 6: “Obama accounts” (Orita and Hada 19) The tool Valebrity (a tool to validate ‘social celebrities’) is an example of a tool that also works with this concept of linkability with bi‐direction. It tries to discover fake Twitter accounts by detecting similar accounts on major social networking sites and blogs (Orita and Hada 19). 39 7.3 Language analysis Social network sites are characterized by the lack of routine feedback such as body language, tone of voice or facial expressions. This unavoidably puts greater emphasis on language use, which can be used to support policing (Hughes et al. 125). Hughes et al. propose a criminal‐identifying method by language profiling, based on the idea that e‐discovering problems can be embarked upon more efficiently through automated natural language analysis of traffic originating from digital communities such as social network sites (126). This method makes it possible to detect child predators pretending to be children and cyber‐stalkers using fake personas for example. Their research focuses on peer‐to‐peer file sharing networks. By using a frequency profiling technique, they could detect popular words within the search query corpus, leaving known high frequency words such as the, and, of, and in out. From the resulting list of terms, emergent unclassified terminology may be identified, and through association classified, allowing for the discovery of criminal terminology (Hughes et al. 126). Despite that this research method is focused on media and files rather than on persons, it can also be applied to profiles, through comparative frequency analysis. “This technique compares the frequency profile generated from search terms to a frequency profile built from a reference corpus of general English. For each word in the profile, a log‐
likelihood statistical test is performed which estimates the significance of the difference in its frequency between the search term corpus and the reference corpus” (Hughes et al. 130). This way, the most overused words could be detected by sorting the resulting comparative profile on the log‐
likelihood value. If these top terms fit the domain‐specific language corpus of pedophiles impersonating as children (to gain access to a child), it is most likely coming from those involved with pedophilia. This way, the detection of this language on social networking sites may be used by law enforcement bodies to improve their limited manual policing resources. The complicated part is not the method itself, but defining the corpus, which requires the establishment and extension of corpora of child and adult language in chat rooms and social network sites. Comparing samples of observed chat language with these corpora using the natural language analysis techniques may offer some hints to adults impersonating as children according to their use of vocabulary (Hughes et al. 131). 7.4 Ranking social entities You, Hwang, Nie and Wen introduce a solution to the problem of matching people’s names to their corresponding social network identities, such as their Twitter account, in order to discover impersonation. The authors claim that existing tools in the entity ranking field “building upon naive 40 textual matching and inevitably suffering from low precision due to false positives (fake impersonator accounts) and false negatives (accounts using nicknames)” know a lot of shortcomings. To illustrate, the authors found out that among 42 percent of accounts using full names, which naive textual matching identifies as correct matches, only 17 percent are authentic and the rest are fake or confused Twitter identities. Another issue for naive textual matching is the usage of nicknames, which naive name matching cannot reach (You et al. 1). To tackle these problems, they leveraged ‘relational’ evidences extracted from the Web corpus, as for instance Web document co‐occurrences, which can be seen as an implicit counterpart of Twitter follower relationships. For a ‘Bill Gates’ query for example, it will retrieve and visualize the related names, such as ‘Steve Jobs’, often co‐occurring with ‘Bill Gates’ in online documents. They learned a ranking function called SocialSearch aggregating these features for the precise ordering of candidate matches. It aims at matching a node from an entity‐relationship graph (obtained from the search engine Microsoft EntityCube) to a Twitter graph. From the two graphs, they extract both textual and relational features and train two classifiers for ranking candidates by relevance for the given query and classifying whether there is a matching Twitter profile among the candidates. Given the example of the query name ‘Bill Gates’, a candidate account ‘BillGates’ is more expected to be a match for relation relevance if its related nodes (based on Twitter follower relationships) match the names of the adjacent nodes in the entity‐relationship graph (the frequent co‐occurrence in web documents). The difference between co‐occurrences and follower relationships is that the first mentioned is bidirectional, while the latter is unidirectional. Illustrated by figure 5, using the query ‘Barack Obama’, the entity ‘Bill Clinton’ with a bidirectional follower relationship may have different strength compared to the entity ‘John McCain’, having a unidirectional follower relationship. Figure 7: Features on Relation (You et al. 3) 41 Thus, You et al. separated them into two different features. For each type, they counted the number of common neighbors as features neighbors in the EntityCube graph and those in the Twitter graph. The authors used A‐type textual matches (all containment, meaning the account name contains all the words of the query name) for matching, and S‐type textual matches (some containment, meaning the account name contains some words of the query name) for counting common neighbors. They use the amount of common neighbors as a feature to distinguish matches from non‐
matches. Additionally to the relevance features, You et al. also take the popularity of the corresponding Twitter accounts into consideration, calculated by how many follower in‐links there are. As EntityCube entities correspond to people with prominent Web presence, their popularity metric will indicate how probable it is that the Twitter account has prominent Web presence (You et al. 5). The authors tested their learning model and it seemed to be working accurately. However, the work of You et al. did not address the issue of name disambiguation. 7.5 Socialbots In the specific case of detecting automated socialbots in social networking sites, many techniques have been proposed that aim to automatically identify them based on their abnormal behavior (Stringhini, Kruegel, and Vigna). For Facebook, Stein et al. have presented the Facebook Immune System (FIS), an adversarial learning system performing real‐time checks and classification on every read and write action on Facebook's database, in order to protect its users from socialbots (Boshmaf et al. 1). However, it is not well‐understood how such defenses stand against socialbots that mimic real users. Boshmaf et al. tried to fill in this knowledge gap by treating large scale infiltration on social networking sites as an organized campaign that is “run by an army of socialbots to connect to either random or targeted online social networking site users on a large scale” (2). They adopted a traditional web‐based botnet design, one that can exploit the identified properties of social networking sites and use them as heuristics to define commands, which increase the extent of the possible infiltration in the targeted website. The results of their testing showed that a successful infiltration of a social network site user is expected to be observed within the first three days after the request has been sent by a socialbot. They measured that the FIS was able to block only 20 percent of the socialbot accounts. This was however only the result of user feedback that flagged these accounts as being spam. They did not observe any evidence of detection of organized large‐
scale infiltration by FIS (Boshmaf et al. 2). Kontaxis et al. propose a tool that automatically seeks and identifies cloned profiles in social networking sites. It consists of three main components: the information distiller (responsible for 42 extracting information from the authentic social network account), the Profile Hunter (processing user records and using the user‐identifying terms to locate social network profiles that may possibly belong to the user) and the Profile Verifier (processing profile records and extracting the information available in the harvested social profiles) (Kontaxis et al. 2). The key concept of the tool is that it employs user‐specific (or user‐identifying) data, collected from the original online social network profile of the user to locate similar profiles across social networking sites. Any returned results, conditional on how rare the common profile data is considered to be, are considered suspicious. For these profiles, further inspection is performed. In conclusion, the user receives a list of possible profile clones and a score indicating their degree of similarity with the user’s own profile (Kontaxis et al 3). The authors tested their tool on LinkedIn. In their controlling experiment, cloning ten existing profiles on the online social network that belonged to members of their lab, they were able to detect all profile clones without any false positives or negatives. Finally, the authors tried to detect existing profile duplicates. The Profile Hunter component returned at least one clone for 7.5 percent of the user profiles. After manual inspection, Kontaxis et al. verified that all detected profiles pointed to the actual person and that the score produced by the Profile Verifier was correct. However, the authors cannot be sure if those clones are the product of a malicious act or can be credited to misconfiguration. Furthermore, their prototype tool may have missed cloned profiles where the adversary intentionally injected mistakes with the purpose to escape detection. Another limitation is that the prototype system relied on the precise matching of fields and did not employ the image comparison technique for detecting cloned accounts. The authors want to conduct a study to estimate the error threshold for the image comparison (Kontaxis et al. 6). 8. RESULTS Not all the methods, tools, and techniques mentioned in the former chapter are developed for detecting imposter fraud. Many of them focus on detecting spam. For instance the tool presented by Brendel and Krawczyk. Their algorithm tries to detect imminent threats to a certain network or community. The critical part of their approach refers to the process of building relations pattern graphs that describe what is typical and abnormal behavior of the certain network. In their method tests on the e‐mail community, identification roles failed by missing out on 20 percent of the spammers, classifying them as regular users. The main reason of this was that spammers were able to reach local communities of e‐mail users, meaning that although their method is very efficient in recognizing typical activity of users, additional pattern graphs should be made for abnormal behavior. In the case of Twitter, it would be hard to distinguish ‘typical imposter behavior’ and ‘regular user behavior’ and additionally find the right pattern graph, especially since the behavior of 43 well‐performed identity theft on Twitter is the same to that of regular user behavior. Moreover, the pattern graphs can also change over time which makes behavior classification even more difficult. Many of the described methods, tools, and techniques have demonstrated to be feasible and effective through their experiments. However, this was often the case for another platform, such as the detection schemes proposed by Jin et al. Their experiments were done on Facebook, while Twitter has a different network structure, which is based on followers and following profiles, instead of the two‐way working friend system. The same counts for the automatically detecting cloned profiles tool of Kontaxis et al., which is currently only developed for usage in LinkedIn. Another axis upon which their prototype can be enhanced lies in the accuracy of comparing two profiles and assigning a similarity score. Their current application of the Profile Verifier seeks for precise string matches in information fields when comparing two profiles. Instead of looking for exact matches, the authors suggest using fuzzy string matching to overcome mistakenly typed information, or intentionally injected mistakes. Then there might me other reasons why the presented tool or method might not me suitable for detecting imposter fraud in Twitter. The evaluation results of the SocialSearch ranking function presented by You et al. empirically validated the accuracy of their algorithm over the real‐life datasets they tested. However, the question of how relational features can be used for the name disambiguation purpose is left out, which is still a key issue in detecting impersonated accounts. Also the study of Shrivastava, Majumder, and Rastogi does not fulfill this purpose completely. As the name of a random link attacks already suggest, the choices for the victim sets are random, meaning each node in the graph has equal probability of being chosen and spammed, as opposed to often targeted ‘classic’ identity theft attacks. In addition, the authors focused on static social networking graphs, but in practice they are constantly changing as both nodes and edges are added as the networks evolve. Another given making random link attacks hard to detect is their collaborative nature, when attackers are masquerading to be real users through connecting to each other. 9. A TOOL PROPOSED Because of the lack of a general fulfilling tool or method that can automatically detect and identity fake profiles, I thought of a different option. Based on the characteristics of fake profiles (see chapter 5) as described in the literature, I have made a scheme that includes examples of online tools that can be used for e‐discovery purposes, to check on false profile characteristics one by one: 44 Characteristic Real profile Fake Profile E‐discovering method/tool Follower‐to‐following ratio Around 1.0, usually between Extremely high or extremely TFF Ratio ‐ www.tffratio.com 0.5 and 1.5. Celebrities are low exceptions with very high ratios Growth of followers Steady, consistent Unexplainable spikes Twittercounter ‐ www.twittercounter.com Spam rating (various Low spam rating High spam rating Twitblock ‐ www.twitblock.org Customized profile picture, Default egg avatar, no bio Tweepi – www.teepi.com filled in bio description description Connected to other fake Twiangulate – profiles www.twiangulate.com Tagxedo – www.taxxedo.com characteristic associated with junk accounts) Profile picture and biography Interconnectivity Use of words No ‘chatspeak’ or other broken form of language In detecting fake profiles, it is important to use multiple tools next to each other. The reason for this is that one characteristic does not say it all. For example, a high number of followers or an average follower‐to‐following ratio does not automatically mean the account is trustable, because this ratio is manipulatable. People can quickly gain followers by following an unselective number of people, waiting for them to follow back, and un‐following them in groups. Or they can participate in a number of ‘services’ that will accumulate followers on their behalf (Dugan). Therefore, it is important to look also at the growth history of the followers. If this number goes up pretty steadily, chances are they are legit. But if there are one or more irregular jumps in the graph, such as a thousand new followers in a single day, it will probably be a scammer. Then there is eventually the ‘manually’ checking step. In this example, the account can have a large number of followers, with large spikes in the follower growth graph, but still is a legit, existing user. This could be the situation when the user profile belongs to a celebrity who has been big in the new at a certain point in time, and for this reason gained a huge number of new followers at once. This scheme is most suitable for detecting fake profiles. Detecting imposter frauds and impersonators is a lot harder, for example on the point of the profile picture. In case of profile porting and profile cloning, the attacker usually copies the victim’s profile picture from the same or another social networking site. To see if there is another profile using the same image, one could use a reverse image search tool. TinEye (www.tineye.com) is an example of such a service. You can submit an image into the search engine to find out where it came from and whether there are any other locations the image, or a modification if it, is existing. However, the tool is still lacking on the 45 field of social media sites, especially when pictures appear in a private, friends‐only, or password protected area. Detecting a copied profile biography is a lot easier and can be performed through Google for instance, but then, it also only works when the profiles are publically accessible. 9.1 Case study The American author and YouTube‐personality Kabel Nation is recognized as the writer behind TwilightGuy, a blog dedicated to the Twilight novel series. On his official website, he has posted a blog post in 2009 expressing his irritation of fake Twitter accounts created for movie stars. With it he published a list with official and fake Twitter accounts of the actors starring in the Twilight movie. Many of the fake accounts have been suspended by now. One of the profiles still existing, on the moment of writing, is @OfficialTL, pretending to be the actor Taylor Lautner. Figure 8: Twitter header of the Twitter profile @OfficialTL (https://twitter.com/OfficialTL) The profile is fake; there is no link to Taylor Lautner’s official website [www.taylor‐lautner.com] and the account only links to press photos and never includes any personal ones, two of the most common features of fake celebrity profiles. Fake celebrity accounts often communicate online with other false celebrity profiles (Nation). @OfficialTL does this amongst others with KStews20, a similar kind of fake profile created in the name of another Twilight star, Kristen Stewart, and @MileyRcryus, pretending to be the actress and singer Miley Ray Cyrus, whose official account is @mileycyrus. Figure 9: Screen capture of @OfficialTL’s tweets 46 Following the theory that fake profiles are often interlinked with and to each other, it is interesting to look at the network structure of these profiles. A tool that visualized this structure is Twiangulate [www.twiangulate.com]. It can graph Twitter’s ‘hidden networks’, mapping the inner circle or mutual followers/friends of two different accounts. The mutual friend graph for the OfficalTL and MileyRCyrus accounts shows a network of accounts that they follow in common, with behind the accounts names the percentage of mapped Twitter users that is linked to. Figure 10: Mutual friends graph of OfficalTL and MileyRCyrus A closer look at some of the highest scoring names in the graph indicates a lot of common friends (figure 11) are actually verified accounts, such as [joejonas] [taylorswift13] and [Real_Liam_Payne]. Doing the same research but then adding [KStews20] results in a much smaller graph. Only one account extra pops up: [VanessuHudgens], a fake account pretending to be actress Vanessa Hudgens. Figure 11: Fake (top) and real (bottom) Vanessa Hudgens account Figure 12: Mutual friends of OfficalTL, MileyRCyrus and KStews20 47 Another possible point of distinction between a fake and a real profile is the use of language. In terms of language monitoring, as described in chapter 7.3, there are various techniques one could use varying from computational linguistics and corpus‐based natural language processing to keyword profiling and comparative frequency analysis. Through adapting a likelihood ratio that estimates differences in frequency for a certain reference corpus, fake profiles can be detected based on the language used. Defining this corpus is still a challenging task. Hughes et al. identified domain‐specific terminology for pedophiles faking to be children (128). However, there is no general corpus that covers the language behavior of all kind of imposers, since certain fakesters use certain, specific ‘techniques’. But there are some indications. According to Kaleb Nation’s ‘Fake Twitter checklist’, an account is probably fake if “the Twitter talks in Chatspeek or in some other broken form of English” (Nation). Based on this assumption, I performed a comparative word frequency analysis of a fake and the real account for both Miley Cyrus and Lady Gaga (one of the most followed and most faked celebrity‐Tweeters). A method to perform this is by generating word clouds by using online tools such as Tagxedo, which can automatically crawl and scrape any Twitter account. The biggest advantage of using such a tool is that you do not have to read all of tweets ever posted on the profile manually. The tables on the next page show the twenty‐five most frequently‐used words on each Twitter profile (common words such as ‘and’, ‘or’ and ‘the’ excluded), derived from the generated word clouds. In this case, the real profiles are the reference corpuses. Figure 13: Screen capture of @Ladygaga26 tweets For Miley Cyrus, there are a few words that both appear frequently for both accounts: ‘love’, ‘happy’, 48 ‘day’, ‘thank’, ‘people’, ‘amazing,’ and ‘awesome’. For Miley Cyrus, there are quite a few words that appear frequently for both accounts: ‘love’, ‘happy’, ‘day’, ‘thank’, ‘people’, ‘amazing,’ and ‘awesome’. In case of Lady Gaga, these common words are: ‘love’, think’, ‘Gaga,’ and ‘know’. [mileycyrus] REAL [mileyRcryus] FAKE [LadyGaga] REAL [LadyGaga26] Fake love ‐ 227 happy ‐ 336 tarasavelo ‐ 336 followers ‐ 336 amp ‐ 189 love ‐ 318 ladygaga – 318 love – 318 day ‐ 155 thank ‐ 297 love ‐ 297 hey ‐ 297 TeamMileyNY ‐ 129 LiamHemsworths ‐ 241 monsters – 277 come – 277 JT ‐ 110 think ‐ 277 look ‐ 266 want – 259 Novak ‐ 94 day ‐ 259 feel – 259 th ‐ 241 Na ‐ 83 new ‐ 226 New – 241 Cutieface – 199 happy ‐ 73 amp ‐ 212 Way – 212 Gaga ‐ 187 CheyneThomas ‐ 66 time ‐ 199 Tea – 199 update – 177 thank ‐ 59 look ‐ 187 having ‐ 199 going – 167 RealDenikaB ‐ 54 done ‐ 177 night – 187 concert ‐ 158 god ‐ 49 feel – 167 amazing – 177 birthday – 150 StylishCyrus ‐ 45 people ‐ 158 time ‐ 167 ur ‐ 143 today ‐ 42 know ‐ 150 ready – 158 nice – 136 think ‐ 39 make ‐ 143 Born ‐ 150 tell ‐ 130 make ‐ 36 officialtl ‐ 136 think – 143 shopping ‐ 124 awesome ‐ 34 amazing ‐ 130 fans ‐ 136 tomorrow – 119 amazing ‐ 32 demetialovato ‐ 124 excited ‐ 130 know ‐ 114 music ‐ 30 Ziggy ‐ 119 tonight ‐ 124 people – 109 fight ‐ 29 close ‐ 114 Twitter – 114 roll ‐ 105 judge ‐ 27 need ‐ 109 Gaga ‐ 109 think ‐ 101 people ‐ 26 realdenikab ‐ 105 know ‐ 105 good ‐ 97 FloydLilaZiggy ‐ 24 right ‐ 101 BTWBall ‐ 101 todays ‐ 94 twitter ‐ 21 awesome ‐ 97 unicorn – 97 Russia – 91 reading ‐ 20 VanessuHudgens ‐ 91 Thank ‐ 91 tweets ‐ 85 In both cases, the table shows no big difference in the kind of word use for the real and fake accounts. Also, the lists for the fake accounts do not show a significantly more frequent use of chatspeak. The follower‐to‐following ratio for the accounts @OfficialTL, @mileyRcryus, @VanessuHudgens, @KStews20 and @Ladygaga26 are respectively 1003.35, 339.12, 167.59, 167.91 and 2307.50 (derived from TFFratio on June 15th). These are all very high ratios for regular users, but for celebrities not uncommon. These numbers indicate all these fake accounts have means they all did a 49 good job in looking seemingly real; otherwise they would not have gotten such big amounts of followers. Also by looking at the follow growth history for all the accounts, there is no striking remark that can be made. All accounts have pretty steady and regular growth patterns. Figure 14: Followers growth history for @OffciailTL The same applies to the point of the spam ratio. Only the account [MileyRcryus] gives a deviate spam rating as a result, but this number is too small to be significant. Figure 15: Followers growth history for @OffciailTL 10. CONCLUSION In today’s modern society, social media is ubiquitous. Social media platforms provide possibilities for networking, but also for exploitation. Since the rise of the online social networks, that encourage disclosure of personal data, access to other’s personal information has become easier than ever, even in the case of anonymous profiles, as the layered identity representation scheme in figure 1 shows. One of the downsides is the creation of fake profiles by real people or social bots. The numerous cases and examples of imposter profiles developing on social network sites show the prevalence of the problem and the potential harm it can cause. Twitter’s anti‐impersonation 50 measures do not solve all of the problems the service creates. Their validation method is still only open to celebrities, and even then it is not fully working as intended, since there are cases known of people who have undeservingly achieved the verification seal badge. Only when enough evidence has been collected, or after significant damage has taken place, Twitter will remove accounts. While identity theft has traditionally occurred through offline methods, online data collection of stolen identities can be easier and more efficient for adversaries with the new approaches and tricks being created and implemented under the cloak of electronic anonymity. Online identity theft can be performed through phishing, skimming, hacking or impersonation. Impersonation is typical in cases of identity theft on social networking sites. But impersonation itself also knows a lot of different forms, varying from profile faking, profile cloning (simply copying the whole targeted profile page and all its existing data, including disclosed information and properties, and this way building trust relationships with followers) to profile porting attacks, in which the attacker creates a profile under the victim’s identity on a social media platform where the victim is not present. The latter is more difficult to detect. Imposter fraud can be performed by both humans and bots. There are many different motivations for an adversary to imposter a profile. A relatively innocent motivation can be creating a fake profile of a celebrity to gain attention. In this case, it is most likely that the account will follow many others users, in the hope they will notice the profile and follow it back. Twitter is somewhat different from other social networks in the sense that it is more common to add and follow people they do not know. For this reason, Twitter is an example of a directed graph network, as opposed to for instance Facebook and LinkedIn, examples of undirected graph networks. Users usually do not have bidirectional links with celebrities on Twitter, but one‐directional ties. The three structural elements of the online identity as defined by Pfitzman and Hansen (unlinkability, undetectability and unobservability) make it difficult to determine whether an identity presented on a profile is fake or real. Additionally, screen‐names can be altered and messages can be deleted in seconds. This is where e‐discovery becomes important, especially in finding proof to present in court. Discovering deleted information can be found back in the caches of search engines for example. Also, there are various e‐discovery tools available for storing and preserving archived information on online social networks. A lot of e‐discovering techniques are meant for detecting spammers. There are several third‐party applications of social networking sites proposed and employed for detecting and protecting users against spam and content polluters. Just as Jin, Takabi, and Joshi stated, detecting fake identities is very challenging, especially because of the multiple occurring name issue. Most already existing tools are on the bases of algorithms. The fact that profiles subjected to 51 imposter fraud in general do not really have typical features, except for that most of the time they are very hard to separate from a real account, in the contrary to spam accounts that do have a lot of characteristic features, makes detection through algorithms even harder. Most of the time, a fake profile has the same or a similar name to that of the victim, so a search method in detecting these double names is an option, but there has to be a way in distinguishing same name entities from impersonated entities. Attribute similarity and similarity of friend network‐based research could help out, as demonstrated in the work of Jin, Takabi, and Joshi, calculating profile similarities with that of the victim. Not only do all the various ways of impersonation make it very hard to find one method that can detect all faked or impersonated profiles, there are also Twitter sphere‐based constraints to e‐discovering fake identities. The first is Twitter’s very own identity and impersonation policy, which will delete fake accounts upon their discovery. This requires real‐time monitoring. Then there is the limitation in the level of access. Many detection tools and methods will not work for private profiles (one would have to befriend such accounts to get access to the data, an ethically problematic action in e‐discovery). Additionally, many third‐party applications and online (statistics) tools will only work when having access to the username and access, due to Twitter’s privacy policy. For my case study, I only used tools that are openly accessible and can be applied to all kind of public profiles, where befriending is not required. The case study tested five different fake celebrity profiles on various ‘fakester characteristics’ points. Concluding from the results of this case study, the tools have proven that this method only works one way. Although a profile implementing one or more characteristics mentioned in the table in chapter 9.1 is most likely to be fake, a profile not carrying out these typical behavior aspects is not automatically real. Therefore, these tools are not suitable as full‐fledged discovery methods, but only valid as additional check‐ups for suspicious accounts, with the ‘mutual friend graph‐method’ having proven to be showing the most reliable fake account indications out of all tested tools. This case study only involved the study of false celebrity accounts. The reason for this is that it is very hard to use fake accounts of real, non‐famous people as an object of study, because there is no verification system for these kinds of accounts yet. Therefore, it is not possible to do the research based on a reference profile, since this real account is undeterminable on legitimate grounds. To conclude, the most suitable discovery approach depends on the sort of fake profile the detection is aiming at, on both the side of the adversary (bot, fan, spammer) as well as on the victim’s (famous/non‐famous) side, and on the way the fraud or impersonation is performed (automated, the creation of a whole new fake identity, profile faking, ‐cloning, or –porting). For instance, language profiling will work for the target group that is known to have a specific and definable language 52 corpus, such as pedophiles; the spam ratio tool is a good option for testing accounts for being fake for the purpose of spamming; and an all‐embracing user recognition tool combined with content‐
based and linkability analysis would be best suited to detect general imposters. 53 BIBLIOGRAPHY Ahearn, Tom. “Users of Popular Social Networking Sites Facebook and Twitter Warned About Identity Theft.” My Background Check. 2009. 4 April 2012. <http://www.mybackgroundcheck.com/blog/post/2009/10/21/Users‐of‐Popular‐Social‐
Networking‐Sites‐Facebook‐and‐Twitter‐Warned‐About‐Identity‐Theft.aspx>. Bayardo, Roberto J., Yiming Ma, and Ramakrishnan Srikant. Scaling up all Pairs Similarity Search. Proceedings of the 16th international Conference on World Wide Web. Banff, 2007: 131‐140. Bhumiratana, Bhume. A Model for Automating Persistent Identity Clone in Online Social Networks. Conference Proceedings of the 2011 International Joint Conference of IEEE TrustCom. 2011. Bilge, Leyla et al. All Your Contacts Are Belong to Us: Automated Identity Theft Attacks on Social Networks. Conference Proceedings of the 18th International World Wide Web Conference. 2009. Boshmaf, Yazan et al. The Socialbot Network: when Bots Socialize for Fame and Money. Conference Proceedings of the 27th Annual Computer Security Applications Conference. 2011. Bowker, Natilene and Keith Tuffin. “Disability Discourses for Online Identities.” Disability & Society 17.3 (2002): 327‐344. boyd, danah. “Why Youth (Heart) Social Network Sites: The Role of Networked Publics in Teenage Social Life.” Youth, Identity and Digital Media. Ed. David Buckingham. Cambride: The MIT Press, 2008. 16. boyd, danah and Nicole B. Ellison. “Social Network Sites: Definition, History, and Scholarship.” Journal of Computer‐Mediated Communication, 13.1 (2007). Boykin, Oscar and Vwani Roychowdhury. “Leveraging Social Networks to Fight Spam.” IEEE Computer Society 38.4 (2005): 61‐68. Brendel, Radoslaw and Henryk Krawczyk. “Application of Social Relation Graphs for Early Detection of Transient Spammers.” WSEAS Trans Info Science and Applications 5.3 (2008): 267‐276. 54 Burrell, Wesley. “I Am He as You Are He as You Are Me: Being Able to Be Yourself, Protecting the Integrity of Identity Online.” Digital Commons at Loyola Marymount University and Loyola Law School 44.2 (2011): 707‐750. Coburn, Zack and Greg Marra. “Realboy: Believable Twitter Bots.” Computer Architecture. 2011. 17 April 2012. <http://ca.olin.edu/2008/realboy>. Crawford, Susan. “Who’s in Charge of who I am?: Identity and Law Online.” New York Law School Law Review 49 (2004): 211‐230. Cutillo, Leucio Antonio and Refik Molva. “Safebook: A Privacy‐Preserving Online Social Network Leveraging on Real‐Life Trust.” IEEE Communications Magazine 47.12 (2009): 94‐101. Douceur, John. The Sybil Attack. Proceedings of 1st International Workshop on Peer‐to‐Peer Systems. Cambridge, 2002. Dugan, Lauren. “How to Tell if Someone has a Fake Follower Count.” Mediabistro. 2011. 4 June 2012. <http://www.mediabistro.com/alltwitter/how‐to‐tell‐if‐someone‐has‐a‐fake‐follower‐ count_b14898>. Edwards, Lachy. “Online Identity Part II: The Notion of Identity as Fluid and the Influence of our Offline Selves.” An Online Identity? 2006. Identity in the Online Environment. 18 April 2012. <http://onlineidentity.blogspot.com/2006/09/online‐identity‐part‐ii‐notion‐of.html>. Finch, Emily. “What a Tangled Web we Weave: Identity Theft and the Internet.” Dot.cons: Crime, Deviance and Identity on the Internet. Ed. Yvonne Jewkes. Collompton, England: Willan, 2003. 86‐104. Frankel, Alison. “Twitter, Facebook and the peril of e‐discovery.” News & Insight. 2011. Thomson Reuters. 26 April 2012. <http://newsandinsight.thomsonreuters.com/Legal/News/View News.aspx?id=23834&terms=@ReutersTopicCodes+CONTAINS+'ANV'>. Gross, Ralph and Andrew Acquisti. Information Revelation and Privacy in Online Social Networks. Proceedings of the ACM workshop on Privacy in the Electronic Society. Alexandra: WPES, 2005. 55 “Handreiking Politie Identiteitsfraude.” Programma Versterking Identiteitsketen Publieke Sector. 2010. Ministerie van Veiligheid en Justitie. 2 May 2012. <http://www.overheid.nl/media/downloads/Handreiking_politie_Identiteitsfraude.pdf>. Heels, Erik. “How to Twittersquat The Top 100 Brands.” Erik J. Heels Blog. 2009. 9 May 2012. <http://www.erikjheels.com/1298.html>. Huffaker, David. “Gender Similarities and Differences in Online Identity and Language Use among Teenage Bloggers.” Washington D.C: Georgetown University, 2004. Hughes, Danny, Eds. Sargur N. Srihari and Katrin Franke. Supporting Law Enforcement in Digital Communities through Natural Language Analysis. Proceedings of the 2nd international workshop on Computational Forensics (IWCF 2008). Berlin, Heidelberg, Springer‐Verlag, 2008: 122‐134. Java, Akshay, Xiaodan Song, Tim Finin and Belle Tseng. Why We Twitter: Understanding Microblogging Usage and Communities. Proceedings of the 9th WebKDD and 1st SNA‐KDD Workshop on Web Mining and Social Network Analysis. ACM, 2007: 56‐65. Jin, Lei, Hassan Takabi, and James B. Joshi. Towards Active Detection of Identity Clone Attacks on Online Social Networks. Proceedings of the first ACM conference on Data and application Security and Privacy. CODASPY ’11. New York: ACM, 2011, 27–38. <http://doi.acm.org/10.1145/1943513.1943520>. Kannan, Karthik. “The convergence of eDiscovery and eCompliance.” SC Magazine. 2009. 26 April 2012. < http://www.scmagazine.com/the‐convergence‐of‐ediscovery‐and‐ecompliance/ article/140563/>. Katz, Leslie. “Twitter to roll out 'Verified Accounts' this Summer.” CNET News. 2009. 2 May 2012. <http://news.cnet.com/8301‐1023_3‐10258816‐93.html>. Kay, Bradley. “Extending Tort Liability To Creators Of Fake Profiles On Social Networking Websites.” Chicago‐Kent Journal of Intellectual Property 10.1 (2010): 2‐23. Klooster, Erik. “Identiteitsdiefstal op Twitter: niet alleen bij VIP’s.” Radio Nederland Wereldomroep. 56 2011. 23 January 2012 <http://www.rnw.nl/nederlands/article/identiteitsdief stal‐op‐twitter‐niet‐alleen‐bij‐vip%E2%80%99s>. Kontaxis, Georgios, Iasonas Polakis, Sotiris Ioannidis and Evangelos P. Markatos. Detecting Social Network Profile Cloning. Proceedings of the 3rd IEEE International Workshop on SEcurity and SOCial Networking (SESOC). Seattle: 2011. Lee, Kyumin, Brian David Eoff and James Caverlee. Seven Months with the Devils: A Long‐Term Study of Content Polluters on Twitter. Conference Proceedings of the Fifth International Conference of Weblogs and Social Media. Barcelona: 2011. Malachowski, Dan. “Username Jacking in Social Media: Should Celebrities and Brand Owners Recover from Social Networking Sites when their Social Media Usernames are Stolen?” DePaul Law Review 60.1 (2010): 223‐241. Marshall, Angus and Brian Tompsett. “Identity Theft in an Online World.” Computer Law & Security Report 21 (2005): 128‐137. Marwick, Alice E. and danah boyd. “I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience.” New Media and Society 13 (2011): 96‐113. Matier, Phillip and Andrew Ross. “Air Board Pays $75K for Columnist’s Speech.” San Francisco Chronicle. 2009. 4 April 2012. < http://www.sfgate.com/cgi bin/article.cgi?f=/c/a/2009/05/10/BA8517HE1E.DTL> McDonald, Iain. “How to Spot a Twitter User with a ‘Fake’ Follower Count. Amnesia Blog. 2009. 4 June 2012. <http://amnesiablog.wordpress.com/2009/03/22/how‐to‐spot‐a‐twitter‐user‐ with‐a‐fake‐follower‐count>. Mercuri, Rebecca. “Scoping Identity Theft.” Communications of the ACM 49.5 (2006): 17‐21. Mihalko, Daniel L. “Fighting Identity Theft ‐ The Role of FCRA.” Identity Theft: The Aftermath 3. 2009. 4 April 2012. <http://www.idtheftcenter.org/artman2/uploads/1/Aftermath_2008_20090520.pdf>. 57 Milne, George, Andrew Rohm and Shalini Bahl. “Consumers’ Protection of Online Privacy and Identity.” The Journal of Consumer Affairs 38.2 (2004): 217‐232. Morozov, Evgeny. “Swine Flu: Twitter's Power to Misinform.” Foreign Policy. 2009. 16 April 2012. <http://neteffect.foreignpolicy.com/posts/2009/04/25/swine_flu_twitters_power_to_misinf
orm>. Nation, Kaleb. “How to Spot a Fake Twitter.” Kaleb Nation Official Website and Blog. 2009. 7 June 2012. <www.kalebnation.com/blog/2009/06/20/how‐to‐spot‐a‐fake‐twitter/>. Nelson, Sharon, John Simek and Jason Foltin. “The Legal Implications of Social Networking.” The Journal of the Legal Profession 22.1 (2009). “Nep Twitter‐accounts.” De Wereld Draait Door. VARA. 9 January 2012. <http://dewerelddraaitdoor.vara.nl/media/83973> . Nielsen, Nikolaj. “EU to Set Up Anti‐Cyber‐Crime Centre.” EU‐Observer. 2012. Justice & Home Affairs. 1 April 2012. <www.euobserver.com/9/115735>. Nosko, Amanda, Eileen Wood and Seija Molema. “All About Me: Disclosure in Online Social Networking Profiles: The Case of Facebook.” Computers in Human Behavior 26 (2010): 406–
418 O’Carroll, Lisa and Josh Halliday. “Wendi Deng Twitter Account is a Fake.” The Guardian. 2012. Guardian News and Media Limited. 23 January 2012. <http://www.guardian.co.uk/media/2012/jan/03/wendi‐deng‐twitter‐account‐fake>. “Online Identity Theft.” OECD report. 2009. OECD Publishing. 3 May 2012. <http://www.oecd.org/document/44/0,3746,en_2649_34267_42420716_1_1_1_1,00.html>. Orita, Akiko and Hisakazu Hada. “Is That Really you? An Approach to Assure Identity Without Revealing Real‐name Online.” Digital Identity Management (2009): 17‐20. 23 January 2012. <http://www.cs.jhu.edu/~sdoshi/jhuisi650/papers/spimacs/SPIMACS_CD/dim/p17.pdf>. Pfitzmann, Andreas and Marit Hansen. “Anonymity, Unlinkability, Undetectability, Unobservability, 58 Pseudonymity, and Identity Management, a Consolidated Proposal for Terminology.” 2008. 2 May 2012. <http://dud.inf.tudresden.de/literatur/Anon_Terminology_v0.31.pdf>. Pontell, H.N. “Policy essay: Identity theft: Bounded Rationality, Research and Policy.” Criminology & Public Policy, 8.2 (2009): 263‐279. Poster, Mark. “CyberDemocracy: Internet and the Public Sphere.” University of California, School for Humanities. 1995. 18 April 2012. < http://www.hnet.uci.edu/mposter/writings/democ.html> Riley, Kathryn. “Misappropriation of Name or Likeness Versus Invasion of Right of Publicity.” Contemporary Legal Issues 587‐599 (2001). Rustad, Michael L. and Thomas H. Koenig. “Cybertorts and Legal Lag: An Empirical Analysis.” Southern California Interdisciplinary Law Journal 13 (2003). Saco, Diana. Cybering Democracy: Public Space and the Internet. Minneapolis: University of Minnesota Press, 2002. Samson, Martin. “Jane Doe v. Friendfinder Network Inc.” Internet Library of Law and Court Decisions. 2008. 8 April 2012 < http://www.internetlibrary.com/cases/lib_case600.cfm>. Schaffer, Neal. “Twitter Followers vs Following: What is the Ideal Ratio?” Windmill Networking. 2009. 4 June 2012. <http://windmillnetworking.com/2009/08/12/twitter‐followers‐following‐ quality‐or‐quantity/>. Schoemaker, René. “Kamer wil nep Twitteraccounts Strafbaar Maken.” Webwereld. 2012. IDG Nederland. 23 January 2012. <http://webwereld.nl/nieuws/109199/kamer‐wil‐nep‐
twitteraccounts‐strafbaar‐maken‐‐‐update.html>. Shekar, Chandra, Kathy J. Liszka and Chien‐Chung Chan. Twitter on Drugs: Pharmaceutical Spam in Tweets. Conference Proceedings of the 2011 International Conference on Security and Management. Las Vegas, 2011, 193‐198. ShockWaveWriter. “If it’s Too Good to be True, it Usually is!” Computer Fraud & Security 10 (2000): 18‐19. 59 Shrivastava, Nisheeth, Anirban Majumder and Rajeev Rastogi. “Mining (Social) Network Graphs to Detect Random Link Attacks.” ICDEE Proceedings of the 24th International Conference on Data Engineering (2008): 486‐495. Stein, Tao, Erdong Chen and Karan Mangla. Facebook Immune System. Proceedings of the 4th Workshop on Social Network Systems, SNS '11. New York: ACM, 2011. Stephenson, Correy E. “E‐Discovery Implications of Twitter.” Lawyers USA 8.101 (2008): 729. 26 April 2012. <www.lawyersusaonline.com/index.cfm/archive/view/id/432466>. Sterritt, Shannon N. “Applying the Common‐Law Cause of Action Negligent Enablement of Imposter Fraud to Social Networking Sites.” National Security Policy and the Role of Lawyering: Guantanamo and Beyond 4.41 (2012). Stone, Brad. “Viruses That Leave Victims Red in the Facebook.” The New York Times. 2009. 4 April 2012. < http://www.nytimes.com/2009/12/14/technology/internet/14virus.html> Stringhini, Gíanluca, Christopher Kruegel, and Giovanni Vigna. Detecting spammers on Social Networks. Proceedings of the 26th Annual Computer Security Applications Conference, ACSAC '10. New York : ACM, 2010. Stutzman, Frederic. “An Evaluation of Identity‐Sharing Behavior in Social Network Communities.” International Digital and Media Arts Journal 3.1 (2006): 10‐18. Suler, John. “The Online Disinhibition Effect.” Cyber Psychology & Behavior 7.3 (2004): 321‐325. Tsoutsanis, Alexander. “Tackling Twitter And Facebook Fakes: Identity Theft In Social Media.” World Data Protection Report. 2012. Bloomberg BNA. 2 May 2012. <http://www.bnai.com/TacklingIdentityTheft/default.aspx>. Tsoutsanis, Alexander. “Wet Draait Door: 'Wetsvoorstel' Ficq tegen Neptweets geen Goed Idee.” Mediaforum 2, (2012): 37. <http://ssrn.com/abstract=2004720>. Van Oorschot, Paul and Stuart Stubblebine. Countering Identity Theft through Digital Uniqueness, Location Cross‐Checking, and Funneling. Conference Proceedings of the 9th International 60 Conference on Financial Cryptography and Data Security. Roseau: 2005, 31‐43. Veltman, Paulus. Aan de slag met Twitter. 2011. 17 May 2012. <http://paulusveltman.nl/files/Aan_de_slag_met_Twitter_versie_1.0.pdf>. Von Ahn, Luis, Manuel Blum, Nicholas Hopper, and John Langford. “Captcha: Using hard AI problems for security.” EUROCRYPT 2656. Ed. E. Biham. New York: Springer, 2003. Wang, Alex Hai. Don't Follow Me: Twitter Spam Detection. Proceedings of 5th International Conference on Security and Cryptography (SECRYPT). Athens, 2010. Wang, Wen Jie, Yufei Yuan and Norm Archer. “A Contextual Framework for Combating Identity Theft.” IEE Security & Privacy, 1540‐7993 (2006): 30‐37. Watts, Duncan, Peter S. Dodds and M.E.J Newman. “Identity and Search in Social Networks”. Science 296 (2002): 1302‐1305 Watts, Duncan, Steven Strogatz and M.E.J Newman. “Random Graph Models of Social Networks.” PNAS 99.1 (2002): 2566‐2572. Watts, Duncan and Steven Strogatz. "Collective Dynamics of Small‐scale Networks." Nature 391 (1998): 440‐442. Weng, Jianshu, Ee‐Peng Lim, Jing Jiang and Qi He. TwitterRank: Finding Topic‐sensitive Influential Twitterers. Proceedings of the third ACM international Conference on Web Search and Data Mining. New York: ACM, 2010. “Wet False Twitter‐accounts Overbodig.” Security.nl. 2012. The Security Council. 23 January 2012. <http://www.security.nl/artikel/39844/1/%22Wet_tegen_valse_Twitteraccounts_overbodi% 22.html>. Whitson, Jennifer R. and Kevin D. Haggerty. “Identity Theft and the Care of the Virtual Self.” Economy and Society 37.4 (2008): 572‐594 Williams, Christopher. “Morocco Jails Facebook Faker.” The Register. 2008. 1 May 2012. 61 <http://www.theregister.co.uk/2008/02/25/morocco_prince_facebook_sentence/>. Wong, Annette. "Cyberself: Identity, Language and Stylisation on the Internet.” Cyberlines: Languages and Cultures of the Internet. Ed. Donna Gibbs and Kerri‐Lee Krause. Melbourne: James Nicholas, 2000 . You, Gae‐Won, Seung‐won Hwang, Zaiqing Nie and Ji‐Rong Wen. SocialSearch: Enhancing Entity Search with Social Network Matching. Conference Proceedings of the 14th International Conference on Extending Database Technology, March 2011, Uppsala. New York: ACM, 2011. Zinman, Aron and Judith S. Donath. Is Britney Spears spam? Conference Proceedings of Fourth Conference on Email and Anti‐Spam, Mountain View, CA. 2007. Zilpelwar, Rashmi, Rajneeshkaur Bedi and Vijay Wadhai. “An Overview of Privacy and Security in SNS.” International Journal of P2P Network Trends and Technology 2.1 (2012): 23‐26. 62 APPENDIX Friend networks of a faked identity (Jin, Takabi, and Joshi 29) 63 
Download