Not by the Book: Facebook as a Sampling Frame Not by the Book: Facebook as Sampling Frame Christine Brickman-Bhutta October 27, 2009 Social networking sites and online questionnaires make it possible to do survey research faster, cheaper, and with less assistance than ever before. The methods are especially well-suited for snowball sampling of elusive subpopulations. This note describes my experience surveying thousands of Catholics via Facebook in less than a month, at little expense, and without hired help. Although the respondents were disproportionately female, young, educated, and religiously active, their responses preserved key correlations found in standard surveys conducted by Gallup and the GSS. I relate my methods to existing web-based methods and offer concrete suggestions for future work. Keywords: social networking websites, Facebook, Internet sampling, snowball sampling, chain-referral sampling, coverage error, inference Online social networking sites offer new ways for researchers to run surveys quickly, cheaply, and single-handedly – especially when seeking to construct “snowball” samples of small or stigmatized subsets of the general population. Facebook is currently the SNS best suited for survey research, thanks to size (currently exceeding 300 million users worldwide), intensive use, and continuing growth. Each Facebook user is directly linked to his or her personal “friends,” while also having access to membership in one or more of the 35 million Facebook groups that links millions of other users throughout the world. Facebook groups are virtual communities linking people with some shared interest, attribute, or cause. Researchers can readily sample populations of interest by working through existing groups or creating new ones. 1 Not by the Book: Facebook as a Sampling Frame Although researchers and journalists devote much attention to social networking, I have yet to locate any work that exploits SNS’s as a tool of research. Existing SNS research focuses on questions related specifically to the phenomenon of online social networking: what functions do SNS’s serve for those who use them, and what benefits do users derive (Joinson 2008)? Is the accumulation of social capital one of those benefits (Ellison, Steinfield, and Lampe 2007)? Do SNS users behave differently or look differently than non-users (Acquisiti and Gross 2006; Ellison et al. 2007)? What privacy concerns do the rise of SNS’s raise (Jones and Soltren 2005; Dwyer, Hiltz, and Passerini 2007), and to what extent do these concerns influence online behavior (Acquisiti and Gross 2006)? Can online social interactions predict tie strength (Gilbert and Karahilos 2007)? My work shifts the emphasis from research about SNS’s to research through SNS’s. Within five days of releasing a 12-minute online survey to a Facebook group of potential volunteers, I harvested 2,788 completed questionnaires. Within a month, the total number of respondents increased to about 4,000. Total monetary costs averaged less than one cent per survey – vastly less than the cost of surveys obtained through mail, phone, or even email.1 Moreover, the responses became available for review the moment they were entered. Hence, if the survey turned out to contain any substantial errors or omissions, I could repair the damage within minutes. By working through social networks, I was also able to reach a population that is difficult to reach through conventional survey methods. Although the respondents by no means constituted a random sample of the relevant (Catholic) population, their responses preserved many of the statistical relationships obtained by traditional means. These and other Raziano et al (2001) report a cost of $10.50/mail survey and $7.70/E-mail survey. Harewood et al.’s (2001) study reported a cost per completed survey of $0.71 for E-mail, $2.54 for mail, and $2.08 for telephone. Cobanoglu, Warde, and Moreo (2001) report a unit cost of $2.61 for mail surveys and $1.20 for faxed surveys excluding the cost of coding the resulting information into machine-readable form. 1 2 Not by the Book: Facebook as a Sampling Frame advantages described below make Facebook a useful tool both for researchers with limited means and for rapid pre-testing of surveys destined for dissemination via traditional methods. The paper proceeds as follows. Section one reviews the relevant literature on chainreferral sampling and electronic survey methods, highlighting strengths and limitations of both of these methods. Section two follows with a description of the Facebook features that make it an effective tool for snowball sampling. Section three discusses attempts to recruit study volunteers, and section four details the results of those efforts. Sections five and six address the nature of the bias in the data, and section seven concludes the paper with suggestions for others interested in replicating this method. 1. Related Research Chain-referral sampling first emerged in response to the neglect of social structure and interpersonal-relationships in survey research methods. As Coleman (1958) notes, most early analyses overlooked the role of relationships, “never including (except by accident) two persons who were friends” (28). Snowball sampling is a chain-referral technique that accumulates data through existing social structures. The researcher begins with a small sample from the target subpopulation and then extends the sample by asking those individuals to recommend others for the study. Chain-referral techniques have the added benefit of providing relatively easy access to “hidden” subpopulations that are almost impossible to sample by standard (phone, mail, or doorto-door) methods, due to their small size or distrust of outsiders. Examples include studies of prostitutes (Faugier 1996), the homeless (Anderson and Calhoun 1992), AIDS victims (Martin and Dean 1990), drug users (Biernacki and Waldorf 1981; Griffiths et al. 2006), and religious “cults” (Lewis 1986). 3 Not by the Book: Facebook as a Sampling Frame Sample bias is the principal downside of the chain-referral approach. On the one hand, study volunteers may try to protect their friends by not referring them, a problem known as “masking.” On the other hand, “referrals occur through network links, so subjects with larger personal networks will be oversampled, and relative isolates will be excluded” (Heckathorn 1996). Thus Faugier’s (1996) study of prostitutes undersampled women who were new to the business or who had been ostracized by their peers. Participants may also recruit inappropriate volunteers, especially if they misinterpret the study’s design or purpose (Biernacki and Waldorf 1981). And response rates are difficult to define, much less estimate, when participation spreads through forwarded surveys and undocumented invitations. Despite these limitations, no one disputes the value of chain referral methods for studies of elusive subpopulations and exploratory work (Penrod et al. 2003; Faugier 1996). Moreover, new techniques can help to overcome some of the problems discussed above.2 Facebook and other social network sites allow us to carry chain-referral methods into the age of the Internet, while also exploiting the strengths of online questionnaires. A single scholar can complete projects that previously required large teams. Costs of printing, postage, and data entry virtually disappear. Feedback is instantaneous. Turnaround times shrink from weeks to days. It becomes much easier to reach remote, diffuse, and alienated subpopulations. (For recent work on the costs and benefits of web-based research, see Bachmann et al. 1996; Berge and Collins 1996; Goree and Marszalek 1995; Kiesler and Sproul 1986; Parker 1992; Schmidt 1997; Sproul 1986; Weible and Wallace 1998; Roselle and Neufeld 1998; Coomber 1997; and Evans and Mathur 1995). See Heckathorn’s (1996) discussion of respondent-driven sampling (RDS), which attempts to overcome problems of non-representativeness. 2 4 Not by the Book: Facebook as a Sampling Frame SNS sampling shares most of the limitations associated with other forms of web-based research. We cannot reach those who lack the requisite computer skills and equipment. Nor are we likely to reach many people with serious concerns about Internet privacy. The layout and readability of surveys can vary across hardware and software. Electronic surveys can easily reach unintended recipients and are more readily taken multiple times. And response rates tend to be lower than those associated with phone, mail, and interviews. (For more on these difficulties, see Berge and Collins 1996; Kiesler and Sproull 1986; Parker 1992; Sproull 1986; Best and Krueger 2002; O’Lear 1996; Sell 1997; Evans and Mathur 1995; Kittleson 1995; Greene, Speizer, and Wiitala 2008; McDonald and Adam 2003; Converse et al. 2008; Cole 2005; Swoboda et al. 1997; Griffis, Goldsby, and Cooper 2003; Smith and Leigh 1997; Goree and Marszalek 1995). Bearing in mind all these considerations, let us turn to a specific SNS-based project. 2. Facebook as a Sampling Frame Facebook is currently the world’s largest and fastest-growing SNS. Each user creates a personal profile (basically a personal webpage) with information about his or her interests, hobbies, education, occupation, contact information, and the like. Most users also post a personal profile picture. Users can invite people to become their Facebook “friends,” thereby creating networks for public postings and private messages. Users can also create specific Facebook “groups” based on shared interests, workplaces, regions, schools, or anything else. Each group or network has its own Facebook page where members can post messages or chat, and many users list the networks and groups that they belong to on their profile pages. 5 Not by the Book: Facebook as a Sampling Frame Table 1. Proportion of the U.S. Populationa with Facebook Profiles Age 15-19 20-24 25-29 30-34 35-39 40-44 45-49 >49 All**** Oct-08 57.8% 63.7% 32.6% 19.9% 12.2% 5.7% 3.4% 1.2% 17.0% Feb-09 62.9% 73.2% 47.1% 36.6% 27.4% 15.7% 9.3% 3.4% 24.4% Jun-09 65.1% 72.1% 56.7% 47.7% 39.3% 27.8% 19.4% 9.1% 31.5% Sep-09 81.3% 87.4% 73.7% 62.3% 52.4% 38.6% 30.9% 16.3% 42.8% a Based on 2007 U.S. Census population estimates Starting with one or more groups or networks, researchers can create snowball samples by gathering respondents via links to additional friends, groups, and networks. To illustrate the potential of this simple approach, consider the result achieved by one enterprising Facebook user who created a group called “Six Degrees of Separation: The Experiment.” In order to maximize the number of group members, he invited all his friends to join and encouraged all of them to do likewise ad infinitum. The group currently numbers more than five million! As Facebook has grown it also has become increasingly representative of the U.S. population. Though created in 2004 for college students alone, Facebook soon launched a high school version and in September 2006 provided free memberships to anyone. Table 1 documents the spread of Facebook from October 2008 through September 2009. Over those twelve months, the proportion of Americans with Facebook profiles increased from 17.0 percent to 42.8 percent, and the greatest growth (in absolute percentages) occurred among adults aged 25 to 44. Significantly, whereas just over one percent of Americans aged 50 and older had Facebook profiles in October 2008, a year later this figure had increased to 16.3 percent. No longer is Facebook solely a tool for young adults. 6 Not by the Book: Facebook as a Sampling Frame 3. Recruiting Survey Participants As noted above, snowball sampling is especially effective when targeting hard-to-reach populations. This was the case in the present investigation, which included inactive- and exCatholics who are by definition underrepresented in pews and parish rosters. I began my search in December 2008 by creating a new group named, “Please Help Me Find Baptized Catholics!” The group’s description explained the purpose of the group, outlined eligibility requirements, and provided instructions on how to be involved. Though I wished only to survey baptized Roman Catholics, the text of my group page invited any viewer to join the group and likewise encouraged them to forward invitations to all their Facebook friends and groups. This strategy was designed both to maximize sample size and to avoid the biases associated with sampling down social chains composed entirely of Catholics. After about a month, all group members would receive the link to the survey, which explored Catholic identity. I then sought to identify the administrators of Facebook groups who could help me recruit members for my study group. Because there are literally tens of thousands of Catholic groups on Facebook, I excluded groups with large proportions of foreign members and groups with narrow membership criteria, such as those created for specific Facebook networks, college alumni groups, or ethnic groups. This process yielded fifty Catholic groups which I then tried to further categorize by level of religious participation and identification. From group names, descriptions, and postings, I concluded that inactive- or ex-Catholics predominated in twentyfive of the groups; active Catholics predominated in seventeen groups; and the remaining eight groups were of mixed or uncertain composition. 7 Not by the Book: Facebook as a Sampling Frame I then contacted the administrators of all 50 Facebook groups, soliciting their help in recruiting volunteers for the study. Each administrator received a personal message that explained the purpose of the research and asked them to send a message to the members of their groups with an invitation to join the research group. I also encouraged them to contact me with any questions or concerns. Over the course of three days, I sent personal messages to 43 of the 50 administrators of the Catholic groups. On the first day of solicitations, I contacted nine administrators, and six responded positively. As a result, my group initially grew quite quickly. As time went on, however, my requests for help yielded fewer responses. Of the 25 administrators contacted on the second day, 18 failed to follow up, and no one responded to messages sent on the third day. I suspect that this rapid decline in responses was a consequence of my own initial success. As my group grew, the administrators of other groups perhaps concluded I no longer needed their help. In any case, in light of the rapid growth of my own group and the rapid decline in responses from other group administrators, I decided not to contact the last seven of fifty administrators. Table 2 summarizes attempts to recruit study volunteers. Although I asked administrators to send messages to their members, some elected to post the information to their group’s pages instead, and they did so for one of two reasons: either they considered messaging their members obtrusive, or the size of their groups inhibited their ability to send mass messages.3 Because viewing the posting required users to visit the group page, and because Facebook users often join groups that they rarely if ever return to, this approach left users less likely to learn about the study. Nevertheless, I preferred some assistance to no assistance. Moreover, one person who posted the link administers a group with over 30,000 members; even 3 Administrators can only send mass mailings to groups with fewer than 5000 members. 8 Not by the Book: Facebook as a Sampling Frame if the rate of awareness in his group was low, the potential for recruitment from his group in absolute numbers was substantial. Table 2. Administrator Response by Group Classification and Type of Assistance Provided ADMINISTRATOR SUMMARY Number contacted: 42a Number who helped: 15 Response Rate: 35.7% RESPONSE BY TYPE Sent a Message to All Group Members Group Type # Members Former/Inactive 2329 Mixed 743 Active 1542 Total 4614 % Members 50.5% 16.1% 33.4% 100.0% 5 3 2 10 Posted a Link to the Group Group Type # Members Former/Inactive 976 Mixed 0 Active 31,873 Total 32,849 % Members 3.0% 0.0% 97.0% 100.0% 3 0 2 5 # Groups # Groups a 43 administrators were contacted but one group was misclassified. The group appeared to be comprised of Catholics but it was not. After working through existing Catholic groups, I also contacted over 200 of my own Facebook friends, seeking their help in recruiting volunteers. Though I failed to collect systematic information on the results of this process, the feedback that I received suggests that many of them passed along my group link to other people, and a few also posted information about the study on their personal Facebook pages. Facebook is designed in such a way that group administrators lose the capacity to send mass messages to their membership if the group size exceeds 5,000. Because I planned to use 9 Not by the Book: Facebook as a Sampling Frame mass mailings to direct people to my online survey, I closed my initial group when membership approached 4,500 and opened a second group for additional volunteers. After 2,500 people joined the second group,4 I closed it and opened a third and final group. Altogether, nearly 7,500 people joined one of the three groups over the course of one month and received the link to the survey. 4. Response Rate Although I cannot know how many people ultimately received the survey link, the webbased survey software (QuestionPro) collects basic statistics that indicate the response rate of those who start the survey. The survey was started 4,709 times, yielding 4,016 completed surveys for a completion rate of 85.3 percent. Only 18 responses came from ineligible participants, leaving a total of 3,998 usable surveys. On average, respondents completed the survey in 12 minutes, and the high completion rate suggests this was a reasonable request of one’s time. Speed of response represents a significant advantage of web-based surveys,5 one that this research captures quite well. Figure 1 plots the number of completed surveys within the first month of its release. Although I kept the survey link active for 100 days, the vast majority of 4 After closing a group, new members can continue to join if they are responding to an existing group invitation. I became aware of this after closing the first group and becoming alarmed when the group continued to grow and approached 5,000 members. In fact, in order to keep the first group below the threshold, I sent the group members a message and asked some of them to leave the first group and join the second group. Wanting to avoid this problem with the second group, I closed it with just 2,500 members and over 2,000 outstanding group invitations. 5 Those who use mixed modes to gather data typically report faster speed of response with electronic surveys than with traditional modes. In Cobanoglu et al.’s (2001) study, the average response speeds for mail and web-based surveys were 16.46 days and 5.97 days, respectively. Schaefer and Dillman (1998) reported rates of 14.39 days and 9.16 days for mail and E-mail modes, respectively. 10 Not by the Book: Facebook as a Sampling Frame completed surveys arrived within days.6 Members of groups two and three received the survey link at time zero; members of group one received the survey link on day three. Just five hours after releasing the survey, I tallied 426 completed surveys, a little more than 10 percent of the total number of completed surveys. Within five days, the number grew to 2788, 70 percent of the total. Within ten days, the figure increased to 80 percent. Participation continued to taper off, and 90 percent of the data arrived by the end of the first month. Figure 1. Number of Completed Surveys during First Month of Survey Release 4000 3500 3000 2500 2000 1500 Released survey to group one (4892 members) 1000 500 Released survey to groups two and three (2661 total members) 0 0 5 10 15 20 25 30 Number of Days Elapsed Pan (2009) reports similar response patterns. He administered four separate web-based surveys and found that “the response percentages of the first days are generally larger than 70 percent” (8). 6 11 Not by the Book: Facebook as a Sampling Frame 5. Sample Characteristics To assess the representativeness the Facebook sample, I compared those aged 18 and older to adult respondents of the 2008 GSS who identified Catholicism as the religion of their upbringing. Compared to the general population of U.S. adult Catholics and ex-Catholics, the Facebook sample is younger on average (46.7 years versus 30.7 years), more female (54.6 percent versus 69.9 percent), and less likely to identify as Latino/Hispanic (29.1 percent versus 5.6 percent). The Facebook sample is also much better educated and more religious. Two-thirds of Facebook respondents have at least a bachelor’s degree (versus one-fourth of GSS respondents) and 76.8 percent claim to attend Mass at least once per week (versus 29.2 percent in the GSS). Table 3 summarizes these comparisons. Table 3. Sample Characteristics: GSS 2008a vs. Facebook a Facebook GSS (N=3767) (N=672) Demographics Mean Age 30.7 46.7 % Female 69.9 54.6 % Latino 5.6 29.1 % College Grad 66.8 26.6 Mass Attendance (%) Never or seldom Sometimes Once/week >Once/week Total 13.3 19.9 43.1 23.7 100.0 41.7 29.2 23.8 5.4 100.1 a Adults (age 18+) raised Catholic and converts to Catholicism Column totals may exceed 100 percent due to rounding Religious activity was especially striking among the Facebook males – 27.9 percent report attending Mass more than once per week, versus 21.9 percent for females (Table 4). This 12 Not by the Book: Facebook as a Sampling Frame result directly contradicts not only the GSS data, but also a huge and varied body of research demonstrating greater religiosity among women than men for all aspects of religious behavior, all regions of the world, and all know eras (Miller and Stark 2002; Beit-Hallahmi 1997). Table 4. Attendance at Religious Services by Gender: GSS 2008a vs Facebooka Samples Facebook GSS Females Males Females Males (N=2631) (N=1136) (N=367) (N=305) Never or seldom 12.4% 15.4% Never or seldom 38.4% 45.6% Sometimes 21.1% 17.1% Sometimes 28.6% 29.8% Once/week 44.6% 39.6% Once/week 25.6% 21.6% >Once/week 21.9% 27.9% >Once/week 7.4% 3.0% Total a 100.0% 100.0% Total 100.0% 100.0% Adults (age 18+) raised Catholic and converts to Catholicism In short, the Facebook respondents cannot possibly serve as a representative sample of the general Catholic population. (Pollsters should view Facebook findings with extreme caution.) Yet all is not lost, for the data may yet preserve many of the empirical relationships that are of primary interest to me and other researchers. 6. Form-Resistant Correlations A proper probability sample may not be needed when one seeks primarily to understand relationships between two or more population characteristics rather than the actual prevalence of individual characteristics (Witte, Amoroso, and Howard 2000; Bainbridge 2002; Bainbridge 1999). The “form-resistant correlation hypothesis” contends that biased samples with very different means and variances will often preserve measures of statistical relationships. However, form resistance is far from certain, and other researchers warn of the many potential problems 13 Not by the Book: Facebook as a Sampling Frame likely to arise when testing causal hypotheses with non-probability samples (Best and Krueger 2002; Best et al. 2001). With Internet surveys, for example, researchers must assume that the relationships of interest are not influenced by patterns of Internet use – an assumption that obviously fails when studying something like attitudes toward technology or Internet privacy. There is, however, little a priori evidence suggesting that patterns of contemporary Catholic commitment correlate strongly with patterns of Internet usage. Hence, there is no reason to reject the form-resistant correlation hypothesis out of hand. Table 5. Correlations with Current Religious Attendance Catholics should…b Facebooka 0.02 0.04* -0.06*** -0.04* 0.33*** 0.45*** 0.20*** 0.27*** 0.37*** Age Sex Education Latino Have a Catholic wedding Attend Mass each week Help the poor Believe in Transubstantiation Registered in parish Believe Catholicism is more true Income 2008 GSSa 0.26*** -0.08*** 0.10*** -0.10*** 0.51*** -0.02 Childhood Saliencec Catholic parents Catholic spouse Presence of children 0.07*** -0.03+ 0.34*** 0.04* 1999 Gallup 0.25*** -0.12*** -0.01 -0.06+ 0.20*** 0.29*** 0.10** 0.23*** 0.38*** 0.29*** -0.04 0.22*** 0.02 0.17* 0.14*** + p<0.10, * p<0.05, ** p<0.01, *** p<0.001 a To make the data comparable to the Gallup study, analysis restricted to those who currently identify as Catholic b The Facebook survey posed these questions somewhat differently than did Gallup. For each item, Gallup asked "Please tell me if you think a person can be a good Catholic without performing these actions or affirming these beliefs." Response categories are "yes," "no,", "don't know," or "refuse." The Facebook sample asked respondents to indicate the extent to which they believe Catholics are obliged to engage in these acts or hold these beliefs c In the GSS, childhood salience is measured by frequency of Mass attendance at age 12; in the Facebook sample, it is measured by response to the question, "How important was religion in your family when you were growing up?" Studies show these measures of childhood religious salience correlate very highly 14 Not by the Book: Facebook as a Sampling Frame Table 5 reports the Facebook correlations between Mass attendance and a variety of explanatory variables and compares them to corresponding correlations derived from two highquality probability-sample surveys: the 2008 General Social Survey and the 1999 Gallup Poll of Catholics. The outcome measure is the same throughout – a dichotomous variable indicating whether the respondent claims to attend Mass at least weekly. The table correlates attendance with standard demographic controls and standard measures of religious background, beliefs, and social networks. Generally speaking, the form-resistant correlation hypothesis fails with respect to the demographic control variables but holds with respect to the religion variables. For example, although age correlates strongly in both probably samples, the corresponding Facebook correlation is essentially zero. Similarly, both the GSS data and Gallup data capture the known tendency for women to be more religious than men, whereas the Facebook correlation is reversed. But these discrepancies are just part of the story. In both Facebook and Gallup samples, respondents are more likely to attend Mass weekly if they expect Catholics to adhere to traditional church teachings, if they are registered in a Catholic parish, if they believe Catholicism contains a greater share of the truth than other religions do, if religion was important to them during their childhood, if they have a Catholic spouse, and if they have children. The match-up is equally good for Facebook versus GSS. Despite its non-representative character and its non-standard demographic effects, the Facebook sample still manages to replicate correlations between Mass attendance and many measures of religious background, beliefs, and social networks. Over this (admittedly limited) domain, the Facebook data function as a remarkably good substitute for data obtained through vastly more expensive procedures. 15 Not by the Book: Facebook as a Sampling Frame 7. Discussion Because this note seeks only to describe a new research method, I omit all discussion of substantive research results. I must emphasize, however, that those results constitute the payoff to the activities and outcomes described above. As each additional Facebook user completed my web-based questionnaire, his or her responses were instantly added to a well-structured and fully labeled online database that was in turn instantly readable by packages like SAS, SPSS, and Stata. In minutes, I could run my tests and review my key results – ranging from summary statistics and scale reliability to multinomial logistic regressions. And as noted above, I had my first 400 surveys back and fully coded less than five hours after posting the questionnaire link. I was thus able to compare the predictive power of my own theory of Catholic commitment (which happens to stress the power of affective bonds) against existing theories, across a huge (albeit non-representative) new sample of young Catholics, using a new survey designed precisely for my purposes, incorporating a carefully selected mix of new and old questions, and pre-tested on a sizeable group of faculty, students, and staff at the Catholic University of America. For details concerning my results, see XXXX (forthcoming). The benefits of my web-based approach did not end with the survey data. By including in my survey an invitation for follow-up interviews, I was able to obtain (and pre-screen) many respondents willing to participate in one-on-one interviews via Internet chat. Thus I supplemented my large collection of questionnaire results with qualitative findings from in-depth conversations, the text of which was likewise saved to online files and immediately readable by text analysis software. Because my method of data collection appears to be unprecedented, I could not avail myself of any established procedures or “Best Strategies.” I relied instead on standard insights, 16 Not by the Book: Facebook as a Sampling Frame personal intuition, and pure luck. I have summarized my experience in the hope of helping others exploit the unique strengths of sampling through social networking sites while avoiding or at least understanding their limitations. Snowball methods almost always involve sample bias. SNS-samplers must consider this problem at every stage of their work. As a final example, consider the seemingly simple process of naming target groups. My study concerned all baptized American Catholics – active, inactive, and former. Hence, it might seem natural to recruit people to a group for “Catholics” or “baptized Catholics.” But inactive and former Catholics might ignore or even avoid such groups (and one respondent did indeed note that the term “baptized Catholics” made her “cringe”). Different problems arise when working with more narrowly defined groups – such as separate groups for active, inactive, and ex-Catholics. As illustrated above in Table 2, the Facebook groups catering to active Catholics have many more members and much more activity than do those that cater to inactive or former Catholics. To the extent that a group exists to bind together those that share a common identity, the notion of a Facebook group of indifferent Catholics is an oxymoron: they have little to rally around. However, by recruiting all baptized Catholics together, I could enlist the ardent Catholics as recruits of inactive and former Catholics. I ultimately chose the all-Catholic approach and in retrospect believe it was critical to the success of the study. I named my initial group, “Please Help Me Find Baptized Catholics!” and included a note on the group page that explicitly encouraged group members to recruit friends and family members who are no longer active in the Church. Although active Catholics were still over-represented in my study (Table 3), the relative size of the marginal and former Catholic groups compared to other Catholic groups I located on Facebook leads me to believe the ardent Catholics served an indispensible role as recruiters. Of the groups that agreed to help, the 17 Not by the Book: Facebook as a Sampling Frame inactive groups had just 413 members on average, compared to over 8,300 in the active Catholic group – nearly twenty times as many. I also believe my efforts to lend legitimacy to the study increased the response rate. Conducting scholarly research via Facebook is unconventional, and I knew many might not take me seriously. To make my study more credible, I included my institutional affiliation and my university E-mail address on the website, along with the names, E-mail addresses, and the university websites of the members of my dissertation committee. Additionally, I posted a scholarly review of the research that justifies non-probability sampling to the group’s discussion board. Keeping the size of my groups under 5,000 members was critical to the success of the study. As noted above, group administrators lose the ability to send mass messages to members when the group size exceeds this number. Had this occurred, I would need to rely on my volunteers to return to the group page to access the survey link when it became available. Like many Facebook groups, however, “Please Help Me Find Baptized Catholics,” had little activity. The groups that keep members coming back are those that keep the content fresh – frequently updating the group's “Recent News,” posting new photos and videos, adding current events and links – so members have a reason to return. Even had I attempted this, there is little chance I would captivate the interests of all 7,500 members, prompting them to return to the page on a regular basis. Therefore, closing the group well before the size reaches the 5,000 member threshold is a vital strategy for a successful study. Researchers planning to use Facebook as a sampling frame should undertake measures that enable them to assess – and perhaps mitigate – bias that will necessarily arise. I included several items in my survey that I pulled directly from two probability-based surveys. Not only 18 Not by the Book: Facebook as a Sampling Frame did this enable me to address the nature of the bias in my data, I also used the information to weight the data and correct for the gender bias in my sample (not shown). Although one must take care when using this method,7 studies have shown this to be an effective way of improving the representativeness of biased samples (Witte et al. 2000; Bainbridge 2002; Bainbridge 1999). Releasing the survey in multiple waves can mitigate traffic-related problems and offers an opportunity to make changes to unforeseen problems. My experience offers an imperfect attempt to avoid these pitfalls. As displayed in Figure 1, when I first released the survey, I sent the link to approximately one-third of all group members; the remaining volunteers received the survey after three days. Within minutes of disseminating the link in the first wave, I received an E-mail from a volunteer identifying a problem with one of the questions.8 I corrected the problem immediately, but several minutes passed before the change took effect, presumably due to the large volume of individuals on the website at the time. Future researchers can learn from this mistake by releasing the survey only to a few dozen respondents initially. This experience offers another lesson as well – researchers conducting surveys via the web must be mindful of the limitations of the web-based software they choose. The diversity of Facebook groups lends itself to a stratified sampling approach that may increase the representative of the population of interest. Earlier I mentioned that I only sampled groups that appeared representative of the target population: all individuals baptized in the Catholic Church in the United States. In retrospect, this was an ineffectual approach because the sampling frame itself (i.e. Facebook users) is biased in several ways, and my sample reflected 7 In particular, one must ensure that some variables of interest might is not correlated with the weight variable. The item asked respondents to “select all of the ranges that reflect the ages of your children.” Although this was a multiple response item, I inadvertently set up the software to only accept a single response. 8 19 Not by the Book: Facebook as a Sampling Frame that biased. Had a employed a stratified approach, I might have mitigated that bias by targeting specific groups, such as Latino Catholics and groups comprised of older respondents. Conclusion Over thirty-five years ago, Mark Granovetter (1973) illustrated how weak ties transmit information more quickly and more diffusely than do strong ties. Those to “whom we are weakly tied are more likely to move in different circles from our own and thus will have access to information different from that which we receive” (1371). Weak ties are the bridges between small clusters of close friends, linking us together to form an elaborate web of social relationships. Facebook offers researchers a way to capitalize on the strength of these weak ties. According to Facebook’s internal statistics, the average user has 120 friends (Facebook 2009). While some of these relationships constitute “strong ties,” the vast majority are acquaintances, old friends from high school or college we rarely see, co-workers we may or may not like. Thirty-five million Facebook groups also exist on the site, linking users to millions of other people they do not even know. The present study offers a modest attempt to exploit these features, which make Facebook an ideal tool for snowball sampling. By navigating through existing groups of Catholics and by tapping into personal friends, I recruited over 4,000 individuals to participate in my research study. Data collection was extremely fast and incredibly cheap. Within five days 2,700 individuals completed the survey, and a modest license fee for the web-based software was my only monetary expense. The data also preserved the statistical relationships between the variables of interest, despite being biased in several ways. While snowball sampling via 20 Not by the Book: Facebook as a Sampling Frame Facebook is no substitute for probability-based techniques, the fact that the relevant correlations between variables hold justifies undertaking a study that is representative of the entire population. Finally, although the present study restricted the sampling frame to the United States, Facebook is an increasingly attractive option for international research as well. Among today’s active Facebook users, seventy percent live outside the United States (Facebook 2009). Whereas international research greatly increases the cost of conducting survey research via traditional methods like mail and telephone, the Facebook method poses no additional costs. Facebook had 150 million active users in January 2009. Just nine months later, this number doubled to 300 million (Facebook 2009). Whether or not Facebook continues along its current trajectory, social networking is here to stay and certain to grow – and so too are the opportunities it affords for faster, cheaper, and better research. 21 Not by the Book: Facebook as a Sampling Frame References Acquisti, Alessandro and Ralph Gross. 2006. “Imagined Communities: Awareness, Information Sharing, and Privacy on the Facebook.” Paper presented at the 6th Workshop on Privacy Enhancing Technologies, Cambridge, U.K., June 28 – June 30. Anderson, L., and P. Calhoun. 1992. "Facilitative Aspects of Field Research." Sociological Inquiry 62(4): 490-498. Bachmann, D. et al. 1996. "Tracking the Progress of E-mail vs. Snail-mail." Marketing Research 8(2): 30-35. Bainbridge, William S. 2002. “Validity of Web-Based Surveys: Explorations with Data from 2,382 Teenagers.” In O.V. Burton (Ed.), Computing in the Social Sciences and Humanities (pp 51-66). Chicago: University of Illinois Press. _____________________ . 1999. "Cyberspace: Sociology's Natural Domain." Contemporary Sociology 28(6): 664-667. Beit-Hallahmi, Benjamin.1997. "Biology, Density and Change: Women’s Religiosity and Economic Development." Journal of Institutional and Theoretical Economics 153: 166-178. Berge, Z. L. and M.P. Collins. 1996. "'IPCT Journal' Readership Survey." Journal of the American Society for Information Science 47(9): 701-710. Best, Samuel J. and Brian Krueger. 2002. "New Approaches to Assessing Opinion: The Prospect for Electronic Mail Surveys." International Journal of Public Opinion Research 14(1): 73-92. Best, Samuel J et al. 2001. “An Assessment of the Generalizability of Internet Surveys.” Social Science Computer Review 19(2): 131-145. Biernacki, Patrick and Dan Waldorf. 1981. “Snowball Sampling: Problems and Techniques of Chain Referral Sampling.” Sociological Methods and Research (10)2: 141-163. Browne, Kath. 2005. “Snowball Sampling: Using Social Networks to Research Nonheterosexual Women.” International Journal of Social Research Methodology 8(1): 47-60. Cobanoglu, C., Warde, B. and Moreo, P. 2001. “A Comparison of Mail, Fax and Web-Based Survey Methods.” International Journal of Market Research 43(4): 441-452. 22 Not by the Book: Facebook as a Sampling Frame Cole, S. T. 2005. "Comparing Mail and Web-Based Survey Distribution Methods: Results of Surveys to Leisure Travel Retailers." Journal of Travel Research 43(May): 422-430 Coleman, James. 1958. “Relational Analysis: The Study of Social Organizations with Survey Methods.” Human Organization 17(4): 28-36. Converse, P. D. et al. 2008. "Response Rates for Mixed-Mode Surveys Using Mail and Email/Web." American Journal of Evaluation 29(1): 99-107. Coomber, R. 1997. "Using the Internet for Survey Research." Sociological Research Online 2(2). Dwyer, C., Hiltz, S. and Passerini, K. 2007. “Trust and Privacy Concern within Social Networking Sites: A Comparison of Facebook and MySpace.” Paper presented at the Thirteenth Americas Conference on Information Systems, Keystone, Colorado, August 9 – 12. Ellison, N., Steinfield, C. and Lampe, C. 2007. “The Benefits of Facebook ‘Friends’: Social Capital and College Students' Use of Online Social Network Sites.” Journal of Computer-Mediated Communication 12(4): 1143-1168. Evans, J. R. and Anil Mathur. 2005. "The Value of Online Surveys." Internet Research 15(2): 195-219. Facebook. 2009. Facebook Statistics [Online]. Available: http://www.facebook.com/press/info.php?statistics Faugier, J.,and Mary Sargeant. 1997. "Sampling Hard to Reach Populations." Journal of Advanced Nursing 26(4): 790-797. Gilbert, Eric and Karrie Karahalios. 2009. “Predicting Tie Strength with Social Media.” Paper presented at the 27th International Conference on Human Factors in Computing Systems, Boston, MA, April 4 – April 9. Goree, C. T. and J.F. Marszalek III. 1995. "Electronic Surveys: Ethical Issues for Researchers." College Student Affairs Journal 15(1): 75-79. Granovetter, Mark S. 1973. "The Strength of Weak Ties." American Journal of Sociology 78(6): 1360-1380. Greene, J., Speizer, H. and Wiitala, W. 2008. "Telephone and Web: Mixed-Mode Challenge." Health Services Research 43(1): 230-248. 23 Not by the Book: Facebook as a Sampling Frame Griffis, S. E., Goldsby, T.J. and Cooper, M. 2003. "Web-based Surveys: A Comparison of Response, Data, and Cost." Journal of Business Logistics 24(2): 237-258. Griffiths, P. et al. 2006. “Reaching Hidden Populations of Drug Users by Privileged Access Interviewers: Methodological and Practical Issues.” Addiction 88(12): 1617 – 1626. Harewood, G. C. et al. 2001. Prospective Comparison of Endoscopy Patient Satisfaction Surveys: E-mail versus Standard Mail versus Telephone.” The Journal of Gastroenterology 96(12): 3312-3317. Heckathorn, Douglas D. 1997. “Respondent-Driven Sampling: A New Approach to the Study of Hidden Populations.” Social Problems 44(2): 174-199. Hewitt, Anne and Andrea Forte. 2006. “Crossing Boundaries: Identity Management and Student/Faculty Relationships on the Facebook.” Georgia Institute of Technology, http://www.cc.gatech.edu/grads/f/Andrea.Forte/HewittForteCSCWPoster2006.pdf [accessed August 12, 2009]. Joinson, Adam. 2008. “Looking at, Looking up or Keeping up with People?: Motives and Use of Facebook.” Paper presented at the 26th International Conference on Human Factors in Computing Systems, Florence, Italy, April 5 – April 10. Jones, Harvey and Jose Hiram Soltren. 2005. “Facebook: Threats to Privacy.” Project MAC: MIT Project on Mathematics and Computing, http://www-swiss.ai.mit.edu/6.805/studentpapers/fall05-papers/facebook.pdf [accessed August 12, 2009]. Kiesler, S. and L. S. Sproull. 1986. "Response Effects in the Electronic Survey." The Public Opinion Quarterly 50(3): 402-413. Kittleson, M. J. 1995. "An Assessment of the Response Rate via the Postal Service and Email." Health Values 18(2): 27-29. Lewis, James R. 1986. “Reconstructing the ‘Cult’ Experience: Post-Involvement Attitudes as a Function of Mode of Exit and Post-Involvement Socialization.” Sociological Analysis 47(2): 151-159. Martin, John L. and Laura Dean. 1990. “Developing a Community Sample of Gay Men for an Epidemiologic Study of AIDS.” American Behavioral Scientist 33(5): 546-561. McDonald, H. and S. Adam. 2003. "A Comparison of Online and Postal Data Collection Methods in Marketing Research." Marketing Intelligence and Planning 21(2): 85-95. 24 Not by the Book: Facebook as a Sampling Frame Miller, Alan S. and Rodney Stark. 2002. "Gender and Religiousness: Can Socialization Explanations Be Saved?" American Journal of Sociology 107: 1399-1423. O’Lear, S. R. M. 1996. "Using Electronic Mail (E-mail) Surveys for Geographic Research: Lesson from a Survey of Russian Environmentalists." Professional Geographer 49: 209217. Pan, Bing. 2009. “Online Travel Surveys and Response Patterns.” Journal of Travel Research 47(4): 1-16. Parker, L. 1992. "Collecting Data the E-mail Way." Training and Development 46(7): 52-54. Penrod, Janice et al. 2003. “A Discussion of Chain Referral as a Method of Sampling Hard-toReach Populations.” Journal of Transcultural Nursing 14(2): 100-107. Raziano, Donna Brady et al. 2001. “E-mail versus Conventional Postal Mail Survey of Geriatric Chiefs.” Gerontologist 41: 799-804. Roselle, A. and Steven Neufeld 1998. "The Utility of Electronic Mail Follow-Ups for Library Research." Library and Information Science Research 20(2): 153-161. Schaefer, David R. and Don A. Dillman. 1998. “Development of a Standard E-Mail Methodology: Results of an Experiment.” Public Opinion Quarterly 62(3): 378-397. Schmidt, W. C. 1997. "World-Wide Web Survey Research: Benefits, Potential Problems, and Solutions." Behavior Research Methods, Instruments, and Computers 29(2): 274-279. Sell, R. L. 1997. "Research and the Internet: An E-mail Survey of Sexual Orientation." American Journal of Public Health 87(2): 297. Smith, M. A. and B. Leigh. 1997. "Virtual Subjects: Using the Internet as an Alternative Source of Subjects and Research Environment." Behavior Research Methods, Instruments, and Computers 29: 496-505. Sproull, L. S. 1986. "Using Electronic Mail for Data Collection in Organizational Research." Academy of Management Journal 29: 159-69. Swoboda, W. J. et al. 1997. "Internet Surveys by Direct Mailing: An Innovative Way of Collecting Data." Social Science Computer Review 15(3): 242-255. Weible, R., and J. Wallace 1998. "Cyber Research: The Impact of the Internet on Data Collection." Marketing Research 10(3): 19-23. 25 Not by the Book: Facebook as a Sampling Frame Witte, J. C., Amorosa, L. and Howard, P. 2000. "Method and Representation in Internet-Based Survey Tools: Mobility, Community, and Cultural Identity in Survey2000." Social Science Computer Review 18(2): 179-195. 26