Not by the Book: Facebook as Sampling Frame

Not by the Book: Facebook as Sampling Frame
Christine Brickman-Bhutta
October 27, 2009
Social networking sites and online questionnaires make it possible to do survey research faster, cheaper, and with less assistance than ever before. The methods are especially well-suited for snowball sampling of elusive subpopulations. This note describes my experience surveying thousands of Catholics via Facebook in less than a month, at little expense, and without hired help. Although the respondents were disproportionately female, young, educated, and religiously active, their responses preserved key correlations found in standard surveys conducted by Gallup and the GSS. I relate my methods to existing web-based methods and offer concrete suggestions for future work.
Keywords: social networking websites, Facebook, Internet sampling, snowball sampling, chain-referral sampling, coverage error, inference
Online social networking sites offer new ways for researchers to run surveys quickly, cheaply, and single-handedly – especially when seeking to construct “snowball” samples of small or stigmatized subsets of the general population. Facebook is currently the SNS best suited for survey research, thanks to size (currently exceeding 300 million users worldwide), intensive use, and continuing growth. Each Facebook user is directly linked to his or her personal “friends,” while also having access to membership in one or more of the 35 million Facebook groups that links millions of other users throughout the world. Facebook groups are virtual communities linking people with some shared interest, attribute, or cause. Researchers can readily sample populations of interest by working through existing groups or creating new ones.

Although researchers and journalists devote much attention to social networking, I have yet to locate any work that exploits SNS’s as a tool of research. Existing SNS research focuses on questions related specifically to the phenomenon of online social networking: what functions do SNS’s serve for those who use them, and what benefits do users derive (Joinson 2008)? Is the accumulation of social capital one of those benefits (Ellison, Steinfield, and Lampe 2007)? Do SNS users behave differently or look differently than non-users (Acquisiti and Gross 2006; Ellison et al. 2007)? What privacy concerns do the rise of SNS’s raise (Jones and Soltren 2005; Dwyer, Hiltz, and Passerini 2007), and to what extent do these concerns influence online behavior (Acquisiti and Gross 2006)? Can online social interactions predict tie strength (Gilbert and Karahilos 2007)?

My work shifts the emphasis from research about SNS’s to research through SNS’s. Within five days of releasing a 12-minute online survey to a Facebook group of potential volunteers, I harvested 2,788 completed questionnaires. Within a month, the total number of respondents increased to about 4,000. Total monetary costs averaged less than one cent per survey – vastly less than the cost of surveys obtained through mail, phone, or even email.1 Moreover, the responses became available for review the moment they were entered. Hence, if the survey turned out to contain any substantial errors or omissions, I could repair the damage within minutes. By working through social networks, I was also able to reach a population that is difficult to reach through conventional survey methods. Although the respondents by no means constituted a random sample of the relevant (Catholic) population, their responses preserved many of the statistical relationships obtained by traditional means. These and other advantages described below make Facebook a useful tool both for researchers with limited means and for rapid pre-testing of surveys destined for dissemination via traditional methods.

The paper proceeds as follows. Section one reviews the relevant literature on chain-referral sampling and electronic survey methods, highlighting strengths and limitations of both of these methods. Section two follows with a description of the Facebook features that make it an effective tool for snowball sampling. Section three discusses attempts to recruit study volunteers, and section four details the results of those efforts. Sections five and six address the nature of the bias in the data, and section seven concludes the paper with suggestions for others interested in replicating this method.

1. Related Research
Chain-referral sampling first emerged in response to the neglect of social structure and interpersonal-relationships in survey research methods. As Coleman (1958) notes, most early analyses overlooked the role of relationships, “never including (except by accident) two persons who were friends” (28). Snowball sampling is a chain-referral technique that accumulates data through existing social structures. The researcher begins with a small sample from the target subpopulation and then extends the sample by asking those individuals to recommend others for the study. Chain-referral techniques have the added benefit of providing relatively easy access to “hidden” subpopulations that are almost impossible to sample by standard (phone, mail, or door-to-door) methods, due to their small size or distrust of outsiders. Examples include studies of prostitutes (Faugier 1996), the homeless (Anderson and Calhoun 1992), AIDS victims (Martin and Dean 1990), drug users (Biernacki and Waldorf 1981; Griffiths et al. 2006), and religious “cults” (Lewis 1986).

Sample bias is the principal downside of the chain-referral approach. On the one hand, study volunteers may try to protect their friends by not referring them, a problem known as “masking.” On the other hand, “referrals occur through network links, so subjects with larger personal networks will be oversampled, and relative isolates will be excluded” (Heckathorn 1996). Thus Faugier’s (1996) study of prostitutes undersampled women who were new to the business or who had been ostracized by their peers. Participants may also recruit inappropriate volunteers, especially if they misinterpret the study’s design or purpose (Biernacki and Waldorf 1981). And response rates are difficult to define, much less estimate, when participation spreads through forwarded surveys and undocumented invitations.

Despite these limitations, no one disputes the value of chain referral methods for studies of elusive subpopulations and exploratory work (Penrod et al. 2003; Faugier 1996). Moreover, new techniques can help to overcome some of the problems discussed above.2

Facebook and other social network sites allow us to carry chain-referral methods into the age of the Internet, while also exploiting the strengths of online questionnaires. A single scholar can complete projects that previously required large teams. Costs of printing, postage, and data entry virtually disappear. Feedback is instantaneous. Turnaround times shrink from weeks to days. It becomes much easier to reach remote, diffuse, and alienated subpopulations. (For recent work on the costs and benefits of web-based research, see Bachmann et al. 1996; Berge and Collins 1996; Goree and Marszalek 1995; Kiesler and Sproul 1986; Parker 1992; Schmidt 1997; Sproul 1986; Weible and Wallace 1998; Roselle and Neufeld 1998; Coomber 1997; and Evans and Mathur 1995).

SNS sampling shares most of the limitations associated with other forms of web-based research. We cannot reach those who lack the requisite computer skills and equipment. Nor are we likely to reach many people with serious concerns about Internet privacy. The layout and readability of surveys can vary across hardware and software. Electronic surveys can easily reach unintended recipients and are more readily taken multiple times. And response rates tend to be lower than those associated with phone, mail, and interviews. (For more on these difficulties, see Berge and Collins 1996; Kiesler and Sproull 1986; Parker 1992; Sproull 1986; Best and Krueger 2002; O’Lear 1996; Sell 1997; Evans and Mathur 1995; Kittleson 1995; Greene, Speizer, and Wiitala 2008; McDonald and Adam 2003; Converse et al. 2008; Cole 2005; Swoboda et al. 1997; Griffis, Goldsby, and Cooper 2003; Smith and Leigh 1997; Goree and Marszalek 1995).

Bearing in mind all these considerations, let us turn to a specific SNS-based project.
2. Facebook as a Sampling Frame

Facebook is currently the world’s largest and fastest-growing SNS. Each user creates a personal profile (basically a personal webpage) with information about his or her interests, hobbies, education, occupation, contact information, and the like. Most users also post a personal profile picture. Users can invite people to become their Facebook “friends,” thereby creating networks for public postings and private messages. Users can also create specific Facebook “groups” based on shared interests, workplaces, regions, schools, or anything else. Each group or network has its own Facebook page where members can post messages or chat, and many users list the networks and groups that they belong to on their profile pages.

Table 1. Proportion of the U.S. Populationa with Facebook Profiles



















































aBased on 2007 U.S. Census population estimates

Starting with one or more groups or networks, researchers can create snowball samples by gathering respondents via links to additional friends, groups, and networks. To illustrate the potential of this simple approach, consider the result achieved by one enterprising Facebook user who created a group called “Six Degrees of Separation: The Experiment.” In order to maximize the number of group members, he invited all his friends to join and encouraged all of them to do likewise ad infinitum. The group currently numbers more than five million!

As Facebook has grown it also has become increasingly representative of the U.S. population. Though created in 2004 for college students alone, Facebook soon launched a high school version and in September 2006 provided free memberships to anyone. Table 1 documents the spread of Facebook from October 2008 through September 2009. Over those twelve months, the proportion of Americans with Facebook profiles increased from 17.0 percent to 42.8 percent, and the greatest growth (in absolute percentages) occurred among adults aged 25 to 44. Significantly, whereas just over one percent of Americans aged 50 and older had Facebook profiles in October 2008, a year later this figure had increased to 16.3 percent. No longer is Facebook solely a tool for young adults.
3. Recruiting Survey Participants

As noted above, snowball sampling is especially effective when targeting hard-to-reach populations. This was the case in the present investigation, which included inactive- and ex-Catholics who are by definition underrepresented in pews and parish rosters.

I began my search in December 2008 by creating a new group named, “Please Help Me Find Baptized Catholics!” The group’s description explained the purpose of the group, outlined eligibility requirements, and provided instructions on how to be involved. Though I wished only to survey baptized Roman Catholics, the text of my group page invited any viewer to join the group and likewise encouraged them to forward invitations to all their Facebook friends and groups. This strategy was designed both to maximize sample size and to avoid the biases associated with sampling down social chains composed entirely of Catholics. After about a month, all group members would receive the link to the survey, which explored Catholic identity.

I then sought to identify the administrators of Facebook groups who could help me recruit members for my study group. Because there are literally tens of thousands of Catholic groups on Facebook, I excluded groups with large proportions of foreign members and groups with narrow membership criteria, such as those created for specific Facebook networks, college alumni groups, or ethnic groups. This process yielded fifty Catholic groups which I then tried to further categorize by level of religious participation and identification. From group names, descriptions, and postings, I concluded that inactive- or ex-Catholics predominated in twenty-five of the groups; active Catholics predominated in seventeen groups; and the remaining eight groups were of mixed or uncertain composition.

I then contacted the administrators of all 50 Facebook groups, soliciting their help in recruiting volunteers for the study. Each administrator received a personal message that explained the purpose of the research and asked them to send a message to the members of their groups with an invitation to join the research group. I also encouraged them to contact me with any questions or concerns.

Over the course of three days, I sent personal messages to 43 of the 50 administrators of the Catholic groups. On the first day of solicitations, I contacted nine administrators, and six responded positively. As a result, my group initially grew quite quickly. As time went on, however, my requests for help yielded fewer responses. Of the 25 administrators contacted on the second day, 18 failed to follow up, and no one responded to messages sent on the third day. I suspect that this rapid decline in responses was a consequence of my own initial success. As my group grew, the administrators of other groups perhaps concluded I no longer needed their help. In any case, in light of the rapid growth of my own group and the rapid decline in responses from other group administrators, I decided not to contact the last seven of fifty administrators.

Table 2 summarizes attempts to recruit study volunteers. Although I asked administrators to send messages to their members, some elected to post the information to their group’s pages instead, and they did so for one of two reasons: either they considered messaging their members obtrusive, or the size of their groups inhibited their ability to send mass messages.3 Because viewing the posting required users to visit the group page, and because Facebook users often join groups that they rarely if ever return to, this approach left users less likely to learn about the study. Nevertheless, I preferred some assistance to no assistance. Moreover, one person who posted the link administers a group with over 30,000 members; even if the rate of awareness in his group was low, the potential for recruitment from his group in absolute numbers was substantial.

Table 2. Administrator Response by Group Classification and Type of Assistance Provided


Number contacted: 42a

Number who helped: 15

Response Rate: 35.7%


Sent a Message to All Group Members

Group Type

# Members

% Members

# Groups

















Posted a Link to the Group

Group Type

# Members

% Members

# Groups

















a43 administrators were contacted but one group was misclassified. The group appeared to be comprised of Catholics but it was not.
