Guidelines for Informing Policy via Data CHAPTER 6 - COLLECTING DATA: FINDING THE PEOPLE TO INTERVIEW (page 1)
6.1 INTRODUCTION
The process by which a researcher determines who will be interviewed is more complex than one may realise. If the researcher wishes to develop an understanding of an entire population or sub-population, via a statistic such as median income, for example, that researcher has two choices: either collect data about everybody, or randomly sample the population of interest and collect data from that random sample. In either case, the first major obstacle is obtaining a sample frame, or a list of the population of interest. If that list is a list of addresses within city blocks, then the researcher will potentially be sampling households instead of individuals.
When a sample frame is not available, the researcher has two choices: either create a sample frame, or refine his/her research goals so that a random sample of the population of interest is not required and that a purposive sample can be used.
The differences between simple random samples and purposive samples will be discussed in more detail in Section 6.2 below.
6.2 RANDOM VERSUS PURPOSIVE SAMPLES [1]
In statistics, a sample is a subset of a population. Typically, the population is very large, making a complete enumeration of all the individuals in the population impractical or impossible. The sample represents a subset of manageable size; the sample size is the number of units in the sample. Samples are collected and statistics are calculated from the samples so that one can make inferences or extrapolations from the sample to the population. This process of collecting information from a sample is referred to as sampling.
Samples are expected to be selected in such a way as to avoid presenting a biased view of the population. The sample will be unrepresentative of the population if certain members of the population are excluded from any possible sample. For example, if a researcher is interested in the drug-usage patterns among teenagers, but collects the sample from schools, the sample is biased because it excludes teenagers not in school for a variety of reasons (i.e., lack of funding to attend, schooled at home). Biases may also occur if some members of the population are more likely or less likely to be included in the sample than other members of the population for a reason other than the sample design. So the sample collected from schools is also biased because students who miss a lot of school days because of a chronic illness will be less likely to be selected than students who attend very regularly.
6.2.1 Simple Random Samples
The best way to avoid a biased or unrepresentative sample, and to obtain a representative sample of the population, is to select a random sample, also known as a probability sample. A random sample is defined as a sample where every individual member of the population has a non-zero probability of being selected as part of the sample. In a simple random sample, every individual member of the population has the same probability of being selected as every other individual member. Other types of random samples fall under the category of complex sample design, which will be discussed in Section 6.3 below.
6.2.2 Purposive Samples
A sample that is not random is called a non-random sample or a non-probability sample. Some categories and examples of non-random samples are described below.
6.2.2.1 Types of Purposive Samples
Convenience Samples
Convenience samples are simply samples of whoever is available to survey. Such samples can be extremely biased in relation to the population of interest to the researcher, and should be avoided whenever possible.
Judgment Samples
A judgment sample is a convenience sample in which a particular subgroup of the population is targeted for interviewing. For example, a poll of people who have an affiliation with a particular political party can be considered a judgment sample.
Snowball Samples
A snowball sample is a type of judgment sample used when the subgroup of the population that is of interest is difficult to locate. The researcher starts by locating as many members of the subgroup that he can, and then asks those members of the subgroup to identify additional members of the subgroup. This technique can be very effective for population characteristics that cause individuals to socialise with each other, for example, expatriates from the same state tend to form a community in their new location, individuals that are outcasts from society for the same reason tend to socialise with each other, and individuals with the same genetic condition tend to band together for mutual support.
Quota Samples
Quota samples are an attempt at obtaining a representative sample when random sampling isn't possible. Quota sampling is similar to convenience sampling, except that the interviewers are required to interview a certain number of individuals from each of a set of demographic categories, for example, equal numbers of women and men, or particular counts of individuals from different ethnic groups.
While quota samples may be more representative than convenience samples, use of a quota sample does not guarantee representativity of the underlying population. That is because even if the researcher uses every demographic characteristic available to create quotas of individuals to interview, there may be hidden qualities of individuals that relate both to the likelihood that they will participate in the interview and to their answers.
For example, a researcher is interested in the public's perception of a city government's efficacy in developing employment programmes, and that researcher develops a questionnaire. The researcher also decides to use quota sampling, because she does not have access to a good sampling frame for the population of interest. In this case, that population are all adult residents of the city that can be considered part of the workforce. In order to find her sample, the researcher hires interviewers to stand at several locations in the downtown area and interview passersby according to a quota system based on race, gender, ethnicity, age, education level, and employment status. The researcher is convinced that her sample will be representative, because she will make sure that a representative ratio of employed to unemployed will be maintained.
The problem is that the characteristics of the unemployed that can be found downtown in the city are different than the characteristics of the unemployed that are not found downtown. The unemployed whom the interviewers are most likely to encounter are those that are actively seeking jobs or engaged in training opportunities, as those activities have brought them downtown. The unemployed whom the interviewers will miss are most likely the more dissatisfied individuals of the population of interest: those who have not been offered job interviews, have not taken advantage of training programmes, and are therefore not in the downtown area.
1. Based on the definition of a statistical sample at Wikipedia (accessed 28 December 2006) [disclaimer].
|