Guidelines for Informing Policy via Data CHAPTER 6 - COLLECTING DATA: FINDING THE PEOPLE TO INTERVIEW (page 2)
6.2.2.2 Examples of Purposive Samples
In order to assume that statistics or indicators are representative of the entire population in question, they must be created from data gathered from the entire population or a random sample of the entire population. It would seem that random samples are therefore always preferable; but sometimes a random sample is just not possible. Below are some examples of cases where a random sample cannot be obtained:
Example 1
Refugees are streaming in from a country undergoing a terrible internal conflict. It is neither advisable nor safe to enter that country. A human rights organisation wants to determine whether human rights abuses are occurring in the country and, if so, what percentage of the population is being affected.
The population of interest is the entire population that has been in the country sometime during the internal conflict. The population available to the human rights organisation, however, is the population in refugee camps in neighbouring countries. As such, any conclusions made will apply only to the population in the refugee camps. To represent the human rights organisation's findings as indicative of the entire situation within the country would be unethical.
It is not difficult to see the logic behind those statements. It may be that the refugees consist only of people who live in the outlying parts of the country, i.e., those who could leave the country on foot. Perhaps, therefore, the population within the refugee camps does not represent people living in the internal parts of the country, and perhaps their experiences are different from those from the internal areas.
Example 2
A non-governmental organisation operates in a country that has a registry system and is therefore able to obtain a sample frame of the entire population. That organisation wishes to research women's health, but is unable to interview women due to cultural constraints that do not allow women to talk to strangers. The organisation must therefore substitute the male heads-of-households for the women in the household.
When that organisation presents the results of its random sample survey, it must indicate that while it was able to obtain a random sample of families, women were not directly interviewed. As a result, it is most likely that particular health issues, such as domestic violence against women, were underreported due to cultural constraints.
6.2.2.3 Summary
In Examples 1 and 2, it is impossible for the researcher to access the entire population of interest. That does not mean, however, that the researcher should just give up on his/her project. Useful information can be gathered in both of these examples; the important point is that the researcher is ethically required to be honest about the nature of his/her sample and the limitations imposed by it.
In addition, knowing whether research will be based on a true random sample of the population or sub-section of the population will help the researcher in designing the most appropriate questionnaire. In Example 2, knowing that the interviewers will only be interviewing male heads-of-households can inform the researcher as to the types of questions to ask in order to elicit the best information about the experiences of women.
6.3 COMPLEX RANDOM SAMPLES
In Section 6.1 above, a simple random sample was defined as a sample for which the probability that any member of the population of interest was included was the same. In practice, a simple random sample is rarely an option, given cost constraints. Instead, some type of complex random sample is drawn.
There are two main concepts related to complex sampling: stratification and clustering.
6.3.1 Stratified Sampling [2]
Stratification is the process of grouping members of a population into relatively homogeneous subgroups before sampling. The strata should be mutually exclusive: every element in the population must be assigned to only one stratum. The strata should also be collectively exhaustive: no population element can be excluded.
Stratified sampling is the process of taking a simple random sample from each and every stratum.
Stratification often improves the representativity of the sample by reducing sampling error or, in other words, by increasing the precision of the estimates derived from the survey data. For example, stratification can produce a weighted mean that has a smaller standard error than the analogous mean of a simple random sample of the population.
6.3.2 Cluster Sampling
Cluster sampling is a sampling technique in which the entire population of interest is divided into groups, or clusters, and a random sample of those clusters is selected. Each cluster must be mutually exclusive, but together the clusters must include the entire population. The main difference between a stratified sample and a clustered sample is how the sampling occurs once the population has been partitioned. While in stratified sampling, some units are selected from each partitioned group, in cluster sampling only some partitioned groups are picked, but they are picked as a complete unit.
The main reason for using cluster sampling is that it usually is much cheaper and more convenient to sample in clusters than randomly. In some cases, constructing a sampling frame that identifies each and every population element is too expensive or impossible. Cluster sampling can also reduce cost when the population elements are scattered over a wide area.
However, whereas stratification often increases precision of the estimation compared with simple random sampling, cluster sampling often decreases it. That is because units in a cluster tend to be more similar than elements selected at random from the whole population. When using cluster sampling, it is usually necessary to increase the total sample size to achieve the same precision as in simple random sampling. Nevertheless, the cost savings of cluster sampling are often significant.
Suppose a researcher wants to survey school children in a specific area. If the researcher draws a simple random sampling of school children, that researcher might have to visit all schools in the area to interview the children in the sample. In cluster sampling, the researcher randomly selects schools to be included in the sample and interviews all, or a simple random sample of, children in each school, hence reducing the number of schools to visit and therefore reducing the cost of data collection.
In this example, the schools are what sometimes are referred to as natural clusters. In other cases, the population may be widely distributed geographically and then cluster sampling, where the clusters consists of geographical areas, could reduce the number of areas that need to be visited. A smaller number of areas to be visited could reduce travel expenses and also make possible more efficient supervision of the fieldwork.
2. Based on the Wikipedia definition of stratified sampling (accessed 29 December 2006) [disclaimer].
|