Veranstaltungen
 
Overview
Guidelines
Case Study - Philippines
Encyclopedia of Terms
Example Documents

Guidelines for Informing Policy via Data

CHAPTER 6 - COLLECTING DATA: FINDING THE PEOPLE TO INTERVIEW (page 3)


6.3.3 Complex Sample Design [3]

For large survey projects, there is usually a need to balance the gain in cost savings from cluster sampling with the error reduction of stratified sampling. To do that, a complex sample design is required.

The basic building blocks for complex sample design are stratification and clustering. The first stage in a complex sample design involves developing strata of primary sampling units, that is, geographic clusters that represent the first level of units to be sampled (e.g., enumeration areas). Cluster sampling is then performed within each stratum. In some cases, after the primary sampling units are selected, then clusters of households within the primary sampling units, called secondary sampling units, are selected.

When all the units within the primary sampling units are selected, this technique is referred to as one-stage cluster sampling. If a subset of units is selected randomly from each selected primary sampling unit, it is called two-stage cluster sampling. Cluster sampling can also be performed in three or more stages, and it is then referred to as multistage cluster sampling.

It is important to understand how many levels of cluster sampling occur in order to calculate appropriate standard errors for statistics formed from complex samples. If a complex sample is analysed as if it were a simple random sample, the reported standard errors are likely to be smaller than they should be. That gives the impression that the survey results are more precise than they really are.

There are a few types of specialised complex samples that should be mentioned here:

Proportional allocation occurs when the number of units chosen in each stratum is proportional to the percentage of the total population of units that is within that stratum.

Probability-proportional-to-size sampling is a type of multistage cluster sampling. In this method, the probability of selecting an element in any given cluster varies inversely with the size of the cluster.

Systematic sampling occurs during the last stage of sampling when a list of the units to be sampled is available. A starting point on the list is chosen at random, and then every nth unit after that point on the list is part of the sample.

6.4 RANDOM SAMPLING PITFALLS

From the above discussions, it may appear that random sampling is the way to guarantee a representative sample and, as a result, unbiased estimates. That is true, but only in theory. In practice, there are several issues that tend to affect the ability of random sampling to create unbiased estimates. Those issues are reviewed below.

6.4.1 Incomplete Sample Frames

In order to choose a random sample of a population, a researcher must have some type of list, or sample frame, for that population. For example, in countries with registry systems, individuals can be directly selected; but in most countries, a sample frame comprising maps of small geographic areas, called enumeration areas, is used instead.

Most sample frames are not perfect. In the case of a sample frame of enumeration areas, realities on the ground prevent the sample frame from being completely up-to-date. Housing units burn down, households move in and out of a particular country, and new construction creates new housing units. For that reason, a sample frame of enumeration areas, especially for a large area, is likely to be incomplete, that is, it is likely not to cover all members of the household-based population. Other types of sample frames might be even more incomplete. For example, a sample frame of HIV/AIDS patients may not be complete because of individual unwillingness to make his/her status known, and a sample of professionals for a particular field may not include recent graduates or trainees in that field.

In an incomplete sample frame, the missing population units are almost never a "random" part of the population. In the case of the sampling frame of enumeration areas, the types of people that are more likely to live in new homes, perhaps young families, are excluded, so population characteristics related to the age of the respondent (e.g., income level, medical conditions) cannot be measured in an unbiased manner.

6.4.2 Homeless and Institutionalised Populations

As described in Section 6.4.1, it is common to use a series of maps of small geographic areas, called enumeration areas, as a sample frame for a survey. Within each enumeration area, a random sample of housing units is selected to be included in the survey. There is a significant problem with this approach, however. While the population of interest in that case is usually all members of the population, the sample frame does not include a reasonably large subset of the population: those who are either homeless or live in institutions, such as school dormitories, military barracks, prisons, and hospitals.

Capturing homeless and institutionalised populations often requires special sampling procedures. The homeless might be sampled using local knowledge of areas where the homeless congregate, homeless shelters, and so on. In some cases, spatial sampling, that is, sampling that involves selecting specific geographic points and searching for homeless individuals within a certain radius of those points, might be used. The institutionalised population might require other strategies to allow researchers to reach them: special arrangements might be made with an institution's administration, or an overall list of local institutions that contain residents might be required.

6.4.3 Under-coverage

Under-coverage occurs when some groups in the population are left out of the process of choosing the sample. This may be due to an incomplete sampling frame (see Sections 6.4.1 and 6.4.2 above), refusals to respond (see Section 6.4.4 below), or some other issue related to the sampling process. Under-coverage is especially problematic if the characteristics being measured by the survey are different for members of the uncovered groups than for the members of the covered groups. For example, the homeless population is likely to have significantly different incomes than the household-based population. Therefore, if a survey on income leaves out the homeless population due to the use of a household-based sample frame, the resulting estimate of mean income is biased.


3. This definition is based on the Wikipedia definition of multi-stage sampling, and www.nustats.com/Glossary.htm (accessed 29 December 2006) [disclaimer].

 
   
  Continue to Chapter 6, page 4 of Guidelines for Informing Policy via Data
top