Guidelines for Informing Policy via Data CHAPTER 5 - COLLECTING DATA: GETTING STARTED
Despite the myriad possibilities for data collection, there is a common set of best practices that applies to all methods. Once a clear set of research goals has been established, collecting data then involves two main sets of tasks: deciding from whom or what the data will be collected, and creating the set of questions that will be asked. For example, survey and census data may involve finding a sample/population frame and some type of complex sample design. Focus-group data or expert-interview data may also be based on a random sample, or may be based on the selection of particular informants or experts on the subject to be studied. Questionnaire design will involve determining the format and order of questions and rigourously testing those questions to ensure that they are interpreted correctly by respondents. The set of questions to be asked might be small, designed to be followed-up by expert probing by trained interviewers, or might serve as an exact script for the interviewer to follow. In either case, the same rigourous testing of the scripted questions must occur. One general guiding principle is relevant to all of the above possibilities: keep the data-collection process as simple as possible to get the job done. The more complex a process, the more likely some small step in that process will not go smoothly. The process by which data are collected from a respondent in a face-to-face interview is a case in point. There are many cognitive processes that can affect the accuracy of the information gathered:
A single misinterpretation at any of the steps might bias the data and the subsequent statistics. Every member of the data-collection team must therefore be aware of the importance of his/her role in the data-collection process, and the communication between the researcher and the respondent should be as direct as possible. Creating a conceptual framework and research goals, and deciding on the mode of data collection are discussed below. Chapter 6 will review procedures for determining from whom or what the data are collected, and Chapter 7 will discuss questionnaire design. 5.1 DEVELOPMENT OF A CONCEPTUAL FRAMEWORK In cases where the data to be collected are conceptually simple or universally understood, the development of a conceptual framework might not be required. In many cases, however, especially when a new concept is to be measured, developing a conceptual framework prior to the formulation of research goals will make the entire data-collection project more feasible.For example, the Philippines Metagora project team engaged in a detailed development of a conceptual framework for their survey on access to land by indigenous peoples (IPs). In the process of creating that conceptual framework, the team explored what was meant by an "indigenous person," what was meant by "ancestral domain," and what the current law was pertaining to the right of IPs to ancestral lands. These three concepts had to be clarified in order to develop the sampling frame and the research topics for the survey. The team chose to use a rights-based approach for understanding the relationship between IPs and their traditional domains. They researched international human rights law, and designed the survey questionnaire to answer questions related to the IPs’ realisation of their human rights concerning their ancestral domains. The conceptual framework for the Philippines Metagora project thus addressed perceptions and awareness of land rights, realisation of land rights and violations of those rights, and methods for measuring the rights and mechanisms for realising their rights to ancestral domains and lands. Further information is given in the Case Study associated with this manual.
5.2 DEVELOPMENT OF RESEARCH GOALS Prior to making any decision as to who will be interviewed and in what manner, a clear set of research goals should be outlined. This step is often skipped in favour of diving right into developing a questionnaire. That is a mistake for several reasons. First, without a clear set of research goals it is difficult to ensure that the questions on the questionnaire will elicit the information required, and it will be difficult to determine if any of the questions on the questionnaire are extraneous. In order to increase the probability that any one respondent will complete the data-collection process, it is important to keep that process as short as possible. Only the data needed should be collected, and in as concise a way as possible. Second, in framing research goals, a researcher may discover that a particular data-collection technique will work better than others. For example, if a researcher wants to know about the experiences of the homeless in an urban area, specialised sampling techniques will be required, and interviews will need to be kept very short if that population is transient and reluctant to be interviewed. Certain research questions lend themselves to the use of focus groups, others to individual interviews. Some research questions will require detailed qualitative data be collected from all respondents, while others are addressed readily via quantitative data. If the researcher starts with a clear set of research questions, then determining the type of data collection that is most appropriate will be easier. 5.2.1 Ethical Considerations In all data-collection activities involving human populations, there are ethical concerns. Considering those concerns during the earliest stages of planning, and during all other stages, makes it much more likely that both researchers and their staff will not inadvertently engage in unethical practices. There are two main issues to consider: 5.2.1.1 Confidentiality When individual data are collected from a human population, it is the ethical responsibility of the data collectors to keep that data confidential - in order words, to ensure that the information collected is not disclosed to individuals not involved in the project, and that all individuals involved in the data-collection project have sworn to keep the data secret. That does not mean that none of the data can be released; it just means that any data that are released cannot be traced back to the original person who provided the information.Ensuring confidentiality is more difficult than simply removing identifying information from data before it is released and having staff sign confidentiality statements. If data are collected from a very few individuals in a village, and those data are released with names removed but villages still given, local villagers may be able to easily identify the answers of individuals based on the data provided. An informal rule is that if data are released, they must be identical for at least 10 individual records on every variable that could make a person identifiable, in other words, any cross-classification table created from the data should have at least 10 records in each cell. In the village example, if less than 10 villagers are selected in a sample, then "village" is not a variable that can be released as part of the data. Other variables that could identify individuals include age, gender, name, address, ethnicity, race, income - basically, any variable that described a publicly known aspect of a person. In addition, ethical practice demands that variables that could lead to the potential harm of an individual, such as variables relating to spousal abuse, should be considered variables that could make a person identifiable. This issue is further discussed in 5.1.1.3 below.
5.2.1.2 Informed Consent It is also the ethical responsibility of a researcher to ensure that informed consent is gained from every respondent. For consent to be informed, the respondent must understand the goals of the research project, what data will be collected from them, in what format that data will be made available to the public, and what parts of the data will be kept confidential. Informed consent is usually achieved in one of two ways: either the respondent is given an informed-consent statement to read and sign, or an informed-consent statement is read to the respondent and he/she is asked to give their consent to the survey verbally. If there is the possibility that the respondent could come to harm from participating in the survey, and the researcher is aware of that possibility, then he/she must disclose that information to the respondent. 5.2.1.3 An Important Question Even if a data-collection project has ensured confidentiality of the data and has obtained appropriate informed consent from each respondent, it may not be operating ethically. An important question that every researcher must ask before beginning a data-collection process is: Will the collection of this data ultimately provide more benefit or more harm to the population being interviewed?Many researchers work on the assumption that any data collected must be of greater benefit to the population of interest than any harm that the data-collection procedure could produce. But that is not always the case. There are two obvious examples of when data collection might cause more harm than benefit: when data are collected about traumatic information, and when data are collected about sensitive information or illegal activities. Data collected about traumatic experiences, for example, sexual assault, human rights violations, or miscarriages, can be used to benefit a population. Such information might be used to enact protective legislation, develop reparations programmes, or locate new health clinics. However, many research projects do not have such tangible goals in mind. If information is to be collected about traumatic experiences with no guarantee of a tangible benefit to the traumatised population, then that population needs to be approached with an attitude of "do no harm." For example, counselors might be sent into the field with interviewers to provide services to traumatised respondents, or interviewers might come to interviews with detailed information about services available for traumatised individuals. The respondent must not be misled into thinking that the survey will be used for a direct tangible benefit if that is not the case. If reporting on the traumatic event might endanger the respondent - if, for example, the respondent would be a target of additional violence or harm for having reported on the initial event - then the ethical choice might be not to collect the data on the traumatic event at all. If the entire traumatised population might be put at risk of harm by reporting on the traumatic events, then the most ethical choice might be not to engage in the data-collection project at all. When sensitive information is collected, some of the same considerations apply, for example, the need to compare the benefit of collecting the data with the potential harm to the respondents. In some cases, such as in the case of illegal activity, the collection of sensitive information requires additional confidentiality protections in order to protect that data from being seized by a government or police force. Potential data safeguards are discussed in Section 10.6.2 of this manual. 5.2.2 An Example The Metagora pilot projects all started by developing research goals. The pilot project in the Philippines serves as an exceptional example of that process. A great deal of time was spent by the project staff on considering their research interest and what local partners would be required for that focus to work. Because the project staff was initially composed of members of the staff of the Human Rights Commission of the Philippines, with the National Statistical Coordination Board as a partner, the staff were interested in focusing on a human rights issue, and the issue of land rights of indigenous peoples of the Philippines was their priority. They quickly realised that such a research project would only be successful and ethical if it was of direct interest to the indigenous peoples themselves. They contacted the NCIP to determine its interest in the project, and the NCIP soon became an integral partner. Other partners were needed for the project to be successful. The NSCB had no mandate to undertake data collection, but they knew that the National Statistical Office of the Philippines did. They approached that office, which, although initially reluctant, embraced the project as they became more involved in its planning. They also brought in the National Statistical Coordination Board as a partner. Together, these four partners, representing the human rights community, the community of indigenous peoples, and the statistical community, worked together to determine specific objectives of the research process. They decided on the following:
From a clear understanding of their objectives, the Philippines Metagora Pilot staff were able to determine that they would need both quantitative data in order to judge the awareness level among indigenous peoples, and also qualitative data in order to obtain detailed information on the problems they face. They therefore designed a project that began with a focus group discussion that would be used both to collect qualitative information and to gather information for use in designing a quantitative questionnaire form, and then progressed to a random sample survey. The pilot project in the Philippines is an excellent example of the development of research goals. The staff were aware that it would only be ethical to collect data on the situation of indigenous peoples if that data were of interest to the indigenous peoples themselves and the indigenous peoples were involved in designing the project. The staff went through several iterations of developing research objectives before deciding on the specific method of data collection they would use. When they determined that they did not have the necessary expertise to implement the project themselves, they involved organisations that did. The final objectives were a joint product of all interested parties, and the data-collection methods employed were tailored towards meeting those objectives, not the other way around. More details on the Metagora pilot project in the Philippines can be found in the Case Study associated with this manual.5.3 COLLECTION MODE Data-collection mode is the method by which the questions of the survey will be delivered to the respondent pool, for example, via face-to-face interviews, or by telephone, or by another method. Each potential collection mode has both advantages and disadvantages, and the type of questions on the survey might indicate the use of one mode over another. For example, for more open-ended questions designed to collect qualitative data, a mail-out/mail-back mode will not work well as respondents are unlikely to take the time to write down good qualitative answers.
5.3.1 Mail-Out/Mail-Back In countries with high literacy rates, and especially in situations where the whole population is to be interviewed (i.e., censuses), the use of mail-out/mail-back survey forms is common. For this technique, the sample frame is a list of addresses for the population of interest, and the questionnaire is designed to be self-administered. The survey forms are mailed to the entire population, or a sample of that population, and an addressed, stamped envelope is provided for the respondents to use to return their questionnaires. The advantage of the mail-out/mail-back method is cost: it is a relatively inexpensive method for implementing a survey. The disadvantage, however, is that research has shown the response rate for mail-out/mail-back surveys is lower than for other modes of data collection, such as telephone interviews and face-to-face interviews. 5.3.2 Telephone Interviews In countries with extensive telephone coverage, surveys are often conducted by telephone. Telephone interviews are significantly less expensive than face-to-face interviews. While telephone surveys are typically more expensive than mail-out/mail-back surveys, they have been shown to yield higher response rates. Telephone surveys may also provide higher quality responses than mail-out/mail-back surveys, as trained interviewers ask the questions. If the interviewers are all in the same building during telephone calls, then supervision is significantly easier than when interviewers are speaking with respondents in their homes. If the population of interest is a national or other large population, then telephone service must be mostly universal for a telephone survey to be feasible. One interesting aspect of telephone surveys is that a sample frame, or list of telephone numbers of the population, is not necessarily required. Due to a technique called random-digit dialing, a simple random sample of the population can be obtained relatively easily for a telephone survey. 5.3.2.1 Computer Assisted Telephone Interviewing (CATI) When telephone surveys first began, the questionnaire itself was still a paper questionnaire that would be filled out by the interviewer during the telephone call. With the advent of affordable desktop computers, CATI has become a viable alternative. During CATI, the interviewer accesses the survey questionnaire in electronic format, on a desktop or laptop computer. As the respondent answers questions, the CATI interviewer enters those answers into the electronic survey questionnaire form. CATI has two distinct advantages over traditional telephone interviewing. First, because data are entered directly into a computer, the data-entry step required of a pencil-and-paper survey is skipped, saving on time and cost. Second, for a complicated questionnaire with multiple skip patterns (places where the answer to a question determines which question is asked next), the CATI instrument can be programmed so that skips happen automatically, depending on the answer entered, greatly reducing interviewer error in following skip patterns and the time needed to complete the questionnaire. A CATI instrument, however, might take more time to prepare than a paper-and-pencil questionnaire, given the need to programme those skip patterns. 5.3.3 E-mail Interviews Interviews via e-mail are similar to mail-out/mail-back interviews, except that the questionnaire is in electronic format. An e-mail questionnaire is sent to potential respondents and that questionnaire is e-mailed back to the researcher. Although a very low-cost option, e-mail interviews are not viable in most countries, since the majority of the population does not have an electronic mail account, and there is no registry or list of those that do have an electronic mail account. This option is included here for purposes of completeness. 5.3.4 Web-based Interviews Web-based interviews are growing in popularity and share some of the characteristics of CATI and e-mail interviewing. Like the survey forms for CATI, web-based survey forms can be programmed to automatically skip unnecessary questions, making completion of the survey easier for respondents than is the case for mail-out/mail-back forms. Like e-mail interviewing, web-based interviewing is low-cost. Again, like e-mail interviewing, this mode is not an option in most countries. In order for web-based interviews to work, the vast majority of the population of interest to the researcher must be literate and must have reasonably easy access to the Internet. However, one advantage of web-based interviews over e-mail-based interviews is that an e-mail address is not required for web-based interviews; potential respondents can be sent conventional mail or called to be informed about the survey. 5.3.5 Face-to-face Interviews The standard mode for interviews in most countries is the face-to-face interview, where the interviewer goes to the house of the potential respondent. This mode has been shown to yield the highest response rates, but it is also the most costly. Several variations of the face-to-face interview exist, including the focus group discussion, where multiple respondents come to a joint location and answer the questions of the interviewer as a group. Some additional variations are discussed below. 5.3.5.1 Computer Assisted Personal Interviewing (CAPI) In this case, instead of the interviewer carrying traditional pencil-and-paper questionnaire forms, he/she carries a laptop computer to the home of a potential respondent. Using the computer, the interviewer asks the survey questions, much like the interviewer in CATI. This technique has many of the same advantages and disadvantages of CATI: data entry is significantly reduced or eliminated, and skip patterns are significantly easier to negotiate for the interviewer. One disadvantage of CAPI that does not exist for CATI is that interviewers carrying expensive computer equipment might be a target for theft and/or harassment in impoverished areas. 5.3.5.2 Computer Assisted Self Interviewing (CASI) In some surveys, while an interviewer goes to the home of the potential respondent, the respondent enters their answers directly into the computer provided by the interviewer. This mode might be useful for sensitive questions, where respondents might be reluctant to give their answers directly to the interviewer due to embarrassment, but are willing to enter those answers into the computer if given the privacy to do so. 5.3.5.3 Audio Computer Assisted Self Interviewing (ACASI) The difference between ACASI and CASI is that in the former the interviewer does not hear the questions or see the responses to the questions. In ACASI, the respondent listens to questions via headphones and enters answers into a laptop computer out of sight of the interviewer. ACASI provides even more privacy to the respondent than CASI, and there is, therefore, a greater likelihood that sensitive questions will be answered truthfully. 5.3.6 Summary In most countries, surveys rely on face-to-face interviews for obtaining information. The other options presented here - especially CATI, CAPI, CASI, and ACASI - have been developed in the Global North as the technology to implement them has developed. While those methods are useful for lowering cost, reducing interviewer effects on survey data, and increasing the reporting of sensitive information, there are techniques that can be used during face-to-face interviews to help overcome some of those problems. Chapters 6 and 7 of this manual will focus on good techniques for selecting respondents and developing questionnaires in the context of face-to-face interviews, and will address some of the same problems that computer-assisted methods were designed to overcome.5.4 A FINAL NOTE There is one option for the collection of data that has not yet been discussed in this chapter. If the researcher has a good relationship with an organisation that is already planning or conducting a survey, the researcher could add his/her survey questions to the existing instrument in use by that organisation. For example, the Metagora pilot project in the geographic area of the Andean Community involved adding a module on governance, democracy, and subjective poverty to existing household surveys. The distinct advantage of "piggy-backing" onto an existing survey is the reduction in cost and labour. The disadvantages are that the survey mode and basic survey format are predetermined, so the researcher has little control over those aspects, and the addition of questions to an existing survey increases the response burden of the survey overall, which may decrease the response rate in comparison to a stand-alone survey. Even so, if the decision to add a module to a pre-existing survey is made carefully and intelligently, a high-quality data product can be produced at a fraction of the stand-alone cost. 5.5 RECOMMENDED READING Council of American Survey Research Organisations, Code of Standards and Ethics for Survey Research (accessed 31 March 2007).Dillman, D.A., Mail and Internet Surveys: The Tailored Design Method 2007 Update with New Internet, Visual, and Mixed-Mode Guide, Wiley, Hoboken, NJ, 2006. Ellsberg, M.C., and Heise, L., Researching Violence against Women: A Practical Guide for Researchers and Activists, World Health Organisation, Geneva, Switzerland, 2005. Hewett, P.C., Erulkar, A.S., and Mensch, B.S., "The Feasibility of Computer-Assisted Survey Interviewing in Africa: Experience from Two Rural Districts in Kenya," Social Science Computer Review, 22, 2004, p 319. Human Sciences Research Council, Consent Form for Land Reform Survey, Human Sciences Research Council, Joahannesburg, South Africa, 2005. Lavrakas, P.J., Telephone Survey Methods: Sampling, Selection, and Supervision, Sage Publications, Inc., Thousand Oaks, CA, 1993. Metagora, Case Study, 2007.
|