![]() | ||
![]() | ||
| Bias
When we define "bias" in a general manner, we usually think of bias as a lack of objectivity. In data collection and analysis, bias can take several forms and falls under several definitions, but in each case, bias represents some sort of deviation from the truth. The basic definition of statistical bias is as follows: the bias of an estimator (statistic) is how far the average statistic lies from the parameter it is estimating. In other words, if we imagine we could repeat a survey over and over again, and use the same method for each acquired sample to create the same statistic, then we expect the different values for the statistic to be randomly distributed around the parameter we are attempting to estimate. Bias occurs if those estimates for the statistic are systematically lower or systematically higher than the parameter value. As an example: Let's say we wish to estimate the number of times a police officer requests a bribe in the course of a working day in a particular city. If we take a random sample of police officers from that city, and ask them directly how many bribes they request per working day, it is likely that the police officers will not be willing to tell us the true number of bribes requested. The estimate created from the data we collect will therefore be an underestimate; in other words, the estimate will be biased downward, or lower than the true average number of bribes requested by police officers in the course of a workday. Let's say, instead, we pick a random sample of police officers and follow them around for a day, keeping track of the number of bribes we observe. It is likely that the police officers' behaviour will be different due to our observation; perhaps they ask for fewer bribes when watched. Again, then, the estimate created from the data we collect will be an underestimate. Another way in which an estimate can end up biased is via data collected from a non-random subset of a population of interest. Returning to the example of the police officers, let's say we decide to follow a sample of police officers for a day, but pick those police officers by asking for volunteers. It is likely that the volunteer officers are fundamentally different from the non-volunteer officers; the volunteers may be less likely to bribe or behave inappropriately during the course of their duties. For that reason, the estimates we create from the data collected via the volunteer police officers will most likely be biased downward, as the worst of the bribers have little probability of entering our sample. Finally, bias can occur because of bad questionnaire design, meaning bad question wording or some other systemic issue with a data-collection effort. For example, if we decide to interview the police officers of our city to determine their bribing propensity, but we word our question in a negative, condemning tone, they are less likely to be honest about their bribing than if we word the question in a neutral or supportive tone. In summary, bias can be introduced into an estimate during any stage: developing a research project, picking a sample to survey, developing a questionnaire, or later in the process. Researchers must therefore be careful to consider sources of bias and do their best to mitigate those sources. For more information about statistical bias, please see the following resources: 1. Wikipedia [disclaimer]
| ||