Biased sample

Biased sample

A biased sample is a statistical sample of a population in which some members of the population are less likely to be included than others. If the bias makes estimation of population parameters impossible, the sample is a non-probability sample.

An extreme form of biased sampling occurs when certain members of the population are totally excluded from the sample (that is, they have zero probability of being selected). For example, a survey of high school students to measure teenage use of illegal drugs will be a biased sample because it does not include home schooled students or dropouts. A sample is also biased if certain members are underrepresented or overrepresented relative to others in the population. For example, a "man on the street" interview which selects people who walk by a certain location is going to have an over-representation of healthy individuals who are more likely to be out of the home than individuals with a chronic illness.

Problems caused by a biased sample

A biased sample causes problems because any statistic computed from that sample has the potential to be consistently erroneous. The bias can lead to an over- or under-representation of the corresponding parameter in the population. Almost every sample in practice is biased because it is practically impossible to ensure a perfectly random sample. If the degree of under-representation is small, the sample can be treated as a reasonable approximation to a random sample. Also, if the group that is under-represented does not differ markedly from the other groups in the quantity being measured, then a random sample can still be a reasonable approximation.

The word bias in common usage has a strong negative connotation, and implies a deliberate intent to mislead. In statistical usage, bias represents a mathematical property. While some individuals might deliberately use a biased sample to produce misleading results, more often, a biased sample is just a reflection of the difficulty in obtaining a truly representative sample.

Some samples use a biased statistical design which nevertheless allows the estimation of parameters. The U.S. National Center for Health Statistics. for example, deliberately oversamples from minority populations in many of its nationwide surveys in order to gain sufficient precision for estimates within these groups(NCHS 2007). These surveys require the use of sample weights (see below) to produce proper estimates across all racial and ethnic groups. Provided that certain conditions are met (chiefly that the sample is drawn randomly from the entire sample) these samples permit accurate estimation of population parameters.

Examples of biased samples

test. The statistics are from visitors to one website comprising mostly web developers. [cite web
url=http://www.w3schools.com/browsers/browsers_stats.asp
title=Browser Statistics
publisher=Refsnes Data
month=June
year=2008
accessdate=2008-07-05
] ]

Online and phone-in polls are biased samples because the respondents are self-selected. Those individuals who are highly motivated to respond, typically individuals who have strong opinions, are overrepresented, and individuals that are indifferent or apathetic are less likely to respond. This often leads to a polarization of responses with extreme perspectives being given a disproportionate weight in the summary. As a result, these types of polls are regarded as unscientific.

A classic example of a biased sample and the misleading results it produced occurred in 1936. In the early days of opinion polling, the American "Literary Digest" magazine collected over two million postal surveys and predicted that the Republican candidate in the U.S. presidential election, Alf Landon, would beat the incumbent president, Franklin Roosevelt by a large margin. The result was the exact opposite. The Literary Digest survey represented a sample collected from readers of the magazine, supplemented by records of registered automobile owners and telephone users. This sample included an over-representation of individuals who were rich, who, as a group, were more likely to vote for the Republican candidate. In contrast, a poll of only 50 thousand citizens selected by George Gallup's organization successfully predicted the result, leading to the popularity of the Gallup poll.

Another classic example occurred in the 1948 Presidential Election. On Election night, the Chicago Tribune printed the headline "DEWEY DEFEATS TRUMAN", which turned out to be mistaken. In the morning the grinning President-Elect, Harry S. Truman, was photographed holding a newspaper bearing this headline. The reason the Tribune was mistaken is that their editor trusted the results of a phone survey. Survey research was then in its infancy, and few academics realized that a sample of telephone users was not representative of the general population. Telephones were not yet widespread, and those who had them tended to be prosperous and have stable addresses. (In many cities, the Bell System telephone directory contained the same names as the Social Register.) In addition, the Gallup poll that the Tribune based its headline on was over two weeks old at the time of the printing. [based on http://www.uh.edu/engines/epi1199.htm retrieved on September 29, 2007]

tatistical corrections for a biased sample

If entire segments of the population are excluded from a sample, then there are no adjustments that can produce estimates that are representative of the entire population. But if some groups are underrepresented and you can quantify the degree of underrepresentation, then sample weights can correct the bias.

For example, a hypothetical population might include 10 million men and 10 million women. Suppose that a biased sample of 100 patients included 20 men and 80 women. A researcher could correct for this imbalance by attaching a weight of 2.5 for each male and 0.625 for each female. This would adjust any estimates to achieve the same expected value as a sample that included exactly 50 men and 50 women, unless men and women differed in their likelihood of taking part in the survey.

potlight fallacy

The Spotlight fallacy is committed when a person uncritically assumes that all members or cases of a certain class or type are like those that receive the most attention or coverage in the media. This line of “reasoning” has the following form:

1. Xs with quality Q receive a great deal of attention or coverage in the media.2. Therefore all Xs have quality Q.

This line of reasoning is fallacious since the mere fact that someone or something attracts the most attention or coverage in the media does not mean that it automatically represents the whole population. For example, suppose a mass murderer from Old Town, Maine, received a great deal of attention in the media. It would hardly follow that everyone from the town is a mass murderer.

The Spotlight fallacy derives its name from the fact that receiving a great deal of attention or coverage is often referred to as being in the spotlight. It is similar to Hasty Generalization, Biased Sample and Misleading Vividness because the error being made involves generalizing about a population based on an inadequate or flawed sample.The Spotlight Fallacy is a very common fallacy. This fallacy most often occurs when people assume that those who receive the most media attention actually represent the groups they belong to. For example, some people began to believe that all those who oppose abortion are willing to gun down doctors in cold blood simply because those incidents received a great deal of media attention. Since the news media typically cover people or events that are unusual or exceptional, it is somewhat odd for people to believe that such people or events are representative.

Examples

#I wouldn't like to go to America because of all the gun crime; we see it on the news all the time.People are always in the news blowing other people up; so all (or most) people are criminals.
#Doctor: Why don't patients make some effort to look after themselves? My surgery is full of people who eat, drink, smoke and don't get any exercise. "Of course he may have many more patients who do look after themselves and don't often turn up in his surgery; there's also the possibility that the patients who do look after themselves will be less likely to turn up in his surgery because of the fact that they take care of themselves and are healthier than those who don't."
#Why do young people all take drugs and go around mugging old ladies? You read about it in the paper all the time!
#Child: When I grow up I want to be a singer. Have you seen how much money those pop-stars make?!

References

* [http://www.cdc.gov/nchs/about/otheract/minority/minority.htm National Center for Health Statistics (2007). Minority Health.]

ee also

*Cherry picking
*File drawer problem


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Epsilon-Biased Sample Spaces — In computer science epsilon biased sample spaces, also known as epsilon biased generators or epsilon biased random variables or epsilon biased sets, refer to small probability spaces that approximate larger spaces as defined below. Efficient… …   Wikipedia

  • biased — UK US (also biassed) /ˈbaɪəst/ adjective ► preferring or disliking someone or something more than someone or something else, in a way that means that they are treated unfairly: biased against sb/sth »He believes the American justice system is… …   Financial and business terms

  • Sample maximum and minimum — Box plots of the Michelson–Morley experiment, showing sample maximums and minimums. In statistics, the maximum and sample minimum, also called the largest observation, and smallest observation, are the values of the greatest and least elements of …   Wikipedia

  • Sample (statistics) — In statistics, a sample is a subset of a population. Typically, the population is very large, making a census or a complete enumeration of all the values in the population impractical or impossible. The sample represents a subset of manageable… …   Wikipedia

  • sample selection bias — Non random selection is both a source of bias in empirical research and a fundamental aspect of many social processes. When observations in social research are selected so that they are not independent of the outcome variables in a study, sample… …   Dictionary of sociology

  • Sampling bias — In statistics, sampling bias is when a sample is collected in such a way that some members of the intended population are less likely to be included than others. It results in a biased sample, a non random sample[1] of a population (or non human… …   Wikipedia

  • Selection bias — is a statistical bias in which there is an error in choosing the individuals or groups to take part in a scientific study.[1] It is sometimes referred to as the selection effect. The term selection bias most often refers to the distortion of a… …   Wikipedia

  • Misuse of statistics — A misuse of statistics occurs when a statistical argument asserts a falsehood. In some cases, the misuse may be accidental. In others, it is purposeful and for the gain of the perpetrator. When the statistical reason involved is false or… …   Wikipedia

  • Noncentral hypergeometric distributions — In statistics, the hypergeometric distribution is the discrete probability distribution generated by picking colored balls at random from an urn without replacement. Various generalizations to this distribution exist for cases where the picking… …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”