Berkson's paradox

Berkson's paradox

Berkson's paradox or Berkson's fallacy is a result in conditional probability and statistics which is counter-intuitive for some people, and hence a veridical paradox. It is a complicating factor arising in statistical tests of proportions. Specifically, it arises when there is an ascertainment bias inherent in a study design.

It is often described in the fields of medical statistics or biostatistics, as in the original description of the problem by J. Berkson.

tatement

The result is that two independent events become conditionally dependent (negatively dependent) given that at least one of them occurs. Symbolically: :if 0 < P("A") < 1 and 0 < P("B") < 1,:and P("A"|"B") = P("A"), i.e. they are independent,:then P("A"|"B","C") < P("A"|"C") where "C" = "A"∪"B" (i.e. "A" or "B").In words, given two independent events, if you only consider outcomes where at least one occurs, then they become negatively dependent.

Explanation

The cause is that the "conditional" probability of event "A" occurring, "given" that it or "B" occurs, is inflated: it is higher than the "unconditional" probability, because we have "excluded" cases where "neither" occur.:P("A"|"A"∪"B") > P("A"):conditional probability inflated relative to unconditional

One can see this in tabular form as follows: the gray regions are the outcomes where at least one event occurs (and ~A means "not A").

For instance, if one has a sample of 100, and both A and B occur independently half the time (So P("A") = P("B") = 1/2), one obtains:

So in 75 outcomes, either A or B occurs, of which 50 have A occurring, so :P("A"|"A"∪"B") = 50/75 = 2/3 > 1/2 = 50/100 = P("A")Thus the probability of "A" is higher in the subset (of outcomes where it or "B" occurs), 2/3, than in the overall population, 1/2.

Berkson's paradox arises because the conditional probability of "A" given "B" "within this subset" equals the conditional probability in the overall population, but the unconditional probability within the subset is inflated relative to the unconditional probability in the overall population, hence, within the subset, the presence of "B" decreases the conditional probability of "A" (back to its overall unconditional probability):

:P("A"|"B", "A"∪"B") = P("A"|"B") = P("A"):P("A"|"A"∪"B") > P("A")

Examples

A classic illustration involves a retrospective study examining a risk factor for a disease in a statistical sample from a hospital in-patient population. If a control group is also ascertained from the in-patient population, a difference in hospital admission rates for the case sample and control sample can result in a spurious association between the disease and the risk factor.

As another example, suppose one has 1000 postage stamps, of which 300 are pretty and 100 are rare, with 30 being both pretty and rare. 10% of all the stamps are rare and 10% of the pretty stamps are rare, so prettiness tells me nothing about rarity. One puts the 370 stamps which are pretty or rare on display. Just over 27% of the stamps on display are rare, but still only 10% of the pretty stamps on display are rare. If one only considers stamps on display, one will observe a spurious negative relationship between prettiness and rarity as a result of one's selection bias.

References

*Berkson, J. (1946) "Limitations of the application of fourfold tables to hospital data". "Biometrics Bulletin", 2(3), 47-53.

Note on References

The reference Berkson (1946) cited above is frequently cited incorrectly in the literature as Berkson, J. (1949) Biological Bulletin 2, 47-53.

Biological Bulletin, established in the 19th century, does not publish statistical papers. The correct reference is to the biostatistical journal "Biometrics Bulletin", established in 1945 which became "Biometrics" in 1947.


Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

  • List of mathematics articles (B) — NOTOC B B spline B* algebra B* search algorithm B,C,K,W system BA model Ba space Babuška Lax Milgram theorem Baby Monster group Baby step giant step Babylonian mathematics Babylonian numerals Bach tensor Bach s algorithm Bachmann–Howard ordinal… …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • List of paradoxes — This is a list of paradoxes, grouped thematically. Note that many of the listed paradoxes have a clear resolution see Quine s Classification of Paradoxes.Logical, non mathematical* Paradox of entailment: Inconsistent premises always make an… …   Wikipedia

  • Selection bias — is a statistical bias in which there is an error in choosing the individuals or groups to take part in a scientific study.[1] It is sometimes referred to as the selection effect. The term selection bias most often refers to the distortion of a… …   Wikipedia

  • Ascertainment bias — In scientific research, ascertainment bias occurs when false results are produced by non random sampling and conclusions made about an entire group are based on a distorted or nontypical sample. If this is not accounted for, results can be… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”