Fisher's exact test

Fisher's exact test

Fisher's exact test is a statistical significance test used in the analysis of categorical data where sample sizes are small. It is named after its inventor, R. A. Fisher, and is one of a class of exact tests. Fisher devised the test following a comment from Muriel Bristol, who claimed to be able to detect whether the tea or the milk was added first to her cup.

The test is used to examine the significance of the association between two variables in a 2 x 2 contingency table. The p-value from the test is computed as if the margins of a 2 by 2 table are fixed, e.g. if, in the tea-tasting example, Ms. Bristol knows the number of cups with each treatment (milk or tea first) and will therefore provide guesses with the correct number in each category. As pointed out by Fisher, this leads under a null hypothesis of independence to use of the hypergeometric distribution for a given count in the table.

With large samples, a chi-square test can be used in this situation. However, this test is not suitable when the expected values in any of the cells of the table, given the margins, is below 10: the sampling distribution of the test statistic that is calculated is only approximately equal to the theoretical chi-squared distribution, and the approximation is inadequate in these conditions (which arise when sample sizes are small, or the data are very unequally distributed among the cells of the table). The Fisher test is, as its name states, exact, and it can therefore be used regardless of the sample characteristics. It becomes difficult to calculate with large samples or well-balanced tables, but fortunately these are exactly the conditions where the chi-square test is appropriate.

Example

Exact Tests allow one to obtain a more accurate analysis based on small sample or data which are rare. Exact Tests for the non-parametric analyses is the appropriate statistic to use when working with unbalanced data. Unbalanced data analyzed with the asymptotic methods tends to produce unreliable results. For large and well-balanced data sets, the exact and asymptotic p-values are very similar. But for small, sparse, or unbalanced data, the exact and asymptotic p-values can be quite different and may lead to opposite conclusions concerning the hypothesis of interest (Mehta, Patel, & Tsiatis, 1984; Mehta, 1995; Mehta & Patel, 1997).

The need for the Fisher test arises when we have data that are divided into two categories in two separate ways. For example, a sample of teenagers might be divided into male and female on the one hand, and those that are and are not currently dieting on the other. We hypothesize, perhaps, that the proportion of dieting individuals is higher among the women than among the men, and we want to test whether any difference of proportions that we observe is significant. The data might look like this:


menwomentotal
dieting1910
not dieting11314
totals121224

These data would not be suitable for analysis by a chi-squared test, because the expected values in the table are all below 10, and in a 2 × 2 contingency table, the number of degrees of freedom is always 1.

The question we ask about these data is: knowing that 10 of these 24 teenagers are dieters, and that 12 of the 24 are female, what is the probability that these 10 dieters would be so unevenly distributed between the girls and the boys? If we were to choose 10 of the teenagers at random, what is the probability that 9 of them would be among the 12 girls, and only 1 from among the 12 boys?

Before we proceed with the Fisher test, we first introduce some notation. We represent the cells by the letters "a, b, c" and "d", call the totals across rows and columns "marginal totals", and represent the grand total by "n". So the table now looks like this:


menwomentotal
dieting"a""b""a" + "b"
not dieting"c""d""c" + "d"
totals"a" + "c""b" + "d""n"

Fisher showed that the probability of obtaining any such set of values was given by the hypergeometric distribution:


p = {a+b}choose{ac+d}choose{c}left/{n}choose{a+c} ight. =frac{(a+b)!(c+d)!(a+c)!(b+d)!}{n!a!b!c!d!}

where the symbol ! indicates the factorial operator.

This formula gives the exact probability of observing this particular arrangement of the data, assuming the given marginal totals, on the null hypothesis that the odds ratio between dieter and non-dieter among men and women equals to 1 in the population from which our sample was drawn. Fisher showed that we could deal only with cases where the marginal totals are the same as in the observed table. In the example, there are 11 such cases. Of these only one is more extreme in the same direction as our data; it looks like this:


menwomentotal
dieting01010
not dieting12214
totals121224

In order to calculate the significance of the observed data, i.e. the total probability of observing data as extreme or more extreme if the null hypothesis is true, we have to calculate the values of "p" for both these tables, and add them together. This gives a one-tailed test; for a two-tailed test we must also consider tables that are equally extreme but in the opposite direction. Unfortunately, classification of the tables according to whether or not they are 'as extreme' is problematic. An approach used by the R programming language is to compute the p-value by summing the probabilities for all tables with probabilities less than or equal to that of the observed table. For tables with small counts, the 2-sided p-value can differ substantially from twice the 1-sided value, unlike the case with test statistics that have a symmetric sampling distribution.

Most modern statistical packages will calculate the significance of Fisher tests, in some cases even where the chi-squared approximation would also be acceptable. The actual computations as performed by statistical software packages will as a rule differ from those described. In particular, numerical difficulties may result from large values of the factorials. A simple, somewhat better computational approach relies on a gamma function or log-gamma function, but in fact accurate computation of hypergeometric and binomial probabilities is an area of recent research.

Extension to "m × n" tables

Fisher's exact test can be applied to tables of any size. A discussion of the "m x n" version of the test can be found in detail at mathworld.wolfram.com [http://mathworld.wolfram.com/FishersExactTest.html] .

References

* Fisher, R. A. 1922. "On the interpretation of χ2 from contingency tables, and the calculation of P". Journal of the Royal Statistical Society 85(1):87-94.
* Fisher, R. A. 1954 "Statistical Methods for research workers". Oliver and Boyd.
* Mehta, C. R. 1995. SPSS 6.1 Exact test for Windows. Englewood Cliffs, NJ: Prentice Hall.
* Mehta, C. R., Patel, N. R., & Tsiatis, A. A. 1984. Exact significance testing to establish treatment equivalence with ordered categorical data. Biometrics, 40(3), 819-825.
* Mehta, C. R.& Patel, N. R. 1997. Exact inference in categorical data. Biometrics, 53(1), 112-117.

External links

* [http://www.socr.ucla.edu/htmls/ana/FishersExactTest_Analysis.html Fisher's Exact Test Applet Calculator]
* [http://www.physics.csbsju.edu/stats/exact2.html] On-line exact test calculator with examples
* [http://www.matforsk.no/ola/fisher.htm] On-line exact test calculator that accepts larger cell counts
* [http://mathworld.wolfram.com/FishersExactTest.html] mathworld.wolfram.com Page detailing the m x n extension of Fisher's exact test


Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

  • Fisher’s exact test — A statistical test of independence much used in medical research. It tests the independence of rows and columns in a 2 X 2 contingency table (with 2 horizontal rows crossing 2 vertical columns creating 4 places for data) based on the exact… …   Medical dictionary

  • Exact test — In statistics, an exact (significance) test is a test where all assumptions upon which the derivation of the distribution of the test statistic is based are met, as opposed to an approximate test, in which the approximation may be made as close… …   Wikipedia

  • Test, Fisher’s exact — A statistical test of independence much used in medical research. It tests the independence of rows and columns in a 2 X 2 contingency table (with 2 horizontal rows crossing 2 vertical columns creating 4 places for data) based on the exact… …   Medical dictionary

  • Fisher exact test — Fish·er exact test (fishґər) [Sir Ronald Aylmer Fisher, British statistician, 1890–1962] see under test …   Medical dictionary

  • Fisher exact test — a statistical hypothesis test of independence of rows and columns in a 2 Ч 2 contingency table based on the exact sampling distribution of the observed frequencies, useful when any expected value in the table is small …   Medical dictionary

  • Fisher's noncentral hypergeometric distribution — Probability mass function for Fisher s noncentral hypergeometric distribution for different values of the odds ratio ω. m 1 = 80, m 2 = 60, n = 100, ω = 0.01, ..., 1000In probability theory and statistics, Fisher s noncentral hypergeometric… …   Wikipedia

  • Exact statistics — Exact statistics, such as that described in exact test, is a branch of statistics that was developed to provide more accurate results pertaining to statistical testing and interval estimation by eliminating procedures based on asymptotic and… …   Wikipedia

  • Ronald Fisher — R. A. Fisher Born 17 February 1890(1890 02 17) East Finchley, London …   Wikipedia

  • Pearson's chi-squared test — (χ2) is the best known of several chi squared tests – statistical procedures whose results are evaluated by reference to the chi squared distribution. Its properties were first investigated by Karl Pearson in 1900.[1] In contexts where it is… …   Wikipedia

  • G-test — In statistics, G tests are likelihood ratio or maximum likelihood statistical significance tests that are increasingly being used in situations where chi square tests were previously recommended.The commonly used chi squared tests for goodness of …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”