Point-biserial correlation coefficient

Point-biserial correlation coefficient

The point biserial correlation coefficient ("rpb") is a correlation coefficient used when one variable (e.g. "Y") is dichotomous; "Y" can either be 'naturally' dichotomous, like gender, or an artificially dichotomized variable. In most situations it is not advisable to artificially dichotomize variables. When you artifically dichotomize a variable the new dichotomous variable may be conceptualized as having an underlying continuity. If this is the case, a biserial correlation would be the more appropriate calculation.

The point-biserial correlation is mathematically equivalent to the Pearson (product moment) correlation, that is, if we have one continuously measured variable "X" and a dichotomous variable "Y", "rXY" = "rpb". This can be shown by assigning two distinct numerical values to the dichotomous variable.

To calculate "rpb", assume that the dichotomous variable "Y" has the two values 0 and 1. If we divide the data set into two groups, group 1 which received the value "1" on "Y" and group 2 which received the value "0" on "Y", then the point-biserial correlation coefficient is calculated as follows:
:r_{pb} = frac{M_1 - M_0}{s_n} sqrt{ frac{n_1 n_0}{n^2,
where "sn" is the standard deviation used when you have data for every member of the population:
:s_n = sqrt{frac{1}{n} sum_{i=1}^n (x_i - overline{x})^2}.
It is easy to show algebraically that there is an equivalent formula that uses "sn" - 1:
:r_{pb} = frac{M_1 - M_0}{s_{n-1 sqrt{ frac{n_1 n_0}{n(n-1),
where "sn" - 1 is the standard deviation used when you only have data for a sample of the population:
:s_{n-1} = sqrt{frac{1}{n-1} sum_{i=1}^n (x_i - overline{x})^2}.
To clarify:

:r_{pb} = frac{M_1 - M_0}{s_n} sqrt{ frac{n_1 n_0}{n^2 = frac{M_1 - M_0}{s_{n-1 sqrt{ frac{n_1 n_0}{n(n-1).

Glass and Hopkins' book "Statistical Methods in Education and Psychology", (3rd Edition) [Cite book
author = Gene V. Glass and Kenneth D. Hopkins
title = Statistical Methods in Education and Psychology
edition = 3rd edition
publisher = Allyn & Bacon
year = 1995
isbn = 0205142125
] contains a correct version of point biserial formula.

The square of the point biserial correlation coefficient is equal to::frac{ (M_1 - M_0)^2} {sum_{i=1}^n (x_i - overline{x})^2} left( frac{n_1 n_0}{n} ight),
where "M1" is the mean value on the continuous variable "X" for all data points in group 1, "M0" is the mean value on the continuous variable "X" for all data points in group 2. Further, "n1" is the number of data points in group 1, "n0" is the number of data points in group 2 and "n" is the total sample size. This formula is a computational formula that has been derived from the formula for "rXY" in order to reduce steps in the calculation; it is easier to compute than "rXY".

We can test the null hypothesis that the correlation is zero in the population. A little algebra shows that the usual formula for assessing the significance of a correlation coefficient, when applied to "rpb", is the same as the formula for an unpaired "t"-test and so
:r_{pb} sqrt{ frac{n_1+n_0-2}{1-r_{pb}^2
follows Student's t-distribution with ("n1+n0" - 2) degrees of freedom when the null hypothesis is true.

One disadvantage of the point biserial coefficient is that the further the distribution of "Y" is from 50/50, the more constrained will be the range of values which the coefficient can take. If "X" can be assumed to be normally distributed, a better descriptive index is given by the biserial coefficient
:r_{b} = frac{M_1 - M_0}{s_n} frac{n_1 n_0}{n^2 u},
where "u" is the ordinate of the normal distribution with zero mean and unit variance at the point which divides the distribution into proportions n"0/n" and "n1/n". As you might imagine, this is not the easiest thing in the world to calculate and the biserial coefficient is not widely used in practice.

A specific case of biserial correlation occurs where "X" is the sum of a number of dichotomous variables of which "Y" is one. An example of this is where "X" is a person's total score on a test composed of "n" dichotomously scored items. A statistic of interest (the discrimination index) is the correlation between a given item and the total test score. But since the latter includes the former, a measure of positive correlation is guaranteed and the statistic is biased. In this case the usual formula for the point biserial coefficient is replaced by

:r_{upb}=frac{M_1-M_0-1}{sqrt{frac{n^2s_n^2}{n_1n_0}-2(M_1-M_0)+1.

A slightly different version of the point biserial coefficient is the rank biserial which occurs where the variable "X" consists of ranks while "Y" is dichotomous. We could calculate the coefficient in the same way as where "X" is continuous but it would have the same disadvantage that the range of values it can take on becomes more constrained as the distribution of "Y" becomes more unequal. To get round this, we note that the coefficient will have its largest value where the smallest ranks are all opposite the 0s and the largest ranks are opposite the 1s. Its smallest value occurs where the reverse is the case. These values are respectively plus and minus ("n1+n0")/2. We can therefore use the reciprocal of this value to rescale the difference between the observed mean ranks on to the interval from plus one to minus one. The result is

:r_{rb} = 2frac{M_1 - M_0}{n_{1}+n_{0,

where "M1" and "M0" are respectively the means of the ranks corresponding to the 1 and 0 scores of the dichotomous variable.

It is possible to use this to test the null hypothesis of zero correlation in the population from which the sample was drawn. If "rrb" is calculated as above then the smaller of

:(1+r_{rb})frac{n_1n_0}{2}and:(1-r_{rb})frac{n_1n_0}{2}
is distributed as Mann-Whitney U with sample sizes "n1" and "n0" when the null hypothesis is true.

External links

* [http://www.andrews.edu/~calkins/math/edrm611/edrm13.htm More Information]

Notes


Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Correlation — In probability theory and statistics, correlation, (often measured as a correlation coefficient), indicates the strength and direction of a linear relationship between two random variables. In general statistical usage, correlation or co relation …   Wikipedia

  • Correlation and dependence — This article is about correlation and dependence in statistical data. For other uses, see correlation (disambiguation). In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation …   Wikipedia

  • Phi coefficient — In statistics, the phi coefficient (also referred to as the mean square contingency coefficient and denoted by φ or rφ) is a measure of association for two binary variables introduced by Karl Pearson[1]. This measure is similar to the Pearson… …   Wikipedia

  • PBCC — point biserial correlation coefficient …   Medical dictionary

  • PBCC — • point biserial correlation coefficient …   Dictionary of medical acronyms & abbreviations

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • Effect size — In statistics, an effect size is a measure of the strength of the relationship between two variables in a statistical population, or a sample based estimate of that quantity. An effect size calculated from data is a descriptive statistic that… …   Wikipedia

  • List of mathematics articles (P) — NOTOC P P = NP problem P adic analysis P adic number P adic order P compact group P group P² irreducible P Laplacian P matrix P rep P value P vector P y method Pacific Journal of Mathematics Package merge algorithm Packed storage matrix Packing… …   Wikipedia

  • List of psychology topics — This page aims to list all topics related to psychology. This is so that those interested in the subject can monitor changes to the pages by clicking on Related changes in the sidebar. It is also to see the gaps in Wikipedia s coverage of the… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”