- Pearson distribution
The Pearson distribution is a family of continuous
probability distribution s. It was first published byKarl Pearson in1895 and subsequently extended by him in 1901 and 1916 in a series of articles onbiostatistics .History
The Pearson system was originally devised in an effort to model visibly skewed observations. It was well known at the time how to adjust a theoretical model to fit the first two
cumulant s or moments of observed data: Anyprobability distribution can be extended straightforwardly to form alocation-scale family . Except in pathological cases, a location-scale family can be made to fit the observed mean (first cumulant) andvariance (second cumulant) arbitrarily well. However, it was not known how to construct probability distributions in which theskewness (standardized third cumulant) andkurtosis (standardized fourth cumulant) could be adjusted equally freely. This need became apparent when trying to fit known theoretical models to observed data that exhibited skewness. Pearson's examples include survival data, which are usually asymmetric.In his original paper, Pearson (1895, p. 360) identified four types of distributions (numbered I through IV) in addition to the
normal distribution (which was originally known as type V). The classification depended on whether the distributions were supported on a bounded interval, on a half-line, or on the wholereal line ; and whether they were potentially skewed or necessarily symmetric. A second paper (Pearson 1901) fixed two omissions: it redefined the type V distribution (originally just thenormal distribution , but now theinverse-gamma distribution ) and introduced the type VI distribution. Together the first two papers cover the five main types of the Pearson system (I, III, VI, V, and IV). In a third paper, Pearson (1916) introduced further special cases and subtypes (VII through XII).Rhind (1909, pp. 430–432) devised a simple way of visualizing the parameter space of the Pearson system, which was subsequently adopted by Pearson (1916, plate 1 and pp. 430ff., 448ff.). The Pearson types are characterized by two quantities, commonly referred to as and . The first is the square of the
skewness : where is the skewness, or thirdstandardized moment . The second is the traditionalkurtosis , or fourth standardized moment: . (Modern treatments define kurtosis in terms of cumulants instead of moments, so that for a normal distribution we have and . Here we follow the historical precedent and use .) The diagram on the right shows which Pearson type a given concrete distribution (identified by a point ) belongs to.Many of the skewed and/or non-
mesokurtic distributions familiar to us today were still unknown in the early 1890s. What is now known as thebeta distribution had been used byThomas Bayes as aposterior distribution of the parameter of aBernoulli distribution in his 1763 work oninverse probability . The Beta distribution gained prominence due to its membership in Pearson's system and was known until the 1940s as the Pearson type I distribution. [cite web
url = http://members.aol.com/jeff570/b.html
title = Beta distribution
accessmonthday = December 9
accessyear = 2006
last = Miller
first = Jeff
coauthors = et al.
date = 2006-07-09
work = [http://members.aol.com/jeff570/mathword.html "Earliest Known Uses of Some of the Words of Mathematics"] ] (Pearson's type II distribution is a special case of type I, but is usually no longer singled out.) Thegamma distribution originated from Pearson's work (Pearson 1893, p. 331; Pearson 1895, pp. 357, 360, 373–376) and was known as the Pearson type III distribution, before acquiring its modern name in the 1930s and 1940s. [cite web
url = http://members.aol.com/jeff570/g.html
title = Gamma distribution
accessmonthday = December 9
accessyear = 2006
last = Miller
first = Jeff
coauthors = et al.
date = 2006-12-07
work = [http://members.aol.com/jeff570/mathword.html "Earliest Known Uses of Some of the Words of Mathematics"] ] Pearson's 1895 paper introduced the type IV distribution, which contains Student's "t"-distribution as a special case, predatingWilliam Gosset 's subsequent use by several years. His 1901 paper introduced theinverse-gamma distribution (type V) and thebeta prime distribution (type VI).Definition
A Pearson density "p" is defined to be any valid solution to the
differential equation (cf. Pearson 1895, p. 381):with :
According to Ord [Ord J.K. (1972) p2] , Pearson devised the underlying form of Equation (1) on the basis of, firstly, the formula for the derivative of the logarithm of the density function of the
normal distribution (which gives a linear function) and, secondly, from a recurrence relation for values in theprobability mass function of thehypergeometric distribution (which yields the linear-divided-by-quadratic structure).In Eqaution (1), the parameter "a"0 determines a
stationary point , and hence under some conditions a mode of the distribution, since:
follows directly from the differential equation.
Since we are confronted with a linear differential equation with variable coefficients, its solution is straightforward:
:
The integral in this solution simplifies considerably when certain special cases of the integrand are considered. Pearson (1895, p. 367) distinguished two main cases, determined by the sign of the
discriminant (and hence the number of real roots) of thequadratic function :
Particular types of distribution
Case 1, negative discriminant: The Pearson type IV distribution
If the discriminant of the quadratic function (2) is negative (), it has no real roots. Then define
: and
:
Observe that is a well-defined real number and , because by assumption and therefore . Applying these substitutions, the quadratic function (2) is transformed into
:
The absence of real roots is obvious from this formulation, because is necessarily positive.
We now express the solution to the differential equation (1) as a function of "y":
:
Pearson (1895, p. 362) called this the "trigonometrical case", because the integral
:
involves the inverse trigonometic arctan function. Then
:
Finally, let
: and
:
Applying these substitutions, we obtain the parametric function:
:
This unnormalized density has support on the entire
real line . It depends on ascale parameter andshape parameter s and . One parameter was lost when we chose to find the solution to the differential equation (1) as a function of "y" rather than "x". We therefore reintroduce a fourth parameter, namely thelocation parameter "λ". We have thus derived the density of the Pearson type IV distribution::
The
normalizing constant involves the complexGamma function (Γ) and theBeta function (B).The Pearson type VII distribution
The shape parameter "ν" of the Pearson type IV distribution controls its
skewness . If we fix its value at zero, we obtain a symmetric three-parameter family. This special case is known as the Pearson type VII distribution (cf. Pearson 1916, p. 450). Its density is:
where B is the
Beta function .An alternative parameterization (and slight specialization) of the type VII distribution is obtained by letting
:
which requires . This entails a minor loss of generality but ensures that the
variance of the distribution exists and is equal to . Now the parameter "m" only controls thekurtosis of the distribution. If "m" approaches infinity as "λ" and "σ" are held constant, thenormal distribution arises as a special case::
:
:
This is the density of a normal distribution with mean "λ" and standard deviation "σ".
It is convenient to require that and to let
:
This is another specialization, and it guarantees that the first four moments of the distribution exist. More specifically, the Pearson type VII distribution parameterized in terms of has a mean of "λ",
standard deviation of "σ",skewness of zero, andexcess kurtosis of .Student's "t"-distribution
The Pearson type VII distribution subsumes Student's "t"-distribution, and hence also the
Cauchy distribution . Student's "t"-distribution arises as the result of applying the following substitutions to its original parameterization::
: and
:
where . Observe that the constraint is satisfied. The density of this restricted one-parameter family is
:
which is easily recognized as the density of Student's "t"-distribution.
Case 2, non-negative discriminant
If the quadratic function (2) has a non-negative discriminant (), it has real roots "a"1 and "a"2 (not necessarily distinct):
:
:
One have to define ::::
In the presence of real roots the quadratic function (2) can be written as
:
and the solution to the differential equation is therefore
:
Pearson (1895, p. 362) called this the "logarithmic case", because the integral
:
involves only the
logarithm function, and not the arctan function as in the previous case.Using the substitution
:
we obtain the following solution to the differential equation (1):
:
Since this density is only known up to a hidden constant of proportionality, that constant can be changed and the density written as follows:
:
The Pearson type I and type II distribution
The Pearson type I distribution (a generalization of the
beta distribution ) arises when the roots of the quadratic equation (2) are of opposite sign, that is, . Then the solution "p" is supported on the interval . Apply the substition:
Wikimedia Foundation. 2010.