Empirical Bayes method

Empirical Bayes method

In statistics, empirical Bayes methods are a class of methods which use empirical data to evaluate / approximate the conditional probability distributions that arise from Bayes' theorem. These methods allow one to estimate quantities (probabilities, averages, etc.) about an individual member of a population by combining information from empirical measurements on the individual and on the entire population.

Empirical Bayes methods involve:

*An "underlying" probability distribution of some unobservable quantity is assigned to each member of a statistical population. This quantity is a random variable if a member of the population is chosen at random. The probability distribution of this random variable is not known, and is thought of as a property of the population.

*An observable quantity assigned to each member of the population. When a random sample is taken from the population, it is desired first to estimate the "underlying" probability distribution, and then to estimate the value of the unobservable quantity assigned to each member of the sample.

Introduction

In the Bayesian approach to statistics, we consider the problem of estimating some probability (such as a future outcome or a noisy measurement), based on measurements of our data, a model for these measurements, and some model for our prior beliefs about the system. Let us consider a standard two-stage model, where we write our data measurements as a vector y = {y_1, y_2, dots, y_n} , and our prior beliefs as some vector of random unknowns heta. We assume we can model our measurements with a conditional probability distribution (or likelihood ) ho(y| heta), and also the prior as ho( heta|eta) , where eta is some hyperparameter. For example, we might choose ho(y| heta) to be a binomial distribution, and ho( heta|eta) as a Beta distribution (the conjugate prior). Empirical Bayes then employs the complete set of empirical data to make inferences about the prior heta, and then plugs this into the likelihood ho(y| heta) to make estimates for future outcomes of individual measurements.

To see this in action, use Bayes' theorem to write an expression for the posterior probability for heta as

: ho( heta|y)={ { ho(y| heta) ho( heta|eta)} over {int { ho(y| heta) ho( heta|eta), d heta}

and let us define the denominator of the fraction, known as marginal (or normalizing constant) as

:m(y|eta)={int { ho(y| heta) ho( heta|eta), d heta

Empirical Bayes (EB) combines Bayesianism and frequentist approaches to estimation. EB approximates the marginal (and/or full posterior) distribution with point estimation (maximum likelihood estimation (MLE), squared error loss (SEL), Monte Carlo/numerical integration, etc.) and then estimates the approximate marginal using the empirical data (frequency probability). EB takes several forms, including non-parameteric and parameteric forms. We describe a few common examples below.

Point estimation

Robbins method (1956): non-parameteric empirical Bayes (NPEB)

We consider a case of compound sampling, where probability for each ho(y_i| heta_i) is specified by a Poisson distribution,

: ho(y_i| heta_i)= heta_i}^{y_i} e^{- heta_i} over {y_i}!}

while the prior is unspecified except that it is also i.i.d. (call it G(Theta)). Compound sampling arises in a variety of statistical estimation problems, such as accident rates, clinical trials, etc. We simply seek a point estimate of heta_i. Because the prior is unspecified, we seek a non-parametric estimate of the posterior.We may write a point Bayes estimate for the prior (assuming squared error loss (SEL)) as (see Carlin and Louis, Sec. 3.2 and Appendix B): :E( heta_imid Y=y_i)={(y_i+1)m_G(Y=y_i+1) over m_G(Y=y_i)}.qquadqquadqquad Proof:

First, we show that the point estimate for the prior is the estimated mean of the posterior. Write the posterior risk for a point estimate for heta = a under squared error loss (SEL) (or loss function) as

: ho(G,a) = int ( heta-a)^2g( heta|y),d heta.

To find the minimum error, set the derivative with respect to a equal to zero

:frac{ partial }{partial a} [ ho(G,a)] = int -2( heta-a)g( heta|y),d heta = 0.

Solving for a yields

:a = int heta g( heta|y), d heta = E( heta|y).

A quick check shows that the second derivative is greater than zero, indicating a true minimum.

:frac{ partial^2 }{partial a^2} [ ho(G,a)] = 2 int g( heta|y) d heta = 2.

Note that had we chosen a absolute error loss, the point estimate would instead be the median, and with 0-1 error loss, it would be the mode (definitions pending).

Second, we wish to show that the point estimate is a simple ratio of the estimated marginals for the specific case of the Poisson distribution. Write the point estimate for the prior as:

:E( heta_i|y_i) = {int ( heta^{y+1} e^{- heta} / {y_i}!),dG( heta) over {int ( heta^y e^{- heta} / {y_i}!),dG( heta}) }.

Multiply the expression by ({y_i}+1)/({y_i}+1) and use Bayes definition of the marginal to obtain

: E( heta_i|y_i)= (y_i + 1) m_G(y_i + 1) }over {m_G(y_i).

To take advantage of this, Robbins (1955) suggested estimating the marginals with their empirical frequencies "Y""d", yielding the fully non-parametric estimate as:

: E( heta_i|y_i)= (y_i + 1) { {#(Y_d = y_i + 1)} over {#( Y_d = y_i)} }

(see also Good-Turing frequency estimation).

Example: Accident rates

Suppose each customer of an insurance company has an "accident rate" Θ and is insured against "accidents"; the probability distribution of Theta is the "underlying" distribution, and is unknown. The number of "accidents" suffered by each customer in a specified baseline time period has a Poisson distribution whose expected value is the particular customer's "accident rate". That number of "accidents" is the observable quantity. A crude way to estimate the underlying probability distribution of the "accident rate" Θ is to estimate the proportion of members of the whole population suffering 0, 1, 2, 3, ... accidents during the specified time period to be equal to the corresponding proportion in the observed random sample. Having done so, it is then desired the "accident rate" of each customer in the sample. One may use the conditional expected value of the "accident rate" Θ given the observed number "X" of "accidents" during the baseline period.

Thus, if a customer suffers six "accidents" during the baseline period, that customer's estimated "accident rate" is 7 × [the proportion of the sample who suffered 7 "accidents"] / [the proportion of the sample who suffered 6 "accidents"] .

Parametric empirical Bayes

If the likelihood and its prior take on simple parametric forms (such as 1- or 2-dimensional likelihood functions with simple conjugate priors), then the empirical Bayes problem is only to estimate the marginal m(y|eta) and the hyperparameters eta using the complete set of empirical measurements. For example, one common approach, called parametric Empirical Bayes point estimation, is to approximate the marginal using the maximum likelihood estimate (MLE), or a Moments expansion, which allows one to express the hyperparameters eta in terms of the empirical mean and variance. This simplified marginal allows one to plug in the empirical averages into a point estimate for the prior heta. The resulting equation for the prior heta is greatly simplified, as shown below.

There are several common Parametric Empirical Bayes models, including the Poisson-Gamma model (below), the Beta-binomial model, the Gaussian-Gaussian model, the multinomial-Dirichlet model, as well specific models for Bayesian linear regression (see below) and Bayesian multivariate linear regression. More advanced approaches include hierarchial Bayesian models and Bayesian mixture models.

Poisson-Gamma model

For example, in the example above, let the likelihood be a Poisson distribution, and let the prior now be specified by the conjugate prior, which is a Gamma distribution (G(alpha,eta)) (where eta = (alpha,eta)):

: ho( heta|alpha,eta) = frac{ heta^{alpha-1}, e^{- heta / eta} }{eta^{alpha} Gamma(alpha)} mathrm{for} heta > 0, alpha > 0, eta > 0 ,!

It is straightforward to show the posterior is also a Gamma distribution. Write

: ho( heta|y) propto ho(y| heta) ho( heta|alpha, eta)

where we have omitted the marginal since it does not depend explicitly on heta.Expanding terms which do depend on heta gives the posterior as:

: ho( heta|y) propto ( heta^{y}, e^{- heta}) ( heta^{alpha-1}, e^{- heta / eta}) = heta^{y+ alpha -1}, e^{- heta (1+1 / eta)}

So we see that the posterior density is also a Gamma distribution G(alpha',eta'), where alpha' = y + alpha, and eta' = (1+1 / eta)^{-1}. Also notice that the marginal is simply the integral of the posterior over all Theta, which turns out to be a negative binomial distribution.

To apply Empirical Bayes, we will approximate the marginal using the maximum likelihood estimate (MLE). But since the posterior is a Gamma distribution, the MLE of the marginal turns out to be just the mean the of posterior, which is the point estimate E( heta|y) we need. Recalling that the mean mu of a Gamma distribution G(alpha', eta') is simply alpha' eta', we have

: E( heta|y) = alpha' eta' = frac{ar{y}+alpha}{1+1 / eta} = frac{eta}{1+eta}ar{y} + frac{1}{1+eta} (alpha eta)

To obtain the values of alpha and eta, Empirical Bayes prescribes estimating mean alphaeta and variance alphaeta^2 using the complete set of empirical data.

The resulting point estimate E( heta|y) is therefore like a weighted average of the sample mean ar{y} and the prior mean mu = alphaeta. This turns out to be a general feature of Empirical Bayes; the point estimates for the prior (i.e mean) will look like a weighted averages of the sample estimate and the prior estimate. (Likewise for estimates of the variance).

ee also

* Bayes estimator
* Bayes' theorem
* Bayesian probability
* Best linear unbiased prediction
* Conditional probability
* Monty Hall problem
* Posterior probability
* Bayesian coding hypothesis

References

* Herbert Robbins, "An Empirical Bayes Approach to Statistics", Proceeding of the Third Berkeley Symposium on Mathematical Statistics, volume 1, pages 157-163, University of California Press, Berkeley, 1956.

* Bradley P. Carlin and Thomas A. Louis, "Bayes and Empirical Bayes Methods for Data Analysis", Chapman & Hall/CRC, Second edition 2000,

* Peter E. Rossi, Greg M. Allenby, and Robert McCulloch, "Bayesian Statistics and Marketing", John Wiley & Sons, Ltd, 2006

* George Casella, "An Introduction to Empirical Bayes Data Analysis" American Statistician, Vol. 39, No. 2 (May, 1985), pp. 83-87

External links

* [http://ca.geocities.com/hauer@rogers.com/Pubs/TRBpaper.pdf Use of Empirical Bayes Method in estimating road safety (North America)]
* [http://www.math.uu.se/research/pub/Brandel.pdf Empirical Bayes Methods for missing data analysis]
* [http://it.stlawu.edu/~msch/biometrics/papers.htm Using the Beta-Bionomial distribution to assess performance of a biometric identification device]
* [http://www.biomedcentral.com/1471-2105/7/514/abstract/ A Hierarchical Naive Bayes Classifiers] (for continuous and [http://labmedinfo.org/download/lmi339.pdf discrete] variables).


Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Bayes estimator — In decision theory and estimation theory, a Bayes estimator is an estimator or decision rule that maximizes the posterior expected value of a utility function or minimizes the posterior expected value of a loss function (also called posterior… …   Wikipedia

  • Bayes' theorem — In probability theory, Bayes theorem (often called Bayes law after Thomas Bayes) relates the conditional and marginal probabilities of two random events. It is often used to compute posterior probabilities given observations. For example, a… …   Wikipedia

  • Naive Bayes classifier — A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes theorem with strong (naive) independence assumptions. A more descriptive term for the underlying probability model would be independent feature model . In… …   Wikipedia

  • Principle of maximum entropy — This article is about the probability theoretic principle. For the classifier in machine learning, see maximum entropy classifier. For other uses, see maximum entropy (disambiguation). Bayesian statistics Theory Bayesian probability Probability… …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • List of mathematics articles (E) — NOTOC E E₇ E (mathematical constant) E function E₈ lattice E₈ manifold E∞ operad E7½ E8 investigation tool Earley parser Early stopping Earnshaw s theorem Earth mover s distance East Journal on Approximations Eastern Arabic numerals Easton s… …   Wikipedia

  • Bayesian probability — Bayesian statistics Theory Bayesian probability Probability interpretations Bayes theorem Bayes rule · Bayes factor Bayesian inference Bayesian network Prior · Posterior · Likelihood …   Wikipedia

  • Herbert Robbins — Herbert Ellis Robbins (born January 12, 1915 in New Castle, Pennsylvania; died February 12, 2001 in Princeton, New Jersey) was a mathematician and statistician who did research in topology, measure theory, statistics, and a variety of other… …   Wikipedia

  • Prior probability — Bayesian statistics Theory Bayesian probability Probability interpretations Bayes theorem Bayes rule · Bayes factor Bayesian inference Bayesian network Prior · Posterior · Likelihood …   Wikipedia

  • Credible interval — Bayesian statistics Theory Bayesian probability Probability interpretations Bayes theorem Bayes rule · Bayes factor Bayesian inference Bayesian network Prior · Posterior · Likelihood …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”