Logistic regression

Logistic regression

In statistics, logistic regression is a model used for prediction of the probability of occurrence of an event by fitting data to a logistic curve. It makes use of several predictor variables that may be either numerical or categorical. For example, the probability that a person has a heart attack within a specified time period might be predicted from knowledge of the person's age, sex and body mass index. Logistic regression is used extensively in the medical and social sciences as well as marketing applications such as prediction of a customer's propensity to purchase a product or cease a subscription.

Other names for logistic regression used in various other application areas include logistic model, logit model, and maximum-entropy classifier.

Logistic regression is one of a class of models known as generalized linear models.

In this model, increasing age is associated with an increasing risk of death from heart disease (z goes up by 2.0 for every 10 years over the age of 50), female sex is associated with a decreased risk of death from heart disease ("z" goes down by 1.0 if the patient is female), and increasing cholesterol is associated with an increasing risk of death (z goes up by 1.2 for each 1 mmol/L increase in cholesterol above 5mmol/L).

We wish to use this model to predict Mr Petrelli's risk of death from heart disease: he is 50 years old and his cholesterol level is 7.0 mmol/L.Mr Petrelli's risk of death is therefore

: frac{1}{1+e^{-z ext{, where } z=-5.0 + (+2.0)(5.0-5.0) + (-1.0)0 + (+1.2)(7.0-5.0).

This means that by this model, Mr Petrelli's risk of dying from heart disease in the next 10 years is 0.07 (or 7%).

Formal mathematical specification

Logistic regression analyzes binomially distributed data of the form

:Y_i sim B(n_i,p_i), ext{ for }i = 1, dots , m,

where the numbers of Bernoulli trials "n""i" are known and the probabilities of success "p""i" are unknown. An example of this distribution is the fraction of seeds ("p""i") that germinate after "n""i" are planted.

The model proposes for each trial (value of "i") there is a set of explanatory variables that might inform the final probability. These explanatory variables can be thought of as being in a "k" vector "X""i" and the model then takes the form

:p_i = operatorname{E}left(left.frac{Y_i}{n_{i ight|X_i ight). ,!

The logits of the unknown binomial probabilities ("i.e.", the logarithms of the odds) are modelled as a linear function of the "Xi".

:operatorname{logit}(p_i)=lnleft(frac{p_i}{1-p_i} ight) = eta_0 + eta_1 x_{1,i} + cdots + eta_k x_{k,i}.

Note that a particular element of "Xi" can be set to 1 for all "i" to yield an intercept in the model. The unknown parameters "β"j are usually estimated by maximum likelihood.

The interpretation of the "β""j" parameter estimates is as the additive effect on the log odds ratio for a unit change in the "j"th explanatory variable. In the case of a dichotomous explanatory variable, for instance gender, e^eta is the estimate of the odds ratio of having the outcome for, say, males compared with females.

The model has an equivalent formulation

:p_i = frac{1}{1+e^{-(eta_0 + eta_1 x_{1,i} + cdots + eta_k x_{k,i}). ,!

This functional form is commonly called a single-layer perceptron or single-layer artificial neural network. A single-layer neural network computes a continuous output instead of a step function. The derivative of "pi" with respect to "X = x1...xk" is computed from the general form:

: y = frac{1}{1+e^{-f(X)

where "f"("X") is an analytic function in "X". With this choice, the single-layer network is identical to the logistic regression model. This function has a continuous derivative, which allows it to be used in backpropagation. This function is also preferred because its derivative is easily calculated:

: y' = y(1-y)frac{mathrm{d}f}{mathrm{d}X},!

Extensions

Extensions of the model cope with multi-category dependent variables and ordinal dependent variables, such as polytomous regression. Multi-class classification by logistic regression is known as multinomial logit modeling. An extension of the logistic model to sets of interdependent variables is the conditional random field.

See also

* Logistic function
* Sigmoid function
* Artificial neural network
* Data mining
* Linear discriminant analysis
* Perceptron
* Probit model
* Variable rules analysis
* Jarrow-Turnbull model

External links

* [http://statpages.org/logistic.html Web-based logistic regression calculator]
* [http://www.cs.utah.edu/~hal/megam A highly optimized Maximum Entropy modeling package]
* [http://mallet.cs.umass.edu/index.php/Main_Page MALLET Java library, includes a trainer for logistic models]

References

*cite book
last = Agresti
first = Alan.
title = Categorical Data Analysis
publisher = New York: Wiley-Interscience
date = 2002
isbn = 0-471-36093-7

*cite book
last = Amemiya
first = T.
title = Advanced Econometrics
publisher = Harvard University Press
date = 1985
isbn = 0-674-00560-0

*cite book
last = Balakrishnan
first = N.
title = Handbook of the Logistic Distribution
publisher = Marcel Dekker, Inc.
date = 1991
isbn = 978-0824785871

*cite book
last = Greene
first = William H.
title = Econometric Analysis, fifth edition
publisher = Prentice Hall
date = 2003
isbn = 0-13-066189-9

*cite book
last = Hosmer
first = David W.
coauthors = Stanley Lemeshow
title = Applied Logistic Regression, 2nd ed.
publisher = New York; Chichester, Wiley
date = 2000
isbn = 0-471-35632-8


Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • logistic regression — (or logit regression) A form of regression analysis that is specifically tailored to the situation in which the dependent variable is dichotomous (or binary). For example, among a sample of people under investigation, a researcher might be… …   Dictionary of sociology

  • logistic regression — logistinė regresija statusas T sritis biomedicinos mokslai apibrėžtis ↑Regresijos rūšis, taikoma ↑ryšiui tarp ↑atvejo (priklausomo ↑kintamojo) tikimybės ir kitų (nepriklausomų) kintamųjų išreikšti. Logistinė regresija taikoma, jei priklausomas… …   Lithuanian dictionary (lietuvių žodynas)

  • logistic regression — a multivariate statistical method used for modeling the probability of occurrence of a dichotomous outcome as a function of multiple independent variables; it always yields a probability between 0 and 1 …   Medical dictionary

  • Regression dilution — is a statistical phenomenon also known as attenuation . Consider fitting a straight line for the relationship of an outcome variable y to a predictor variable x, and estimating the gradient (slope) of the line. Statistical variability,… …   Wikipedia

  • Regression logistique — Régression logistique Pour les articles homonymes, voir Régression. La régression logistique est une technique statistique qui a pour objectif, à partir d’un fichier d’observations, de produire un modèle permettant de prédire les valeurs prises… …   Wikipédia en Français

  • multinational logistic regression — See logistic (or logit) regression …   Dictionary of sociology

  • Logistic function — A logistic function or logistic curve is the most common sigmoid curve. It modelsthe S curve of growth of some set P . The initial stage of growth is approximately exponential; then, as saturation begins, the growth slows, and at maturity, growth …   Wikipedia

  • Régression logistique — Pour les articles homonymes, voir Régression. La régression logistique ou modèle logit est un modèle de régression binomiale. Comme pour tous le modèles de régression binomiale, il s agit de modéliser l effet d un vecteur de variables aléatoires… …   Wikipédia en Français

  • Logistic distribution — Probability distribution name =Logistic type =density pdf cdf parameters =mu, location (real) s>0, scale (real) support =x in ( infty; +infty)! pdf =frac{e^{ (x mu)/s {sleft(1+e^{ (x mu)/s} ight)^2}! cdf =frac{1}{1+e^{ (x mu)/s! mean =mu, median …   Wikipedia

  • logistic (or logit) regression — (or logit regression) A form of regression analysis that is specifically tailored to the situation in which the dependent variable is dichotomous (or binary). For example, among a sample of people under investigation, a researcher might be… …   Dictionary of sociology

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”