Omitted-variable bias

Omitted-variable bias

In statistics, omitted-variable bias (OVB) occurs when a model is created which incorrectly leaves out one or more important causal factors. The 'bias' is created when the model compensates for the missing factor by over- or under-estimating one of the other factors.

More specifically, OVB is the bias that appears in the estimates of parameters in a regression analysis, when the assumed specification is incorrect, in that it omits an independent variable (possibly non-delineated) that should be in the model.

Omitted-variable bias in linear regression

Two conditions must hold true for omitted-variable bias to exist in linear regression:

  • the omitted variable must be a determinant of the dependent variable (i.e., its true regression coefficient is not zero); and
  • the omitted variable must be correlated with one or more of the included independent variables.

As an example, consider a linear model of the form

y_i = x_i \beta + z_i \delta + u_i,\qquad i = 1,\dots,n

where

  • xi is a 1 × p row vector, and is part of the observed data;
  • β is a p × 1 column vector of unobservable parameters to be estimated;
  • zi is a scalar and is part of the observed data;
  • δ is a scalar and is an unobservable parameter to be estimated;
  • the error terms ui are unobservable random variables having expected value 0 (conditionally on xi and zi);
  • the dependent variables yi are part of the observed data.

We let

 X = \left[ \begin{array}{c} x_1 \\  \vdots \\ x_n \end{array} \right] \in \mathbb{R}^{n\times p},

and

 Y = \left[ \begin{array}{c} y_1 \\  \vdots \\ y_n \end{array} \right],\quad  Z = \left[ \begin{array}{c} z_1 \\  \vdots \\ z_n \end{array} \right],\quad  U = \left[ \begin{array}{c} u_1 \\  \vdots \\ u_n \end{array} \right] \in \mathbb{R}^{n\times 1}.

Then through the usual least squares calculation, the estimated parameter vector \hat{\beta} based only on the observed x-values but omitting the observed z values, is given by:

\hat{\beta} = (X'X)^{-1}X'Y\,

(where the "prime" notation means the transpose of a matrix).

Substituting for Y based on the assumed linear model,


\begin{align}
\hat{\beta} & = (X'X)^{-1}X'(X\beta+Z\delta+U) \\
& =(X'X)^{-1}X'X\beta + (X'X)^{-1}X'Z\delta + (X'X)^{-1}X'U \\
& =\beta + (X'X)^{-1}X'Z\delta + (X'X)^{-1}X'U.
\end{align}

On taking expectations, the contribution of the final term is zero; this follows from the assumption that U has zero expectation. On simplifying the remaining terms:


\begin{align}
E[ \hat{\beta} | X ] & = \beta + (X'X)^{-1}X'Z\delta \\
& = \beta + \text{bias}.
\end{align}

The second term above is the omitted-variable bias in this case. Note that the bias is equal to the weighted portion of zi which is "explained" by xi.

Effects on Ordinary Least Square

Gauss–Markov theorem states that regression models which fulfill the classical linear regression model assumptions provide the best, linear and unbiased estimators. With respect to ordinary least squares, the relevant assumption of the classical linear regression model is that the error term is uncorrelated with the regressors.

The presence of omitted variable bias violates this particular assumption. The violation causes OLS estimator to be biased and inconsistent. The direction of the biased depends on the estimators as well as the covariance between the regressors and the omitted variables. Given a positive estimator, a positive covariance will lead OLS estimator to overestimate the true value of an estimator. This effect can be seen by taking the expectation of the parameter, as shown in the previous section.

References



Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

  • Bias (statistics) — In statistics, the term bias is used for describing several different concepts: * A biased sample is one in which some members of the population are more likely to be included than others. **Spectrum bias refers to evaluating the ability of a… …   Wikipedia

  • Bias — This article is about different ways the term bias is used . For other uses, see Bias (disambiguation). Bias is an inclination to present or hold a partial perspective at the expense of (possibly equally valid) alternatives. Bias can come in many …   Wikipedia

  • Bias of an estimator — In statistics, the difference between an estimator s expected value and the true value of the parameter being estimated is called the bias. An estimator or decision rule having nonzero bias is said to be biased.Although the term bias sounds… …   Wikipedia

  • Experimenter's bias — In experimental science, experimenter s bias is subjective bias towards a result expected by the human experimenter. David Sackett,[1] in a useful review of biases in clinical studies, states that biases can occur in any one of seven stages of… …   Wikipedia

  • Selection bias — is a statistical bias in which there is an error in choosing the individuals or groups to take part in a scientific study.[1] It is sometimes referred to as the selection effect. The term selection bias most often refers to the distortion of a… …   Wikipedia

  • Confirmation bias — (also called confirmatory bias or myside bias) is a tendency for people to favor information that confirms their preconceptions or hypotheses regardless of whether the information is true.[Note 1][1] As a result, people gather evidence and recall …   Wikipedia

  • Sampling bias — In statistics, sampling bias is when a sample is collected in such a way that some members of the intended population are less likely to be included than others. It results in a biased sample, a non random sample[1] of a population (or non human… …   Wikipedia

  • Cognitive bias — For an article about the conceptual problems of the mind see Cognitive closure (philosophy). Psychology …   Wikipedia

  • Outcome bias — The outcome bias is an error made in evaluating the quality of a decision when the outcome of that decision is already known. Overview One will often judge a past decision by its ultimate outcome instead of based on the quality of the decision at …   Wikipedia

  • Moderator variable — A moderator variable is, in general terms, a qualitative (e.g., sex, race, class) or quantitative (e.g., level of reward) variable that affects the direction and/or strength of the relation between dependent and independent variables.… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”