General linear model
The general linear model or multivariate regression model is a statistical linear model. It may be written as^{[1]}
where Y is a matrix with series of multivariate measurements (each column being a set of measurements on one of the dependent variables), X is a matrix of observations on independent variables that might be a design matrix (each column being a set of observations on one of the independent variables), B is a matrix containing parameters that are usually to be estimated and U is a matrix containing errors (noise). The errors are usually assumed to be uncorrelated across measurements, and follow a multivariate normal distribution. If the errors do not follow a multivariate normal distribution, generalized linear models may be used to relax assumptions about Y and U.
The general linear model incorporates a number of different statistical models: ANOVA, ANCOVA, MANOVA, MANCOVA, ordinary linear regression, t-test and F-test. The general linear model is a generalization of multiple linear regression to the case of more than one dependent variable. If Y, B, and U were column vectors, the matrix equation above would represent multiple linear regression.
Hypothesis tests with the general linear model can be made in two ways: multivariate or as several independent univariate tests. In multivariate tests the columns of Y are tested together, whereas in univariate tests the columns of Y are tested independently, i.e., as multiple univariate tests with the same design matrix.
Comparison to multiple linear regressionEdit
Multiple linear regression is a generalization of simple linear regression to the case of more than one independent variable, and a special case of general linear models, restricted to one dependent variable. The basic model for multiple linear regression is
for each observation i = 1, ... , n.
In the formula above we consider n observations of one dependent variable and p independent variables. Thus, Y_{i} is the i^{th} observation of the dependent variable, X_{ij} is i^{th} observation of the j^{th} independent variable, j = 1, 2, ..., p. The values β_{j} represent parameters to be estimated, and ε_{i} is the i^{th} independent identically distributed normal error.
In the more general multivariate linear regression, there is one equation of the above form for each of m > 1 dependent variables that share the same set of explanatory variables and hence are estimated simultaneously with each other:
for all observations indexed as i = 1, ... , n and for all dependent variables indexed as j = 1, ... , m.
Comparison to generalized linear modelEdit
The general linear model (GLM)^{[2]}^{[3]} and the generalized linear model (GLiM)^{[4]}^{[5]} are two commonly used families of statistical methods to relate some number of continuous and/or categorical predictors to a single outcome variable.
The main difference between the two approaches is that the GLM strictly assumes that the residuals will follow a conditionally normal distribution^{[3]}, while the GLiM loosens this assumption and allows for a variety of other distributions from the exponential family for the residuals^{[4]}. Of note, the GLM is a special case of the GLiM in which the distribution of the residuals follow a conditionally normal distribution.
The distribution of the residuals largely depends on the type and distribution of the outcome variable; different types of outcome variables lead to the variety of models within the GLiM family. Commonly used models in the GLiM family include binary logistic regression^{[6]} for binary or dichotomous outcomes, Poisson regression^{[7]} for count outcomes, and linear regression for continuous, normally distributed outcomes. This means that GLiM may be spoken of as a general family of statistical models or as specific models for specific outcome types.
General linear model | Generalized linear model | |
---|---|---|
Typical estimation method | Least squares, best linear unbiased prediction | Maximum likelihood or Bayesian |
Examples | ANOVA, ANCOVA, linear regression | linear regression, logistic regression, Poisson regression, gamma regression,^{[8]} general linear model |
Extensions and related methods | MANOVA, MANCOVA, linear mixed model | generalized linear mixed model (GLMM), generalized estimating equations (GEE) |
R package and function | lm() in stats package (base R) | glm() in stats package (base R) |
Matlab function | mvregress() | glmfit() |
SAS procedures | PROC GLM, PROC REG | PROC GENMOD, PROC LOGISTIC (for binary & ordered or unordered categorical outcomes) |
Stata command | regress | glm |
SPSS command | regression, glm | genlin, logistic |
Wolfram Language & Mathematica function | LinearModelFit[]^{[9]} | GeneralizedLinearModelFit[]^{[10]} |
EViews command | ls^{[11]} | glm^{[12]} |
ApplicationsEdit
An application of the general linear model appears in the analysis of multiple brain scans in scientific experiments where Y contains data from brain scanners, X contains experimental design variables and confounds. It is usually tested in a univariate way (usually referred to a mass-univariate in this setting) and is often referred to as statistical parametric mapping.^{[13]}
See alsoEdit
NotesEdit
- ^ K. V. Mardia, J. T. Kent and J. M. Bibby (1979). Multivariate Analysis. Academic Press. ISBN 0-12-471252-5.
- ^ Neter, J., Kutner, M. H., Nachtsheim, C. J., & Wasserman, W. (1996). Applied linear statistical models (Vol. 4, p. 318). Chicago: Irwin.
- ^ ^{a} ^{b} Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences.
- ^ ^{a} ^{b} McCullagh, P.; Nelder, J. A. (1989), "An outline of generalized linear models", Generalized Linear Models, Springer US, pp. 21–47, doi:10.1007/978-1-4899-3242-6_2, ISBN 9780412317606
- ^ Fox, J. (2015). Applied regression analysis and generalized linear models. Sage Publications.
- ^ Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (Vol. 398). John Wiley & Sons.
- ^ Gardner, W., Mulvey, E. P., & Shaw, E. C. (1995). Regression analyses of counts and rates: Poisson, overdispersed Poisson, and negative binomial models. Psychological bulletin, 118(3), 392.
- ^ McCullagh, Peter; Nelder, John (1989). Generalized Linear Models, Second Edition. Boca Raton: Chapman and Hall/CRC. ISBN 978-0-412-31760-6.
- ^ LinearModelFit, Wolfram Language Documentation Center.
- ^ GeneralizedLinearModelFit, Wolfram Language Documentation Center.
- ^ ls, EViews Help.
- ^ glm, EViews Help.
- ^ K.J. Friston; A.P. Holmes; K.J. Worsley; J.-B. Poline; C.D. Frith; R.S.J. Frackowiak (1995). "Statistical Parametric Maps in functional imaging: A general linear approach". Human Brain Mapping. 2 (4): 189–210. doi:10.1002/hbm.460020402.
ReferencesEdit
- Christensen, Ronald (2002). Plane Answers to Complex Questions: The Theory of Linear Models (Third ed.). New York: Springer. ISBN 0-387-95361-2.
- Wichura, Michael J. (2006). The coordinate-free approach to linear models. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press. pp. xiv+199. ISBN 978-0-521-86842-6. MR 2283455.
- Rawlings, John O.; Pantula, Sastry G.; Dickey, David A., eds. (1998). "Applied Regression Analysis". Springer Texts in Statistics. doi:10.1007/b98890. ISBN 0-387-98454-2. Cite journal requires
|journal=
(help)