scikits.statsmodels.glm.GLM

class scikits.statsmodels.glm.GLM(endog, exog, family=<scikits.statsmodels.family.family.Gaussian object at 0x01A6DB30>)

Generalized Linear Models class

GLM inherits from statsmodels.LikelihoodModel

Parameters:

endog : array-like

1d array of endogenous response variable. This array can be 1d or 2d for Binomial family models.

exog : array-like

n x p design / exogenous data array

family : family class instance

The default is Gaussian. To specify the binomial distribution family = sm.family.Binomial() Each family can take a link instance as an argument. See statsmodels.family.family for more information.

See also

statsmodels.family.

Notes

Only the following combinations make sense for family and link

             + ident log logit probit cloglog pow opow nbinom loglog logc
Gaussian     |   x    x                        x
inv Gaussian |   x    x                        x
binomial     |   x    x    x     x       x     x    x           x      x
Poission     |   x    x                        x
neg binomial |   x    x                        x          x
gamma        |   x    x                        x

Not all of these link functions are currently available.

Endog and exog are references so that if the data they refer to are already arrays and these arrays are changed, endog and exog will change.

Attributes

df_model : float
Model degrees of freedom is equal to p - 1, where p is the number of regressors. Note that the intercept is not reported as a degree of freedom.
df_resid : float
Residual degrees of freedom is equal to the number of observation n minus the number of regressors p.
endog : array
See above. Note that endog is a reference to the data so that if data is already an array and it is changed, then endog changes as well.
exog : array
See above. Note that endog is a reference to the data so that if data is already an array and it is changed, then endog changes as well.
history : dict
Contains information about the iterations. Its keys are fittedvalues, deviance, and params.
iteration : int
The number of iterations that fit has run. Initialized at 0.
family : family class instance
A pointer to the distribution family of the model.
mu : array
The mean response of the transformed variable. mu is the value of the inverse of the link function at eta, where eta is the linear predicted value of the WLS fit of the transformed variable. mu is only available after fit is called. See statsmodels.family.family.fitted of the distribution family for more information.
normalized_cov_params : array
The p x p normalized covariance of the design / exogenous data. This is approximately equal to (X.T X)^(-1)
pinv_wexog : array
The pseudoinverse of the design / exogenous data array. Note that GLM has no whiten method, so this is just the pseudo inverse of the design. The pseudoinverse is approximately equal to (X.T X)^(-1)X.T
scale : float
The estimate of the scale / dispersion of the model fit. Only available after fit is called. See GLM.fit and GLM.estimate_scale for more information.
scaletype : str
The scaling used for fitting the model. This is only available after fit is called. The default is None. See GLM.fit for more information.
weights : array
The value of the weights after the last iteration of fit. Only available after fit is called. See statsmodels.family.family for the specific distribution weighting functions.

Examples

>>> import scikits.statsmodels as sm
>>> data = sm.datasets.scotland.Load()
>>> data.exog = sm.add_constant(data.exog)

Instantiate a gamma family model with the default link function.

>>> gamma_model = sm.GLM(data.endog, data.exog,
        family=sm.family.Gamma())
>>> gamma_results = gamma_model.fit()
>>> gamma_results.params
array([  4.96176830e-05,   2.03442259e-03,  -7.18142874e-05,
     1.11852013e-04,  -1.46751504e-07,  -5.18683112e-04,
    -2.42717498e-06,  -1.77652703e-02])
>>> gamma.scale 
0.0035842831734919055
>>> gamma_results.deviance
0.087388516416999198  
>>>gamma_results.pearsonX2
0.086022796163805704   
>>> gamma_results.llf
-83.017202161073527

Attributes

df_model float p - 1, where p is the number of regressors including the intercept.
df_resid float The number of observation n minus the number of regressors p.
endog array See Parameters.
exog array See Parameters.
history dict Contains information about the iterations.
iteration int The number of iterations that fit has run. Initialized at 0.
family family class instance A pointer to the distribution family of the model.
mu array The estimated mean response of the transformed variable.
normalized_cov_params array p x p normalized covariance of the design / exogenous data.
pinv_wexog array For GLM this is just the pseudo inverse of the original design.
scale float The estimate of the scale / dispersion. Available after fit is called.
scaletype str The scaling used for fitting the model. Available after fit is called.
weights array The value of the weights after the last iteration of fit.

Methods

estimate_scale(mu) Estimates the dispersion/scale.
fit([maxiter, method, tol, data_weights, scale]) Fits a generalized linear model for a given family.
information(params) Fisher information matrix.
initialize() Initialize a generalized linear model.
loglike(*args) Loglikelihood function.
newton(params)
predict(exog[, params]) Return linear predicted values for a design matrix
score(params) Score matrix.

Previous topic

Technical Documentation - part 2

Next topic

scikits.statsmodels.glm.GLM.estimate_scale

This Page