Quantcast
Channel: Mike Love’s blog
Viewing all articles
Browse latest Browse all 22

Poisson regression

$
0
0

In trying to explain generalized linear models, I often say something like: GLMs are very similar to linear models but with different domains for the target y, e.g. positive numbers, outcomes in {0,1}, non-negative integers, etc. This explanation bypasses the more interesting point though, that the optimization problem for fitting the coefficients is totally different, after applying the link function.

lm_vs_glm

This can be seen by comparing the coefficients from a linear regression of log counts to those from a Poisson regression. For some cases, the fitted lines are quite similar, however they diverge if you introduce outliers. A casual explanation here would be that the Poisson likelihood is thrown off more by high counts than by low counts; the high count pulls up the expected value for x=2 in the second plot, but the low count does not substantially pull down the expected value for x=3 in the third plot.

n <- 20
x <- rep(c(2,3),each=n/2)
y <- rpois(n,lambda=exp(x))
lmfit <- lm(log(y) ~ x)
glmfit <- glm(y ~ x, family="poisson")
par(mfrow=c(1,3))
xlim <- c(1.5,3.5)
plot(x,log(y),xlim=xlim)
abline(coef(lmfit),col="red")
abline(coef(glmfit),col="blue")
legend("topleft",c("lm","glm"),col=c("red","blue"),lty=1)
y[1] <- 50
lmfit <- lm(log(y) ~ x)
glmfit <- glm(y ~ x, family="poisson")
plot(x,log(y),xlim=xlim)
abline(coef(lmfit),col="red")
abline(coef(glmfit),col="blue")
y <- rpois(n,lambda=exp(x))
y[n] <- 2
lmfit <- lm(log(y) ~ x)
glmfit <- glm(y ~ x, family="poisson")
plot(x,log(y),xlim=xlim)
abline(coef(lmfit),col="red")
abline(coef(glmfit),col="blue")


Viewing all articles
Browse latest Browse all 22

Trending Articles