Extract interaction terms from regression estimates

  • A+
Category:Languages

This is a simple question but I couldn't find a clear and compelling answer anywhere. If I have a regression model with one or more interaction terms, like:

mod1 <- lm(mpg ~ factor(cyl) * factor(am), data = mtcars) coef(summary(mod1)) ##                           Estimate Std. Error   t value              Pr(>|t|) ## (Intercept)              22.900000   1.750674 13.080673 0.0000000000006057324 ## factor(cyl)6             -3.775000   2.315925 -1.630018 0.1151545663620229670 ## factor(cyl)8             -7.850000   1.957314 -4.010599 0.0004547582690011110 ## factor(am)1               5.175000   2.052848  2.520888 0.0181760532676256310 ## factor(cyl)6:factor(am)1 -3.733333   3.094784 -1.206331 0.2385525615801434851 ## factor(cyl)8:factor(am)1 -4.825000   3.094784 -1.559075 0.1310692573417492068 

what is a sure fire way of identifying which coefficient estimates are for interaction terms? The obvious way is to grep() for the colon symbol in the term names. But let's assume for a second that's not possible because of something like:

mtcars$cyl2 <- factor(mtcars$cyl, levels = c(4,6,8), labels = paste("Cyl:", unique(mtcars$cyl))) mod2 <- lm(mpg ~ cyl2 * factor(am), data = mtcars) ##                         Estimate Std. Error   t value              Pr(>|t|) ## (Intercept)            22.900000   1.750674 13.080673 0.0000000000006057324 ## cyl2Cyl: 4             -3.775000   2.315925 -1.630018 0.1151545663620229670 ## cyl2Cyl: 8             -7.850000   1.957314 -4.010599 0.0004547582690011110 ## factor(am)1             5.175000   2.052848  2.520888 0.0181760532676256310 ## cyl2Cyl: 4:factor(am)1 -3.733333   3.094784 -1.206331 0.2385525615801434851 ## cyl2Cyl: 8:factor(am)1 -4.825000   3.094784 -1.559075 0.1310692573417492068 

I thought perhaps the terms() object would be useful but it isn't. I could also probably make some assumption about the ordering/numbering of terms to get the intended result:

coef(summary(mod2))[5:6,] ##                         Estimate Std. Error   t value  Pr(>|t|) ## cyl2Cyl: 4:factor(am)1 -3.733333   3.094784 -1.206331 0.2385526 ## cyl2Cyl: 8:factor(am)1 -4.825000   3.094784 -1.559075 0.1310693 

but I don't know how to do that in a general way.

What can be done?


This seems a little convoluted, but could we just enumerate all the main effects and then take the set difference?

mod2 <- lm(mpg ~ cyl2 * factor(am) + wt * disp, data = mtcars) variables <- labels(mod2)[attr(terms(mod2), "order") == 1] factors <- sapply(names(mod2$xlevels), function(x) paste0(x, mod2$xlevels[[x]])[-1]) setdiff(colnames(model.matrix(mod2)), c("(Intercept)", variables, unlist(factors))) # [1] "cyl2Cyl: 4:factor(am)1" "cyl2Cyl: 8:factor(am)1" "wt:disp"  

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: