Skip to contents

Given a dataframe with factor columns, this function will extract the contrasts from the factor column and place them inside new columns. This is useful for cases where you want to work with the numeric values of the contrasts. For a pedagogical example, you can explicitly show how factor variables are transformed into numeric values. For a practical example, you're typically allowed n-1 contrasts for n levels of a factor. If you don't want to use all of the contrasts, you can extract the ones you want and use them in your model. This is sometimes used with polynomial contrasts when you don't want to use higher order polynomials.

Usage

decompose_contrasts(
  model_data,
  extract,
  remove_intercept = TRUE,
  remove_original = FALSE
)

Arguments

model_data

Dataframe with factor columns

extract

A one-sided formula denoting the factors to extract. Note this should ideally be what you would pass to your model fitting function, sans any non-factors.

remove_intercept

Logical, whether to remove the column corresponding to the intercept. Default TRUE since it's always just a column of 1s

remove_original

Logical, whether to remove the original columns in the data frame after decomposing into separate columns. Default FALSE.

Value

model_data but with new columns corresponding to the numeric coding of the given factor's contrasts

Details

An additional usage for this function is to compute the contrasts for interaction terms in a model. In lm(y ~ A * B), where A and B are factors, the expanded form is lm(y ~ A + B + A:B) with an equation of \(y = \beta_Ax_A + \beta_Bx_B + \beta_{A:B}x_Ax_B\). The thing to note is that the coefficient for the interaction(s) are multiplied by the product of \(x_A\) and \(x_B\). Let's call this product \(x_C\). For example, if one value of \(x_A\) is -1/3 and one value of \(x_B\) is 2/3, then the product \(x_C\) is -2/9. But, if there are 3 levels for \(x_A\) and 3 levels for \(x_B\), then we get 4 columns for the fixed effects and 4 more columns for the interaction terms. It can be a lot of tedious work to precompute the products manually, so we can use this function with extract_interaction = TRUE to compute everything at once.

See also

Examples


# Decompose contrasts for carb and gear columns into new columns, using
# the contrast labels used when setting the contrasts
mtcars |>
  set_contrasts(
    carb ~ scaled_sum_code,
    gear ~ contr.sum | c("4-mean", "5-mean")
  ) |>
  decompose_contrasts(~ carb + gear) |>
  str()
#> Converting to factors: carb gear
#> 'data.frame':	32 obs. of  18 variables:
#>  $ mpg       : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#>  $ cyl       : num  6 6 4 6 8 6 8 4 4 6 ...
#>  $ disp      : num  160 160 108 258 360 ...
#>  $ hp        : num  110 110 93 110 175 105 245 62 95 123 ...
#>  $ drat      : num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
#>  $ wt        : num  2.62 2.88 2.32 3.21 3.44 ...
#>  $ qsec      : num  16.5 17 18.6 19.4 17 ...
#>  $ vs        : num  0 0 1 1 0 1 0 1 1 1 ...
#>  $ am        : num  1 1 1 0 0 0 0 0 0 0 ...
#>  $ gear      : Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
#>   ..- attr(*, "contrasts")= num [1:3, 1:2] -1 1 0 -1 0 1
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : chr [1:3] "3" "4" "5"
#>   .. .. ..$ : chr [1:2] "4-mean" "5-mean"
#>  $ carb      : Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...
#>   ..- attr(*, "contrasts")= num [1:6, 1:5] -0.167 0.833 -0.167 -0.167 -0.167 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : chr [1:6] "1" "2" "3" "4" ...
#>   .. .. ..$ : chr [1:5] "2" "3" "4" "6" ...
#>  $ carb2     : num  -0.167 -0.167 -0.167 -0.167 0.833 ...
#>  $ carb3     : num  -0.167 -0.167 -0.167 -0.167 -0.167 ...
#>  $ carb4     : num  0.833 0.833 -0.167 -0.167 -0.167 ...
#>  $ carb6     : num  -0.167 -0.167 -0.167 -0.167 -0.167 ...
#>  $ carb8     : num  -0.167 -0.167 -0.167 -0.167 -0.167 ...
#>  $ gear4-mean: num  1 1 1 -1 -1 -1 -1 1 1 1 ...
#>  $ gear5-mean: num  0 0 0 -1 -1 -1 -1 0 0 0 ...

# Decompose an interaction term between the two factors
mtcars |>
  set_contrasts(
    carb ~ scaled_sum_code,
    gear ~ contr.sum | c("4-mean", "5-mean")
  ) |>
  decompose_contrasts(~ carb * gear) |>
  str()
#> Converting to factors: carb gear
#> 'data.frame':	32 obs. of  28 variables:
#>  $ mpg             : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#>  $ cyl             : num  6 6 4 6 8 6 8 4 4 6 ...
#>  $ disp            : num  160 160 108 258 360 ...
#>  $ hp              : num  110 110 93 110 175 105 245 62 95 123 ...
#>  $ drat            : num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
#>  $ wt              : num  2.62 2.88 2.32 3.21 3.44 ...
#>  $ qsec            : num  16.5 17 18.6 19.4 17 ...
#>  $ vs              : num  0 0 1 1 0 1 0 1 1 1 ...
#>  $ am              : num  1 1 1 0 0 0 0 0 0 0 ...
#>  $ gear            : Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
#>   ..- attr(*, "contrasts")= num [1:3, 1:2] -1 1 0 -1 0 1
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : chr [1:3] "3" "4" "5"
#>   .. .. ..$ : chr [1:2] "4-mean" "5-mean"
#>  $ carb            : Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...
#>   ..- attr(*, "contrasts")= num [1:6, 1:5] -0.167 0.833 -0.167 -0.167 -0.167 ...
#>   .. ..- attr(*, "dimnames")=List of 2
#>   .. .. ..$ : chr [1:6] "1" "2" "3" "4" ...
#>   .. .. ..$ : chr [1:5] "2" "3" "4" "6" ...
#>  $ carb2           : num  -0.167 -0.167 -0.167 -0.167 0.833 ...
#>  $ carb3           : num  -0.167 -0.167 -0.167 -0.167 -0.167 ...
#>  $ carb4           : num  0.833 0.833 -0.167 -0.167 -0.167 ...
#>  $ carb6           : num  -0.167 -0.167 -0.167 -0.167 -0.167 ...
#>  $ carb8           : num  -0.167 -0.167 -0.167 -0.167 -0.167 ...
#>  $ gear4-mean      : num  1 1 1 -1 -1 -1 -1 1 1 1 ...
#>  $ gear5-mean      : num  0 0 0 -1 -1 -1 -1 0 0 0 ...
#>  $ carb2:gear4-mean: num  -0.167 -0.167 -0.167 0.167 -0.833 ...
#>  $ carb3:gear4-mean: num  -0.167 -0.167 -0.167 0.167 0.167 ...
#>  $ carb4:gear4-mean: num  0.833 0.833 -0.167 0.167 0.167 ...
#>  $ carb6:gear4-mean: num  -0.167 -0.167 -0.167 0.167 0.167 ...
#>  $ carb8:gear4-mean: num  -0.167 -0.167 -0.167 0.167 0.167 ...
#>  $ carb2:gear5-mean: num  0 0 0 0.167 -0.833 ...
#>  $ carb3:gear5-mean: num  0 0 0 0.167 0.167 ...
#>  $ carb4:gear5-mean: num  0 0 0 0.167 0.167 ...
#>  $ carb6:gear5-mean: num  0 0 0 0.167 0.167 ...
#>  $ carb8:gear5-mean: num  0 0 0 0.167 0.167 ...