Returns a named list of contrast matrices to use with modeling
functions directly. See set_contrasts()
for a function
to set contrasts directly to the dataframe. See details for syntax
information
Usage
enlist_contrasts(model_data, ..., verbose = getOption("contrastable.verbose"))
Arguments
- model_data
Data frame you intend on passing to your model
- ...
A series of 2 sided formulas with factor name on the left hand side and desired contrast scheme on the right hand side. The reference level can be set with
+
, the intercept can be overwritten with*
, comparison labels can be set using|
, and trends for polynomial coding can be removed using-
.- verbose
Logical, defaults to FALSE, whether messages should be printed
Value
List of named contrast matrices. Internally, if called within
set_contrasts, will return a named list with contrasts
equal to the list
of named contrast matrices and data
equal to the passed model_data
with
any factor coercions applied (so that set_contrasts()
doesn't need to do
it a second time).
Details
enlist_contrasts()
, set_contrasts()
,
and glimpse_contrasts()
use special syntax to set
contrasts for multiple factors. The syntax consists of two-sided formulas
with the desired factor column on the left hand side and the contrast
specification on the right hand side. For example, varname ~ scaled_sum_code
. Many contrasts support additional kinds of contrast
manipulations using overloaded operators:
+ X
: Set the reference level to the level named X. Only supported for schemes that have a singular reference level such assum_code()
,scaled_sum_code()
,treatment_code()
,stats::contr.treatment()
,stats::contr.sum()
,stats::contr.SAS()
. Ignored for schemes likehelmert_code()
.* X
: Overwrite the intercept to the mean of the level named X- A:B
: For polynomial coding schemes only, drop comparisons A through B.| c(...)
: Change the comparison labels for the contrast matrix to the character vectorc(...)
of lengthn-1
. These labels will appear in the output/summary of a statistical model. Note that forbrms::brm
, instances of-
(a minus sign) are replaced withM
.
You can also specify multiple variables on the left hand side of a formula using tidyselect helpers. See examples for more information.
Typically model functions like lm will have a contrasts argument where you
can set the contrasts at model run time, rather than having to manually
change the contrasts on the underlying factor columns in your data. This
function will return such a named list of contrast matrices to pass to these
functions. Note that this function should not be used within a modeling
function call, e.g., lm(y~x, data = model_data, contrasts =
enlist_contrasts(model_data, x~sum_code))
. Often, this will call
enlist_contrasts
twice, rather than just once.
For some model fitting functions, like brms::brm
, there is no
contrasts argument. For such cases, use set_contrasts()
to
set contrasts directly to the factors in a dataframe.
One good way to use enlist_contrasts()
is in conjunction
with MASS::fractions()
to create a list of matrices that can be printed
to explicitly show the entire contrast matrices you're using for your models.
This can be especially helpful for supplementary materials in an academic
paper.
Sometimes when using orthogonal polynomial contrasts from
stats::contr.poly()
people will drop higher level polynomials for
parsimony. Note however that these do capture some amount of variation, so
even though they're orthogonal contrasts the lower level polynomials will
have their estimates changed. Moreover, you cannot reduce a contrast matrix
to a matrix smaller than size n*n-1 in the dataframe you pass to a model
fitting function itself, as R will try to fill in the gaps with something
else. If you want to drop contrasts you'll need to use something like
enlist_contrasts(df, x ~ contr.poly - 3:5)
and pass this to the
contrasts
argument in the model fitting function.
Examples
my_df <- mtcars
my_df$gear <- factor(my_df$gear)
my_df$carb <- factor(my_df$carb)
# Use formulas where left hand side is the factor column name
# and the right hand side is the contrast scheme you want to use
enlist_contrasts(
my_df,
gear ~ scaled_sum_code,
carb ~ helmert_code,
verbose = FALSE
)
#> $gear
#> 4 5
#> 3 -0.3333333 -0.3333333
#> 4 0.6666667 -0.3333333
#> 5 -0.3333333 0.6666667
#>
#> $carb
#> <2 <3 <4 <6 <8
#> 1 -0.5 -0.3333333 -0.25 -0.2 -0.1666667
#> 2 0.5 -0.3333333 -0.25 -0.2 -0.1666667
#> 3 0.0 0.6666667 -0.25 -0.2 -0.1666667
#> 4 0.0 0.0000000 0.75 -0.2 -0.1666667
#> 6 0.0 0.0000000 0.00 0.8 -0.1666667
#> 8 0.0 0.0000000 0.00 0.0 0.8333333
#>
# Add reference levels with +
enlist_contrasts(
my_df,
gear ~ scaled_sum_code + 5,
carb ~ contr.sum + 6,
verbose = FALSE
)
#> $gear
#> 3 4
#> 3 0.6666667 -0.3333333
#> 4 -0.3333333 0.6666667
#> 5 -0.3333333 -0.3333333
#>
#> $carb
#> 1 2 3 4 8
#> 1 1 0 0 0 0
#> 2 0 1 0 0 0
#> 3 0 0 1 0 0
#> 4 0 0 0 1 0
#> 6 -1 -1 -1 -1 -1
#> 8 0 0 0 0 1
#>
# Manually specifying matrix also works
enlist_contrasts(
my_df,
gear ~ matrix(c(1, -1, 0, 0, -1, 1), nrow = 3),
carb ~ forward_difference_code,
verbose = FALSE
)
#> $gear
#> 4 5
#> 3 -1 -1
#> 4 1 0
#> 5 0 1
#>
#> $carb
#> 1-2 2-3 3-4 4-6 6-8
#> 1 0.8333333 0.6666667 0.5 0.3333333 0.1666667
#> 2 -0.1666667 0.6666667 0.5 0.3333333 0.1666667
#> 3 -0.1666667 -0.3333333 0.5 0.3333333 0.1666667
#> 4 -0.1666667 -0.3333333 -0.5 0.3333333 0.1666667
#> 6 -0.1666667 -0.3333333 -0.5 -0.6666667 0.1666667
#> 8 -0.1666667 -0.3333333 -0.5 -0.6666667 -0.8333333
#>
# User matrices can be assigned to a variable first, but this may make the
# comparison labels confusing. You should rename them manually to something
# that makes sense. This will invoke use_contrast_matrix, so reference levels
# specified with + will be ignored.
my_gear_contrasts <- matrix(c(1, -1, 0, 0, -1, 1), nrow = 3)
colnames(my_gear_contrasts) <- c("CMP1", "CMP2")
enlist_contrasts(
my_df,
gear ~ my_gear_contrasts,
carb ~ forward_difference_code,
verbose = FALSE
)
#> $gear
#> 4 5
#> 3 -1 -1
#> 4 1 0
#> 5 0 1
#>
#> $carb
#> 1-2 2-3 3-4 4-6 6-8
#> 1 0.8333333 0.6666667 0.5 0.3333333 0.1666667
#> 2 -0.1666667 0.6666667 0.5 0.3333333 0.1666667
#> 3 -0.1666667 -0.3333333 0.5 0.3333333 0.1666667
#> 4 -0.1666667 -0.3333333 -0.5 0.3333333 0.1666667
#> 6 -0.1666667 -0.3333333 -0.5 -0.6666667 0.1666667
#> 8 -0.1666667 -0.3333333 -0.5 -0.6666667 -0.8333333
#>
# Will inform you if there are factors you didn't set
enlist_contrasts(my_df, gear ~ scaled_sum_code)
#> Expect contr.treatment or contr.poly for unset factors: carb
#> $gear
#> 4 5
#> 3 -0.3333333 -0.3333333
#> 4 0.6666667 -0.3333333
#> 5 -0.3333333 0.6666667
#>
# Use MASS::fractions to pretty print matrices for academic papers:
lapply(enlist_contrasts(my_df, gear ~ scaled_sum_code, carb ~ helmert_code),
MASS::fractions)
#> $gear
#> 4 5
#> 3 -1/3 -1/3
#> 4 2/3 -1/3
#> 5 -1/3 2/3
#>
#> $carb
#> <2 <3 <4 <6 <8
#> 1 -1/2 -1/3 -1/4 -1/5 -1/6
#> 2 1/2 -1/3 -1/4 -1/5 -1/6
#> 3 0 2/3 -1/4 -1/5 -1/6
#> 4 0 0 3/4 -1/5 -1/6
#> 6 0 0 0 4/5 -1/6
#> 8 0 0 0 0 5/6
#>
# Use a list of formulas to use the same contrasts with different datasets
my_contrasts <- list(gear ~ scaled_sum_code, carb ~ helmert_code)
enlist_contrasts(my_df, my_contrasts)
#> $gear
#> 4 5
#> 3 -0.3333333 -0.3333333
#> 4 0.6666667 -0.3333333
#> 5 -0.3333333 0.6666667
#>
#> $carb
#> <2 <3 <4 <6 <8
#> 1 -0.5 -0.3333333 -0.25 -0.2 -0.1666667
#> 2 0.5 -0.3333333 -0.25 -0.2 -0.1666667
#> 3 0.0 0.6666667 -0.25 -0.2 -0.1666667
#> 4 0.0 0.0000000 0.75 -0.2 -0.1666667
#> 6 0.0 0.0000000 0.00 0.8 -0.1666667
#> 8 0.0 0.0000000 0.00 0.0 0.8333333
#>
enlist_contrasts(mtcars, my_contrasts)
#> Converting to factors: gear carb
#> $gear
#> 4 5
#> 3 -0.3333333 -0.3333333
#> 4 0.6666667 -0.3333333
#> 5 -0.3333333 0.6666667
#>
#> $carb
#> <2 <3 <4 <6 <8
#> 1 -0.5 -0.3333333 -0.25 -0.2 -0.1666667
#> 2 0.5 -0.3333333 -0.25 -0.2 -0.1666667
#> 3 0.0 0.6666667 -0.25 -0.2 -0.1666667
#> 4 0.0 0.0000000 0.75 -0.2 -0.1666667
#> 6 0.0 0.0000000 0.00 0.8 -0.1666667
#> 8 0.0 0.0000000 0.00 0.0 0.8333333
#>
# Use tidyselect helpers to set multiple variables at once
# These are all equivalent
contr_list1 <- enlist_contrasts(mtcars,
cyl ~ sum_code, gear ~ sum_code,
verbose = FALSE)
contr_list2 <- enlist_contrasts(mtcars,
cyl + gear ~ sum_code,
verbose = FALSE)
contr_list3 <- enlist_contrasts(mtcars,
c(cyl, gear) ~ sum_code,
verbose = FALSE)
contr_list4 <- enlist_contrasts(mtcars,
all_of(c('cyl', 'gear')) ~ sum_code,
verbose = FALSE)
these_vars <- c("cyl", "gear")
contr_list5 <- enlist_contrasts(mtcars,
all_of(these_vars) ~ sum_code,
verbose = FALSE)
all.equal(contr_list1, contr_list2)
#> [1] TRUE
all.equal(contr_list2, contr_list3)
#> [1] TRUE
all.equal(contr_list3, contr_list4)
#> [1] TRUE
all.equal(contr_list4, contr_list5)
#> [1] TRUE
# You can also use [tidyselect::where()] with class checking helpers:
contr_list6 <- enlist_contrasts(mtcars,
where(is.numeric) ~ sum_code,
verbose = FALSE)
# Each variable name must only be set ONCE, e.g. these will fail:
try(enlist_contrasts(mtcars,
cyl ~ sum_code,
cyl ~ scaled_sum_code,
verbose = FALSE))
#> Error : In `pkgdown::build_site_github_pages(new_process = FALSE, install = ...`:
#> Names must be unique.
#> ✖ These names are duplicated:
#> * "cyl" at locations 1 and 2.
#> ℹ Left hand side of multiple formulas evaluated to the same column name
try(enlist_contrasts(mtcars,
cyl ~ sum_code,
all_of(these_vars) ~ scaled_sum_code,
verbose = FALSE))
#> Error : In `pkgdown::build_site_github_pages(new_process = FALSE, install = ...`:
#> Names must be unique.
#> ✖ These names are duplicated:
#> * "cyl" at locations 1 and 2.
#> ℹ Left hand side of multiple formulas evaluated to the same column name
try(enlist_contrasts(mtcars,
cyl ~ sum_code,
where(is.numeric) ~ scaled_sum_code,
verbose = FALSE))
#> Error : In `pkgdown::build_site_github_pages(new_process = FALSE, install = ...`:
#> Names must be unique.
#> ✖ These names are duplicated:
#> * "cyl" at locations 1 and 3.
#> ℹ Left hand side of multiple formulas evaluated to the same column name