R's stats::contr.helmert()
function is unscaled, meaning
that you need to scale the coefficients of a model fit to get the actual
comparisons of interest. This version will automatically scale the contrast
matrix such that the coefficients are the expected scaled values.
Details
Helmert coding compares each level to the total mean of all levels that have come before it. Differs from backward difference coding, which compares only pairs of levels (not a level to a cumulative mean of levels)
Example interpretation for a 4 level factor:
Intercept = Grand mean (mean of the means of each level)
grp2 = mean(grp2) - mean(grp1)
grp3 = mean(grp3) - mean(grp1, grp2)
grp4 = mean(grp4) - mean(grp1, grp2, grp3)
Examples
mydf <- data.frame(
grp = gl(4,5),
resp = c(seq(1, 5), seq(5, 9), seq(10, 14), seq(15, 19))
)
mydf <- set_contrasts(mydf, grp ~ helmert_code)
lm(resp ~ grp, data = mydf)
#>
#> Call:
#> lm(formula = resp ~ grp, data = mydf)
#>
#> Coefficients:
#> (Intercept) grp<2 grp<3 grp<4
#> 9.750 4.000 7.000 9.667
#>