This post is to illustrate the differences in model parameter estimates for dummy-coded factors when the model includes an intercept versus when it does not.
Data
Dataset is 10k rows from Lending Club, years 2007-2014.
loan_status
The target values are 0 = Fully Paid and 1 = Charged off (default). The goal is to predict whether a
loan will default.
dti
A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested LC loan, divided by the borrower’s selfreported monthly income.
grade
Lending Club-assigned loan grade. “A” is good, “G” is bad, etc.
Reference level with an intercept
If we fit a logistic regression model, by default, an intercept will be estimated. In R, by default,
the first level is used as the reference level.
The logOdds model predictions for any given instance starts with a value equal to the intercept that the model estimated – in this case, -2.75. Then, if the instance has a grade anywhere between B..G, the logOdds is adjusted up or down (depending on the value of the estimate). If the instance has a grade
equal to the reference level (grade A), then no modification is made. This is because the intercept
is the estimated prediction for the reference level, Grade A, and all other estimates are relative to
Grade A.
Using a reference level has the benefit of adding interpretabilty to the z-scores for each of the
parameter estimates. Because each of the estimates is relative to the reference level, we can answer
the question “Are the average values for Grade B statisictally significantly different from those of Grade A?” The answer to that is found by looking at the z-score for the gradeB estimate – 5.954, which has
a very small associated p-value, therefore we can reject the hypothesis that the means of gradeA and gradeB are the same. All other grade parameter estimates can likewise be interpreted as pairwise comparisons between each level and gradeA. Comparisons between other pairs can be obtained by contrasting
least-square means for each grade level.
Least-square mean estimates:
And pairwise contrasts:
The z-score for the estimate of the intercept can be interpreted as “is the intercept significantly different from 0?”
No intercept, no reference level
Now examine the parameter estimates for when we instruct our model to not estimate an intercept.
Take note that the estimate for grade A is equivalent to the previously-estimated intercept. However,
the estimates for all other grade levels are no longer interpreted as being relative to any other level – but rather they are each interpreted as testing whether the weight is different from 0. So, while the estimates in isolation are much different from what they were previously, they are exactly equal to the previous relative estimates added to the previous intercept. The two models will lead
to exactly the same predictions.
Least-square means in this case are equivalent to what they were previously.
More than one predictor
Note that least-square mean estimates are not reliable when interaction terms are present in the model. Also, when more than one predictor is used, least-square mean estimates are the mean effect for each level
of the requested factor holding all other predictors constant at their averages. In other words, they
are mean estimates “controlling for” the effect of other predictors.
Tags: R, analytics
Dave Eargle is a Senior Consultant in Cybersecurity Assessment at Carve Systems. More about the author →