Hierarchical Linear Modeling (HLM) is a type of regression model used frequently for education datasets. Education data typically select students from a set of schools and thus information about students are correlated (which is not great for the reason I state below). You can say this in a couple of different ways. I will try two:

- Students' outcomes are related to one another if they are in the same school (John and Mary have similar outcome scores because they sit right next to each other in the same school)
- Residuals from the regression model are not independent from one another (John's residual and Mary's residual are close)

With this type of data, classical methods, such as OLS regression, would not produce correct standard errors (standard errors will be underestimated). This is because the classical approach relies on the assumption that residuals are independent from each other.

Here "residuals" very roughly means the same thing as outcome values (though outcome values sit on the right side of the equation and residuals are on the left side of the equation. If you plot outcome values and residuals from the intercept-only model (no predictor model), they will form a straight line.

If you apply the classical approach (e.g., OLS) on hierarchically structured data, you will be underestimating the size of standard errors estimated for coefficients. This problem is also refereed to as clustering problem.

HLM is one approach to adjust for the clustering problem, so statistical test results are more realistic and conservative statistical testing. Personally, I understood HLM better when I learned Geospatial statistics, which is also an approach to fix the data dependency problem (observation from Washington DC and observation from Arlington VA are similar due to geographical proximity).

HLM is one of many approaches that deal with data dependency problem. It is only one approach and fixes one type of problem, leaving many other problems not fixed.

Parameter estimates (AKA, coefficients, effects), however, are not drastically different in classical methods and HLM. If OLS tells you the US junior high school students scored 555 points on average, HLM would give you almost the same information. However, standard errors would be larger for HLM than OLS, as HLM considers sources of errors more rigorously than OLS.

My manual for SAS PROC MIXED for doing HLM

- Syntax Example PROC MIXED
- The MANUAL Download an MS-WORD version
- Download a data for an exercise mentioned in the manual
- memo: how to get R-squares from SAS PROC MIXED

proc glimmix data=asdf;

class School_ID ;

model Y = X1 X2 /dist=normal link=identity s ddfm=kr;

random int / subject = School_ID;

output out=gmxout residual=resid;

run;

Special Topics

- Random effects vs. Fixed efffects
- When to use HLM, when not to use HLM http://www.kuekawa.com/when-to-use-hlm-hierarchical-linear-modeling/ 05 24 2016
- What does HLM solve? http://www.kuekawa.com/cluster-effect-what-does-hlm-solve/
- How between group variance can increase in HLM: http://www.kuekawa.com/if-between-group-variance-increased-instead-of-decreased-in-hlm/
- SAS PROC GLIMMIX: how to request a covariance stat test http://www.kuekawa.com/proc-glimmix-how-to-request-a-group-level-variance-size-stat-test/
- SAS PROC GLIMMIX: Automatically runs a fixed effect model when a random effect model fails to converge http://www.kuekawa.com/automate-the-choice-between-hlm-and-non-hlm/
- SAS PROC GLIMMIX and PROC MIXED produce identical results for linear modeling: http://www.kuekawa.com/proc-mixed-and-proc-glimmix-produce-identical-results/
- Kenward-Roger Degree of Freedom Option: http://www.kuekawa.com/kenward-roger-degrees-of-freedom-sas-glimmix/
- Experiment: what happens if we enter level2 variable at level 1 variable? http://www.kuekawa.com/hlm-what-happens-if-i-enter-level2-variables-at-level-1/