After years of studying and applying quantitative social science research methods, it has become easier to catch glimpses of larger connections between seemingly independent methodological approaches. I have learned, for example, that survey development has a lot in common with test development. I have also learned that longitudinal data analysis is like spatial/geographic data analysis in that both attend to reference-based dependencies (i.e., one dimensional time and two-or-more dimensional space).

Many methods seem independent because they are taught in isolation as topics courses (e.g., survey methods) or only within disciplines where they are traditionally applied (e.g., social network analysis within sociology). I credit my professors with teaching me to see the larger connections. I also credit the University of Minnesota with offering interdisciplinary courses, such as Latent Variable Measurement Models and Path Analysis taught by Melanie Wall in Biostatistics--a course traditionally taught by psychologists.

Item response theory (IRT) scaling and multilevel modeling are two other quantitative methods that have more in common than they seem at first glance. The Rasch model is usually expressed as

where represents a test taker's latent ability, and represents the *difficulty* of an item. The Rasch model can also be expressed as a generalized linear model (GLM) with a logit link:

where is a dummy variable (1 if a correct response, 0 otherwise) and is the *easiness* of an item.

Test development designs can be thought of as multistage sampling designs in which test takers represent primary sampling units and item responses represent secondary sampling units nested within test takers. Items can also be thought of as primary sampling units in which responses are nested. If item responses exhibit sizable intraclass correlation, then mixed-effects modeling may be appropriate to account for loss of statistical power relative to simple random sampling. Mixed-effects models may also help with vertical scaling and identifying differential item functioning (DIF) by considering fixed effects for test taker age and group membership. Some authors treat items as fixed effects; while others treat them as random effects. In the latter case, the Rasch model can be expressed as a two-level mixed model with crossed random effects:

Chris - Thanks for this post, and the next one. Very helpful. I think all IRT model estimation will move in this direction, with time.

Regarding the nesting/crossing of responses, a confusing concept: authors use the terms inconsistently and, I think, incorrectly. Nesting only occurs when there are no duplicate levels of one variable as grouped by another. People can be nested within classrooms, but only if the people vector is unique, with every person in only one classroom.

In the same way, repeated measures designs would have timepoints crossed with people, unless all the people in each timepoint appeared only in that timepoint, in which case they'd be nested.

So, when modeling item-level responses, the items and people would be crossed, right?

Hi Tony, Sorry it has taken me so long to respond. I agree that the terms nesting and crossing can be confusing. In the above example, it seems to me that item-level responses are nested within people and within items, but people and items are crossed. When every person sees every item, then they're fully crossed. In multi-form or computerized adaptive testing situations, people and items partially cross. I think of nesting as the consequence of multi-stage sampling and nesting factors as facets under G-theory. Bates also acknowledges that the terms are confusing, and he devotes considerable attention to how lmer() handles crossing and nesting:

lme4.r-forge.r-project.org/book/Ch2.pdf