# Multilevel Rasch

After years of studying and applying quantitative social science research methods, it has become easier to catch glimpses of larger connections between seemingly independent methodological approaches. I have learned, for example, that survey development has a lot in common with test development. I have also learned that longitudinal data analysis is like spatial/geographic data analysis in that both attend to reference-based dependencies (i.e., one dimensional time and two-or-more dimensional space).

Many methods seem independent because they are taught in isolation as topics courses (e.g., survey methods) or only within disciplines where they are traditionally applied (e.g., social network analysis within sociology). I credit my professors with teaching me to see the larger connections. I also credit the University of Minnesota with offering interdisciplinary courses, such as Latent Variable Measurement Models and Path Analysis taught by Melanie Wall in Biostatistics--a course traditionally taught by psychologists.

Item response theory (IRT) scaling and multilevel modeling are two other quantitative methods that have more in common than they seem at first glance. The Rasch model is usually expressed as

$\text{Pr}(\theta_s) = \tfrac{1}{1+e^{-(\theta_s-b_i)}}$,

where $\inline \theta_s$ represents a test taker's latent ability, and $\inline b_i$ represents the difficulty of an item. The Rasch model can also be expressed as a generalized linear model (GLM) with a logit link:

$\text{ln}\left(\tfrac{\text{Pr}(Y_{is}=1)}{\text{Pr}(Y_{is}=0)}\right)=\beta_i+\theta_s$,

where $\inline Y_{is}$ is a dummy variable (1 if a correct response, 0 otherwise) and $\inline \beta_i$ is the easiness of an item.

Test development designs can be thought of as multistage sampling designs in which test takers represent primary sampling units and item responses represent secondary sampling units nested within test takers. Items can also be thought of as primary sampling units in which responses are nested. If item responses exhibit sizable intraclass correlation, then mixed-effects modeling may be appropriate to account for loss of statistical power relative to simple random sampling. Mixed-effects models may also help with vertical scaling and identifying differential item functioning (DIF) by considering fixed effects for test taker age and group membership. Some authors treat items as fixed effects; while others treat them as random effects. In the latter case, the Rasch model can be expressed as a two-level mixed model with crossed random effects:

\begin{align*} \small{\text{Level 1: Item responses}}\\ \text{ln}\left(\tfrac{\text{Pr}(Y_{mis}=1)}{\text{Pr}(Y_{mis}=0)}\right) &= \beta_{0is} + e_{mis}\\ \small{\text{Level 2: Items and test takers}}\\ \beta_{0is} &= \beta_{00} + r_{0i} + r_{0s} \end{align*}.

Chris - Thanks for this post, and the next one. Very helpful. I think all IRT model estimation will move in this direction, with time.

Regarding the nesting/crossing of responses, a confusing concept: authors use the terms inconsistently and, I think, incorrectly. Nesting only occurs when there are no duplicate levels of one variable as grouped by another. People can be nested within classrooms, but only if the people vector is unique, with every person in only one classroom.

In the same way, repeated measures designs would have timepoints crossed with people, unless all the people in each timepoint appeared only in that timepoint, in which case they'd be nested.

So, when modeling item-level responses, the items and people would be crossed, right?