Multilevel modeling: From lme4 to Mplus

My dissertation research involves multivariate multilevel modeling. I would prefer to use lme4, but it can only regress one dependent variable at a time. I am using Mplus because it allows the necessary constraints for multivariate multilevel modeling.

In this entry I demonstrate some univariate multilevel modeling constraints, following lmer's sleepstudy example. Unlike other Mplus multilevel modeling demonstrations, I avoid Mplus' syntactic shortcuts (i.e., "TYPE = TWOLEVEL") in order to highlight the constraints and the underlying measurement assumptions made in conventional multilevel modeling.

I replicate the lmer example first (see below). One of the first differences to note is that Mplus does not apply restricted maximum likelihood, so the REML argument must be set to FALSE. Additionally, I recast the sleepstudy data from long to wide for Mplus. The Mplus estimates match the lmer results. How? As annotated below, Mplus achieves the same results as conventional multilevel models by:

1. treating the fixed effects (at level two) as latent variables (indicated in this example by level-one observations at points in time)
2. assuming and imposing configural, scalar and uniqueness invariance.

Configural invariance is imposed by constraining loadings on the latent intercept to 1. Scalar invariance is imposed by constraining indicator intercepts to 0. Uniquenesses are estimated but constrained to be equal. This makes sense from a Generalizability Theory perspective when you consider that the measurement error variance component is for a single observation drawn from a universe of admissible observations.

lmer's sleepstudy example with random slopes for time

> summary(lmer(Reaction ~ Days + (Days | Subject), sleepstudy, REML = F))
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: Reaction ~ Days + (Days | Subject)
Data: sleepstudy

AIC       BIC    logLik  deviance
1763.9393 1783.0971 -875.9697 1751.9393

Random effects:
Groups   Name        Variance Std.Dev. Corr
Subject  (Intercept) 565.52   23.781
Days         32.68    5.717   0.08
Residual             654.94   25.592
Number of obs: 180, groups: Subject, 18

Fixed effects:
Estimate Std. Error t value
(Intercept)  251.405      6.632   37.91
Days          10.467      1.502    6.97

Correlation of Fixed Effects:
(Intr)
Days -0.138


Mplus random slopes example with sleepstudy data

INPUT INSTRUCTIONS

TITLE: Replication of lmer sleepstudy random slopes example;
DATA: FILE = "Sleepstudy.dat";
VARIABLE:
NAMES = Subject React_0 React_1 React_2 React_3 React_4 React_5 React_6 React_7
React_8 React_9;
MISSING=.;
IDVARIABLE IS Subject;
USEVARIABLE = Subject React_0 React_1 React_2 React_3 React_4 React_5 React_6 React_7
React_8 React_9;
ANALYSIS:
ESTIMATOR = ML;
MODEL:

!Specify intercept fixed effect (latent intercept factor).
FI by React_0-React_9@1;
[FI]; !intercept fixed effect (mean)

!Specify fixed effect for days (latent slope factor).
FS by React_0@0 React_1@1 React_2@2 React_3@3 React_4@4 React_5@5
React_6@6 React_7@7 React_8@8 React_9@9;
[FS]; !slope fixed effect (mean)

!Suppress indicator intercepts; assume scalar invariance.
[React_0 - React_9 @0];

!Specify residual (error) variance component; assume invariant uniquenesses.
React_0 - React_9 (1);

!Specify factor (facet) variance and covariance components.
FI FS;
FI with FS;

!Suppress slope variance if specifying a random intercept model.
!FS@0;
!FI with FS @0;

Replication of lmer sleepstudy random slopes example;

SUMMARY OF ANALYSIS

Number of groups                                                 1
Number of observations                                          18

Number of dependent variables                                   10
Number of independent variables                                  0
Number of continuous latent variables                            2

Observed dependent variables

Continuous
REACT_0     REACT_1     REACT_2     REACT_3     REACT_4     REACT_5
REACT_6     REACT_7     REACT_8     REACT_9

Continuous latent variables
FI          FS

Variables with special functions

ID variable           SUBJECT

Estimator                                                       ML
Information matrix                                        OBSERVED
Maximum number of iterations                                  1000
Convergence criterion                                    0.500D-04
Maximum number of steepest descent iterations                   20
Maximum number of iterations for H1                           2000
Convergence criterion for H1                             0.100D-03

Input data file(s)
C:/Sleepstudy.dat

Input data format  FREE

SUMMARY OF DATA

Number of missing data patterns             1

COVARIANCE COVERAGE OF DATA

[omitted: 100% coverage]

THE MODEL ESTIMATION TERMINATED NORMALLY

TESTS OF MODEL FIT

Chi-Square Test of Model Fit

Value                            141.141
Degrees of Freedom                    59
P-Value                           0.0000

Chi-Square Test of Model Fit for the Baseline Model

Value                            260.682
Degrees of Freedom                    45
P-Value                           0.0000

CFI/TLI

CFI                                0.619
TLI                                0.710

Loglikelihood

H0 Value                        -875.970
H1 Value                        -805.399

Information Criteria

Number of Free Parameters              6
Akaike (AIC)                    1763.939
Bayesian (BIC)                  1769.282
(n* = (n + 2) / 24)

RMSEA (Root Mean Square Error Of Approximation)

Estimate                           0.278
90 Percent C.I.                    0.220  0.337
Probability RMSEA <= .05           0.000

SRMR (Standardized Root Mean Square Residual)

Value                              0.379

MODEL RESULTS

Two-Tailed
Estimate       S.E.  Est./S.E.    P-Value

FI       BY
REACT_0            1.000      0.000    999.000    999.000
REACT_1            1.000      0.000    999.000    999.000
REACT_2            1.000      0.000    999.000    999.000
REACT_3            1.000      0.000    999.000    999.000
REACT_4            1.000      0.000    999.000    999.000
REACT_5            1.000      0.000    999.000    999.000
REACT_6            1.000      0.000    999.000    999.000
REACT_7            1.000      0.000    999.000    999.000
REACT_8            1.000      0.000    999.000    999.000
REACT_9            1.000      0.000    999.000    999.000

FS       BY
REACT_0            0.000      0.000    999.000    999.000
REACT_1            1.000      0.000    999.000    999.000
REACT_2            2.000      0.000    999.000    999.000
REACT_3            3.000      0.000    999.000    999.000
REACT_4            4.000      0.000    999.000    999.000
REACT_5            5.000      0.000    999.000    999.000
REACT_6            6.000      0.000    999.000    999.000
REACT_7            7.000      0.000    999.000    999.000
REACT_8            8.000      0.000    999.000    999.000
REACT_9            9.000      0.000    999.000    999.000

FI       WITH
FS                11.055     42.876      0.258      0.797

Means
FI               251.405      6.632     37.906      0.000
FS                10.467      1.502      6.968      0.000

Intercepts
REACT_0            0.000      0.000    999.000    999.000
REACT_1            0.000      0.000    999.000    999.000
REACT_2            0.000      0.000    999.000    999.000
REACT_3            0.000      0.000    999.000    999.000
REACT_4            0.000      0.000    999.000    999.000
REACT_5            0.000      0.000    999.000    999.000
REACT_6            0.000      0.000    999.000    999.000
REACT_7            0.000      0.000    999.000    999.000
REACT_8            0.000      0.000    999.000    999.000
REACT_9            0.000      0.000    999.000    999.000

Variances
FI               565.516    265.266      2.132      0.033
FS                32.682     13.573      2.408      0.016

Residual Variances
REACT_0          654.941     77.186      8.485      0.000
REACT_1          654.941     77.186      8.485      0.000
REACT_2          654.941     77.186      8.485      0.000
REACT_3          654.941     77.186      8.485      0.000
REACT_4          654.941     77.186      8.485      0.000
REACT_5          654.941     77.186      8.485      0.000
REACT_6          654.941     77.186      8.485      0.000
REACT_7          654.941     77.186      8.485      0.000
REACT_8          654.941     77.186      8.485      0.000
REACT_9          654.941     77.186      8.485      0.000

QUALITY OF NUMERICAL RESULTS

Condition Number for the Information Matrix              0.453E-04
(ratio of smallest to largest eigenvalue)


Measures of Effective Teaching weighting schemes: An improved plot

I came across the following plot while reading "Ensuring Fair and Reliable Measures of Effective Teaching".

Maybe I've spent too much time reading Andrew Gelman's blog because I noticed two issues with the plot that could be improved. The most serious issue is that the first two sets of bars display correlation coefficients, but the last set of bars displays squared correlation coefficients. It doesn't matter that the first two sets represent a type of validity and the last represents reliability. What matters is that the bars incorrectly imply a common scale for the entire figure and could mislead readers. Perhaps the authors made the somewhat common mistake of assuming that a reliability coefficient is a correlation coefficient because both are denoted r? It's probably just a graphic design mistake that the authors didn't catch. The second issue I noticed is far less serious (and is debatable). Since the bars do not represent counts, the coefficients should be displayed as dots.

I used to create an improved (but less colorful) version of their plot that displays only correlation coefficients with dots instead of bars. The new version shows more clearly that each variety of teacher effectiveness composite score "predicts" itself (from a test-retest perspective) better than any of them predict the criteria. In other words, the improved plot illustrates how score reliability is a necessary but insufficient condition that can act as a ceiling on validity.

Applying generalizability theory with R: A package

I have contributed my generalizability functions to the QME: Quantitative Methods in Education package. I described my efforts in an earlier post. This project continues to be a great learning experience for me, especially in terms of seeing how classical test theory, generalizability theory, and multilevel modeling overlap. I will probably submit a proposal to demonstrate the package at either NCME or AERA in 2014. You can use the following commands to install the package:

library(devtools)
install_github("QME", "zief0002")


Which schools are closing achievement gaps in Minnesota?

Minnesota has large and persistent achievement gaps. It's an empirically supported proposition no matter how you look at it. But if we are going to tackle the problem, it does matter how we look at it! Well-intentioned advocates and researchers sometimes talk about "the achievement gap" as if there is one definition. Are they referring to test score differences? Or graduation rates? Or gaps in educational opportunities? Are they referring to math? Or reading? Or science? Are they referring to students of color? Or students in poverty? Or English learners? Or special education students? Or all four groups? They don't often say. Referring to "the achievement gap" is good rhetoric, but I contend that a lack of specificity is impeding progress. I also contend that we can learn from schools that are closing achievement gaps.

Advocates and researchers would be wise to use the Minnesota Department of Education's (MDE) achievement gap reduction measure when possible. MDE quantifies gaps as the difference between the growth of historically under-achieving groups relative to more advantaged groups. The calculation yields a negative number if a school helped reduce achievement gaps in Minnesota and a positive number if a school added to existing gaps. (See page 78 of Minnesota's approved No Child Left Behind (NCLB) waiver for the methodology.) Here's why I think advocates and researchers should consistently publicize and track the achievement gap reduction measure:

• It's official. Minnesota's accountability system holds schools accountable for achievement gap reduction (i.e., it's one of the Multiple Measures). It's recognized by the U.S. Department of Education, it's reported annually by MDE, and it's increasingly familiar to teachers, administrators, and parents.
• It's fair. Student growth is a better measure of the contribution of a school or intervention than proficiency rates, which reflect the economic affluence and selectivity. When we over-emphasize proficiency it prevents us from finding schools in lower-income areas that are closing gaps. Growth measures only compare students and schools to those with similar achievement at the beginning of the school year.
• It's embodies a theory of action. As long as some groups are starting the school year behind, then the only way to close gaps is to accelerate their learning relative to their more advantaged peers. It's a race, and traditional achievement gap measures paint a picture of the starting line. The achievement gap reduction measure tells the pace of learning over a school year.

I used to create an interactive map to help advocates and researchers become familiar with the achievement gap reduction measure and encourage recognition of schools that are helping close achievement gaps in Minnesota. The data underlying the map are publicly available from MDE's Data Center. MDE does not publish achievement gap reduction for each student group--it only publishes overall achievement gap reduction measures (i.e., student-weighted averages across all historically lower-achieving student groups). I averaged the annual values over three years (2010-2012) to improve precision. You can zoom in and click on a school to learn more about it. Now that we know which schools are helping close achievement gaps in Minnesota, we should recognize their successes and learn from them. What are the successful schools doing? How can we encourage other schools to adopt their strategies?

Schools helping reduce achievement gaps in Minnesota: Top schools by type
Type School District Direction of gaps Achievement gap reduction
ALP Bemidji High ALP BEMIDJI PUBLIC SCHOOL DISTRICT Closing -0.13
CHARTER Twin Cities International Elem TWIN CITIES INTERNATIONAL ELEM SCH. Closing -0.33
CHARTER Rochester Math and Science Academy ROCHESTER MATH AND SCIENCE ACADEMY Closing -0.18
CHARTER Seed Academy and Harvest Prep school HARVEST PREP SCHOOL-SEED ACADEMY Closing -0.18
CHARTER Schoolcraft Learning Community SCHOOLCRAFT LEARNING COMMUNITY CHTR Closing -0.15
ELEM Atheneum Elem INVER GROVE HEIGHTS SCHOOLS Closing -0.71
ELEM Winsted Elem HOWARD LAKE-WAVERLY-WINSTED Closing -0.39
ELEM Kimberly Lane Elem WAYZATA PUBLIC SCHOOL DISTRICT Closing -0.36
ELEM Withrow Elem STILLWATER AREA PUBLIC SCHOOL DIST. Closing -0.32
ELEM Lake Marion Elem LAKEVILLE PUBLIC SCHOOL DISTRICT Closing -0.31
ELEM Goodview Elem WINONA AREA PUBLIC SCHOOL DISTRICT Closing -0.31
ELEM Forest Hills Elem EDEN PRAIRIE PUBLIC SCHOOL DISTRICT Closing -0.28
ELEM Washington Elem MANKATO PUBLIC SCHOOL DISTRICT Closing -0.27
ELEM Frazee-Vergas Elem FRAZEE-VERGAS PUBLIC SCHOOL DIST. Closing -0.26
HIGH Arlington High SIBLEY EAST SCHOOL DISTRICT Closing -0.31
HIGH St. Anthony Village High ST. ANTHONY-NEW BRIGHTON SCHOOLS Closing -0.21
HIGH Hawley High HAWLEY PUBLIC SCHOOL DISTRICT Closing -0.12
HIGH Mounds View High MOUNDS VIEW PUBLIC SCHOOL DISTRICT Closing -0.08
MIDD Minnetonka East MINNETONKA PUBLIC SCHOOL DISTRICT Closing -0.21
MIDD Valley View Middle EDINA PUBLIC SCHOOL DISTRICT Closing -0.18
MIDD Minnetonka West Middle MINNETONKA PUBLIC SCHOOL DISTRICT Closing -0.14
MIDD New Prague Middle NEW PRAGUE AREA SCHOOLS Closing -0.12
MIDD Wayzata Central Middle WAYZATA PUBLIC SCHOOL DISTRICT Closing -0.12
MIDD Sauk Rapids-Rice Middle SAUK RAPIDS-RICE PUBLIC SCHOOLS Closing -0.12
MIDD Mahtomedi Middle MAHTOMEDI PUBLIC SCHOOL DISTRICT Closing -0.10
MIDD Sartell Middle SARTELL-ST. STEPHEN SCHOOL DISTRICT Closing -0.10
OTHER Park Spanish Immersion El. ST. LOUIS PARK PUBLIC SCHOOL DIST. Closing -0.20
OTHER School of Environmental Studies ROSEMOUNT-APPLE VALLEY-EAGAN Closing -0.19
OTHER Friedell Middle ROCHESTER PUBLIC SCHOOL DISTRICT Closing -0.09

Remembering Hap Casmey inspired me to write this entry. I enjoyed getting to know Hap later in his life and hearing about his influence on education policy in Minnesota.

Evaluation 2012 presentation

The annual conference of the American Evaluation Association came to Minneapolis last week. I enjoyed getting re-acquainted with many of my evaluation colleagues and learning about evaluation studies from around the world without leaving the city I call home. I presented a poster titled "Teacher evaluation and the Standards of Effective Instruction rating instrument: Psychometric considerations".

Applying generalizability theory with R

| 1 Comment

One of the things I like about my current job is getting to apply generalizability theory. I have to confess that I didn't appreciate G theory at first, but now that I'm analyzing data from performance assessments (i.e., ratings of instructional effectiveness) I think it's great for a number of reasons. I like the concept of universe scores, treating items as a random facet, estimating multiple sources of error, the ability to focus on different objects of measurement (e.g., ratees and raters), and G theory's strong emphasis on finding ways to increase generalizability (i.e., decision studies). And as someone who has far more experience with item response theory (IRT) where conditional standard errors of measurement (CSEMs) are the norm, I'm especially intrigued by the prospect of using G theory to estimate CSEMs.

It's pretty obvious from this blog that I like statistical programming with because it helps me learn methods far better than simply reading about them or using a pre-existing software package. In that vein, I decided to write some functions to conduct G and D studies with . I've been using synthetic data from Brennan's book and Shavelson and Webb's book to ensure that my results match theirs. The nice thing about using for G theory is that you can choose functions for many different packages to fully apply G theory. For example, by using lme4 to estimate variance components, arm to extract standard errors of random intercepts, and ggplot2 to create faceted plots, I am able to create "vertical ruler" plots, which are prevalent among practitioners of many-facets Rasch. Such plots make it easy to identify the (statistically significantly) highest and lowest performers as well as the most lenient and stringent raters.

The plots below illustrate how I've been implementing G theory with . The plots were produced from Brennan's synthetic data set number 4, and the dependability coefficients match those from GENOVA. I've never developed an package, but given that no one else has so far, I'd be interested in doing so with the code I'm developing.

Estimating congeneric reliability with R

| 1 Comment

I was recently tasked with estimating the reliability of scores composed of items on different scales. The lack of a common scale could be seen in widely varying item score means and variances. As discussed by Graham, it's not widely known in applied educational research that Cronbach's alpha assumes essential tau-equivalence and underestimates reliability when the assumption isn't met. A friend made a good suggestion to consider stratified alpha, which is commonly used to estimate internal consistency when scores are composed of multiple choice items (scored 0-1) and constructed response items (e.g., scored 0-4). However, the assessment with which I was concerned does not have clear item-type strata. I decided to estimate congeneric reliability because it makes few assumptions (e.g., unidimensionality) and doesn't require grouping items into essentially tau-equivalent strata.

With the help of Graham's paper and a LISREL tutorial by Raykov I wrote an program that estimates congeneric reliability. The program uses Fox's sem() library to conduct the confirmatory factor analysis with minimal constraints (variance of common true score fixed at 1 for identifiability). The estimated loadings and error variances are then summed to calculate reliability (i.e., the ratio of true score variance to observed score variance) as:

$\hat{\rho}_{XX'} = \frac { \left( \sum \hat{\lambda}_i \right) ^ 2} {\left( \sum \hat{\lambda}_i \right) ^ 2 + \sum \hat{\delta}_{i} }$.

I obtained a congeneric reliability estimate of 0.80 and an internal consistency estimate of 0.58 for the scores I was analyzing. If the items had been essentially tau-equivalent, then the reliability estimates would have been the same. If I had assumed tau-equivalence, than I would have underestimated the reliability of the total scores (and overestimated standard errors of measurement). The example below replicates Graham's heuristic example.

> ############################################
> #Replicate results from Graham (2006) to check congeneric reliability code/calculations.
>
> library(sem)
> library(psych)
> library(stringr)
>
> #Variance/covariance matrix
> S.graham <- readMoments(diag = T, names = paste("x", 1:7, sep = ""))
1:  4.98
2:  4.60  5.59
4:  4.45  4.42  6.30
7:  3.84  3.81  3.66  6.44
11:  5.71  5.67  5.52  4.91 11.86
16: 23.85 23.68 22.92 19.87 34.28 127.65
22: 46.53 46.20 44.67 38.57 62.30 244.36 471.95
29:
>
> ############################################
> #A function to estimate and compare congeneric reliability and internal consistency
> funk.congeneric <- function(cfa.out) {
+   names.loadings <- str_detect(names(cfa.out$coeff), "loading") + names.errors <- str_detect(names(cfa.out$coeff), "error")
+   r.congeneric <- sum(cfa.out$coeff[names.loadings]) ^ 2 / + (sum(cfa.out$coeff[names.loadings]) ^ 2 + sum(cfa.out$coeff[names.errors])) + round(c("Congeneric" = r.congeneric, "Alpha" = alpha(cfa.out$S)$total$raw_alpha), 2)
+ }
>
> ############################################
> #Congeneric model; tau-equivalent items
> model.graham <- specifyModel()
6: x1 <-> x1, error1
7: x2 <-> x2, error2
8: x3 <-> x3, error3
9: x4 <-> x4, error4
10: x5 <-> x5, error5
11: T <-> T, NA, 1
12:
> cfa.out <- sem(model = model.graham, S = S.graham, N = 60)
> summary(cfa.out)

Model Chisquare =  0.11781   Df =  5 Pr(>Chisq) = 0.99976
Chisquare (null model) =  232.13   Df =  10
Goodness-of-fit index =  0.9992
RMSEA index =  0   90% CI: (NA, NA)
Bentler-Bonnett NFI =  0.99949
Tucker-Lewis NNFI =  1.044
Bentler CFI =  1
SRMR =  0.0049092
AIC =  20.118
AICc =  4.6076
BIC =  41.061
CAIC =  -25.354

Normalized Residuals
Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
-0.038700 -0.008860 -0.000002  0.004700  0.002450  0.119000

R-square for Endogenous Variables
x1     x2     x3     x4     x5
0.9286 0.8175 0.6790 0.4962 0.5968

Parameter Estimates
Estimate Std Error z value Pr(>|z|)
error1   0.35559  0.17469   2.0356  4.1793e-02 x1 <--> x1
error2   1.02000  0.25339   4.0255  5.6861e-05 x2 <--> x2
error3   2.02222  0.41688   4.8509  1.2293e-06 x3 <--> x3
error4   3.24471  0.62679   5.1767  2.2583e-07 x4 <--> x4
error5   4.78227  0.94911   5.0387  4.6871e-07 x5 <--> x5

Iterations =  21
> pathDiagram(cfa.out, edge.labels = "values", ignore.double = F, rank.direction = "TB")


> funk.congeneric(cfa.out)
Congeneric      Alpha
0.91       0.91
>
> ############################################
> #Congeneric model; tau-inequivalent items
> model.graham <- specifyModel()
6: x1 <-> x1, error1
7: x2 <-> x2, error2
8: x3 <-> x3, error3
9: x4 <-> x4, error4
10: x7 <-> x7, error7
11: T <-> T, NA, 1
12:
> cfa.out <- sem(model = model.graham, S = S.graham, N = 60)
> summary(cfa.out)

Model Chisquare =  0.0072298   Df =  5 Pr(>Chisq) = 1
Chisquare (null model) =  353.42   Df =  10
Goodness-of-fit index =  0.99995
RMSEA index =  0   90% CI: (NA, NA)
Bentler-Bonnett NFI =  0.99998
Tucker-Lewis NNFI =  1.0291
Bentler CFI =  1
SRMR =  0.0010915
AIC =  20.007
AICc =  4.497
BIC =  40.951
CAIC =  -25.464

Normalized Residuals
Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
-2.76e-02 -1.27e-04 -1.00e-07 -1.92e-03  1.96e-04  3.70e-03

R-square for Endogenous Variables
x1     x2     x3     x4     x7
0.9303 0.8171 0.6778 0.4942 0.9902

Parameter Estimates
Estimate Std Error z value  Pr(>|z|)
error1    0.34688 0.090227   3.84451 1.2080e-04 x1 <--> x1
error2    1.02240 0.200635   5.09584 3.4720e-07 x2 <--> x2
error3    2.02973 0.382466   5.30694 1.1148e-07 x3 <--> x3
error4    3.25764 0.605376   5.38118 7.3998e-08 x4 <--> x4
error7    4.64661 6.387765   0.72742 4.6697e-01 x7 <--> x7

Iterations =  38
> pathDiagram(cfa.out, edge.labels = "values", ignore.double = F, rank.direction = "TB")


> funk.congeneric(cfa.out)
Congeneric      Alpha
0.99       0.56


Converting Rasch units (RITs) to thetas

How do you place Measures of Academic Progress (MAP) Rash unit scores (RITs) on the normal theta metric (and vice versa)? I searched the Web but couldn't find a straightforward answer. What I've gathered from various sources is that RIT scores are just logits multiplied by 10 and centered on 200. From there, logits can be transformed to thetas based on the probabilities of the respective distributions, as shown in the code below.

RIT scores are new to me because I've worked mostly with the 3-parameter model until recently. I've noticed that a lot psychometricians who use item response theory (IRT) fall into one of two camps: Rasch or 3PL. My personal preference is somewhere in the middle. I think 1) it's unrealistic to assume items discriminate equally and 2) estimates of a are useful for test construction and adaptive selection. The c parameter, on the other hand, is costly to estimate and not entirely essential. So I guess you could say I'm a dispassionate 2PL guy.

> #Simulate some thetas (z-scores).
> thetas <- sort(rnorm(20))
> thetas
[1] -1.34485345 -1.08206871 -1.04673807 -0.98990444
[5] -0.96802821 -0.81766629 -0.56671883 -0.18795885
[9] -0.17779446 -0.08314090 -0.04615256  0.13611619
[13]  0.13796440  0.43172163  0.47472683  0.57965345
[17]  0.64315758  0.71571489  1.10205842  1.16067809
>
> #Convert thetas to probabilities then logits.
> p <- pnorm(thetas)
> logits <- log(p / (1 - p))
>
> #Convert logits to RITs.
> RITs <- logits * 10 + 200
> RITs
[1] 176.7823 181.8148 182.4653 183.5001 183.8947 186.5552
[7] 190.8243 196.9958 197.1587 198.6728 199.2634 202.1739
[13] 202.2035 206.9477 207.6533 209.3914 210.4565 211.6874
[19] 218.5559 219.6538
>
> #Place RITs on normal theta metric.
> funk.RITs.thetas <- function(x) {
+   qnorm(plogis((x - 200) / 10))
+ }
> funk.RITs.thetas(RITs)
[1] -1.34485345 -1.08206871 -1.04673807 -0.98990444
[5] -0.96802821 -0.81766629 -0.56671883 -0.18795885
[9] -0.17779446 -0.08314090 -0.04615256  0.13611619
[13]  0.13796440  0.43172163  0.47472683  0.57965345
[17]  0.64315758  0.71571489  1.10205842  1.16067809


Urban canoeing: Dining at The Sample Room

I finally found some time to upgrade my Wike canoe trailer. As I mentioned in an earlier post, the trailer had a hard time handling Scrappy, my 17' Alumacraft canoe. I added a U-bolt to adequately secure the bow, and I reinforced the aluminum axle with an iron rod to keep it from bending. It's working well enough to resume canoeing around Minneapolis without a car shuttle.

Amy and I took the trailer down Saint Anthony Parkway to the Mississippi River for some paddling at dusk. We put in at the Camden Railroad Bridge. The moon was huge, so we had no trouble navigating after the sun went down. You can see the moon above Psycho Suzi's Motor Lounge in the picture below.

We got hungry on the river, so we stopped at one of our favorite spots in Northeast: The Sample Room. The restaurant conveniently has its own dock and stairway up the steep bank of the river. Our server even comped us a drink for canoeing in with our life jackets! After dining on the patio, we continued down the Mississippi to Boom Island Park and rode our bikes home from there. It was a fun way to spend an evening in Minneapolis!

Urban canoeing: North/Northeast Minneapolis

Back in July, my friend, Joe, and I were laid off during the state government shutdown. It was a stressful time, not knowing if or how the shutdown would end and having to cut back on expenses to make up for lost income. We decided to put our worries behind us for a day and make the most of our time off by taking a canoe trip down the Mississippi River.

We started under a rainy sky in North Minneapolis, at the Camden Avenue bridge boat launch. I had never paddled the northern section of the Mississippi as it passes through Minneapolis. It's an industrial stretch of river, but plans have been developed to transform it into a more attractive and accessible destination, much like the river's character further downstream.

A highlight of our trip occurred just above the uppermost lock on the Mississippi River, near Saint Anthony Falls. As we explored the northern channel around Nicollet Island, we paddled past a ceremony honoring the ratifications of an Urban Conservation Treaty for Migratory Birds attended by Mayor R.T. Rybak and several other environmental enthusiasts. A local news station filmed us paddling by to highlight the outdoor recreation potential of urban rivers and the importance of conserving them. Check out my sweet J-strokes!

The City of Minneapolis recently held a design competition for the stretch "Above the Falls." The video accompanying the winning proposal is below. As a resident of Northeast Minneapolis (river left), it's thrilling to envision an inviting riverfront right in my neighborhood.

Creating accessible tables and graphs with R

I was recently given the task of improving the accessibility of a statistical report, with the goal of meeting level AA of the Web Content Accessibility Guidelines 2.0. The report contains about 600 pages of tables and graphs in a DOC file. The rule of thumb for tables is to ensure that a visually impaired person can use the tab key to navigate easily across cells in a row from the top left of the table to the bottom right. Additionally, tables and graphs should contain alternative titles and descriptions.

Last year, I produced the report with the packages xtable and R2HTML. James Falkofske notes that hypertext markup language (HTML) files are inherently fluid, allowing readers with a wide range of abilities and assistive technologies to access the content. HTML tables with little or no nesting generally meet the navigation guidelines, even when opened with Word and saved as a word processor document. However, alternative titles and descriptions are not applied to tables and graphs automatically.

It would be very time consuming to retroactively add alternative text to 600 pages of tables and graphs in Word, so I decided to look for ways to modify the defaults in the xtable and R2HTML packages. The example below shows that it's possible to add alternative text while writing tables and graphs to a HTML file. For tables, the goal is to add title and summary attributes to the <TABLE> tag. Those attributes will show up as alternative text when the HTML file is opened in Word. For graphs, the goal is to add title and "alt" attributes to the <IMG> tag. The graph's "alt" attribute will carry over to the alternative description field when the HTML file is opened in Word, but the title won't. However, adding title attributes will allow web browsers to display a float-over title for graphs as well as tables. Try it out by floating your cursor over the table and graph below.

########################################
library(xtable) #for tables
library(R2HTML) #for graphs
library(datasets) #for data

########################################
#Initiate a HTML file to export results.

#The file can be named with a DOC extension to facilitate opening it with Word.
name.file.htm <- "Accessible Tables and Graphs.doc"
HTML.title(
"Add alternative titles and descriptions to tables and graphs in a HTML-based DOC file",
file = name.file.htm, HR = 1, append = F)

########################################
#Accessible table example

#Assign alternative title and description to a single object.
html.table.attributes <- paste('border=1',
'title="Table of days of frost and years of life expectancy"',
'summary="Table of days of frost and years of life expectancy"')

#The xtable(caption) argument places the title in the first row of the table, making it
#more difficult to designate table headers to repeat across multiple pages in a document.
#Produce the table title with HTML.title() instead.
HTML.title("Days of frost and years of life expectancy", file = name.file.htm, HR = 3)

#Create table and write to file.
data.frost.exp <- as.data.frame(state.x77[, c("Frost", "Life Exp")])
names(data.frost.exp)[2] <- "Life"
table.xtable <- xtable(data.frost.exp, digits = c(NA, 0, 2))
print(table.xtable, type = "html", html.table.attributes = html.table.attributes,
file = name.file.htm, border = 1, append = T)
HTML("<br>", file = name.file.htm)

########################################
#Accessible graph example

#Modify HTMLInsertGraph() by adding Title and Alt arguments.
HTMLInsertGraph <- function (GraphFileName = "", Caption = "",
Title = "untitled graph", Alt = "a graph",
GraphBorder = 1, Align = "center",
WidthHTML = 500, HeightHTML = NULL,
file = get(".HTML.file"), append = TRUE, ...) {
cat("\n", file = file, append = append, ...)
cat(paste("<p align=", Align, "><img src='", GraphFileName,
"' title='", Title, "' alt='", Alt, "' border=", GraphBorder,
if (!is.null(WidthHTML))
paste(" width=", WidthHTML, sep = "")
else "",
if (!is.null(HeightHTML))
paste(" height=", HeightHTML, sep = "")
else "", ">", sep = "", collapse = ""),
file = file, append = TRUE, sep = "")
if (Caption != "")
cat(paste("<br><i class=caption>", Caption, "</i>"),
file = file, append = TRUE, sep = "")
invisible(return(TRUE))
}

#Assign alternative title and description to objects.
Title <- "Graph of days of frost and years of life expectancy"
Alt <- "Graph of days of frost and years of life expectancy"

name.file.plot <- "Days of frost and years of life expectancy.png"
png(name.file.plot, width = 350, height = 350)
plot(data.frost.exp, main = "Days of frost and years of life expectancy", family = "serif")
abline(lm(Life ~ Frost, data.frost.exp), col = "red")
dev.off()
HTMLInsertGraph(GraphFileName = name.file.plot, file = name.file.htm, Align = "left",
WidthHTML = 350, HeightHTML = 350, append = T, Title = Title, Alt = Alt)
HTML("<br>", file = name.file.htm)


Days of frost and years of life expectancy

Frost Life
Alabama 20 69.05
Arizona 15 70.55
Arkansas 65 70.66
California 20 71.71
Connecticut 139 72.48
Delaware 103 70.06
Florida 11 70.66
Georgia 60 68.54
Hawaii 0 73.60
Idaho 126 71.87
Illinois 127 70.14
Indiana 122 70.88
Iowa 140 72.56
Kansas 114 72.58
Kentucky 95 70.10
Louisiana 12 68.76
Maine 161 70.39
Maryland 101 70.22
Massachusetts 103 71.83
Michigan 125 70.63
Minnesota 160 72.96
Mississippi 50 68.09
Missouri 108 70.69
Montana 155 70.56
New Hampshire 174 71.23
New Jersey 115 70.93
New Mexico 120 70.32
New York 82 70.55
North Carolina 80 69.21
North Dakota 186 72.78
Ohio 124 70.82
Oklahoma 82 71.42
Oregon 44 72.13
Pennsylvania 126 70.43
Rhode Island 127 71.90
South Carolina 65 67.96
South Dakota 172 72.08
Tennessee 70 70.11
Texas 35 70.90
Utah 137 72.90
Vermont 168 71.64
Virginia 85 70.08
Washington 32 71.72
West Virginia 100 69.48
Wisconsin 149 72.48
Wyoming 173 70.29

Best breweries in Minnesota: Value-added ratings

| 1 Comment

A good friend of mine told me to check out Beer Advocate online. Their rating system does a good job of balancing simplicity (A-F grades) with thoroughness (measures of central tendency and dispersion). I also appreciate the way it encourages raters and readers to consider multiple dimensions of a beer's quality--its look, smell, taste, and feel--in addition to overall quality.

As a practitioner of multilevel and latent variable modeling, one aspect of Beer Advocate's rating system has given me pause: a lack of adjustment for consistently high- or low-raters. What's concerning about a simple average of ratings? Let's imagine a plausible scenario: a motivated group of beer aficionados with high standards get their hands on a limited-release beer and publish their reviews on Beer Advocate. The beer could be world class, but if only tough raters have tried it, then the average rating will appear lower. Conversely, a large number of uncritical raters could review a mediocre beer, resulting in an excellent rating on average.

Is my concern warranted? If so, which beers and breweries are rated highest after adjusting for raters and other factors? To answer those questions, I wrote some code to gather Beer Advocate ratings of beers produced by Minnesota breweries. The code relies heavily on the XML and stringr packages. After gathering the ratings, I decided to check my assumption of within-rater consistency. The intraclass correlation coefficient of r = 0.13 indicates a small degree of consistency within raters, but enough to justify my concern.

I specified a multilevel model to obtain value-added ratings of beers and breweries while adjusting for rater and beer characteristics. For simplicity, I decided to focus on the overall rating, although I would like to consider multiple dimensions of beer quality (look, smell, taste, and feel) in future analyses. The model specifies random effects for raters, cross-classified with beers nested in breweries. The fixed effects were the serving type experienced by the rater for a given beer (e.g., can), the beer's alcohol by volume (ABV), and its style (e.g., porter).

\begin{align*} \small{\text{Level 1: Ratings}}\\ \text{ln}\left(\tfrac{(Overall_{ijkl}-1)\div 4}{1-(Overall_{ijkl}-1)\div 4}\right) &= \beta_{0jkl} + \mathbf{Served}_{ij} \boldsymbol{\beta}_1 + e_{ijkl}\\ \small{\text{Level 2: Raters and beers}}\\ \beta_{0jkl} &= \beta_{00l} + \beta_{01} ABV_k + \mathbf{Style}_k \boldsymbol{\beta}_{02} + r_{0j} + r_{0kl}\\ \small{\text{Level 3: Breweries}}\\ \beta_{00l} &= \beta_{000} + r_{00l} \end{align*}.

Note that I transformed the overall rating from a 1-5 scale to logits. I did so to avoid predictions below the floor of 1 point and above the ceiling of 5 points on the original scale (i.e., to enforce lower and upper asymptotes) and to spread the ratings out at the tails (e.g., make it "harder" to go from 4.5 to 5 than going from 3.5 to 4). The chart below shows the resulting ogival relationship between the original point system and the logits.

Plot showing transformation from points to logits

I estimated the model parameters with the lme4 package. Given the large number of fixed effects (and the need for future blog content), I'll describe the influence of serving type, ABV, and style at a later date. The value added by breweries after adjusting for rater and beer characteristics is shown in the table below. Value-added logits were transformed back to the original scale, with values in the table representing point deviations from the brewery average, but the logit and point ratings are virtually the same. August Schell Brewing Company recently won national accolades, but it came in second to a relative newcomer, Fulton Beer. Summit Brewing and Surly Brewing followed closely in the third and fourth spots. The results show some discrepancies between the value-added ratings and Beer Advocate's grades, which are based on the simple average of all beer ratings. For example, Harriet Brewing received an A-, compared to Schell's B grade, but Harriet's value-added rating is below average.

(average beer rating)
Fulton Beer 0.85 0.80 A-
August Schell Brewing Co., Inc. 0.83 0.79 B
Summit Brewing Company 0.80 0.76 B+
Surly Brewing Company 0.71 0.68 A-
Minneapolis Town Hall Brewery 0.27 0.27 A-
Lake Superior Brewing Company 0.18 0.18 B
Fitger's Brewhouse 0.10 0.10 B+
Olvalde Farm & Brewing Company 0.08 0.08 A-
Granite City Food & Brewery 0.04 0.04 B-
Flat Earth Brewing Company 0.03 0.03 B+
Brau Brothers Brewing Co., LLC 0.02 0.02 B
Pig's Eye Brewing Company 0.00 0.00 D+
Harriet Brewing -0.09 -0.09 A-
Lift Bridge Brewery -0.11 -0.11 B+
Leech Lake Brewing Company -0.14 -0.14 B+
Barley John's Brew Pub -0.26 -0.26 B+
McCann's Food & Brew -0.27 -0.26 B+
St. Croix Brewing Company, LLC -0.69 -0.66 C
Cold Spring Brewing Co. -0.99 -0.91 D+
Great Waters Brewing -1.37 -1.19 B+

A different picture emerges when one considers that value-added ratings are point estimates within a range of certainty. The prediction-interval plot below shows that we can't confidently conclude that Fulton Beer scored above average, let alone out performed Shell's, Summit, or Surly. We can confidently say, though, that Shell's, Summit, and Surly scored above average, and along with Minneapolis Town Hall Brewery, they significantly out-performed Cold Spring Brewing Co. and Great Waters Brewing.

Plot of breweries' value-added ratings with 95% prediction intervals

The results highlight some of the benefits of multilevel modeling and the limits of value-added ratings. By controlling for beer characteristics and separately estimating variance components for raters and beers nested in breweries, I have reduced the degree to which raters' tastes and brewers' preferences for certain types of beers are confounding brewery ratings. That is, the value-added ratings reflect the brewery's overall quality with greater fidelity. Nevertheless, we should keep in mind that value-added ratings are based on error unexplained by theory/fixed effects, that they are point estimates within a range of certainty, and that one should interpret value-added ratings with caution because personal preferences still count. This analysis has motivated me to try beers by the breweries with high value-added ratings, but I know from experience that I like beers brewed by Brau Brothers, Lift Bridge, and Barley John's, even though their value-added point estimates place them in the bottom half of Minnesota breweries.

Using irtoys, plyr, and ggplot2 for test development

My job as a Quantitative Analyst at the Minnesota Department of Education requires me to "provide evaluation, statistical, and psychometric support to the Division of Research and Assessment through the entire process of design, development, and reporting to the Minnesota assessment system." That's an accurate job description, but a little vague. What I really do is write a lot of code to apply item response theory (IRT) and compare item and test characteristics across multiple years and/or forms to maintain high quality.

is great for plotting item and test characteristics across multiple years and/or forms, but it requires using a few packages in concert (i.e., to my knowledge there is no single package designed for this purpose). The irtoys package provides many useful functions for test development under IRT; the plyr package allows one to apply irtoys' functions to subsets of item parameters in a data set (e.g., by strand); and the ggplot2 package allows one to overlay and facet item and test curves.

In the following reproducible example, I graphically compare item and test characteristics of the 2009 and 2010 grade 3 Minnesota Comprehensive Assessments of reading ability. Note that for simplicity, I have excluded two polytomous items from 2009 and a few dichotomous items from 2010 from the published item parameter estimates. Also note that items have been assigned to strands arbitrarily in this example. Graphs such as these allow us to answer questions like, "Did the 2010 form yield as much information (i.e., measure with equal or less error) at the cut scores as the 2009 form?" The graphs also provide guidance for replacing undesirable items.


########################################
#Combine two years' worth of item parameter estimates from grade 3.

data.pars.2009 <- matrix(ncol = 3, byrow = T, c(0.57, -1.24, 0.17, 0.93, -0.74, 0.25,
1.45, -0.45, 0.19, 0.97, -0.5, 0.13, 0.95, -0.8, 0.62, 1.07, -0.38, 0.24, 0.87,
0.37, 0.22, 0.61, -1.45, 0.02, 0.84, -1.03, 0.19, 0.72, -0.23, 0.12, 0.93, -1.6,
0.25, 0.68, -2.02, 0.06, 1.1, -1.5, 0.36, 0.74, -1.6, 0.51, 1.01, -1.01, 0.21,
0.81, -1.52, 0.16, 0.93, -0.5, 0.11, 0.34, -0.75, 0.02, 0.92, -0.93, 0.19, 1.14,
0.21, 0.26, 0.54, -0.59, 0.1, 0.86, -0.52, 0.28, 1.04, -1.77, 0.05, 0.84, -1.68,
0.02, 1.46, -1.7, 0.17, 1.3, -1.16, 0.21, 0.51, -0.62, 0.1, 1.09, -1.04, 0.17,
0.66, -2.06, 0.09, 1.11, -0.61, 0.11, 1.34, -0.9, 0.26, 1.3, -1.24, 0.19, 1.37,
-1.72, 0.24, 1.09, -1.37, 0.16, 0.89, -0.94, 0.08, 1.24, -1.38, 0.14, 0.79, -1.32,
0.11, 1.09, -1.69, 0.13))
data.pars.2009 <- data.frame(2009, data.pars.2009)
names(data.pars.2009) <- c("year", "a", "b", "c")
data.pars.2009$strand <- factor(rep(1:4, length.out = 38), labels = paste("Strand", 1:4)) data.pars.2010 <- matrix(ncol = 3, byrow = T, c(1, -2.02, 0.05, 0.61, -0.88, 0.24, 0.74, -1.36, 0.08, 0.93, 0.63, 0.25, 0.94, 0.38, 0.23, 0.56, -0.14, 0.32, 0.7, -0.73, 0.08, 1.1, -2.54, 0.02, 1.17, -0.36, 0.22, 0.46, 0.76, 0.09, 0.77, -1.53, 0.09, 1.5, -1.27, 0.18, 1.01, -1.41, 0.1, 1.63, -1.61, 0.21, 1.33, -1.63, 0.13, 1.11, -1.01, 0.19, 1.05, -1.17, 0.07, 0.88, -0.33, 0.23, 0.56, -0.65, 0.28, 1.03, 0.45, 0.13, 1.29, -1.73, 0.12, 0.96, -1.54, 0.15, 0.62, -1.33, 0.09, 0.67, -1.41, 0.06, 0.74, 0.16, 0.3, 1.33, -0.94, 0.26, 1.31, -1.71, 0.23, 1.14, -1.42, 0.16, 0.96, -0.87, 0.09, 1.22, -1.36, 0.14, 0.86, -1.19, 0.17, 0.94, -2.48, 0.02, 0.81, -1.96, 0.02, 0.98, -0.86, 0.13, 0.8, -0.68, 0.11, 0.72, -1.67, 0.04, 0.63, 0.32, 0.16, 1.24, -1.64, 0.33)) data.pars.2010 <- data.frame(2010, data.pars.2010) names(data.pars.2010) <- c("year", "a", "b", "c") data.pars.2010$strand <- factor(rep(1:4, length.out = 38), labels = paste("Strand", 1:4))

data.pars <- rbind(data.pars.2009, data.pars.2010)
data.pars$year <- factor(data.pars$year)
data.pars$id <- factor(1:nrow(data.pars)) #Apply D scaling constant. data.pars$a <- data.pars$a * 1.7 #Theta values for plotting thetas <- seq(-3, 3, by = 0.1) #Theta cut scores cuts.grade3 <- c(-1.40, -0.84, -0.01) ######################################## #Plot overlapping test characteristic curves. funk <- function(z) with(trf(z[, c("a", "b", "c")], x = thetas), data.frame(Ability = x, f)) data.plot <- ddply(data.pars, "year", funk) g1 <- ggplot(data.plot, aes(x = Ability, y = f)) + geom_vline(xintercept = cuts.grade3, linetype = 2, lwd = 1, color = "darkgrey") + geom_line(aes(color = year, linetype = year), lwd = 1) + scale_colour_manual(name = "Year", values = c("darkgoldenrod1", "darkblue")) + scale_linetype_manual(name = "Year", values = 2:1) + scale_y_continuous(name = "Expected score") + opts(title = "Test characteristic curves") g1  The test characteristic curves appear comparable up to the "exceeds proficiency standard" cut score, above which the test became more difficult compared to 2009.  ######################################## #Plot overlapping conditional standard errors of measurement (from test information). funk <- function(z) with(tif(z[, c("a", "b", "c")], x = thetas), data.frame(Ability = x, f)) data.plot <- ddply(data.pars, "year", funk) data.plot$f <- 1/sqrt(data.plot$f) g1 <- ggplot(data.plot, aes(x = Ability, y = f)) + geom_vline(xintercept = cuts.grade3, linetype = 2, lwd = 1, color = "darkgrey") + geom_line(aes(color = year, linetype = year), lwd = 1) + scale_colour_manual(name = "Year", values = c("darkgoldenrod1", "darkblue")) + scale_linetype_manual(name = "Year", values = 2:1) + scale_y_continuous(name = "CSEM") + opts(title = "Conditional standard errors of measurement") g1  The conditional standard errors of measurement appear comparable, overall, except for the 2010 standard error at the exceeds-proficiency cut, which appears to slightly exceed the 0.25 rule of thumb.  ######################################## #Plot overlapping CSEMs faceted by strand. funk <- function(z) with(tif(z[, c("a", "b", "c")], x = thetas), data.frame(Ability = x, f)) data.plot <- ddply(data.pars, c("year", "strand"), funk) data.plot$f <- 1/sqrt(data.plot$f) g1 <- ggplot(data.plot, aes(x = Ability, y = f)) + geom_vline(xintercept = cuts.grade3, linetype = 2, lwd = 1, color = "darkgrey") + geom_line(aes(color = year, linetype = year), lwd = 1) + scale_colour_manual(name = "Year", values = c("darkgoldenrod1", "darkblue")) + scale_linetype_manual(name = "Year", values = 2:1) + facet_wrap( ~ strand) + scale_y_continuous(name = "CSEM") + opts(title = "Conditional standard errors of measurement") g1  It looks like strand 3 could use a highly discriminating difficult item to bring its CSEM in line with the corresponding 2009 CSEM at the exceeds-proficiency cut.  ######################################## #Plot item characteristic curves faceted by strand. funk <- function(z) with(irf(z[, c("a", "b", "c")], x = thetas), data.frame(Ability = x, f)) data.plot <- ddply(data.pars, c("id", "year", "strand"), funk) g1 <- ggplot(data.plot, aes(x = Ability, y = f)) + geom_vline(xintercept = cuts.grade3, linetype = 2, lwd = 1, color = "darkgrey") + geom_line(aes(group = id:year, color = year, linetype = year)) + scale_colour_manual(name = "Year", values = c("darkgoldenrod1", "darkblue")) + scale_linetype_manual(name = "Year", values = 2:1) + facet_wrap( ~ strand) + scale_y_continuous(name = "Probability", breaks = seq(0, 1, by = 0.25), limits = c(0, 1)) + opts(title="Item characteristic curves") g1  Compared to 2009, there are fewer items from strands 1 and 2 that students are able to guess easily (i.e., high c parameters/lower asymptotes). As noted above, strand 3 lacks an item that discriminates at the exceeds-proficiency cut. Replacing the poorly discriminating difficult item in strand 2 with a higher discriminating one would also reduce measurement error of high-ability students. My new nephews are way above average | No Comments My new nephews, Jacob and Jadon, are way above average. And though they don't even live in Lake Wobegon! See, it says so right on their onesies that I got from NausicaaDistribution. Eastside Food Co-op goes solar | No Comments I love Eastside Food Co-op. They're in my neighborhood and stock all my grocery needs. They show informative movies like "Troubled Waters: A Mississippi River Story," which the University of Minnesota helped create but balked at showing. They hold community forums to discuss issues like pollution at Shoreham Yards. And they've welcomed Recovery Bike Shop into the building. A lot of great things are happening at Eastside Food Co-op. To top it all off, Eastside Food Co-op has gone solar! Their solar panels began operating on December 8 and have generated over 160 killowat hours so far (see chart below). At at time when the Minnesota Legislature is considering nuclear power expansion, I am proud to belong to an organization that is investing in clean solar energy. Anniversary trip: Lake Pepin and Zumbro River | No Comments Amy and I celebrated our eighth anniversary with a trip to southeastern Minnesota. The forecast called for warm and sunny weather, so we decided to camp and canoe. We wanted to camp at Frontenac State Park to get some more mileage out of our state park pass, but the fall colors and warm weather helped fill up Frontenac's campground. One of the park rangers recommended Hok-Si-La Campground in Lake City. It was a fantastic campground with well-kept facilities and campsites right on Lake Pepin's shore. During breakfast in our campsite we saw several species of birds, including a bald eagle and a downy woodpecker. For our canoe trip, we chose a section of the Zumbro River from Theillman to river mile 13 near Dumfries. We chose that route because much of it is protected as part of the Richard J. Dorer Memorial Hardwood State Forest. Riverland Outfitters shuttled us to the put-in. A month before our trip, flooding devastated towns along the Zumbro River. Although the waters had receded well below flood stage, the river was still running at a brisk pace. It took us about six hours to cover 13 river miles on the Buffalo and Nemadji Rivers, but it only took about three hours on the Zumbro. We enjoyed the views of the hillsides and farms, the sand bars, and dodging all the submerged trees in the river. According to Paddling Minnesota, the Zumbro derives its name from the French pronunciation of "embarrassment" caused by the river's obstacles. After getting off the water, we walked around downtown Wabasha before heading across the river to Pepin, Wisconsin, for an anniversary dinner at the Harbor View Cafe. The cafe was hopping. It took about two hours to get a table. We didn't mind because it gave us a chance to walk around the harbor and listen to some live jazz and folk music outside the pub next door. Harbor View Cafe was overpriced for the quality of the food and service, but the overall experience of visiting and dining in Pepin made it worthwhile. The great company and occasion made for a memorable experience. Urban canoeing: Lake Phalen to Gervais Lake | No Comments Here are some pictures from a nice little canoeing trip we took with our friend, Jim. He owns a beautiful wood canvas canoe. The three of us became friends at YMCA Camp Widjiwagan in Ely, where wood canvas canoes are used as part of the curriculum to teach young people about teamwork and environmental stewardship. In addition to being easy on the eye, wood canvas canoes are very quiet on the water--much quieter than Scrappy, our aluminum canoe. It was nice to spend some time in the Payne-Phalen neighborhood of Saint Paul. We hardly make it over there. We put in at Lake Phalen, padded through Phalen Park, portaged around a bridge repair project to Round Lake, paddled along a scenic creek leading to Keller Lake, and then paddled under Highway 36 into Gervais Lake. It was getting dark, so we didn't spend much time on Gervais Lake before heading back to the put-in. People gush about the Minneapolis chain of lakes, but the Phalen-Gervais chain of lakes is quite nice, too. Some highlights: water fowl, families enjoying the park, ice cream truck, many bridges, kayakers practicing rolling, fall leaves, quiet canoe, spending time with a good friend. Plotting survey responses representing three or more dimensions | No Comments Bob Williams recently posted a question to EVALTALK about open source software for plotting survey responses in three dimensions: I'm wondering how to display Likert scale data in three dimensions (ie the Likert scores for each subject on 3 scale dimensions). My graphic software is not that sophisticated and I haven't been able to find an open source equivalent.... that can plot something along three dimensions (X,Y,Z) like this:  X, Y, Z Subject A 2, 5, 8 Subject B 3, 5, 9 Subject C 3, 7, 3 Another evaluator, Bill Harris, recommended and ggplot2 in particular. Bob's question and the mention of inspired me to write up a brief demonstration of plotting multi-dimensional data. As I have mentioned in the past (see here, here, and here), plotting or mapping three or more dimensions can be difficult and potentially misleading because plots and maps are inherently two-dimensional. I would say that plotting three survey response dimensions actually requires plotting four dimensions: X, Y, Z, and the frequency of each X, Y, and Z combination. That's because evaluators commonly have large data sets with repeated response patterns. I simulated a larger data set from Bob's example in order to illustrate plotting four dimensions. #Open a recording window (if using Windows). windows(record=T) #Manually read in the data. data.3D <- data.frame(Subject = c("A", "B", "C"), X = c(2, 3, 3), Y = c(5, 5, 7), Z = c(8, 9, 3)) data.3D #Larger data sets will contain frequently repeated combinations of responses. #Simulate a larger data set with repeated responses combinations. data.4D <- data.3D for(i in 1:200) { temp <- data.frame(Subject=factor(paste(as.character(data.3D[, 1]), i, sep=".")), X=round(jitter(data.3D[, 2], 7), 0), Y=round(jitter(data.3D[, 2], 7), 0), Z=round(jitter(data.3D[, 4], 7), 0)) data.4D <- rbind(data.4D, temp) } library(plyr) data.4D <- ddply(data.4D, c("X", "Y", "Z"), "nrow") names(data.4D)[which(names(data.4D)=="nrow")] <- "Frequency" data.4D <- data.4D[order(data.4D$Frequency, decreasing=T), ]

#Scatterplot matrices
pairs(data.3D[2:4], xlim=c(0, 10), ylim=c(0, 10),
main="Responses to three Likert scale items")
library(psych)
pairs.panels(sapply(data.4D[1:3], jitter), xlim=c(0, 10), ylim=c(0, 10), ellipses=F,
hist.col="blue", pch=1, scale=T, main="Responses to three Likert scale items")


#Load ggplot2 library; set theme without gray background.
library(ggplot2)
theme_set(theme_bw())

#Create a scatterplot with size and color varying by the third dimension.
plot1 <- ggplot(data=data.3D, aes(x=X, y=Y, color=Z, size=Z)) +
geom_point() +
xlim(0, 10) +
ylim(0, 10) +
scale_colour_gradient(breaks=0:10, limits=c(0, 10), low="lightblue", high="darkblue") +
scale_size_continuous(breaks=0:10, limits=c(0, 10)) +
opts(title="Responses to three Likert scale items")
plot1

#Create a jittered scatterplot with color varying by the third dimension
#and size by response combination frequency.
plot2 <- ggplot(data=data.4D, aes(x=X, y=Y, color=Z, size=Frequency)) +
geom_jitter() +
xlim(0, 10) +
ylim(0, 10) +
scale_colour_gradient(breaks=0:10, limits=c(0, 10), low="lightblue", high="darkblue") +
scale_size_continuous() +
opts(title="Responses to three Likert scale items")
plot2


#Reshape data from wide to longer.
data.3D.longer <- melt(data.3D, id.vars=c("Subject", "Y"), variable_name="Item")
names(data.3D.longer)[which(names(data.3D.longer)=="value")] <- "Response"
data.3D.longer

#Create a scatterplot with faceted second and third dimensions.
plot3 <- ggplot(data=data.3D.longer, aes(x=Response, y=Y)) +
geom_point(color="blue") +
xlim(0, 10) +
ylim(0, 10) +
facet_grid(. ~ Item) +
opts(title="Responses to three Likert scale items")
plot3

#Reshape simulated data from wide to longer.
data.4D.longer <- melt(data.4D, id.vars=c("Y", "Frequency"), variable_name="Item")
names(data.4D.longer)[which(names(data.4D.longer)=="value")] <- "Response"

#Create a scatterplot with faceted second and third dimensions
#and size varying by response combination frequency.
plot4 <- ggplot(data=data.4D.longer, aes(x=Response, y=Y, size=Frequency)) +
geom_jitter(color="blue") +
xlim(0, 10) +
ylim(0, 10) +
facet_grid(. ~ Item) +
opts(title="Responses to three Likert scale items")
plot4


KFAI rocks

I want to give a shout-out to KFAI, Fresh Air Community Radio. I listen to KFAI just about every day. And every day I am impressed by the station's diverse programming and skilled volunteer DJs. I urge everyone to tune into 90.3 in Minneapolis, 106.7 in Saint Paul, or on the Web at http://www.kfai.org. Then support them with a contribution. Here's a list of my favorite shows to get you started:

Linear mixed-effects regression p-values in R: A likelihood ratio test function

Following Douglas Bates' advice for those required to produce p-values for fixed-effects terms in a mixed-effects model, I wrote a function to perform a likelihood ratio test for each term in an lmer() object. Bates has championed the notion that calculating the p-values for fixed effects is not trivial. That's because with unbalanced, multilevel data, the denominator degrees of freedom used to penalize certainty are unknown (i.e., we're uncertain about how uncertain we should be). As the author of lme4, the foremost mixed-effects modeling package in , he has practiced what he preaches by declining to approximate denominator degrees of freedom as SAS does. Bates contends that alternative inferential approaches make p-values unnecessary. I agree with that position, focusing instead on information criteria and effect sizes. However, as a program evaluator, I recognize that some stakeholders find p-values useful for understanding findings.

A likelihood ratio test can be used to test $\inline H_0:\beta=0$ if the sample size is large. According to Fitzmaurice, Laird, and Ware, twice the difference between the maximized log-likelihoods of two nested models, $\inline G^2 = 2(\hat l_\text{full}-\hat l_\text{reduced})$, represents the degree to which the reduced model is inadequate. When the sample size is large, $\inline G^2 \sim \chi^2$ with degrees of freedom equal to the difference in parameters between the full and reduced models, $\inline m_\text{full} - m_\text{reduced}$. The p-value is $\inline \text{Pr} ( \chi^2 > G^2 | H_0)$.

What if the sample size is not large? A p-value based on $\inline \chi^2$ will be too liberal (i.e., the type I error rate will exceed the nominal p-value). More conservatively, we might say that $\inline G^2 \sim F$ with $\inline {m_\text{full} - m_\text{reduced}$ numerator and $\inline N_\text{effective}-m_\text{full}-1$ denominator degrees of freedom. According to Snijders and Bosker, the effective sample size lies somewhere between $\inline M$ total micro-observations (i.e., at level one) and $\inline N$ clusters randomly sampled in earlier stages (i.e., at higher levels). Formally, the effective sample size is $\inline N_\text{effective}=\tfrac{Nn}{1+(n-1)\rho}$, where $\inline n$ observations are nested within each cluster and intraclass correlation is $\inline \rho=\tfrac{\tau^2}{\tau^2+\sigma^2}$. Even if $\inline M=Nn$ is large, $\inline N_\text{effective}$ (and statistical power) could be quite small if $\inline N$ is small and $\inline \rho}$ is large: $\inline \tfrac{5*100}{1+(100-1)0.8}=6.2$. Unbalanced designs, modeling three or more levels, and cross-level interactions add to our uncertainty about the denominator degrees of freedom.

The function I wrote chews up the lmer() model call and concatenates the frame and model matrix slots, after which it iteratively fits (via maximum likelihood instead of restricted ML) models reduced by each fixed effect and compares them to the full model, yielding a vector of p-values based on $\inline \chi^2 (1)$. As the example shows, the function can handle shortcut formulas whereby lower order terms are implied by an interaction term. The function doesn't currently handle weights, glmer() objects, or on-the-fly transformations of the dependent variable [e.g., log(dep.var) ~ ...]. The accuracy of resulting p-values depends on large sample properties, as discussed above, so I don't recommend using the function with small samples. I'm working on another function that will calculate p-values based on the effective sample size estimated from intraclass correlation. I will post that function in a future entry. I'm sure the following function could be improved, but I wanted to go ahead share it with other applied researchers whose audience likes p-values. Please let me know if you see ways to make it better.

p.values.lmer <- function(x) {
summary.model <- summary(x)
data.lmer <- data.frame(model.matrix(x))
names(data.lmer) <- names(fixef(x))
names(data.lmer) <- gsub(pattern=":", x=names(data.lmer), replacement=".", fixed=T)
names(data.lmer) <- ifelse(names(data.lmer)=="(Intercept)", "Intercept", names(data.lmer))
string.call <- strsplit(x=as.character(x@call), split=" + (", fixed=T)
var.dep <- unlist(strsplit(x=unlist(string.call)[2], " ~ ", fixed=T))[1]
vars.fixef <- names(data.lmer)
formula.ranef <- paste("+ (", string.call[[2]][-1], sep="")
formula.ranef <- paste(formula.ranef, collapse=" ")
formula.full <- as.formula(paste(var.dep, "~ -1 +", paste(vars.fixef, collapse=" + "),
formula.ranef))
data.ranef <- data.frame(x@frame[,
which(names(x@frame) %in% names(ranef(x)))])
names(data.ranef) <- names(ranef(x))
data.lmer <- data.frame(x@frame[, 1], data.lmer, data.ranef)
names(data.lmer)[1] <- var.dep
out.full <- lmer(formula.full, data=data.lmer, REML=F)
p.value.LRT <- vector(length=length(vars.fixef))
for(i in 1:length(vars.fixef)) {
formula.reduced <- as.formula(paste(var.dep, "~ -1 +", paste(vars.fixef[-i],
collapse=" + "), formula.ranef))
out.reduced <- lmer(formula.reduced, data=data.lmer, REML=F)
print(paste("Reduced by:", vars.fixef[i]))
print(out.LRT <- data.frame(anova(out.full, out.reduced)))
p.value.LRT[i] <- round(out.LRT[2, 7], 3)
}
summary.model@coefs <- cbind(summary.model@coefs, p.value.LRT)
summary.model@methTitle <- c("\n", summary.model@methTitle,
"\n(p-values from comparing nested models fit by maximum likelihood)")
print(summary.model)
}

library(lme4)
library(SASmixed)
lmer.out <- lmer(strength ~ Program * Time + (Time|Subj), data=Weights)
p.values.lmer(lmer.out)


Yields:

Linear mixed model fit by REML
(p-values from comparing nested models fit by maximum likelihood)
Formula: strength ~ Program * Time + (Time | Subj)
Data: Weights
AIC  BIC logLik deviance REMLdev
1343 1383 -661.7     1313    1323
Random effects:
Groups   Name        Variance Std.Dev. Corr
Subj     (Intercept) 9.038486 3.00641
Time        0.031086 0.17631  -0.118
Residual             0.632957 0.79559
Number of obs: 399, groups: Subj, 57

Fixed effects:
Estimate Std. Error   t value p.value.LRT
(Intercept)     79.99018    0.68578 116.64000       0.000
ProgramRI        0.07009    1.02867   0.07000       0.944
ProgramWI        1.11526    0.95822   1.16000       0.235
Time            -0.02411    0.04286  -0.56000       0.564
ProgramRI:Time   0.12902    0.06429   2.01000       0.043
ProgramWI:Time   0.18397    0.05989   3.07000       0.002

Correlation of Fixed Effects:
(Intr) PrgrRI PrgrWI Time   PrRI:T
ProgramRI   -0.667
ProgramWI   -0.716  0.477
Time        -0.174  0.116  0.125
ProgrmRI:Tm  0.116 -0.174 -0.083 -0.667
ProgrmWI:Tm  0.125 -0.083 -0.174 -0.716  0.477