Urban canoeing: Dining at The Sample Room

| No Comments

I finally found some time to upgrade my Wike canoe trailer. As I mentioned in an earlier post, the trailer had a hard time handling Scrappy, my 17' Alumacraft canoe. I added a U-bolt to adequately secure the bow, and I reinforced the aluminum axle with an iron rod to keep it from bending. It's working well enough to resume canoeing around Minneapolis without a car shuttle.

Amy and I took the trailer down Saint Anthony Parkway to the Mississippi River for some paddling at dusk. We put in at the Camden Railroad Bridge. The moon was huge, so we had no trouble navigating after the sun went down. You can see the moon above Psycho Suzi's Motor Lounge in the picture below.

We got hungry on the river, so we stopped at one of our favorite spots in Northeast: The Sample Room. The restaurant conveniently has its own dock and stairway up the steep bank of the river. Our server even comped us a drink for canoeing in with our life jackets! After dining on the patio, we continued down the Mississippi to Boom Island Park and rode our bikes home from there. It was a fun way to spend an evening in Minneapolis!

DSCI1039.JPG NE_Mpls_Canoe_Tour_08.JPG NE_Mpls_Canoe_Tour_09.JPG NE_Mpls_Canoe_Tour_07.JPG

Urban canoeing: North/Northeast Minneapolis

| No Comments

Back in July, my friend, Joe, and I were laid off during the state government shutdown. It was a stressful time, not knowing if or how the shutdown would end and having to cut back on expenses to make up for lost income. We decided to put our worries behind us for a day and make the most of our time off by taking a canoe trip down the Mississippi River.

We started under a rainy sky in North Minneapolis, at the Camden Avenue bridge boat launch. I had never paddled the northern section of the Mississippi as it passes through Minneapolis. It's an industrial stretch of river, but plans have been developed to transform it into a more attractive and accessible destination, much like the river's character further downstream.

DSCI0726.JPG DSCI0729.JPG
DSCI0737.JPG DSCI0734.JPG

A highlight of our trip occurred just above the uppermost lock on the Mississippi River, near Saint Anthony Falls. As we explored the northern channel around Nicollet Island, we paddled past a ceremony honoring the ratifications of an Urban Conservation Treaty for Migratory Birds attended by Mayor R.T. Rybak and several other environmental enthusiasts. A local news station filmed us paddling by to highlight the outdoor recreation potential of urban rivers and the importance of conserving them. Check out my sweet J-strokes!

The City of Minneapolis recently held a design competition for the stretch "Above the Falls." The video accompanying the winning proposal is below. As a resident of Northeast Minneapolis (river left), it's thrilling to envision an inviting riverfront right in my neighborhood.

Creating accessible tables and graphs with R

| No Comments

I was recently given the task of improving the accessibility of a statistical report, with the goal of meeting level AA of the Web Content Accessibility Guidelines 2.0. The report contains about 600 pages of tables and graphs in a DOC file. The rule of thumb for tables is to ensure that a visually impaired person can use the tab key to navigate easily across cells in a row from the top left of the table to the bottom right. Additionally, tables and graphs should contain alternative titles and descriptions.

Last year, I produced the report with the Rlogo.jpg packages xtable and R2HTML. James Falkofske notes that hypertext markup language (HTML) files are inherently fluid, allowing readers with a wide range of abilities and assistive technologies to access the content. HTML tables with little or no nesting generally meet the navigation guidelines, even when opened with Word and saved as a word processor document. However, alternative titles and descriptions are not applied to tables and graphs automatically.

It would be very time consuming to retroactively add alternative text to 600 pages of tables and graphs in Word, so I decided to look for ways to modify the defaults in the xtable and R2HTML packages. The example below shows that it's possible to add alternative text while writing tables and graphs to a HTML file. For tables, the goal is to add title and summary attributes to the <TABLE> tag. Those attributes will show up as alternative text when the HTML file is opened in Word. For graphs, the goal is to add title and "alt" attributes to the <IMG> tag. The graph's "alt" attribute will carry over to the alternative description field when the HTML file is opened in Word, but the title won't. However, adding title attributes will allow web browsers to display a float-over title for graphs as well as tables. Try it out by floating your cursor over the table and graph below.

########################################
#Load libraries.
library(xtable) #for tables
library(R2HTML) #for graphs
library(datasets) #for data

########################################
#Initiate a HTML file to export results.

#The file can be named with a DOC extension to facilitate opening it with Word.
name.file.htm <- "Accessible Tables and Graphs.doc"
HTML.title(
  "Add alternative titles and descriptions to tables and graphs in a HTML-based DOC file", 
  file = name.file.htm, HR = 1, append = F)

########################################
#Accessible table example

#Assign alternative title and description to a single object.
html.table.attributes <- paste('border=1', 
  'title="Table of days of frost and years of life expectancy"', 
  'summary="Table of days of frost and years of life expectancy"')

#The xtable(caption) argument places the title in the first row of the table, making it 
#more difficult to designate table headers to repeat across multiple pages in a document.  
#Produce the table title with HTML.title() instead.
HTML.title("Days of frost and years of life expectancy", file = name.file.htm, HR = 3)

#Create table and write to file.
data.frost.exp <- as.data.frame(state.x77[, c("Frost", "Life Exp")])
names(data.frost.exp)[2] <- "Life"
table.xtable <- xtable(data.frost.exp, digits = c(NA, 0, 2))
print(table.xtable, type = "html", html.table.attributes = html.table.attributes, 
  file = name.file.htm, border = 1, append = T)
HTML("<br>", file = name.file.htm)

########################################
#Accessible graph example

#Modify HTMLInsertGraph() by adding Title and Alt arguments.
HTMLInsertGraph <- function (GraphFileName = "", Caption = "", 
  Title = "untitled graph", Alt = "a graph", 
  GraphBorder = 1, Align = "center", 
  WidthHTML = 500, HeightHTML = NULL, 
  file = get(".HTML.file"), append = TRUE, ...) {
    cat("\n", file = file, append = append, ...)
    cat(paste("<p align=", Align, "><img src='", GraphFileName, 
      "' title='", Title, "' alt='", Alt, "' border=", GraphBorder, 
      if (!is.null(WidthHTML)) 
        paste(" width=", WidthHTML, sep = "") 
        else "", 
      if (!is.null(HeightHTML)) 
        paste(" height=", HeightHTML, sep = "")
        else "", ">", sep = "", collapse = ""), 
      file = file, append = TRUE, sep = "")
    if (Caption != "") 
      cat(paste("<br><i class=caption>", Caption, "</i>"), 
        file = file, append = TRUE, sep = "")
    invisible(return(TRUE))
}

#Assign alternative title and description to objects.
Title <- "Graph of days of frost and years of life expectancy"
Alt <- "Graph of days of frost and years of life expectancy"

name.file.plot <- "Days of frost and years of life expectancy.png"
png(name.file.plot, width = 350, height = 350)
plot(data.frost.exp, main = "Days of frost and years of life expectancy", family = "serif")
abline(lm(Life ~ Frost, data.frost.exp), col = "red")
dev.off()
HTMLInsertGraph(GraphFileName = name.file.plot, file = name.file.htm, Align = "left", 
  WidthHTML = 350, HeightHTML = 350, append = T, Title = Title, Alt = Alt)
HTML("<br>", file = name.file.htm)

Days of frost and years of life expectancy

Frost Life
Alabama 20 69.05
Alaska 152 69.31
Arizona 15 70.55
Arkansas 65 70.66
California 20 71.71
Colorado 166 72.06
Connecticut 139 72.48
Delaware 103 70.06
Florida 11 70.66
Georgia 60 68.54
Hawaii 0 73.60
Idaho 126 71.87
Illinois 127 70.14
Indiana 122 70.88
Iowa 140 72.56
Kansas 114 72.58
Kentucky 95 70.10
Louisiana 12 68.76
Maine 161 70.39
Maryland 101 70.22
Massachusetts 103 71.83
Michigan 125 70.63
Minnesota 160 72.96
Mississippi 50 68.09
Missouri 108 70.69
Montana 155 70.56
Nebraska 139 72.60
Nevada 188 69.03
New Hampshire 174 71.23
New Jersey 115 70.93
New Mexico 120 70.32
New York 82 70.55
North Carolina 80 69.21
North Dakota 186 72.78
Ohio 124 70.82
Oklahoma 82 71.42
Oregon 44 72.13
Pennsylvania 126 70.43
Rhode Island 127 71.90
South Carolina 65 67.96
South Dakota 172 72.08
Tennessee 70 70.11
Texas 35 70.90
Utah 137 72.90
Vermont 168 71.64
Virginia 85 70.08
Washington 32 71.72
West Virginia 100 69.48
Wisconsin 149 72.48
Wyoming 173 70.29


Graph of days of frost and years of life expectancy

Best breweries in Minnesota: Value-added ratings

| 1 Comment

A good friend of mine told me to check out Beer Advocate online. Their rating system does a good job of balancing simplicity (A-F grades) with thoroughness (measures of central tendency and dispersion). I also appreciate the way it encourages raters and readers to consider multiple dimensions of a beer's quality--its look, smell, taste, and feel--in addition to overall quality.

As a practitioner of multilevel and latent variable modeling, one aspect of Beer Advocate's rating system has given me pause: a lack of adjustment for consistently high- or low-raters. What's concerning about a simple average of ratings? Let's imagine a plausible scenario: a motivated group of beer aficionados with high standards get their hands on a limited-release beer and publish their reviews on Beer Advocate. The beer could be world class, but if only tough raters have tried it, then the average rating will appear lower. Conversely, a large number of uncritical raters could review a mediocre beer, resulting in an excellent rating on average.

Is my concern warranted? If so, which beers and breweries are rated highest after adjusting for raters and other factors? To answer those questions, I wrote some Rlogo.jpg code to gather Beer Advocate ratings of beers produced by Minnesota breweries. The code relies heavily on the XML and stringr packages. After gathering the ratings, I decided to check my assumption of within-rater consistency. The intraclass correlation coefficient of r = 0.13 indicates a small degree of consistency within raters, but enough to justify my concern.

I specified a multilevel model to obtain value-added ratings of beers and breweries while adjusting for rater and beer characteristics. For simplicity, I decided to focus on the overall rating, although I would like to consider multiple dimensions of beer quality (look, smell, taste, and feel) in future analyses. The model specifies random effects for raters, cross-classified with beers nested in breweries. The fixed effects were the serving type experienced by the rater for a given beer (e.g., can), the beer's alcohol by volume (ABV), and its style (e.g., porter).

.

Note that I transformed the overall rating from a 1-5 scale to logits. I did so to avoid predictions below the floor of 1 point and above the ceiling of 5 points on the original scale (i.e., to enforce lower and upper asymptotes) and to spread the ratings out at the tails (e.g., make it "harder" to go from 4.5 to 5 than going from 3.5 to 4). The chart below shows the resulting ogival relationship between the original point system and the logits.

Plot showing transformation from points to logits
Points_to_Logits.png

I estimated the model parameters with the lme4 package. Given the large number of fixed effects (and the need for future blog content), I'll describe the influence of serving type, ABV, and style at a later date. The value added by breweries after adjusting for rater and beer characteristics is shown in the table below. Value-added logits were transformed back to the original scale, with values in the table representing point deviations from the brewery average, but the logit and point ratings are virtually the same. August Schell Brewing Company recently won national accolades, but it came in second to a relative newcomer, Fulton Beer. Summit Brewing and Surly Brewing followed closely in the third and fourth spots. The results show some discrepancies between the value-added ratings and Beer Advocate's grades, which are based on the simple average of all beer ratings. For example, Harriet Brewing received an A-, compared to Schell's B grade, but Harriet's value-added rating is below average.

Breweries sorted by value-added ratings
Brewery Value-added (logits) Value-added (points) Beer Advocate grade
(average beer rating)
Fulton Beer 0.85 0.80 A-
August Schell Brewing Co., Inc. 0.83 0.79 B
Summit Brewing Company 0.80 0.76 B+
Surly Brewing Company 0.71 0.68 A-
Minneapolis Town Hall Brewery 0.27 0.27 A-
Lake Superior Brewing Company 0.18 0.18 B
Fitger's Brewhouse 0.10 0.10 B+
Olvalde Farm & Brewing Company 0.08 0.08 A-
Granite City Food & Brewery 0.04 0.04 B-
Flat Earth Brewing Company 0.03 0.03 B+
Brau Brothers Brewing Co., LLC 0.02 0.02 B
Pig's Eye Brewing Company 0.00 0.00 D+
Harriet Brewing -0.09 -0.09 A-
Lift Bridge Brewery -0.11 -0.11 B+
Leech Lake Brewing Company -0.14 -0.14 B+
Barley John's Brew Pub -0.26 -0.26 B+
McCann's Food & Brew -0.27 -0.26 B+
St. Croix Brewing Company, LLC -0.69 -0.66 C
Cold Spring Brewing Co. -0.99 -0.91 D+
Great Waters Brewing -1.37 -1.19 B+


A different picture emerges when one considers that value-added ratings are point estimates within a range of certainty. The prediction-interval plot below shows that we can't confidently conclude that Fulton Beer scored above average, let alone out performed Shell's, Summit, or Surly. We can confidently say, though, that Shell's, Summit, and Surly scored above average, and along with Minneapolis Town Hall Brewery, they significantly out-performed Cold Spring Brewing Co. and Great Waters Brewing.

Plot of breweries' value-added ratings with 95% prediction intervals
Breweries_Value-Added.png

The results highlight some of the benefits of multilevel modeling and the limits of value-added ratings. By controlling for beer characteristics and separately estimating variance components for raters and beers nested in breweries, I have reduced the degree to which raters' tastes and brewers' preferences for certain types of beers are confounding brewery ratings. That is, the value-added ratings reflect the brewery's overall quality with greater fidelity. Nevertheless, we should keep in mind that value-added ratings are based on error unexplained by theory/fixed effects, that they are point estimates within a range of certainty, and that one should interpret value-added ratings with caution because personal preferences still count. This analysis has motivated me to try beers by the breweries with high value-added ratings, but I know from experience that I like beers brewed by Brau Brothers, Lift Bridge, and Barley John's, even though their value-added point estimates place them in the bottom half of Minnesota breweries.

Using irtoys, plyr, and ggplot2 for test development

| No Comments

My job as a Quantitative Analyst at the Minnesota Department of Education requires me to "provide evaluation, statistical, and psychometric support to the Division of Research and Assessment through the entire process of design, development, and reporting to the Minnesota assessment system." That's an accurate job description, but a little vague. What I really do is write a lot of Rlogo.jpg code to apply item response theory (IRT) and compare item and test characteristics across multiple years and/or forms to maintain high quality.

Rlogo.jpg is great for plotting item and test characteristics across multiple years and/or forms, but it requires using a few packages in concert (i.e., to my knowledge there is no single package designed for this purpose). The irtoys package provides many useful functions for test development under IRT; the plyr package allows one to apply irtoys' functions to subsets of item parameters in a data set (e.g., by strand); and the ggplot2 package allows one to overlay and facet item and test curves.

In the following reproducible example, I graphically compare item and test characteristics of the 2009 and 2010 grade 3 Minnesota Comprehensive Assessments of reading ability. Note that for simplicity, I have excluded two polytomous items from 2009 and a few dichotomous items from 2010 from the published item parameter estimates. Also note that items have been assigned to strands arbitrarily in this example. Graphs such as these allow us to answer questions like, "Did the 2010 form yield as much information (i.e., measure with equal or less error) at the cut scores as the 2009 form?" The graphs also provide guidance for replacing undesirable items.


########################################
#Combine two years' worth of item parameter estimates from grade 3.

data.pars.2009 <- matrix(ncol = 3, byrow = T, c(0.57, -1.24, 0.17, 0.93, -0.74, 0.25, 
  1.45, -0.45, 0.19, 0.97, -0.5, 0.13, 0.95, -0.8, 0.62, 1.07, -0.38, 0.24, 0.87, 
  0.37, 0.22, 0.61, -1.45, 0.02, 0.84, -1.03, 0.19, 0.72, -0.23, 0.12, 0.93, -1.6, 
  0.25, 0.68, -2.02, 0.06, 1.1, -1.5, 0.36, 0.74, -1.6, 0.51, 1.01, -1.01, 0.21, 
  0.81, -1.52, 0.16, 0.93, -0.5, 0.11, 0.34, -0.75, 0.02, 0.92, -0.93, 0.19, 1.14, 
  0.21, 0.26, 0.54, -0.59, 0.1, 0.86, -0.52, 0.28, 1.04, -1.77, 0.05, 0.84, -1.68, 
  0.02, 1.46, -1.7, 0.17, 1.3, -1.16, 0.21, 0.51, -0.62, 0.1, 1.09, -1.04, 0.17, 
  0.66, -2.06, 0.09, 1.11, -0.61, 0.11, 1.34, -0.9, 0.26, 1.3, -1.24, 0.19, 1.37, 
  -1.72, 0.24, 1.09, -1.37, 0.16, 0.89, -0.94, 0.08, 1.24, -1.38, 0.14, 0.79, -1.32, 
  0.11, 1.09, -1.69, 0.13))
data.pars.2009 <- data.frame(2009, data.pars.2009)
names(data.pars.2009) <- c("year", "a", "b", "c")
data.pars.2009$strand <- factor(rep(1:4, length.out = 38), labels = paste("Strand", 1:4))

data.pars.2010 <- matrix(ncol = 3, byrow = T, c(1, -2.02, 0.05, 0.61, -0.88, 0.24, 
  0.74, -1.36, 0.08, 0.93, 0.63, 0.25, 0.94, 0.38, 0.23, 0.56, -0.14, 0.32, 0.7, 
  -0.73, 0.08, 1.1, -2.54, 0.02, 1.17, -0.36, 0.22, 0.46, 0.76, 0.09, 0.77, -1.53, 
  0.09, 1.5, -1.27, 0.18, 1.01, -1.41, 0.1, 1.63, -1.61, 0.21, 1.33, -1.63, 0.13, 
  1.11, -1.01, 0.19, 1.05, -1.17, 0.07, 0.88, -0.33, 0.23, 0.56, -0.65, 0.28, 1.03, 
  0.45, 0.13, 1.29, -1.73, 0.12, 0.96, -1.54, 0.15, 0.62, -1.33, 0.09, 0.67, -1.41, 
  0.06, 0.74, 0.16, 0.3, 1.33, -0.94, 0.26, 1.31, -1.71, 0.23, 1.14, -1.42, 0.16, 
  0.96, -0.87, 0.09, 1.22, -1.36, 0.14, 0.86, -1.19, 0.17, 0.94, -2.48, 0.02, 0.81, 
  -1.96, 0.02, 0.98, -0.86, 0.13, 0.8, -0.68, 0.11, 0.72, -1.67, 0.04, 0.63, 0.32, 
  0.16, 1.24, -1.64, 0.33))
data.pars.2010 <- data.frame(2010, data.pars.2010)
names(data.pars.2010) <- c("year", "a", "b", "c")
data.pars.2010$strand <- factor(rep(1:4, length.out = 38), labels = paste("Strand", 1:4))

data.pars <- rbind(data.pars.2009, data.pars.2010)
data.pars$year <- factor(data.pars$year)
data.pars$id <- factor(1:nrow(data.pars))

#Apply D scaling constant.
data.pars$a <- data.pars$a * 1.7

#Theta values for plotting
thetas <- seq(-3, 3, by = 0.1)

#Theta cut scores
cuts.grade3 <- c(-1.40, -0.84, -0.01)


########################################
#Plot overlapping test characteristic curves.

funk <- function(z) with(trf(z[, c("a", "b", "c")], x = thetas), data.frame(Ability = x, f))
data.plot <- ddply(data.pars, "year", funk)

g1 <- ggplot(data.plot, aes(x = Ability, y = f)) + 
  geom_vline(xintercept = cuts.grade3, linetype = 2, lwd = 1, color = "darkgrey") +
  geom_line(aes(color = year, linetype = year), lwd = 1) + 
  scale_colour_manual(name = "Year", values = c("darkgoldenrod1", "darkblue")) + 
  scale_linetype_manual(name = "Year", values = 2:1) + 
  scale_y_continuous(name = "Expected score") + 
  opts(title = "Test characteristic curves")
g1

TCCs.png

The test characteristic curves appear comparable up to the "exceeds proficiency standard" cut score, above which the test became more difficult compared to 2009.


########################################
#Plot overlapping conditional standard errors of measurement (from test information).

funk <- function(z) with(tif(z[, c("a", "b", "c")], x = thetas), data.frame(Ability = x, f))
data.plot <- ddply(data.pars, "year", funk)
data.plot$f <- 1/sqrt(data.plot$f)

g1 <- ggplot(data.plot, aes(x = Ability, y = f)) + 
  geom_vline(xintercept = cuts.grade3, linetype = 2, lwd = 1, color = "darkgrey") +
  geom_line(aes(color = year, linetype = year), lwd = 1) + 
  scale_colour_manual(name = "Year", values = c("darkgoldenrod1", "darkblue")) + 
  scale_linetype_manual(name = "Year", values = 2:1) + 
  scale_y_continuous(name = "CSEM") + 
  opts(title = "Conditional standard errors of measurement")
g1

CSEMs.png

The conditional standard errors of measurement appear comparable, overall, except for the 2010 standard error at the exceeds-proficiency cut, which appears to slightly exceed the 0.25 rule of thumb.


########################################
#Plot overlapping CSEMs faceted by strand.

funk <- function(z) with(tif(z[, c("a", "b", "c")], x = thetas), data.frame(Ability = x, f))
data.plot <- ddply(data.pars, c("year", "strand"), funk)
data.plot$f <- 1/sqrt(data.plot$f)

g1 <- ggplot(data.plot, aes(x = Ability, y = f)) + 
  geom_vline(xintercept = cuts.grade3, linetype = 2, lwd = 1, color = "darkgrey") +
  geom_line(aes(color = year, linetype = year), lwd = 1) + 
  scale_colour_manual(name = "Year", values = c("darkgoldenrod1", "darkblue")) + 
  scale_linetype_manual(name = "Year", values = 2:1) + 
  facet_wrap( ~ strand) + 
  scale_y_continuous(name = "CSEM") + 
  opts(title = "Conditional standard errors of measurement")
g1

CSEMs_Strand.png

It looks like strand 3 could use a highly discriminating difficult item to bring its CSEM in line with the corresponding 2009 CSEM at the exceeds-proficiency cut.


########################################
#Plot item characteristic curves faceted by strand.

funk <- function(z) with(irf(z[, c("a", "b", "c")], x = thetas), data.frame(Ability = x, f))
data.plot <- ddply(data.pars, c("id", "year", "strand"), funk)

g1 <- ggplot(data.plot, aes(x = Ability, y = f)) + 
  geom_vline(xintercept = cuts.grade3, linetype = 2, lwd = 1, color = "darkgrey") +
  geom_line(aes(group = id:year, color = year, linetype = year)) + 
  scale_colour_manual(name = "Year", values = c("darkgoldenrod1", "darkblue")) + 
  scale_linetype_manual(name = "Year", values = 2:1) + 
  facet_wrap( ~ strand) + 
  scale_y_continuous(name = "Probability", breaks = seq(0, 1, by = 0.25), limits = c(0, 1)) + 
  opts(title="Item characteristic curves")
g1

ICCs_Strand.png

Compared to 2009, there are fewer items from strands 1 and 2 that students are able to guess easily (i.e., high c parameters/lower asymptotes). As noted above, strand 3 lacks an item that discriminates at the exceeds-proficiency cut. Replacing the poorly discriminating difficult item in strand 2 with a higher discriminating one would also reduce measurement error of high-ability students.

My new nephews are way above average

| No Comments

Jacob_Outlier.jpg My new nephews, Jacob and Jadon, are way above average. And though they don't even live in Lake Wobegon! See, it says so right on their onesies that I got from NausicaaDistribution.

Jadon_Outlier.jpg

Eastside Food Co-op goes solar

| No Comments

I love Eastside Food Co-op. They're in my neighborhood and stock all my grocery needs. They show informative movies like "Troubled Waters: A Mississippi River Story," which the University of Minnesota helped create but balked at showing. They hold community forums to discuss issues like pollution at Shoreham Yards. And they've welcomed Recovery Bike Shop into the building. A lot of great things are happening at Eastside Food Co-op.

To top it all off, Eastside Food Co-op has gone solar! Their solar panels began operating on December 8 and have generated over 160 killowat hours so far (see chart below). At at time when the Minnesota Legislature is considering nuclear power expansion, I am proud to belong to an organization that is investing in clean solar energy.

Eastside_Solar_Log.png

Anniversary trip: Lake Pepin and Zumbro River

| No Comments
00000026.JPG

Amy and I celebrated our eighth anniversary with a trip to southeastern Minnesota. The forecast called for warm and sunny weather, so we decided to camp and canoe. We wanted to camp at Frontenac State Park to get some more mileage out of our state park pass, but the fall colors and warm weather helped fill up Frontenac's campground. One of the park rangers recommended Hok-Si-La Campground in Lake City. It was a fantastic campground with well-kept facilities and campsites right on Lake Pepin's shore. During breakfast in our campsite we saw several species of birds, including a bald eagle and a downy woodpecker.

00000014.JPG

For our canoe trip, we chose a section of the Zumbro River from Theillman to river mile 13 near Dumfries. We chose that route because much of it is protected as part of the Richard J. Dorer Memorial Hardwood State Forest. Riverland Outfitters shuttled us to the put-in. A month before our trip, flooding devastated towns along the Zumbro River. Although the waters had receded well below flood stage, the river was still running at a brisk pace. It took us about six hours to cover 13 river miles on the Buffalo and Nemadji Rivers, but it only took about three hours on the Zumbro. We enjoyed the views of the hillsides and farms, the sand bars, and dodging all the submerged trees in the river. According to Paddling Minnesota, the Zumbro derives its name from the French pronunciation of "embarrassment" caused by the river's obstacles.

00000006.JPG

After getting off the water, we walked around downtown Wabasha before heading across the river to Pepin, Wisconsin, for an anniversary dinner at the Harbor View Cafe. The cafe was hopping. It took about two hours to get a table. We didn't mind because it gave us a chance to walk around the harbor and listen to some live jazz and folk music outside the pub next door. Harbor View Cafe was overpriced for the quality of the food and service, but the overall experience of visiting and dining in Pepin made it worthwhile. The great company and occasion made for a memorable experience.

Urban canoeing: Lake Phalen to Gervais Lake

| No Comments
Phalen_Gervais (2).jpg Phalen_Gervais (6).jpg

Here are some pictures from a nice little canoeing trip we took with our friend, Jim. He owns a beautiful wood canvas canoe. The three of us became friends at YMCA Camp Widjiwagan in Ely, where wood canvas canoes are used as part of the curriculum to teach young people about teamwork and environmental stewardship. In addition to being easy on the eye, wood canvas canoes are very quiet on the water--much quieter than Scrappy, our aluminum canoe.

Phalen_Gervais (0).jpg

It was nice to spend some time in the Payne-Phalen neighborhood of Saint Paul. We hardly make it over there. We put in at Lake Phalen, padded through Phalen Park, portaged around a bridge repair project to Round Lake, paddled along a scenic creek leading to Keller Lake, and then paddled under Highway 36 into Gervais Lake. It was getting dark, so we didn't spend much time on Gervais Lake before heading back to the put-in. People gush about the Minneapolis chain of lakes, but the Phalen-Gervais chain of lakes is quite nice, too. Some highlights: water fowl, families enjoying the park, ice cream truck, many bridges, kayakers practicing rolling, fall leaves, quiet canoe, spending time with a good friend.

Plotting survey responses representing three or more dimensions

| No Comments

Bob Williams recently posted a question to EVALTALK about open source software for plotting survey responses in three dimensions:

I'm wondering how to display Likert scale data in three dimensions (ie the Likert scores for each subject on 3 scale dimensions). My graphic software is not that sophisticated and I haven't been able to find an open source equivalent.... that can plot something along three dimensions (X,Y,Z) like this:
          X, Y, Z
Subject A 2, 5, 8
Subject B 3, 5, 9
Subject C 3, 7, 3

Another evaluator, Bill Harris, recommended Rlogo.jpg and ggplot2 in particular.

Bob's question and the mention of Rlogo.jpg inspired me to write up a brief demonstration of plotting multi-dimensional data. As I have mentioned in the past (see here, here, and here), plotting or mapping three or more dimensions can be difficult and potentially misleading because plots and maps are inherently two-dimensional. I would say that plotting three survey response dimensions actually requires plotting four dimensions: X, Y, Z, and the frequency of each X, Y, and Z combination. That's because evaluators commonly have large data sets with repeated response patterns. I simulated a larger data set from Bob's example in order to illustrate plotting four dimensions.

#Open a recording window (if using Windows).
windows(record=T)

#Manually read in the data.
data.3D <- data.frame(Subject = c("A", "B", "C"),
  X = c(2, 3, 3),
  Y = c(5, 5, 7),
  Z = c(8, 9, 3))
data.3D

#Larger data sets will contain frequently repeated combinations of responses.
#Simulate a larger data set with repeated responses combinations.
data.4D <- data.3D
for(i in 1:200) {
  temp <- data.frame(Subject=factor(paste(as.character(data.3D[, 1]), i, sep=".")), 
  X=round(jitter(data.3D[, 2], 7), 0), 
  Y=round(jitter(data.3D[, 2], 7), 0), 
  Z=round(jitter(data.3D[, 4], 7), 0))
  data.4D <- rbind(data.4D, temp)
}
library(plyr)
data.4D <- ddply(data.4D, c("X", "Y", "Z"), "nrow")
names(data.4D)[which(names(data.4D)=="nrow")] <- "Frequency"
data.4D <- data.4D[order(data.4D$Frequency, decreasing=T), ]

#Scatterplot matrices
pairs(data.3D[2:4], xlim=c(0, 10), ylim=c(0, 10), 
  main="Responses to three Likert scale items")
library(psych)
pairs.panels(sapply(data.4D[1:3], jitter), xlim=c(0, 10), ylim=c(0, 10), ellipses=F, 
  hist.col="blue", pch=1, scale=T, main="Responses to three Likert scale items")

Scatterplot_Matrix1_EVALTALK.png Scatterplot_Matrix2_EVALTALK.png

#Load ggplot2 library; set theme without gray background.
library(ggplot2)
theme_set(theme_bw())

#Create a scatterplot with size and color varying by the third dimension.
plot1 <- ggplot(data=data.3D, aes(x=X, y=Y, color=Z, size=Z)) + 
  geom_point() + 
  xlim(0, 10) + 
  ylim(0, 10) + 
  scale_colour_gradient(breaks=0:10, limits=c(0, 10), low="lightblue", high="darkblue") + 
  scale_size_continuous(breaks=0:10, limits=c(0, 10)) + 
  opts(title="Responses to three Likert scale items")
plot1

#Create a jittered scatterplot with color varying by the third dimension 
#and size by response combination frequency.
plot2 <- ggplot(data=data.4D, aes(x=X, y=Y, color=Z, size=Frequency)) + 
  geom_jitter() + 
  xlim(0, 10) + 
  ylim(0, 10) + 
  scale_colour_gradient(breaks=0:10, limits=c(0, 10), low="lightblue", high="darkblue") + 
  scale_size_continuous() + 
  opts(title="Responses to three Likert scale items")
plot2

ggplot1_EVALTALK.png ggplot2_EVALTALK.png

#Reshape data from wide to longer.
data.3D.longer <- melt(data.3D, id.vars=c("Subject", "Y"), variable_name="Item")
names(data.3D.longer)[which(names(data.3D.longer)=="value")] <- "Response"
data.3D.longer

#Create a scatterplot with faceted second and third dimensions.
plot3 <- ggplot(data=data.3D.longer, aes(x=Response, y=Y)) + 
  geom_point(color="blue") + 
  xlim(0, 10) + 
  ylim(0, 10) + 
  facet_grid(. ~ Item) + 
  opts(title="Responses to three Likert scale items")
plot3

#Reshape simulated data from wide to longer.
data.4D.longer <- melt(data.4D, id.vars=c("Y", "Frequency"), variable_name="Item")
names(data.4D.longer)[which(names(data.4D.longer)=="value")] <- "Response"
head(data.4D.longer)

#Create a scatterplot with faceted second and third dimensions 
#and size varying by response combination frequency.
plot4 <- ggplot(data=data.4D.longer, aes(x=Response, y=Y, size=Frequency)) + 
  geom_jitter(color="blue") + 
  xlim(0, 10) + 
  ylim(0, 10) + 
  facet_grid(. ~ Item) + 
  opts(title="Responses to three Likert scale items")
plot4

ggplot3_EVALTALK.png ggplot4_EVALTALK.png

KFAI rocks

| No Comments

I want to give a shout-out to KFAI, Fresh Air Community Radio. I listen to KFAI just about every day. And every day I am impressed by the station's diverse programming and skilled volunteer DJs. I urge everyone to tune into 90.3 in Minneapolis, 106.7 in Saint Paul, or on the Web at http://www.kfai.org. Then support them with a contribution. Here's a list of my favorite shows to get you started:

  1. African Rhythms
  2. Sugar Shop
  3. Sangam
  4. Caribbean Jam
  5. Mostly Jazz
  6. Blueslady's Time Machine
  7. Louisiana Rhythms

Following Douglas Bates' advice for those required to produce p-values for fixed-effects terms in a mixed-effects model, I wrote a function to perform a likelihood ratio test for each term in an lmer() object. Bates has championed the notion that calculating the p-values for fixed effects is not trivial. That's because with unbalanced, multilevel data, the denominator degrees of freedom used to penalize certainty are unknown (i.e., we're uncertain about how uncertain we should be). As the author of lme4, the foremost mixed-effects modeling package in Rlogo.jpg, he has practiced what he preaches by declining to approximate denominator degrees of freedom as SAS does. Bates contends that alternative inferential approaches make p-values unnecessary. I agree with that position, focusing instead on information criteria and effect sizes. However, as a program evaluator, I recognize that some stakeholders find p-values useful for understanding findings.

A likelihood ratio test can be used to test if the sample size is large. According to Fitzmaurice, Laird, and Ware, twice the difference between the maximized log-likelihoods of two nested models, , represents the degree to which the reduced model is inadequate. When the sample size is large, with degrees of freedom equal to the difference in parameters between the full and reduced models, . The p-value is .

What if the sample size is not large? A p-value based on will be too liberal (i.e., the type I error rate will exceed the nominal p-value). More conservatively, we might say that with numerator and denominator degrees of freedom. According to Snijders and Bosker, the effective sample size lies somewhere between total micro-observations (i.e., at level one) and clusters randomly sampled in earlier stages (i.e., at higher levels). Formally, the effective sample size is , where observations are nested within each cluster and intraclass correlation is . Even if is large, (and statistical power) could be quite small if is small and is large: . Unbalanced designs, modeling three or more levels, and cross-level interactions add to our uncertainty about the denominator degrees of freedom.

The function I wrote chews up the lmer() model call and concatenates the frame and model matrix slots, after which it iteratively fits (via maximum likelihood instead of restricted ML) models reduced by each fixed effect and compares them to the full model, yielding a vector of p-values based on . As the example shows, the function can handle shortcut formulas whereby lower order terms are implied by an interaction term. The function doesn't currently handle weights, glmer() objects, or on-the-fly transformations of the dependent variable [e.g., log(dep.var) ~ ...]. The accuracy of resulting p-values depends on large sample properties, as discussed above, so I don't recommend using the function with small samples. I'm working on another function that will calculate p-values based on the effective sample size estimated from intraclass correlation. I will post that function in a future entry. I'm sure the following function could be improved, but I wanted to go ahead share it with other applied researchers whose audience likes p-values. Please let me know if you see ways to make it better.

p.values.lmer <- function(x) {
  summary.model <- summary(x)
  data.lmer <- data.frame(model.matrix(x))
  names(data.lmer) <- names(fixef(x))
  names(data.lmer) <- gsub(pattern=":", x=names(data.lmer), replacement=".", fixed=T)
  names(data.lmer) <- ifelse(names(data.lmer)=="(Intercept)", "Intercept", names(data.lmer))
  string.call <- strsplit(x=as.character(x@call), split=" + (", fixed=T)
  var.dep <- unlist(strsplit(x=unlist(string.call)[2], " ~ ", fixed=T))[1]
  vars.fixef <- names(data.lmer)
  formula.ranef <- paste("+ (", string.call[[2]][-1], sep="")
  formula.ranef <- paste(formula.ranef, collapse=" ")
  formula.full <- as.formula(paste(var.dep, "~ -1 +", paste(vars.fixef, collapse=" + "), 
                  formula.ranef))
  data.ranef <- data.frame(x@frame[, 
                which(names(x@frame) %in% names(ranef(x)))])
  names(data.ranef) <- names(ranef(x))
  data.lmer <- data.frame(x@frame[, 1], data.lmer, data.ranef)
  names(data.lmer)[1] <- var.dep
  out.full <- lmer(formula.full, data=data.lmer, REML=F)
  p.value.LRT <- vector(length=length(vars.fixef))
  for(i in 1:length(vars.fixef)) {
    formula.reduced <- as.formula(paste(var.dep, "~ -1 +", paste(vars.fixef[-i], 
                       collapse=" + "), formula.ranef))
    out.reduced <- lmer(formula.reduced, data=data.lmer, REML=F)
    print(paste("Reduced by:", vars.fixef[i]))
    print(out.LRT <- data.frame(anova(out.full, out.reduced)))
    p.value.LRT[i] <- round(out.LRT[2, 7], 3)
  }
  summary.model@coefs <- cbind(summary.model@coefs, p.value.LRT)
  summary.model@methTitle <- c("\n", summary.model@methTitle, 
                           "\n(p-values from comparing nested models fit by maximum likelihood)")
  print(summary.model)
}

library(lme4)
library(SASmixed)
lmer.out <- lmer(strength ~ Program * Time + (Time|Subj), data=Weights)
p.values.lmer(lmer.out)

Yields:

Linear mixed model fit by REML 
(p-values from comparing nested models fit by maximum likelihood)
Formula: strength ~ Program * Time + (Time | Subj) 
   Data: Weights 
  AIC  BIC logLik deviance REMLdev
 1343 1383 -661.7     1313    1323
Random effects:
 Groups   Name        Variance Std.Dev. Corr   
 Subj     (Intercept) 9.038486 3.00641         
          Time        0.031086 0.17631  -0.118 
 Residual             0.632957 0.79559         
Number of obs: 399, groups: Subj, 57

Fixed effects:
                Estimate Std. Error   t value p.value.LRT
(Intercept)     79.99018    0.68578 116.64000       0.000
ProgramRI        0.07009    1.02867   0.07000       0.944
ProgramWI        1.11526    0.95822   1.16000       0.235
Time            -0.02411    0.04286  -0.56000       0.564
ProgramRI:Time   0.12902    0.06429   2.01000       0.043
ProgramWI:Time   0.18397    0.05989   3.07000       0.002

Correlation of Fixed Effects:
            (Intr) PrgrRI PrgrWI Time   PrRI:T
ProgramRI   -0.667                            
ProgramWI   -0.716  0.477                     
Time        -0.174  0.116  0.125              
ProgrmRI:Tm  0.116 -0.174 -0.083 -0.667       
ProgrmWI:Tm  0.125 -0.083 -0.174 -0.716  0.477

Urban canoeing: Cedar Lake to Minnehaha Falls and back

| No Comments
IMG_5887_sm.gif

Chris Desjardins and I took another urban canoe trip yesterday. We originally planned to pick up where we left off last time, at Hidden Falls in Saint Paul, but we chose a different route to take advantage of recent rains that had swollen Minnehaha Creek and to try out my Wike Woody Wagon Canoe Trailer.

We left from Cedar Lake where my canoe, Scrappy, resides on a rack rented from the City of Minneapolis. After loading my bike, we paddled through Lake of the Isles to Lake Calhoun, portaged to Lake Harriet, and then portaged over to Minnehaha Creek. The creek was so high that we had to portage around or duck under several bridges. We used the bike trailer for the longer portages before the creek, and after reaching Minnehaha Falls, I transported Scrappy all the way back to Cedar Lake by bike. Click here for a map of our canoe route.

The entire trip took about seven hours, not counting a stop for dinner at Sea Salt Eatery. My bike took a beating from trees and a wall along the creek, the Wike trailer didn't perform as well as I had hoped, and I'm still exhausted the next day, but I thoroughly enjoyed the Cedar-Minnehaha-Cedar loop. I especially got a kick out of people's reactions to Scrappy hitched to my bike.

IMG_4091.jpg IMG_4097.jpg IMG_4109.jpg
IMG_4094.jpg IMG_4104.jpg
IMG_4106.jpg IMG_4114.jpg

Structural equation modeling for theory-driven evaluation

| No Comments

Just to show that I'm not the biggest slacker-blogger on the Web, I want to direct you to my guest-post on AEA365: A tip-a-day by and for evaluators. I chose my "tip" because I think structural equation modeling (SEM) and logic modeling would complement each other very well, but very few researchers have combined the two approaches. Those of us who use SEM know how important it is to have strong prior theory for model fit and valid conclusions. We could learn a lot from evaluators who are skilled at developing logic models. Conversely, theory-driven evaluators could improve their practice by carefully attending to statistical power, construct validity, attenuation due to measurement error, and the decomposition of total effects. I am very interested in hearing others' opinions on this issue, so please leave a comment here or at AEA365.

A logic model (left) operationalized as a partial mediation growth model (right)
AEA_Tip-a-Day_Logic_Model.png AEA_Tip-a-Day_Path_Diagram.png

I think I have settled on a dissertation topic: spatiotemporal piecewise regression evaluation. Spatiotemporal piecewise regression (SPR) refers to the analysis of longitudinal data from a spatial regression discontinuity (SRD) design with multiple pre-test observations. SRD offers a way to quasi-experimentally estimate local average treatment effects (LATEs) of geographically implemented programs or policies; SPR is a way to estimate change in LATEs over time. My dissertation will describe SPR methodology in detail, including validity threats, and demonstrate an SPR evaluation of an educational program. As discussed by Shadish, Cook, and Campbell, multiple pre-test observations will allow me to address validity questions.

I decided to conduct a preliminary SPR analysis of data from the educational program to determine if the topic would be feasible and to include some results in a conference paper proposal. As discussed in earlier posts (here, here, and here), Rlogo.jpg can be used to plot regression discontinuity fitted lines, showing the LATE at the treatment assignment cutoff point. The ggplot2 package can be used to plot SPR fitted lines and change in LATEs over time, but it's not easy. One reason is that ggplot2 has a steep learning curve and limits control over the legend's appearance, although its curve is not as steep as lattice's. Another reason SPR plots are difficult to produce is that they represent several dimensions: north/south/east/west (reduced to one-dimensional distance), time, and the outcome. The SPR plots below show that participation in the educational program is associated with an initially positive, small LATE that diminishes over time.

Spatiotemporal piecewise regression fitted line plots
spatiotemporal_piecewise_regression_LATEs.png spatiotemporal_piecewise_regression.png

Compressing KML files with R

| No Comments

Compressing Google Earth files with Rlogo.jpg is not well documented on the Web, so I thought I should share my approach. I am working on my presentation for the upcoming Minnesota Assessment Conference. I will be presenting geographic maps of 2010 Minnesota assessment results. To minimize clutter in the maps and enhance the experience of stakeholders, I am creating two versions of each map: a static PDF map with no school district labels and a Web-based, interactive map. Each PDF will link to a keyhole markup language (KML) file through Google Maps, much like this proficiency map described here.

The KML file sizes exceed Google Maps' interface limit, so I had to find a way to compress them into KMZ files. I could compress each KML file using Google Earth or 7-Zip, but since I am already creating the maps with Rlogo.jpg, I decided to find a way to compress the KML files via code. Gunzipping is easy in Rlogo.jpg (e.g., see gzfile), but Google Maps will not accept gunzipped KML files. Thanks to Duncan Temple Lang, the zip() function in library(Rcompression) will create KMZ files with standard ZIP compression. Here's a reproducible example:

library(maptools)
data(wrld_simpl)
sw <- slot(wrld_simpl[wrld_simpl$NAME=="South Africa",], "polygons")[[1]]
tf <- tempfile()
kmlPolygon(sw, kmlfile=paste(tf, ".kml", sep=""), name="South Africa", col="#df0000aa", lwd=5, border=4, kmlname="KMZ test", kmldescription="<i>KMZ</i> file created with <a href='http://www.r-project.org'>R</a>.")
install.packages("Rcompression", repos="http://www.omegahat.org/R")
library(Rcompression)
zip(zipfile=paste(tf, ".kmz", sep=""), files=paste(tf, ".kml", sep=""))
paste(tf, ".kmz", sep="") #path and filename

Florida paddling excursions

| No Comments
IMG_4036.jpg

Amy and I had a wonderful time vacationing in Florida with my family. We visited Cedar Key along Florida's hidden coast and stayed the rest of the time in Redington Beach. The BP oil spill had not yet reached the coast there, but almost every Floridian we talked to was bracing for environmental and economic disaster.

IMG_1182.jpg

We experienced the Gulf of Mexico ecosystem by boat on three different occasions. Amy and I paddled a tandem kayak from Cedar Key out to Atsena Otie Key and Snake Key, two islands within the Cedar Keys National Wildlife Refuge. We encountered great egrets, cormorants, and a pod of dolphins that swam within 10 yards of our kayak. The kayak was provided at no extra charge by Faraway Inn, a nice place to stay in Cedar Key. Vsiting the Shell Mound, built over 3,000 years by Native Americans, was another highlight from our time in Cedar Key.

IMG_3816.jpg

For our second excursion, we headed to Manatee Springs State Park where we rented a canoe to explore the springs and the Suwannee River. We were amazed by the clear blue water of the springs, the cypress trees covered with Spanish moss, the sturgeon leaping out of the river, and the amount of sunbathing turtles and alligators.

We were sad that we didn't see any manatees at Manatee Springs or in the Suwannee River, but my family and I saw a couple manatees on the third paddling excursion at Fort De Soto Park. They were eating near Soldier's Hole in Mullet Key Bayou, and we caught glimpses of them when they surfaced to breathe.

The Early Childhood Environment Rating Scale (Revised; ECERS-R) was designed to measure process quality in child care centers, but the data set that I have been analyzing contains ratings of center classrooms and family child care (FCC) homes. A sizable portion of the program's budget was spent on training ECERS-R raters. Training them to reliably use both the ECERS-R and the Family Day Care Rating Scale (FDCRS) was cost prohibitive.

Was it unfair to rate FCC homes with the ECERS-R instrument? I conducted a differential item functioning (DIF) analysis to answer that question and remedy any apparent biases. Borrowing a quasi-experimental matching technique (not for the purpose of inferring causality), I employed propensity score weighting to match center classrooms with FCC homes of similar quality. For each item j, classroom/home i's propensity score (i.e., predicted probability of being an FCC home) was estimated using the following logistic regression model:

,

where FCC is a focal group dummy variable (1 if home; 0 if center) and Totalc stands for corrected total score (i.e., mean of item scores excluding item j). Classroom weights were calculated from the inverse of a site's predicted probability of its actual type and normalized so the weights sum to the original sample size:

.

Such weights can induce overlapping distributions (i.e., balanced groups). In this case, the weights were were applied via weighted least squares (WLS) regression of item scores on corrected total scores and the focal group dummy, as well as an interaction term to consider the possibility of nonuniform/crossing DIF:

.

By minimizing , WLS gave classrooms/homes with weights greater than 1 more influence over parameter estimates and decreased the influence of those with weights less than 1.

The results suggest that three provisions for learning items function differentially and in a non-uniform manner, as indicated by statistically significant interactions (see the table below). None of the language/interaction items exhibited DIF. The regression coefficients suggest that differentially functioning ECERS items did not consistently favor one type of care over the other. After dropping the three differentially functioning items from the provisions for learning scale and re-running the DIF analysis, none of the remaining items exhibited differential functioning.

Summary of WLS estimates: Differentially functioning ECERS items

Estimate Std. Error t value Pr(> |t|)
Item 8: Gross motor equipment
Intercept 3.014 2.044 1.475 0.149
Corrected total 0.523 0.445 1.175 0.248
FCC home -5.234 2.278 -2.297 0.028
Corrected*FCC 1.161 0.498 2.332 0.026
Item 22: Blocks
Intercept -3.990 1.919 -2.079 0.045
Corrected total 1.608 0.403 3.994 0.000
FCC home 3.929 2.076 1.892 0.067
Corrected*FCC -0.893 0.437 -2.045 0.049
Item 25: Nature/science
Intercept -4.802 2.566 -1.872 0.070
Corrected total 1.833 0.540 3.393 0.002
FCC home 5.996 2.771 2.164 0.038
Corrected*FCC -1.443 0.584 -2.471 0.019

Plot of weighted observations and fitted lines: Three crossing DIF items and one fair item (Item 19: Fine motor activities)
ECERS_DIF.png

Scores from the final provisions for learning scale exhibited good reliability of α = 0.87. Scores from the language/interaction scale also exhibited good reliability of α = 0.86. None of the items detracted problematically from overall reliability (i.e., the most that α increased after dropping any item was 0.01 in one instance).

I want to caution readers against generalizing the factor structure and DIF findings beyond this study because the sample is small in size (n = 38 classrooms/homes) and was drawn from a specific population (urban child care businesses subject to local market conditions and state-specific regulations). I largely avoided making statistical inferences in the preceding analyses by exploring the data and comparing relative fit, but in this case I used a significance level of 0.05 to infer DIF. The data do not provide a large amount of statistical power, so there could be more items that truly function differentially. For the final paper, I may apply a standardized effect size criterion. Standardized effect sizes are used to identify DIF with very large samples because the p-values would lead one to infer DIF for almost every item. De-emphasizing p-values in favor of effect sizes may be appropriate with small samples, too.

I followed up the exploratory factor analysis (EFA) of the Early Childhood Environment Rating Scale (ECERS-R) with a confirmatory model comparison of the six-, two-, and one-factor solutions. The six-factor specification adhered to the original subscales in the ECERS-R instrument (after dropping highly skewed items); the two-factor specification simply restricted the minor loadings from the EFA to zero; and the one-factor specification restricted items to zero if they loaded less than |0.3| in the one-factor EFA solution.

The six-factor model did not converge, and the two-factor model exhibited better fit than the one-factor model. These results support earlier findings by Sakai and colleagues (2003) and Cassidy and colleagues (2005) that the ECERS-R measures two latent factors: provisions for learning and language/interaction experienced by children.

The goal of this analysis was to find a factorially simple solution. Higher order or hierarchical models of greater complexity would be worth considering in light of evolving theories and needs surrounding measures of child care quality and given that:

  • several ECERS-R items cross-loaded in the EFAs and were subsequently excluded
  • the overall fit of the two-factor model was poor
  • the estimated correlation between the latent factors was somewhat large (0.68).

Comparison of one- and two-factor models

CFA fit measureOne factorTwo factors
Model χ2750, df = 377, p < 0.01306, df = 208, p < 0.01
χ2 (null model)1139, df = 406653, df = 231
Goodness-of-fit index0.5090.645
Adjusted goodness-of-fit index0.4330.569
RMSEA index0.1640.113
Bentler-Bonnett NFI0.3410.532
Tucker-Lewis NNFI0.4520.742
Bentler CFI0.4910.768
SRMR0.1150.107
BIC-621-451

Measurement model (click image)
ECERS_Path_Diagram.png

The Early Childhood Environment Rating Scale, revised edition (ECERS-R; Harms, Clifford, & Cryer, 1998), is an instrument used widely to observe and rate levels of process quality in child care centers. The authors have divided the instrument into six subscales to help ensure broad and flexible coverage of various aspects of child care quality. (The ECERS-R actually contains seven subscales, but the last one pertains to parents and staff instead of process quality experienced by children.) Psychometric analyses suggest the ECERS-R actually measures two latent dimensions of quality at most. Perlman, Zellman, and Le (2004) concluded that process quality, as measured by the ECERS-R, is unidimensional. Sakai and colleagues (2003) and Cassidy and colleagues (2005) concluded that the ECERS-R measures two latent dimensions, although their factor loadings and interpretations differ across the two studies.

I am writing a paper to present at the upcoming American Educational Research Association (AERA) conference in Denver. The paper will report results from a structural equation mediation model of the influence of on-site child care professional development on school readiness through child care quality. Given the lack of agreement among the earlier psychometric analyses of the ECERS-R, I conducted an exploratory factor analysis to see if I could replicate findings from one of the earlier studies. As shown in the table below, my preliminary results and interpretations do not align perfectly with either of the two-factor solutions from the earlier studies, but the similarities helped me decide which items to drop and which of my interpretations to keep. I plan to use a confirmatory factor analysis to formally compare model fit between the one- and two-factor solution before estimating the mediation model. I hope the summary below will help others who are wrestling with the possibility of multiple dimensions of child care quality as measured by the ECERS-R.

Summary of ECERS-R two-factor solutions from three studies

Item numberItemSubscaleSakai and colleagues (2003)Cassidy and colleagues (2005)Moore (2010)Decision
1Indoor spaceSpace and furnishingsProvisions for learning  Drop
2Furniture for care, play, and learningSpace and furnishingsTeaching and interactions Provisions for learningDiscrepancy
3Furnishings for relaxationSpace and furnishingsTeaching and interactionsMaterials/activities Discrepancy
4Room arrangementSpace and furnishingsProvisions for learning  Drop
5Space for privacySpace and furnishings Materials/activitiesProvisions for learningKeep
6Space for gross motorSpace and furnishings  Provisions for learningDiscrepancy
7Child-related displaySpace and furnishingsTeaching and interactions  Drop
8Gross motor equipmentSpace and furnishingsProvisions for learning Provisions for learningKeep
9Greeting/departingPersonal care routinesTeaching and interactions  Drop
10Meals/ snacksPersonal care routinesProvisions for learning  Drop
11Nap/restPersonal care routines  Provisions for learningDiscrepancy
12Toileting/diaperingPersonal care routinesProvisions for learning  Drop
13Health practicesPersonal care routinesProvisions for learning  Drop
14Safety practicesPersonal care routinesProvisions for learning  Drop
15Books and picturesLanguage-reasoningProvisions for learningMaterials/activitiesProvisions for learningKeep
16Encouraging children to communicateLanguage-reasoning  Provisions for learningDiscrepancy
17Using language to develop reasoning skillsLanguage-reasoningTeaching and interactionsLanguage/interactionLanguage/interactionKeep
18Informal use of languageLanguage-reasoningTeaching and interactionsLanguage/interactionLanguage/interactionKeep
19Fine motorActivitiesTeaching and interactionsMaterials/activitiesProvisions for learningKeep
20ArtActivities Materials/activities Drop
21Music/movementActivitiesTeaching and interactions Language/interactionKeep
22BlocksActivitiesProvisions for learningMaterials/activitiesProvisions for learningKeep
23Sand/waterActivitiesProvisions for learning  Drop
24Dramatic playActivitiesProvisions for learningMaterials/activitiesProvisions for learningKeep
25Nature/scienceActivities Materials/activitiesProvisions for learningKeep
26Math/numbersActivitiesTeaching and interactionsMaterials/activitiesProvisions for learningKeep
27Use of TV, video, and/or computersActivities  Language/interactionDiscrepancy
28Promoting acceptance of diversityActivitiesProvisions for learning Provisions for learningKeep
29Supervision of gross motor activitiesInteraction  Language/interactionDiscrepancy
30General supervision of childrenInteractionProvisions for learningLanguage/interaction Discrepancy
31DisciplineInteraction Language/interactionLanguage/interactionKeep
32Staff-child interactionsInteractionProvisions for learningLanguage/interactionLanguage/interactionKeep
33Interactions among childrenInteractionTeaching and interactionsLanguage/interactionLanguage/interactionKeep
34ScheduleProgram structureProvisions for learning  Drop
35Free playProgram structureTeaching and interactions  Drop
36Group timeProgram structureProvisions for learningLanguage/interactionLanguage/interactionKeep
Note: Empty cells indicate loadings less than |0.3|, cross-loadings greater than |0.3|, or items skewed greater than |2|. Item 37, which asks about provisions for children with disabilities was excluded due to high missingness.