Merging panel data in R
It is easy to merge two data frames with a common column in R (see R: Merge Two Data Frames), but I am not sure whether the same command “merge” can be used easily to combine two panel data together through the intersection of two common columns, for instance, “NAME” and “YEAR.” So I define the following function “pmerge” to do that for me:
1. Codes for the function
-------------------------------------------------------
pmerge <- function(X,Y){
n <- max(X$YEAR)-min(X$YEAR)+1
n0 <- min(X$YEAR)
Z <- list(NULL)
for (i in 1:n) {
X1 <- subset(X,YEAR==n0+i-1)
Y1 <- subset(Y,YEAR==n0+i-1)
Z[i] <- list(merge(X1,Y1,by.x="NAME",
by.y="NAME",all=TRUE))
}
ZZ <- data.frame(Z[1]);dim(ZZ)
YEAR <- rep(min(X$YEAR),dim(data.frame(Z[1]))[1])
for (i in 2:n) {
ZZ <- rbind(ZZ,data.frame(Z[i]))
YEAR <- c(YEAR, rep(min(X$YEAR)+i-1,
dim(data.frame(Z[i]))[1]))
}
ZZ$YEAR <- YEAR; ZZ
}
-------------------------------------------------------
2. An example
-------------------------------------------------------
#. Assuming that you will merge two data frames, “A” and “B,” through two common columns of “NAME” and “YEAR.
> dim(A); names(A)
[1] 12820 4
[1] "NAME" "YEAR" "SERVICE" "POP"
> dim(B); names(B)
[1] 1071 4
[1] "NAME" "YEAR" "MSA" "HGWAY"
> D <- pmerge(A,B)
> dim(D); names(D)
[1] 12910 8
[1] "NAME" "YEAR.x" "SERVICE" "POP" "YEAR.y" "MSA" "HGWAY"
[8] "YEAR"
> D[sample(dim(D)[1],8),]
......
-------------------------------------------------------
Note that that columns “YEAR.x” and “YEAR.y” may have missing values, indicating missing observations in either data A or data B, respectively.