Recently in Statistics and R Category

Hi all!
I've been back at Grinnell for about a month now, and so far my semester's off to a great start! Very busy, but not busy enough to prevent me from finishing up my data analysis.

It looks like Echinacea purpurea's breeding system is similar to Echinacea angustifolia's. Styles that receive compatible pollen shrivel up within a few days, whereas styles that don't persist longer, often for more than a week. In addition, E. purpurea is self-incompatible. (This means that an individual plant can't pollinate itself, it needs pollen from another plant of the same species.)

To complicate things a little:
Not all of the flower heads I was studying had finished flowering by the time I left Minnesota, so I do not have much data on the top several rows of styles on many of them. Since styles in higher-up rows persist somewhat shorter than in lower rows, I cannot be sure that the trends I saw in my data hold for the top rows of all flower heads.
Also, some of the statistical tests I ran showed that style persistence in the self-pollination treatment differs significantly from the control treatment, while others do not. I'm not sure why that would be.

Here are my csv file and my analysis in R:
epurpurea.csv
epurpureaAnalysis.R

If anyone has any suggestions for improvement or other things I could look at with this data, please let me know!

And I posted much of my data analysis on H. helianthoides already, but here are the "final" versions. (But, again, I'm open to suggestions for further improvement!)
hhelianthoides.csv
FinalAnalysisLeeRodman.R

| No Comments

H. helianthoides appears to be self-incompatible. Here's the data I collected:
hhelianthoides.csv
I've also started my data analysis in R. It's not done yet, but here's what I have so far:
Data Analysis.txt
Basically, styles persist significantly longer when self-pollinated or not pollinated than when cross-pollinated. However, in the top rows of florets on each flower head, style persistence does not differ as much between the treatments (because all the styles in these rows shrivel rather quickly). Therefore, when using style persistence to study other aspects of this species' breeding system (ex. pollinator efficiency, compatibility of specific individuals), one should use the bottom several rows of florets. In these rows, cross-pollinated styles always shrivel within three days of pollen application, whereas styles that do not receive compatible pollen never shrivel so quickly.

As I mentioned before, I was unable to collect much quantitative data about C. palmata style persistence. But I did notice some things that might be helpful to anyone interested in studying this species further. The following document gives a brief summary:
C. palmata Summary.doc

| No Comments

Here's the final dataset for my compatibility experiment. The experiment is officially ended today (I collected the last bit of data). The dataset contains GPS data (column name distBetween). I missed one plant while GPS-ing, so I used the hand-measured data (for flag #6 at Nessman's). I also corrected several errors in the datasheet.

Data for Analysis -- cswitzer -- 31 July 2011.csv


We spent some time GPS-ing the plants, so we could get the exact distances between them. Here is a csv file with the gps data.


I have been working on analyzing all my data. I looked at plots of each of my individual sties, as well as all the data combined. The data are almost exactly opposite of what I expected.
Here's the script I've been exploring:
callinCompatRScript31july2011.R


Here's a picture of Josh, Amber Z, and I out in the field (having a lot of fun).
IMG_0340.JPG

| No Comments

Edited by cswitzer. 25 July 2011


Characteristics of a good CSV file:
1. Use database format in Excel

See this example: http://blog.lib.umn.edu/wage0005/echinacea/2011/07/preliminary-analysis-for-calli.html
2. Don't mix text, integer, or numeric fields (you may enter NA in a numeric field to signify missing data)
3. Remove spaces from excel cells
4. No punctuation in each column name
5. Don't start a column with a number
6. Column names should be in easily typable format -- use capitals at new words and use no spaces (called camelback format)

| No Comments

Here's a link to a useful, online statistics textbook.

http://www.statsoft.com/textbook/

| No Comments

I've been working with the Stipa germination data we collected from the common garden over the summer for Stuart's R class and, among other things, have come up with a little plot of the common garden. Filled-in blue circles are where we found Stipa alive, empty circles had no seedlings. A neat thing would be some kind of heat map for longest leaf or number of leaves, but I'll try that later.

View image

| No Comments
| No Comments

This R script, hillaryLookAtAphids.r, allows one to view graphs of growth of aphid clones in Lauren and Hillary's experiment.

| No Comments

XML is packaged for R via CRAN and is based on RSXML. Perhaps this will make it easier to parse the XML that the Topcon software puts out, or easier than trying to parse it all yourself.

| No Comments

Ian wants to quantify the overlap of flowering time between all pairs of plants in his experiment. This following R script reads his file with the flowering schedule of all 31 plants in his experiment and writes a file called ianPhenPairs.csv that has the flowering schedule of every possible pair combination (one per line). Note that there is a separate record for each plant as a sire and dam.

# script ian.phenology.r

pp <- read.csv("http://blog.lib.umn.edu/wage0005/echinacea/ian.phenology.final.csv")

str(pp)

p <- merge(pp,pp, by = NULL)

str(p)

31^2 == dim(p)[1]

write.csv(p, "ianPhenPairs.csv", row.names = FALSE)

| No Comments

I dumped the topcon's gps data into a csv. this data hasn't been cleaned and contains a couple errors that have yet to be fixed, but it should be enough to flesh out some R magic to help parse the output. The point number, lat, long, and {all the entered data for the points} are comma seperated, the least being one big glob of stuff.

The first couple values are proper points but the wrong dictionary, so those will have to be done seperately.

stipa.csv

| No Comments

I only need 680 positions/site, because the seeds will be in between the plug points. So attached is a .doc and an R file w/ the script to create 3 sites with ~680 positions in each. I have also attached the resulting .csv file, 3 columns "site", "row", and "pos".

KG_row&pos_03 July.doc

KG_row&pos_03 July.R

KG_positions_03July.csv

Here's the breakdown:
site breakdown.xls

Next steps:


  1. Assign each new.env ids to a row and position. See file: sane3blocks.csv

  2. Create labels.

  3. Put labels on envelopes.

  4. Assign each plug to a row and position (keeping in mind that they're already randomized in the trays.)

  5. Develop planting protocol.

  6. Organize materials for planting.

  7. Mow sites.

  8. Plant.


| No Comments

Here's a list of plants that are available for Katie to use in inb1:
plantsForKatie.csv
I made this list with this script :plantsForKatieKoch.r

| No Comments

Here is my dataset that I am working on analyzing in R as a .csv file.

halverson.data.091.csv

Stuart, here is my R script so far:

halverson.data.analysis1.R

I made new columns in the .csv spreadsheet for the factors and levels we discussed. I will work on a list of hypotheses to test. I think I changed the definition of "y" when I did my 24 hour analysis. Can I give "y" a different name for each analysis? Or does the code need to read a defined "y" each time?

Thanks for the help and check out the graph of 24 hours and the summary m2.

Allegra

| No Comments
Here's a snippet of R code showing how to extract info from the shrivel character data (a file is below...
df <- data.frame(shrivel.txt =c("x", "xoxx", "xxxx", "oooo", "xoooo"))
df      # start off with this data frame
str(df)

df$shrivel.count <- nchar(as.character(df$shrivel.txt)) #add column

vx <- gsub("o", "", df$shrivel.txt)  # replace o with ""
vx
df$shrivel.xs <- nchar(vx)           # make a new column in df

vo <- gsub("x", "", df$shrivel.txt)  # replace x with ""
vo
df$shrivel.os <- nchar(vo)           # make a new column in df

str(df)
df      # final data frame
codeForAllegra.r
| No Comments

Here's a snippet of code I used to generate files to upload to visors.

makeRandFileForVisor <- function(size = 50, fname = "xyz"){
write.table(sample(1:size),
                 file = paste("E:\\shared\\rand",
                                    size, 
                                    fname,
                                    ".txt", 
                                    sep=""), 
                 quote= FALSE,
                 row.names= FALSE, 
                 col.names= paste("rand",size,fname, sep="")) }
visors <- c("ag","dr","kg","ad","cr","gk",
            "mmj","mj","ah","gd","sw","rs")
for (i in visors) {
makeRandFileForVisor(20,i)
makeRandFileForVisor(50,i)
makeRandFileForVisor(100,i)
makeRandFileForVisor(200,i)
}
| No Comments

This file lists flags in random orders suitable for pollinator observation tomorrow.

Here's the R code used:

flagOrder <- function() {
cat(cat(sample(LETTERS[1:8]),"\n"),
    cat(sample(LETTERS[1:8]),"\n"),
    cat(sample(LETTERS[1:8]),"\n"),
    cat(sample(LETTERS[1:8]),"\n"),
    cat(sample(LETTERS[1:8]),"\n"),
    "\n")
}
for (i in 1:20) flagOrder()
| No Comments

I generated a list of 40 random UTM coordinates for SPP and posted them here: sppRandCoords.csv.

Here's the R code I used to generate random coordinates...

df <-  data.frame(order= 1:40,
                  E= round(runif(40,  286100,  286900),2),
                  N= round(runif(40, 5077080, 5077500),2))
write.csv(df, file= "sppRandCoords.csv", row.names= FALSE) 

I gleaned the rough SPP corner coordinates from Google Earth--UTM 15T:
NE 286900 E 5077500 N
SE 286900 E 5077080 N
NW 286100 E 5077500 N
SW 286100 E 5077080 N

Here's a snippet of R code to make a plot of the points and to make a file with latitudes & longitudes..

df <- read.csv(
"http://blog.lib.umn.edu/wage0005/echinacea/sppRandCoords.csv")
plot(df$E, df$N, asp = 1, type = "n")
text(df$E, df$N, labels= df$order)
require(PBSmapping)
names(df) <- c("EID", "X", "Y") 
df <- as.EventData(df)
attr(df, "projection") <- "UTM" 
attr(df, "zone") <- 15
fred <- convUL(df, km=FALSE)
write.csv(fred, file= "sppRandLL.csv", row.names= FALSE)
Here's a link to those 40 random points in a lat long projection sppRandLL.csv.
| No Comments