Stats Week 9 (R) Lab FAQ

As I get questions about the Stats Final, I will post them here, prepended by the date/time that I post them.  FAQs are in reverse chronological order (most recent first).

If you are doing Part I of the Lab (the Cyclismo R Tutorial), you might wonder what to turn in as a lab report!  Please hand in the conclusions of the Case Study that you do.  If you don’t get to the Case Study due to lack of time, hand in a one paragraph summary of which tutorial lessons you finished and what you learned from the tutorial.

If you are doing Part II of the Lab, we have some suggestions for how to do the ‘split’ if you want to try it!    (I haven’t had time to try these yet, but will over the weekend):

  • from Isaac Goodfellow (who wins the CPAT prize – thank you Isaac!): His data is in a data frame ‘tkcs’.  From there he did “split_dat = split(tkcs, tkcs$Species_ID)”, which creates a list (in R seems to be a hash map of sorts?), and then to access the data:  “split_dat$ABAM$Height”, or whatever tree species is appropriate, e.g.,
    “mean(split_dat$ABAM$Height)”.  The species name becomes the key by which the data is indexed.
  • from Kara:  She created an age class variable in excel to sort by age instead of site (thus the Age_Class variable).  Her R statements follow:
    with(wk9_stats_lab_data, tapply(Height, list(Age_Class, Species_ID), mean))
    with(data, tapply(dependent variable, list(independent variable, independent variable), summary statistic)
     She says this only works with some functions, and is not yet sure why . She gets it to work with mean, median, standard dev, and min and max.
  • From Greg Stewart (GIS and R expert ecologist!): He uses the Plyr package for doing pivot tables. There are other ways, but Plyr is a very fast flexible way to break up data, perform functions on the pieces, and then put the data back together. Here’s an example for our data.
    ddply(tkcs, .(Site_ID, Species_ID), summarize, mean_DBH=mean(DBH), stdev_DBH=sd(DBH))
    After ‘summarize’, just give a header names (e.g., mean_DBH) and define the function you want to apply (e.g., mean(DBH)) to that column.