NSF Workshop on Canopy Structure Data - Workshop Notes

Thursday, April 25, 2002

Introductions
Provost- Emphasized the importance of providing support to develop interest and collaboration with other institutions. Also mentioned TESC's commitment to supporting the work of its faculty beyond its traditional liberal arts focus (particularly in the sciences).

Review Agenda
Presentation by Nalini
Access used to be the problem or obstacle for canopy researcher, but there have been significant improvements in canopy access tools. However, there is still a need for computer tools to enable greater success in answering specific ecological questions. Most of the focus has been on canopy structure. An initial survey done by Nadkarni; Cushing and Parker led to the identification of this need for better database tools. The survey ultimately led to the NSF grant proposal currently underway.

In early stages of NSF grant, Lyons' dataset was used for a case study in first year of database grant. The emphasis during the first year was on learning each other's language - computer scientists and canopy ecologist. Early research and the case study led to the "Lost Loggage"concept. Just as airlines use a 'lost luggage' chart-chart to classify possible luggage types into a finite number of possibilities, canopy researchers can use a "Lost Loggage" framework to understand canopy structure. The goal is to identify a finite number of ways you can build trees and forests from individual components.

Future vision

Understand structure and function of individual trees; whole forest structure and function (intact and disturbed); other disciplines that use tree-like objects (e.g. marine structures; rivers; nervous systems; blood vessels etc.). Hopefully, canopy researchers can extend from and derive their ideas from other disciplines. Overlaying data sets in a maximally efficient manner can provide a framework for data (e.g. nervous system overlaid by organs by muscles). Through such an overlay in a larger-scale, holistic way canopy researchers will have better understanding of how such systems work. (Overlay of structure and function studies.)

Conclusion
- The nature of canopy studies requires collaboration and a need to gain common culture. Researchers have done so and need to continue.
- Other fields have done this - astronomy; plant genome; human genome; these have had to overcome similar obstacles.
- NSF represents official sanction for good science and have been very positive about efforts- has continued to support via grant supplements.
- Work feeds into larger scale programs- global canopy program, which will pull together people, sites, projects and data worldwide; canopy database- repository and exchange for global canopy efforts. Will canopy researches use this and will it be useful for canopy scientists?
-This is their first effort to try it out with other scientists. Judy and Nalini have "prepared a meal that's not quite done" and need involvement from other researchers so they can incorporate ideas before "the meal" is complete.

Introduction to Database (Judy Cushing)
The computer science community is different from other disciplines and need interaction with other groups (e.g. ecologists) to help frame/focus their work. They are increasingly recognized by scientists and funding agencies as a mechanism for advancing the state of the art and helping solve questions.

NSF and USGS and NBII- published last year a workshop report on providing database support and data access for biodiversity and ecosystem projects. Among computer scientists, the important issue being examined is: "what are questions people would ask if they had access to other peoples data."

Two Major Issues
1. How to get data in a form that is accessible
2. Assuming data is in hand, what kind of things could scientists do with it, and what barriers are there to overcome?

There is a need for better data that is validated; documented- metadata; publicly available, and published data. However, there are sociological and practical barriers to publishing that can only be approached tangentially. Judy is interested in working on technical tools to make publishing data easier. Better tools that are needed include: 1) visualization; 2) analysis (statistically valid to combine different data collected for different purposes?); and 3) modeling. In addition to better data, there is need for a different mindsets on the side of researchers. Collaboration takes time and effort. Two years ago they had difficulty identifying ideas and questions that needed to be answered. Questions didn't have enough depth. Computer scientists involved in the project are hoping to get a start on digging into real questions that will enable them to launch some crosscutting efforts.


Summary from individual canopy researchers on projects and needs

Nalini
Previous workshop with about 15 database and canopy researchers examing questions researchers have and exploring various issues. Looking back, she realized many of the issues have been addressed in her work, although she is still struggling with scaling.

Akira Sumida
Research is focused on leaf layer structure- how distributed in 3 dimensions. In order to understand how structure (distribution of foliage) develops overtime, it is necessary to first understand branching structure. One method used is steroglasses and 3D diagrams. A more complicated issue is to understand the interaction between trees and competition between neighboring trees. How is branch structure influenced by neighboring trees?

Robert Mutzelfeldt
University of Edinborough- ecological metamodeling. Interested in how others make models. Design tools that others use for modeling. In terms of science, he mentioned the use of system models for integration. This must include spatial interactions. This is done now via computer programs. There is a need for the same types of language notation. Visualization and modeling are important and of interest. One idea generating interest is having Internet framework ("canopy grid") to bring together data, models, projects etc. Resources are linked on the Internet so any scientist can get data on the net. Hoping to get project funded.

Barbara Bond
Refers to herself as a forest physiologist or physiological ecologist. She is still trying to figure out her role in the group, but thought "guinea pig user" might be appropriate. She doesn't generate data or worry about spatial data, but is more driven by questions about forest function. Her work relates to consequences of tree/forest aging on ecological processes. For example if you harvest a basin (up to 100 acres) and a new forest grows, what are the consequences of atmospheric changes, or rain over 500 years? Also, do woody plants age in process of senescence? Is there an indogenous clock? This is the other end of spectrum of what she is working on. (So far, seems they just age based on genetic make up, but mostly experience.)

At the interim scale. She's interested in size effects- growth and hydrologic processes- exchange of material within leaf, tree, and whole forest. What are the consequences of changes in forest structure on interception of water and light and effect on carbon cycling? How are exchanges of matter and energy affected by growth?

Bob Van Pelt
Collecting data for Nalini and Judy's project. Looking at several spatial scales and 3-4d.
Need to go 3d to work between datasets. He is trying to do that at all the sites. A goal is to have functional datasets to go along with structural data. Done with idea that other data could be collected along with these. E.g. Betsy's dataset could go along with
Judy: Bob designed study. Provided ideas on what data people need and what data they collect. His work has been seminal for computer scientists.

Question: Why need to collect data? Nalini's response: it was hard to get raw datasets from other people and it was difficult to get people to document their data, so they had to collect their own data to use for the study. They wanted to have a variety of structural data collected in a specific manner, and it was difficult to secondarily bring in other data they knew they could use.

Michael Ficker
Senior TESC student working on big canopy database.

Nalini Nadkarni
Most of field research has bee in Monteverde. First 10-15 years, she used a very descriptive approach. She also had a special interest in nutrient cycling. More recently, her work is focused on long-term dynamics of recolonization following disturbance and seed germination. She is now more interested in experimental approaches. For instance, the use of canopy transplants can be used to study climate change, by exposing transplants to drier or wetter conditions.

She also is interested in being a canopy communicator. As such she co-found ICAN which is a precursor to the canopy database. During the past year she has been communicating to non-scientists-spreading the word and generating interest in forest canopies. Her latest efforts have taken her into marketing and design of a "Canopy Barbie" and skateboard logo; to talks with worship groups raising awareness of trees and forest. All of it ties back to canopy database and getting the public more aware of forest canopies.

Bram Svoboda
Jumped in August after Steve Rentmeester's departure. He is a liaison between Nalini's canopy lab and database group.

Alex Mikitik
Computer scientist student working in science database lab on the Databank.

Judy Cushing
Received her Ph.D. 10 years ago at OGI. Involved in scientific databases-want to make tools directly available to scientists. Interested in spatial data and characterizations of models to id inputs and outputs- computations and id of what models do. How help users to do own programming.

Eric Ordway
Working on databank project. Maintains BCD. Graduated from Evergreen 2 years ago and just started working on project.

Jess Parker
Forest Ecologist STRI
Want to work on large scale-ignores individual trees. Questions he's interested in are How to measure structure- flux, access and structure? Hoping he can connect structures and functions. Many of the old notions outmoded or inappropriate at scale he is working at.

Michele Berger
Working with Jess at STRI as data manager and digitizing data. Specifically working on LIDAR project. Before, database and metadata manager at SERC. Some data was made available- some experience getting centralized databases here.

Hiroki Ishi
Recently completed a post-doc with Dr. Sumida and is now seeking work. Talked about his Ph.D. work at UW (development of tree crowns and analysis of forest dynamics). Interested in how old trees maintain them once they have reached maximum crown size. Combined data w/Barbara Bond at WR- data on crown structure- developed conceptual model of development for various sizes and age of trees- Doug fir. He sees himself as end-user of data. Integrate data on different scales to answer ecological questions.

Lois Delcambre
Last July OGI became unit in Oregon Health and Science University. Commented that it is good to be part of larger group with lots interest in collaboration. She has considerable experience with collaboration. This includes working with doctors examining the behavior of physicians to see how info is accessed and used. She is a PI for a NSF grant in digital govt. program. One project included economic researchers working with govt. agencies (USFS) focused on adaptive management areas of NW. In this case, science or data was not the primary focus, but rather documents are main focus- e.g. developing metadata to find documents.

With respect to the "Lost Loggage" concepts she wants to take relative data and make concrete data.

Jessica Archer
Senior at TESC- Research Education for Undergrads. Helped create databases with all the data. Working on Guggenheim outreach projects of Nalini's data.

Jessica Berry
Student at TESC.

Genevieve Becker
Beginning canopy student.

Dave Shaw
Father-in-law passed away, not coming. Dir. Of Research WRCCRF. Interest in mistletoe. Drew on existing dataset for 12 ha and hawkswarth dwarf mistletoe. Ecological questions about dispersal and spatial distribution.

Steve Rentmeester
Currently at UW Fisheries.

MIX-N-MATCH SCENARIOS

Sumida
Surveyed tree architecture w/theodolite- visible laser finder. A red dot is seen on the branch (similar to laser pointer). Using this data he can reconstruct tree structure. Turning points and branching points, tips or leaves, the base of the branch, and branch diameter at the base were recorded for live and dead branches. (Only measures primary branches). If it forks, only thicker one measured. XYZ coordinates were recorded for every branch point. Using this method, 3D stick graphics were developed. Different colored lines automatically drawn to denote different status or type of structure. The results were recently published in Annals of Botany. This data could connect with other datasets containing XYZ coordinates.

Jess Parker
Uses ground based Lidar measurements of forest canopy structure. Data has own "flavor" which can lead to misunderstandings if not understood. He wants this to be clear so others can know how the data could be used for mix-n-match.

Using a Lidar backpack laser carried through the woods, overhead transect data is collected. The laser hits the first overhead physical structure. Can get 2000 points of raw data in a few seconds.

Data grouped by bins and height. This data is graphed by bubbles, which are proportional to distance- relative area profile. You see data on what is physically closest to you. Assumptions about the distribution in canopy can be used to adjust data. Can estimate Canopy Area Index. This visualization is not always useful because you are looking at the relative proportion of surfaces per height column. If totally open spot, this could be zero. There is a lot of variation in canopy. Sometimes you are looking through blank spaces.

Contour graphs or blobs in space are another way of visualizing data and looking at the continuity between areas of high and low density. These are like slices through tree crowns, although we aren't sure what to make of this. There are other fields where we know a lot about interpreting slices (e.g. medical cat scans) but canopy scientists do not have much experience interpreting similar slices of data. Canopy researchers need to figure out how to interpret their data with reference to other data. Contour maps can show the density of hits. Jess wants to understand the 'reasonableness' of looking at data in a specific manner- does it connect to real data? Is it worth pursuing?
Comment by Barbara Bond- lower in the canopy, there are smaller crowns so not surprising that there is more turnover from contour map to contour map, but at higher parts of the canopy, you see continuity between images- same crown showing up in several transects.

Question by Nalini- if data taken from top, would maps look the same? The answer is no-structural features are not distributed similarly from top to bottom. They are more tightly structured in upper canopy, which is an important distinction when taking spatial data.

Comment- appreciates showing data that was not so useful- bubble diagrams. Important to share what doesn't work for a particular purpose. Experiential results don't get discussed because not publishable.

Question by Barbara- when analyzing above, are you looking at just first return? Answer yes.

Contrast with above forest stuff by slicer- footprint 10 meter; satellite 50-100m; elvis-?; ecologically interesting things going at the scale he is working on.

What about idea of 3d printer- companies will do this already and put data on plastic sheets. University of San Diego offered to do some models, but they need to convert data.

Statistics and Questions- some ways the canopy can be quantified
Max height
Mean Outer height
Rugosity
Porosity
Mean CAI (canopy area index- leaf or other surfaces)
Mean cover (fraction of sky obscured)
Mean height of surfaces throughout
Mean distribution of heights above ground.
Average height profile across CAI
Skewness
Chronosequence (mean standard of error- height profiles between young, intermediate and old forests)
One can make a chart comparing statistics among age classes and look at the development sequence from young to old.

Other possible statistics (gap size distribution; patch size). Ex. mixed-deciduous; undisturbed vs. selectively cut. One can look at distribution of holes in canopy (e.g. recent cut forest has more variety of gap size) or patch size between disturbed and undisturbed. Raw data has already been manipulated in process. The forest is a continuous medium- thinking of data in this way (space) is different than thinking of it the way people who are doing branching patterns and connect the dot type analysis do. They are not thinking about the space, but about the structures in space (e.g. tree stick diagrams or cones)

Barbara Bond
Tried to do assignment, but a bit perplexed about how to do. "The other datasets compiled by this group may be useful to her research questions in many ways although there is no direct match between any of these datasets and the one she submitted.

Her interests are in how changes in canopy structure/organization, as a function of tree and forest age, affect fluxes of mater and energy. From her perspective, want gene to landscape perspective- different scales of time and space; different scales of time and space.

Ideas that occurred to her:
1. How do changes in crown organization affect light absorption patterns by foliage? Do changes in 3d patters of light absorption (independent from the total amt. of light) affect net C assimilation of canopies?
Useful datasets: Van Pelt; Parker; Sumida. Assumption that net absorption has light use efficiencies- doesn't matter whether more absorbed at top or bottom; - don't know whether structure has that much affect on total amount of absorption. Clumps from Parker data to model absorption. Hard part then is to figure out how to model photosynthesis. Now constantly adapting model to do this. Vertically explicit model of canopy divided into layers and photosynthesis to model light layers. Interested in how water coming up through trees interact with light going down to affect photosynthesis.

Process models vs. spatial models. Process ones don't consider vertical structures.

2. Is age-related decline in productivity young forests caused by decreased resource use efficiency brought on by competition (Blinkley hypothesis)? The remaining hypothesis is hard to prove. Need to use spatially explicit model to answer.

2. Does leaf-specific hydraulic conductance vary as a function of branch age, length, and height above ground? How does this affect transpiration and net carbon assimilation in different forest ages?
Useful datasets: Van Pelt; Parker; Ishii. Need to develop hydraulic maps using structure data. Take examples from electronic world to examine architecture. Also need root architecture.

3. How do age related changes in crown structure (including pops and distribution of epiphytes affect moisture interception (liquid and vapor phase) by canopies. How does this interception in turn affect the canopy microclimate and soil moisture? How do these changes in the ambient environment affect carbon assimilation?
Useful datasets: Van Pelt, Parker, Lyons. Epiphytes also capturing evaporating moisture from forest floor. Lags are known to exist and are unaccounted for, but thinking is that low hanging epiphytes may be capturing a lot of that water.

4. Do epiphytes function as significant moisture "capacitors in old-growth canopies?
Useful datasets: Van Pelt, Parker, Lyons

5. How does species composition, density and distribution affect stand-level water use.
Useful datasets: Shaw.

6. Is annual transpiration from plant canopies affected by the structural organization of species mixtures independently from the overall composition?
Useful datasets: Shaw.

Question by Judy
How can we go about getting these questions? What else can we do after looking at data descriptions? Example of group at NASA looking at light reflectance coming back from trees and what their structure might be.

What kind of queries would researchers want to do and how long would it take to do scaling? Time is extremely variable. Some datasets are very easy to plug into, while others are more difficult or may not be so comparable. Some may need additional data collection to join data sets.

Jess- will likely be classes of projects
1. Adding detail to something simple
2. Extrapolation to a larger region (use tool and known info)

Bob van Pelt
Suggested involved by Steve Sillett who has lots of canopy data. Bob has worked w/him on structural datasets in the past on Redwoods. Many of his ideas are stemming from that early work where they mapped (measured all vertical stems) redwood trees. The products were crown maps from ground by species. Crown distribution map -EPIC forest very complex. Fifteen trees- some have dozens to 100 trunks- called reiterations- repeat of original tree architecture. Most complicated tree had over 200 'stems' with fusions everywhere leading to 'jungle gym networks'. Up to six orders of trees. Mapped in three dimensions. Graphed distribution of stems by height. Other studies of ecological processes were studied on some of the same trees. Also looking at chronosequence. Commented on the evidently huge difference between tree farms and cathedral forest, yet there has been little study.

Interested in how the epicormic phenomenon occurs in old forests and how this affects structure and function. Described Canopy Structural Development in Douglas-fir forests- picked eight sites in WA/OR. Youngest stand fifty years old. Complexity develops in mid age. Characterized over 2500 trees. In OG forest can't use simple model- little correlation between branch parameters and foliage because most foliage is on epicormic buds, not main branches. Branches too complex in OG forest to use ground based techniques, so they have climbed some.

Foliage Distribution- used cross calibration technique to estimate foliage and other parameters. Also examined patterns of epicormic branches. In Australia developed new protocols for eucalyptus. Understory very dense- can't see anything from ground. The mid-zone has almost no foliage, but does have photosynthetic bark. Upper canopy much higher. Sub-sampled have foliage volume and surface area. Have branch mass. They were physically unable to get to very end-access dependent. Data used to develop canopy profile. Almost inverse of Douglas fir.

Rather than generate specific mix and match ideas, Bob wanted to emphasize that he has structural data available and that the trees he studied are still rigged for others to use.
.
Question by Jess - You chose certain variables to collect, did you know why you were collecting that specific data? Response- wanted most robust structural data. For instance, if Betsy used Van Pelt data she would be able to do so much more with their data on forest structure because it is spatially explicit. He wanted to try and collect as much data as possible in case it was needed later. His project wants to do a 1000-year chronosequece to look at Douglas fir processes over time and how the relationship of structure and function develops over time. Need to collect structure and function data over time at different stand ages. Jess cautioned that the motivation for collecting data affects utility of data.


Hiroaki
His project is on age-related development of crown structure in coastal Douglas fir trees. It was done at the Tomakomai Research Station, Hokkaido Univ.(SPELLING?)
1. How does old-growth crown form develop? Compared 40 yr. old Douglas fir and 450 yr. Old Douglas fir. Structurally, how do they go from pointed crown top, to the more rounded crown top typical of older trees?

Background
Ecological functions of old trees for epiphyte establishment
-Create diverse canopies and understory environments
-Creates diverse habitats for animals
-Provide substrate for epiphytes

Important processes in crown development
-Branch growth, death, and epicormic branching

Crown development concept
-Layering of branch cohorts. Normally, top is youngest, but many OG trees have epicormics, which can occur at any location, so vertical branch height may not be the best indicator of age. Young tree has similar cohort in mid canopy, at lower level than in old tree, when this structure would be found in upper canopy (?)

Data from WRCCRF
Accessed trees using a tower or single rope technique. Measured branch height above ground, length and diameter. Crown profile diagrams with branch diameter on y-axis show how distributed over time.

Young trees have lots variability and lots of branches. Older trees typically have a shell of older branches, but epicormics fill in inner crown.

Progression from upper to lower crown in different age classes is similar to size distribution during stand development. Epicormic branches fill in smaller size class for OG trees.

The model developed is a statistical approach to stand development, using the relationship of mean branch size to vertical distribution along depths. As a result of vertical change in mean branch size and branch number, branch biomass shifts upward with increasing tree age. Self pruning curve- for 20 yr. old trees, progression through density and size- in upper crown no branch pruning, have increase in mean size. Once pruning starts there is a progression toward the lower crown. Similar in 40 yr. old trees. In OG trees there is self-pruning throughout the crown. In old trees, biggest old branches create a top-heavy tree.

Barbara comment- the way Horaki looks at data is completely different then Parker- almost flip side of looking at it because it is a relative curve, not absolute. Older stands have more areas exposed to less light overall than young stands. Although the top part still has full exposure, old stands have a lot more trees with less light in the understory. In young stands, there isn't' as much understory and isn't as much diversity in the number of trees with limited light exposure.


Branch density- decreases exponentially. Get self thinning curve not line.

Interested in crown development and interactions at the stand level. Also interested in how old trees are maintained at smaller scales. Identified clusters of shoots and units of renewal that are repeated throughout the crown to continue development. Also individual needle and shoot changes throughout the crown are also examined- with B. Bond. Across scales, - defining units depends on processes looking at.
Within-branch data mix n match- should be process derived by scientific questions.

Within Branch
- Branch growth rate
- Shoot number
- Age structure
- Shoot dynamics

Crown Development
- Branch growth
- Branch death
- Epicormic branching

Stand level data
- Local light environment
- Local density
- Crown spacing
- Neighboring tree species

Could get help from computer scientists to get the data that is at the right scale. Researches often don't want ALL the data. Need to get the right data for the right scale, without have to spend a huge amount of time picking through the data. Tools, queries or a better interface to do the data manipulation would be great.

Nalini- idea is that you could query all the data to get the pieces you want, based on parameters you select. Hard part is to identify the parameters. However, queries will be question driven and therefore specific. Hiroaki thinks there is a need for human interface. Some problems are more metadata problems rather than database problems- reference to data conversions. Need someone who knows database and ecology. Lois- crossing scale requires aggregation- this one involves scaling up and down.

Betsy presented- Nalini took notes


Discussion

Datagaps- what are missing tools?

Jess- data does not generate hypothesis, but will inform you. Should collect dataset to answer a question. Lots of discussion about whether "extra" data should be collected without being tied to a specific question (see below for more discussion).

Nalini, if you had an opportunity to use others data, you could do a refinement, or addition, or expansion. Still need to know what motivation is of the data that was collected.

Judy- from profiles, was it clear why certain data was collected? Hiroaki's profile was very clear; others weren't.

Barbara- by training and intuition agree that data should be collected in response to hypothesis; but can be useful sometimes to collect data. Example, in a project she was working on, they were able to make hypothesis on stream flow and forest aging using someone else's long-term baseline data. She never did anything with stream flow before, but found it a great resource to make data-free predictions (hypothesize and generate predictions). Went to LTER and gave them their curve. LTER looked at their data and they matched the curve. If this data hadn't been available, this whole new branch of research would not have been possible.

Nalini- we need to examine usefulness of the database and data collection.

Bob Van Pelt
In defense of his project, did bring in his own hypothesis. (From first study w/Malcom, structure data was collected, but something was missing, some of the reason for collecting so much data on x,y,z data).

Judy
Wanted a range or representation of canopy structure for a range of forest types.

Jess
How things work- works independently of data. Thinks just putting together unlike data does not necessarily tell you anything about how they are related or work together

Hiroaki-
If data collected using specific protocol, it can be more useful to archive and combine unrelated data. But if it wasn't collected with a common purpose, it is more un-alike and can be influenced by the purpose for which it was collected. Because data has signature, how could you erase the signature when you combine them appropriately? After this has been done, it will be a lot easier.

Nalini
At Wind River and Monteverde there is so much data but people still are not putting it together. Still can't overlay data. Obviously more tools are needed to do this.

Robert
Ontology- struggling to put our finger on it- how describe the world around you. In informatics- formal description of data. Ex. Someone talking about trees, may not have same ontology to discuss branches; someone else comes along, but is looking at a stand, not as branches, but as layers with spaces. Need to formalize what we want to talk about. Lost luggage, is one way to go about it- define common terms. Maybe need to move down a level, not stand level, but go to lower level of points with xy coordinates and build up to compound structure. Need someone who has formalized ontologies in another sector.

Lois- started with Lyons' data and identified which was structure data and which was paint (e.g. epiphytes). Led to idea of lost loggage and defining components.

There seem to be the lines and points method along with the raster based structure. Perhaps one solution would be to crystallize these structural types and formalize ontology.

Nalini
Example based on Jess's comment. Different organisms or researchers look at same branch and measure distance differently. Example someone looking at distance a bird has to go from branch to branch to get a bug is very different from the distance an ant would travel to get between those two branches.

Judy
Something in data that is more abstract that would allow you to transform the datasets, if you really understood your data, but getting this information out of the study is difficult. Idea behind templates is to break up ontology into discrete components that others could understand so they could figure out how they could use the data. Not always a question of just data, but a question of the studies, and really understanding someone else's studies and how it could relate to your own- then would be the point to look at the data.

Difference between fishing for data and using data to enhance your project or answering a new question. Don't want to let the way the data is stored be a hindrance. Need for scientific integrity. Some facilitating this question, some want to ask questions. Protocols could establish basic measurements that will allow data to be more useful to individual researches and to others using the data. Connectivity data is important- xyz.

BREAK FOR LUNCH

Meeting at 1:45 at CAL

Big Canopy Database
Nalini
Introduction on BCD and leading to usability testing orientation.

Mission Statement
Listing of useful websites
Test procedures- to see how well the database works, and whether it is easy to use.


Questions/Comments
Bob- Our study not listed (KCTS) under research projects- have to go to databank to see it.

Robert- lots of links that don't go anywhere- suggest not allowing the link to be active until the site is filled. Others are dead ends (e.g. going to another website and can't get back).

With research contacts, had blank screen and had to type in a name. Assumes you know names. Would be useful to have a list of all the contacts, even though it is long. User needs to have some clue.

Databases embedded in some of the pages are very reasonable and seem to work ok.

Would be helpful to link info about people to sites. E.g. under site, have a list of all names of people who are working at that site.

No connection yet to link authors and subject categories yet.

No 'browse' capacity. Lots of times this info is just found by accidentally looking around. May be good to have some browse capacity. Literature is always behind, so need to have more active links to Internet. Addressing now through students who are supposed to be googling the web and putting on BCD. Secondly, once people are using BCD, there will be a mechanism so that people can enter their own information on articles, meeting events etc. Relying on canopy community plus web-meisters to keep it updated. Could be moderated. Would be hard to have automatic web-crawler because sometimes too many things come up with keywords. E.g. automatic search to bring in daily links to 'canopies' might bring in all sorts of information

It may be more efficient to use existing individual's website to harvest new info, for researchers that would never spend the time to update their own information. Was having problem with Netscape, think it is fixed. There was a suggestion to put a note on the page that says 'if you are having trouble, it may be due to….

Damage and access section
Right now, it seems to have 'chat room' atmosphere- is that the way it will continue? May set the tone for other activities and seems a little inconsistent with the scientific nature of the study. Could use moderator to make sure incoming info is more consistent and scholarly.


JUDY Canopy databank introduction
Vision
Querying Research Study Collection
Browsing those datasets (data and metadata)
Downloading data sets
Data access policy

Browse to look at sample data; metadata (at study level) and detailed information about the variables themselves. Will have options to download the data. Ultimately need data access policies. E.g. if download data, what kind of policy would make you comfortable? Should be able to protect from other people. If not yet complete, how much would you be willing to let people see parts of your data?

Objective to provide ecologists with a lot of integrated data.
But access is on the first step, also need:
- Query and analysis-ecologist as programmers?
- Metadata-why is it so hard to get?
- Database- with data set) design ecologists as database programmers
Can we get:
-Metadata
-Common data elements for linking
-Tools

How get metadata as a by-product of increased research productivity?
Canopy Pathfinder projects (ecologists do multi-site field research with database support and study and publish data sets)
The BCD (research reference)
Databank (the project is to help people design datasets; publish them; and query data)
Within-lab research (working w/Barbara- how collaborate within lab? Students have local share repository where data is stored AND documented)
Spatial Database research


BCD- domain specific research reference overhead

One part of sequence- locate where are the data and metadata bottlenecks?
Study design -fieldwork- data entry/verification- analysis-sharing w/in group-journal publication-data archiving- data mining.

Judy walk through of canopy databank homepage
Distinction between "project" and "study" as used in databank and defined by LTER

Study -one piece of work associated with a single body of data (e.g. masters thesis)
Project-looser type collection of studies organized to answer related questions or where researchers are going to share metadata. (e.g. WESTGEC at WRCCRF)

Click on search or recent study- 1000-year chronosequence (listed as study, but Bob thinks it is more of a project with about half a dozen studies in it). Judy walked through study information provided in database.

Would be useful to be able to click on name of the individual field and get the metadata or definition of that individual field. (In the same way you can click on "data" or "table information" to get detailed information). Data is in Excel or Access format- sequel format can provide in certain format.

Can download metadata- report like form in word that allows you to read through. Now, metadata is what user provides, would be nice to have common format that could be used for a standard metadata report. If someone is interested in or willing to use database design someone else made (e.g. common variable such as 'photosynthesis), would be easier to design metadata. Metadata available in 2 forms (1. machine process metadata; 2. narrative)

Are working closely with LTER in Corvalis and others to try and standardize metadata (MBII and USGS and other standards committee). Would allow us to take their data and they could take ours. Working on it.

Judy reiterated that his is in draft format and want input on what would be useful. There was considerable comment and discussion about the word "metadata" and the fact that the info that follows it is really narrative, not metadata. Question- Could we use 2 different words for the types of metadata? There is metadata that is more of a description of the project and how it was done. Then there is more technical metadata (field descriptions; characteristic; spaces; integers vs. real numbers; accuracy etc.) Some think of metadata that can actually be manipulated.

Working on cross-queries. E.g. Want all stem data for 3 different projects- would be able to do cross level query without having to get all the data from all 3 projects. Possible on-line graphics. How get data in? A metadata form will be used- can see tomorrow. Can do on-line also.

Are researchers encouraged to follow standards of FGDC? Not many people knew what FGDC was, but after explanation most thought they were compliant with those standards, but there were a few layers in between.

Can download dataset. E.g. as zipped file- Access file. If submitted as Excel, will be able to download as Excel file. Judy showed Access database design view with all the tables and connections. Many scientists want to look at data in one big table. If scientists knew how data was connected, they could export it that way. Or some people may only be interested in a portion of the data-need to create a data form view where only some variables could be taken out and put into a form.

Judy overhead on canopy databank cross study queries-diagram of data integration to get data from databank into format researchers want.

Question
Data access policy and technology overhead
LTER- passes on dead datasets. If people publish datasets that aren't completely milked, want assurances for access control or security. You may want to have the data on the web for your partners to look at, but not for everyone.

Through 2 systems of log-ons (user or guest), the viewing and access of data would be customized. Among users, there are multiple roles that would enable different permissions (e.g. PI vs. researcher vs. collaborator). E.g. may get permission to view, but not download.

Barbara- Thought this was making more work than necessary. Her comments were intended not to be critical, but to try to save us time. Collaboration is more often done scientist to scientist. Real value more between researcher and someone they don't know- all layers from one level. Assumption is that by the time someone is willing to release their data to BCD, they aren't going to be as concerned about access-it is now already in the public domain.

Institutional question- do you want grid where everything is available or do you want to work on a peer level where people just call for it?

Nalini thinks it really useful at an intermediate level. Not dead data, but not raw data.

Some might be concerned about releasing data mid-process because there are so many iterations and changes, and errors are made etc. Barbara cautioned she would not release raw data and does not want to pre-announce existence of her data -too many people would try to contact her early on asking about data-she doesn't want to talk with them

XML web services- Bill Gates mentioned- reality in a few years (???)

If researcher wants to set up project, project set up with short identifier. The PI can add info.

Last item of the day:
Nalini- started exercise that will be continued tomorrow…

Formulate a specific question that requires more data or a transformation of data from another study identified today. You can make assumptions about new data you would need to collect.

Friday, April 26, 2002

Nalini -outlines activities for the day:

Presentations: Visualizations; Statistical Analyses; Discussion Modeling Studies
Break for Lunch
Usability session
----------

Review yesterday's work:
Where interesting questions have arisen?
Barbara
How does the interaction of down welling light from above and upwelling water from below determine crown development? Use: BB, JP, BL, RJ data. Description of hydraulic architecture. Start with Bob's tree structure; model hydraulic architecture (simile); need sapflow; aggregate trees with stem map into community (xy coordinate); pull in light distribution of Jess; use model in spatially explicit way. Rely on light measurement of Jess; D. Shaw spatial data at the stand level; xyz within-tree data from Van Pelt. The key would be to develop the right algorithm saying when branch should grow under x light conditions and with so much water. (In Hiroaki's study, growth based on light; doesn't include water). Can't have big branches at top because don't have enough- light and water are limiting factors. Interaction between light and water availability controls demography of branches across branches and stands- e.g. crown structure. A huge factor not included is wind and other mechanical constraints. Assumption is that nutrients aren't a major factor or could add nutrient info as needed to data. Lifespan- 1000 yr timeframe. No doubt nutrients determine forest productivity, but in chronosequence very different from light and water. Nutrients are more of a chronic noise and constraint- but not an age-related phenomenon. Light and water are correlated with age.

How use Bob's xyz based tree; put in assumptions about water and light; run model; see if it's the same as what Bob gets in chronosequence. Barbara's question with Rocky's model. Would have to get statistics and an aggregate measure in order to compare Bob's data with model. If happy to do comparison with aggregate measure, why need to model? After about 200 years trees very individualistic and the model doesn't work so well. Is pattern at stand level, but not within individual trees? If s, why model to that level. Model with points and segments because stochastic variation is at the branch level. Have allometric growth at beginning, but then fall off in random way. Perhaps add noise to the model (e.g. lightning). Wouldn't create Bob's trees, but could model equally unique trees. Could we model and then compare to bob's actual trees to see if statistically similar enough for study? Need to figure out how you are going to develop model first, and then worry about available data.

Next question- if we didn't have workshop, how would Barbara or outside researcher know that she could link up with this data? Could we broadcast that this data is available? SERC data table of contents- metadata strategy at SERC- maybe could advertise at a Clearinghouse website. NBII metadata standard used and some of SERC's data is advertised. Just has metadata and links back to their own site. More data is available to SERC- 2 levels of sharing- public and internal. If officially cataloged, the data can be found on web- DTOC (LTER's data table of contents). Other examples of well know data exchanges are the tree ring website and CO2 site.

Jess, Robert and Lois
(Should get overheads, because I couldn't write it all)

Jess- Fundamentally different ways of thinking about structure.
1. Complex Environmental Gradient (CEG) (light; wet, hot/cold)
2. Medium of Radiation/Material. Exchange (MRE)
3. Mosaic of Climate Structure (TYP)
4. Reticulum of Voids (VOD)- spaces interconnected (tunnels- cave systems)
5. Topographic surface xyz (OCT)
6. Continuous medium in 3d (C3D)
7. Community of Leaves (COL) (surfaces affected by processes at locations
8. Network of Surfaces (NOS) (dendrite structure location xyz connectivity)

What's important is to figure out how these different structures are related. Easier to make connections between data whose underlying structure is more similar.

Diagram of how close they are. Also matrix

CEG MRE TYP VOD C3D OCT NOS COL
CEG yes yes no yes ? no ?
MRE yes ? yes yes ? ?
TYP yes yes no ? ?
VOD yes no ? ?
C3D no ? ?
OCT no no
NOS ?
COL

3d structure can be linked to a lot of them

Can't think about how making connections useful. Maybe we can't combine everything- maybe there is no "white rat" of canopy structure. Recognize diversity of measurements is important. Logical next step is how to see how they are connected.

In terms of database technology, we would like to find connections between these and ways of brining the data together. This would provide the MECHANISMS for combining any data or unlike data. This does not mean researchers would combine unrelated data together and just go mining for interesting things. They could still be hypothesis driven, but there would be tools available for whatever hypothesis driven question they want to answer.

Lay out geometry for things to be located in 3d space. Defining possibility for measurements to be located or co-located.

List of 8 terms with pictorial diagrams- what else do we have or need to collapse or expand to make a structure? There is a disconnect between communities of researchers because they have different views of structure. Need to ensure that they are equally 'right' and that one way is not better or worse. There was also a reminded to remember that this issues is not unique to canopy science and that others have successfully dealt with this before.

What mechanisms for doing this? Ontology? Conceptual Design? Common Protocols? Vocabulary? How would proposal be written? The idea is central to proposal, but work should be driven by specific scientific question. Take data in one of those different forms (a particular project) and use as studies- would benefit two or three studies.

1st step- realize there is a problem.

What about things that aren't spatial explicit?

Bram
Visualization Overview- Slides/overheads
All these will be available on BCD

Mike
Demo on IDL
Used Bob's data and simplest protocols
Tree diameters; branches; branch height, length, angle and azimuth. Foliage was considered as a proportion of foliage on branch. Didn't have to manipulate data much to get into format for model.

Now, can't query on model or click on branch, but can rotate it. Can't grow, but could make a new tree based on assumptions and new data about how it could grow.

Example of program with pig heart-dendritic type model in 3D space. Process involved filling the heart with dye; freezing it; sectioning it; and taking digital images (green dots are where arteries were).

Discussion of visualization tools
Buying a visualization package is a big commitment to make. Researchers need someone with tech experience to be very good. NASA visualization tools- should make connections as they have well-developed tools. NASA tools- data had to do more with surfaces and rendering- hard shells etc., looking at continuous densities and iso-volumes is in development, but not quite there yet. Los Altos?- visualization tools that are more related to what we need. Problem is that technology transfer may be an issue- not every one has access to big computers. Nalini wants to make sure the average researcher could use, afford and/or have access to. Or they could work to support a national center e.g. NCES? That could work with people on visualization-support or technology center.

BREAK

Meeting at 1:45 at CAL

Intro Travis
TESC graduate- now OGI computer scientist grad student. Working with B. Bond and Judy.

Analyses

Description (e.g. # home runs)
Exploration (e.g. multivariate analysis)
Hypothesis testing
Prediction (modeling too)

Classes appropriate to canopy studies:
Sampling- how can you design schemes so that all units have equal chance of being sampled? E.g. How can you do a random sampling of tree crown? Do we break trees down into parts, lay in a line and randomly select points then put back vs. treating area around tree as cube, randomly select points and then disregard ones that don't fall on tree?

Billy Ellison (Humbolt University) tried to address this issue in a complete inventory of epiphytes on Sitka spruce. Measured everything and then sub-sampled.

Often too few in sample size- e.g. WRCCRF small part of universe- more sub-replication within size rather than multiple samples. Do canopy biologists need to redefine what N is for a particular study to replicate appropriately? Access always precludes true randomness. There is always a trade-off between access and ideal sampling. If variability is not at tree level, do we really need to measure lots of trees? There is a need to work on strategies for choosing samples. How can we ascertain the statistically accurate level? Define experimental unit based on what you know you can do. Given that we have datasets like Bob's, we can figure out what level of sub-sampling needs to be done. Standard statistics developed for simpler systems- difficult to force ecological study into agriculture box. Need to look at different approaches to analysis. There are lots of exclusions- typically are random exclusions and therefore not so biased. Some exclusions are biased (e.g. inner branches, or branches in dense area where gondola can't go.

Link between results and conclusions-be honest about exactly what you have and what it means. Can deal with this issue in the text by explaining how you are drawing references. Ideal sampling should be thought of first. The statistical design should be done without thinking of access as an issue. It can still be adjust accordingly. (Nalini- "ideal" sampling is taking the least amount of measurements to get the information you need).

Techniques for stratify random sampling based on knowledge of conditions. E.g. if you know there is a light affect, you can do random sampling within light zones. Different sampling schemes, with rigorously statistical standard, would be dependent upon disciplines. Stratification of randomness will vary depending on what you are studying, so if Bob's data is used to figure out the minimum sample size for structure studies- this minimum size might not be accurate for something Barbara Bond is doing-she is looking at structure a different way.

Betsy comment- perhaps could use the 8 structure types and get minimum sample size for different types of structural analyses.

Can't remember who was talking here- could get their overheads and compare with notes below.
Representativeness
What does my data apply to?
Encourage research at level to try and differentiate at spatial scales.

Meta analysis
Process based modeling
Uncertainty analysis

Get around representativeness by meta analyses- look at variety of studies as variable units-almost case studies. Were results repeated- lend credit when combined with other. Process based modeling is another approach- try to quantify theory to see how you approach compares to other studies-NEEDS TO BECOME MORE STATISTICALLY RIGOROUS. Uncertainty analysis- Ex. carbon cycling the way that Mark Harmon did it.

Need to examine specificity of model based on how looking at statistics.
Different ontologies may require different approaches including statistical approaches.

Tools not exploited yet
Spatial statistics
-Geostatistics (semivariogram- variety of types-average distance between things you measure as ? Can tell you how to sample if it requires statistical independence of unit. Ex from Bob- 4 m determined to be appropriate scale, so measuring more closely doesn't provide more info))
- Spatial series analysis (not often used in 3d, but no reason it can't be extended) Relevant to tree precipitation/sunlight
- Time series analysis
- Spatial point pattern analysis
o Ripley's
o Nearest neighbor

Hiroaki "spatial analysis overhead
2d approach- he would like 3d version

Vertical distribution of canopy trees
Foliage in "Cube" as measured at Japanese crane. He's not sure how to analyses- figure "throws away" foliage, but crunched into height diagram (e.g. total leaf area vs. height diagram). Doing in 2d slices of 1-d stratification because they don't know how to do point pattern analysis in 3d.

When distinguish if modes observed are real. Need 3d to understand PROCESS that creates pattern that can be observed in 1 or 2 d.

Non-parametric
Jess makes a plea to consider non-parametric statistics as descriptors (e.g. light in forest either lots or little, not so much that is intermediate in intensity).

Distribution free methods
-Jackknife and bootstrap
-Baysian analysis

Simulations- Has anyone made 'fake' forests and tried different sampling methods to test effectiveness or optimizing sampling?

MODELING STUDIES
Robert - Presentation
1 Methodology of ecological modeling --support declarative modeling- supported by appropriate CAD tools: Simile

Simile- Tools for designing a simulation model. It is similar to an electrical diagram. Think quasi-mathematical structure. Has notion of objects.

Ex. Boxes= compartments; circles/arrows-flows; Each flow could be modeled differently. Have variables-can be linked with 'influence' arrow (defined by calculation). Variables can 'influence processes. Each box, variable or flow defined by equations. So far similar with other models.

What is unique is that it can all be wrapped up in an envelope- the whole thing can be seen as an object- like a sub-model. Then you can define properties for the whole sub-model and can say you have X number of these. Choice of language to run program in Tcl or C. Select time parameters over which the model would run. Result is a program. Variety of display tools. For example-tree growth over time, plotted on xy coordinate space.

More interesting visualizations are of whole populations of trees with sub-models. Specify number trees to start, new ones start from integration of outside forces; new trees from internal processes- self reproduction; can make it so each tree has 2 trees/yr. or so that the stand generates 2 trees/yr.; can have mortality; add attributes based on location in space. Can have sub-model to show how trees are affected by other trees-e.g. tree shading.

More complicated model developed from 3pg- can bring in other modes and recreate in Simile.

Simile is free on web. Working on commercial version but will keep free one too. Can view Tcl code for model once it is made, but you have to go into Windows temp file before exiting. He wouldn't recommend looking at though because it is not designed for people to look at and read. Robert wrote separate code to take text info generated for each simile file and translate into html- metadata type information and description of model and variables is hyperlinked.

There was a lot of interest among canopy researchers in Simile.


2 Information flow in ecological research

Data Destiny Diagram- diagrammatic view (similar to access database design) that shows what data collected. Conceptualized view of methods and study design. Great way to visually show what a study included and what data was examined. Color-coded to show what data has already been collected and what needs to be done. As data collected and added to database, circles are filled in with red.

Judy
There is a good possibility that computer scientists can work with researchers to design databases and studies before the initiation of research. This would be useful as tracking system for data.


3 e-science and ecology specific Grids

Foster and Kesselman, 1999 book that started the idea. The word Grid is chosen by analogy with the electric power grid, which provides pervasive access to power and like the computer and a small number of other advances, has had a dramatic impact on human capabilities and society. We believe that by providing pervasive, dependable, consistent and inexpensive access to advanced computational capabilities, databases, sensors and people, computation grids will have a similar transformation effect…..

Refers to computing in distributed network-Couldn't get text in overheads from Robert. Too quick.

NASA info power grid. Overall motivation for GRIDS is to facilitate the routine interactions of these resources in order to support largee0scale science and engineering. The grid part comes in at software level- e.g. you will belong to certain group and have certification of range of tools. Can get other tools to act on your behalf through log-on processes. (Canopy science is a little different from other grid users in that we typically don't have huge databases requiring supercomputing).

Example-coupled ocean/atmospheres

Grid more about processing and tools, where as Web supports more of data retrieval.
Currently Grid tools provide low level of operational control. Judy and Robert trying to get proposal together for canopy grid.

What sense of state of the art canopy studies- are we good candidates for grids?
Robert's response: We have raw material and analysis which is good, but need for paradigm shift to go forward. We need to think of databases being at pooled site- pre-grid thinking. Need to work on grid connections to control data rather than having a shared site.
Is it better to jump directly to a grid? Robert's Response: Could set self up as node on grid and deliver services. Eventually grid could grow from this node. Grid would avoid need for everyone send data to a central location.