Section 1: Database Design with Templates

Introduction

The DataBank Database Designer is intended to facilitate the creation of databases for storing ecological field data. Rather than construct the various tables of a relational database by hand, column by column, the user of the Database Designer application creates a database by laying together one or more pre-existing database components that have been built to represent objects encountered in the user's domain of research. At the lowest level, each of these components consists simply of a number of attributes, where each attribute corresponds to some property of the object represented by the database component, such as its diameter (for the branch of a tree, say), or its species (for the tree itself).

A number of advantages over the manual construction method may be obtained.

Frequently-occurring database pieces can be defined once, then used in multiple separate databases, with very little additional effort required from the user subsequent to the orignal act of definition.
A set of attributes that are regularly observed and recorded for more than one type of object under study can be isolated and stored as a free-standing unit. Database components representing objects on which those attributes are present can then indicate the presence of the attributes by a single reference to the stored attribute set -- the individual attributes needn't be repeated by hand for each type of object that needs them.
The lower-level details of the final database needn't be kept constantly in mind while the design is being laid out. In particular, the user doesn't need to concern him- or herself with how the several tables in the generated database will be related to each other through keys, since the necessary relationships are determined by the application, based upon the particular combination of components that the user has used in the design.
More than just a database can be produced from a user's design. For example, the Designer is able to produce Ecological Metadata Language (EML) documents that describe the database designed by the user.

In the world of the Database Designer, the "pre-existing database components" we have just introduced are called templates. The Designer is distributed with a number of templates that may be useful to forest ecologists, and, through the included template editor, any user can create additional templates that are valuable in his or her particular situation.

Note: Although we have said that the Database Designer is intended for use by those needing to create databases for storing ecological data, the fact that users are free to create templates of whatever sort they please means that the application need not, in fact, be limited to use in designing and creating only one kind of database. Since our audience consists primarily of ecologists, however, and since this version of the Designer is distributed with a catalog of ecology-related templates, we will proceed in this guide without any further consideration of what usefulness the application might have to those working in other fields.

Entities and Observations

Having introduced the idea of a DataBank template in general, we'll now move on to discuss the two primary categories of template present in the Database Designer. Each of the templates that is distributed with the Database Designer, and any template that is created by a user of the application falls into one or the other of these two categories. The categories are that of entity templates and that of observation templates.

Entity templates are database components that are intended to represent particular kinds of objects from a research domain (here, forest canopy ecology). For example, among the templates that are distributed with the Designer are a number of entity templates representing tree branches. Someone putting together a database for a study that entails measurements being taken of a tree's branches might browse through this group of templates in search of a branch template that meets his or her needs. Each of the various branch templates in the group is composed of a slightly different set of attributes, reflective of the fact that the particular properties of a branch that are important to a researcher change with the intent of the study.

What happens, however, if no template in the catalog is entirely satisfactory? What if none of the provided branch templates possesses all of the attributes that a researcher intends to measure on the branches encountered in the field, or possesses some of the attributes and not others? For example, perhaps the branch template that comes closest to satisfying the researcher's needs does not provide for a place to record measurements taken of a branch's taper. In that case, a separate observation template may exist that encapsulates those taper-related attributes that are lacking in the branch template. An instance of this observation template could then be added to the design and connected to an instance of the best-fit branch entity template selected earlier. Effectively, the attributes of the taper observation would then be added to the set of attributes of the branch entity and the desired result would be achieved: the part of the database created to store branch data would contain space for storing branch taper information, in addition to whatever else.

Of course, if the desired observation template did not already exist in the catalog, the researcher might elect to create it with the template editor. An advantage to simply creating the needed observation, rather than creating another complete branch template comprising the additional attributes, is that in the former case the new attributes remain available for reuse on other types of entity. For example, the researcher may at some time wish to add the taper observation to an entity template representing the stem of a tree. Separating common sets of attributes from specific entity templates may thus enable savings of time and effort to be had in the future. Experience and experimentation with the Database Designer will refine one's sense for how to use templates well.

To summarize, the fundamental difference between entity templates and observation tempates is this: an entity template represents a complete, self-contained object that is encountered in a research domain, whereas an observation template exists as a "disembodied" set of attributes. Pragmatically stated, the consequence of this distinction is that an instance of an observation template can never occur in a design disconnected from all other database components present therein, since, being but an airy collection of attributes, it itself has no substance. Therefore, an observation always appears as if a satellite in orbit around some particular entity. Entity templates form the base structure of a design, observation templates augment that structure with additional attributes.

The next subsection of the guide presents the feature of entity templates that enables one to safely forget about table relationships while designing a database, and thus to make real the advantage of point three.

Relationships between Entities

We have explained that an observation template is always related to an entity template, and that the effect of this relationship is that the attributes contained within the observation template are added to the set of attributes possessed by the entity template. We will see now that it is also possible for two entity templates to be related to each other, but that the meaning of this relationship is somewhat different from the meaning of an entity-observation relationship. Consider again our branch templates. Perhaps it is the case that every branch about which data are recorded is connected to some tree, and that certain data on the stem of this tree must also be recorded in the database. Furthermore, the database must maintain an association between each branch and the tree it is a part of. These conditions could be expressed by stating that the existence of a branch template in a database design means that there must also exist a stem template in that design, or else the design is not valid, and that some way to keep track of the relationship between specific stems and specific branches must be at hand in the generated database.

We can state this in the world of the Database Designer by giving the branch template a dependency on the stem template. The application will then ensure that an instance of the depended-on stem template occurs in a design whenever an instance of the dependent branch template occurs there. When the final database is produced according to the design, the table generated to store data collected on the object represented by the dependent branch template will have a foreign key pointing to the table generated for that stem template in the design that satisfies the dependency.

Note: It should be mentioned here that there are two kinds of dependencies that an entity template can possess; it can have a specific dependency on a particular entity template, or a general dependency on a particular group of entity templates. Discussion of the details of these two types of dependency is deferred until the section on the designer's workspace (section 6).

[ Table of Contents ] -- Section 2: Acquiring and Running the Database Designer

SciDB

Databank

User Login

Section 1: Database Design with Templates