Friday, August 10, 2012

The Blog of an Ecologist Dog

By Snickers

This summer, my mom takes me to work with her. She is a "research mentor," whatever that means. We go to Harvard Forest several times a week. I am very excited about going there because I am never alone. I usually stay by the table where my mom works and people come to pat me from time to time. When 12:00 pm comes, I start wagging my tail with excitement because I know it’s time to go on a walk. I love walking through the forest. When I come back from the walk, I want to make friends with the cows in the pasture. However, Mom always gets upset when I get too close to them.

After lunch, my friends come to play with me. I love their company and they are always so excited to see me. I am pretty sad that the summer is almost over because I will miss my friends. I wish I could talk just so I could thank all of them for this amazing summer!

RESEARCH PROFILE: Visualization Tools for Digital Dataset Derivation Graphs

By Miruna Oprescu

If you were a scientist working with more than 10,000 new data points every week, how well would you be able to keep track of all the changes you made to the data to obtain the final results? Moreover, if you were to look at your research 5 to 10 years from now, how well would you or any other scientist be able to reproduce your results from the original data? This summer, I am working with Emery Boose, researcher at Harvard Forest, Barbara Lerner, professor of Computer Science at Mt. Holyoke College and Yujia Zhou, rising senior at Dikinson College, to develop effective tools for creating, keeping and accessing a record of all the processes the data undergoes throughout a research project.

The history of data from its collection to its output as a final result is known as provenance data and it is essential for the reproducibility and validation of results. Although it is possible and quite common to manually capture provenance data through narrative description, this practice is not adequate for large datasets which undergo complex processes. In this case, we need powerful software tools for automatically capturing and storing provenance data in a digital format.

To get a better understating of some of the issues scientists have to face on a daily basis when it comes to recording provenance data, we took a closer look at two of the research projects going on here at Harvard Forest. 

First Stop: The ecology of forest watersheds on the Prospect Hill Tract

Forest watersheds play a crucial role in forest ecosystems. Any change in the quantity of water available to an ecosystem has an immediate consequence on its dynamics. For 10 weeks, we ventured into the woods and gathered about 100,000 points worth of data from six stream and wetland gauges on the Prospect Hill Tract.

We found that every week scientists have to check for equipment malfunction and sensor drift, mark or model corrupted data and keep track of all these processes … for each and every one of 10,000 data points!

Next stop:
Climate change with data from the Fisher Meteorological Station

Climate change has become a serious concern for scientists around the world. However, despite all the interest shown in this matter, many questions remain still unanswered. One of the ways in which scientists study climate dynamics at Harvard Forest is by tracking atmospheric changes using data from the Fisher Meteorological Station. This means that another 10,000 data points need to be processed every week using a similar approach with the one we used for the hydrological data. 

This summer we also had to change the sensors for the meteorological station and send the old ones back for recalibration which means that data collection was perturbed on that day. All this needs to be properly documented for future reference.  

To respond to these issues, we used Little-JIL, a graphical programming language for defining processes developed at the University of Massachusetts, Amherst, to capture and store provenance data digitally. Little-JIL allows users to define and execute a process through a Process Definition Graph (PDG), a graphical representation of all the possible ways a point in a dataset can be processed. This is an example of a PDG used to compute the stream discharge from the hydrological data collected:

Little-JIL also produces a dataset derivation Graph (DDG), an abstract mathematical object which shows how every point in a dataset was processed. However, the DDG comes in the form of numbers and names which describe how the different pieces of the processes are connected for every data point processed. This can be very difficult to interpret even for small datasets. Therefore, to make provenance data easier to understand, we developed a way to display in the form of interactive visualizations and we included several features to make DDGs of large datasets manageable. To do this, we extended Prefuse, a graphical platform written in Java which supports visualizations of data structures such as graphs and trees.

One of the most important features of this program is that it collapses or expands parts of the graph, allowing the user to focus on certain parts of the process and manage large DDGs. The light blue labels in the graph above represent a step in the PDG shown before. Once the processes contained in that step are finished, the program automatically collapses the step, thus reducing the size of the visualization and enhancing its clarity.  

Future research will focus on building queries to retrieve only the relevant information from the database storing the provenance data and will seek to display the results of these queries visually through the implementation of the software tools we developed this summer.

RESEARCH PROFILE: Trees and Bugs in Computers

By Yujia Zhou

Scientists often rely on sensors to collect data. However, sensors can go wrong due to various surprising yet possible reasons. Have you ever thought, what you would do if you lost a couple of hours’ data because a lightning destroyed the sensor? Also, your sensor may freeze during winter time due to low temperature. Moreover, certain sensors require calibration every year because of inexorable sensor drift. As a result, raw data is usually not very reliable before some special processing, or “quality control.” This summer, I worked with Dr. Emery Boose and Prof. Barbara Lerner on quality control of sensor data and data provenance tracking. In this research, we considered several possible quality control techniques including calibration, detection of irregular values, and gap filling of the missing data. Different quality control methods will generate different versions of the processed data. When datasets grow larger and time passes, it is very likely to forget which set of quality control actions have been applied to a particular version of the processed data. As a result, recording the data provenance, or the history of data, is essential. 

A Data Derivation Graph (DDG) stores detailed information about how the processed data is derived from the raw data in a special data structure called “tree”. To accomplish this goal, we built a process simulating the initial process of the raw data and reprocessing of the data when some quality control techniques improve. More specifically, we developed R programs for detecting and fixing quality control problems. Then we implemented theses R programs in both Keper and Little-JIL, two pieces of software that can record data provenance, to build identical processes and compared the DDGs generated by them. We found that although Kepler is easier to use for scientists with little programming background, Little-JIL stores more information in the DDG and produces a more legible and comprehensive graph.

Besides programming, we usually went to the 6 stream gauges in the Prospect Hill area to collect data once a week. We downloaded the data from the data logger at each stream gauge to a palm device. In addition, we took the measurements in the stream manually to check with the sensor data, detecting potential sensor failures. Certainly, there are a lot of mosquitoes in the stream area. But we appreciate this experience because it allows us to understand how sensors work in the field.

Besides the hydrological data, we also worked with the data from the Met Station. The sensors at the Met Station need to be calibrated about every two years. This summer, Mark VanScoy took us to the Met Station to take off the sensors and replace with calibrated ones. We tilted down the tower and opened the white box to unplug and re-plug those colorful wires connected to the sensors. Luckily, the cows did not bother us because they were put into the lower part of the pasture on purpose.

We spend most of our time sitting in the common room programming. It becomes very stressful when it comes to debugging. Bugs in computer programs are as annoying as, or maybe more annoying than the bugs in the field. Computer bugs refer to any problem or error in the program. However, life becomes cheerful when we have Snickers, Prof. Lerner’s super cute Bernese mountain dog, by our table. She brings so much joy and deserves all the love in the world.

Wednesday, August 8, 2012


By Tiffany Carey and Courtney Maloney

One of the many signs of Spring is the United States’ report on pollen counts across the country. These pollen counts are essential, due to the 35 million Americans who get hay fever every year from pollen. In our project, we investigated whether allergenic pollen concentrations from three ecotypes of common ragweed (Ambrosia artemisiifolia) produce more pollen in response to rising CO2 concentrations. Our objective was to test for differences in pollen production by ecotypes from these climatically distinct parts of New England. In order to predict when and where pollen allergies are most likely to increase in response to climate change, we have to determine its impact in different places.

We investigated two factors of growth and production. We analyzed the amount of pollen produced by each ecotype, and in each of the three CO2 concentrations. To do this we created a stratified random subset of approximately 90 plans out of the full experimental design. The pollen was collected and frozen from the three to five flowering spikes per plant by covering the spikes with polyethelene bags at the time of flowering until pollen was completely released and then placed in a sub-80C freezer. These plants were kept in a lab at the University of Massachusetts - Amherst (UMASS - Amherst) where we spent most of our summer. Including our mentor Kristina Stinson, the team for this project consisted of 9 people from both Harvard Forest and UMASS’ aerobiology lab.

To process the plants that were in the freezer, we extracted pollen grains from each plant with an extensive methodical procedure. For future processing, we came up with a detailed protocol to remove pollen grains from the ragweed spikes and polyethelene bag that the ragweed spike was when harvested. We constructed a solution of 12mL, comprised of distilled water and pollen from an individual ragweed spike. We used a hemocytometer to count the amount of pollen grains that were present in .5 mm3 of the solution to assess the amount of pollen produced by each ragweed plant. After harvesting the pollen, we measured the length and weighed the dry biomass of each ragweed spike. These determinants allowed us to determine the production and growth of ragweed plants in respects to CO2 and ecotype.

We also helped with the fieldwork component of the project. Even though this portion of the project was not a part of our summer’s analysis, we helped gather field data to determine the phenology of the ragweed plants in 3 temperature gradients across Massachusetts. We went to various demography plots that were in cool, warm and hot gradients and counted the number of individual ragweed plants and identify if they are flowering and/or releasing pollen.

Predicting how increased CO2 affects ragweed growth and pollen output and achieving greater understanding of how different local ecotypes respond to such changes, will better inform decisions regarding ragweed and allergenic plant policy and management. The importance of these results will only increase over the next several decades, as climate change increases the quantity and allergenicity of pollen in certain area via rises in CO2 concentrations and temperatures.

Monday, August 6, 2012


This year, the Richardson Lab of Harvard University and the Friedl lab of Boston University set out to study climate change using two different methods, remote sensing and near remote sensing. This summer, the two teams predominantly focused on honing the methods already established by other scientists to study the changing climate as well as widen the subset of biomes and localities studied.  

Near Remote Sensing to Track Changes in Phenology in Forests, Team Harvard 
By Dmitri Ilushin, Sascha Perry, and Hannah Skolnik

Team Harvard is comprised of Dmitri Ilushin of Harvard University; Sascha Perry, Lincoln University in Missouri; and Hannah Skolnik of Columbia University. We are under the direction of the Richardson Lab within the Department of Organismic & Evolutionary Biology at Harvard University. Phenology is the study of life cycles in various organisms. Specifically, we’ve been looking at tree phenology, tracking spring leaf out (when the trees start growing leaves after lying dormant all winter) and autumn senescence (when the leaves begin to change colors and drop from the trees). This summer, we have had the opportunity to challenge the question “Can webcam imagery track phenological changes?” Believe it or not, we can do just that! By looking at over 1800 different webcams with about 60 million images from all over the world, we were able to track the cameras' stability and views of vegetation. With this information, we’ve been able to track Green Chromatic Coordinate (GCC), a measure of the percentage of greenness of an area of an image. The GCC graphs yield quantitative information to track the changes of the seasons.

Below is a map of all the sites that are on the AMOS archive.

Great sites are green. Good Sites are yellow. Unusable sites are red. 

In order to parse through such a large data set, we had to come up with a protocol. Firstly, we conducted a preliminary, visual inspection of the 1879 sites with known locations. We looked through a subset of a site's images and noted FOV (field of view) shifts, when the camera moved noticeably. Based on the number of FOV shifts, we marked the sites either as stable (no movement), constant (consistently changing), or poor (too many shifts to use the site). Of the stable locations, we looked through each photo and found the exact times when FOV shifts occurred. We then made ROIs (Regions Of Interest), or selections of certain pixels of the image, that covered trees, shrubs, grass, or crops. Finally, a program was run that takes all of this information, calculates the GCC from the ROIs we made, and makes a time series (GCC graph) for each year. We decided to focus on creating GCC graphs for a vertical gradient along the Atlantic coast of the United States as well as two horizontal gradients in the US, one in the north, and one in the south.

A sample vegetated site: 

The ROIs made for this field of view: 

The resulting GCC graph calculated for one year: 

Studying phenology has been a blast. We’ve been able to study the immediate effects of recent climate phenomena. By calculating Green Chromatic Coordinate (GCC) curves and studying them across the United States, we have been able to see how these warming patterns have been affecting growing seasons across the country. What we have been doing has been on a smaller scale than satellite imagery so we can compare patterns even between species. However, by using the Archive of Many Outdoor Scenes (AMOS), we have huge spatial coverage, which allows us to compare how a certain species of tree is growing on the east coast versus the west coast, for example. Thus, webcam imagery represents a happy medium of scale. We can study the small scale and compare it to almost all other biomes and the same biome in different parts of the world.

We got to see some incredible places including the North Pole, Kenya and Japan in addition to the Continental United States. Here is a representative photo from Kenya of a water buffalo at a watering hole, one of our more interesting sites: 

MODIS Satellite Imagery as Applied to Phenological Assessment, Team BU 
By Erin Frick and Jose Luis Rugelio 

Observations of vegetation phenology can be collected not only from ground-level field studies but also space borne remote sensing instruments. In particular, satellite images may be used to assess vegetative phenophase transition dates such as spring onset, maximum vegetation cover and senescence across regional scales. One approach to such assessment entails analysis of data from the MODIS (Moderate Resolution Imaging Spectroradiometer) instrument. MODIS provides measurements of light reflectance that can be analyzed to estimate phenophase transition dates with respect to variation in land cover type. Despite the utility of the MODIS data, atmospheric factors including cloud coverage and presence of aerosols as well as land cover features such as snow and ice, can distort data received by satellites, rendering this information unreliable. Differences between the outputs of various MODIS satellite products are also a result of factors including the timing of the satellites’ orbits, intervals between image collection and the way in which the reflectance values are processed. These anomalies result in each product yielding a varying quantity of valid data. A goal in remote sensing research is determining which MODIS product contains the greatest amount of high quality information to be used to accurately calculate phenophase transition dates. In order to assess the differences between and overall utility of these MODIS products, we have created a variety of plots and maps which address: NDVI (normalized difference vegetation index), EVI (enhanced vegetation index), date of spring onset, land cover variation, differing levels of quality control, comparisons across time series within and between years, and quantity/location of ‘good’ data— all of which illuminate the unique strengths and weakness of the MODIS products. 

In addition, team BU has been analyzing data from satellite imagery and the resulting picture tiles. Pixel values can tell us the date of Green Onset and yearly trends for entire tiles, corridors, and even single pixels. We opened MODIS 12Q2 tiles in ArcMap, which is a great tool for working with these images. We then created six buffer zones around the city limits of the metropolitan area of Boston, Massachusetts (1-3km, 2-2km, 3-3km, 4-2km, 5-5km, 6-5km). Then, MatLab was used to produce graphs for 10 years (2001 - 2010) and then compared and analyzed to see what trends were present. We found that standard deviation declines as you leave the city and enter more rural and stable forests and this is apparent in all the buffer zones. The reasoning behind this is that the forest is more stable and deviates less. In contrast, as you get closer to Boston, the deviation is higher due to temperature increase. Due to the urban heat island effect, Boston greens up sooner and loses its leaves later than vegetation in rural areas. By looking at Boston, we can relate what could happen with a temperature increase of a few degrees Fahrenheit at a small scale to a global scale. Ultimately, we can use this data to better understand global warming and inform the population.

K-12 Phenology Lessons for the Phenocam Project 

By Katherine Bennett

In the fall of 2011, the Ashburnham- Westminster Regional School District became the first of five schools to join Dr. Andrew Richardson’s Phenocam Network with the installation of a digital phenocam on the roof of Overlook Middle School in Ashburnham, Massachusetts. As a part of the Phenocam project, students at the K-12 level have expanded the scope of phenological monitoring that is part of the Harvard Forest Schoolyard Ecology Program protocol, Buds, Leaves, and Global Warming. In this protocol, students work with Dr. John O’Keefe to monitor buds and leaves on schoolyard trees to determine the length of the growing season. Lessons are being developed for comparing student data on budburst, color change, and leaf drop to phenocam images in Ashburnham and other locations throughout North America, GCC (Green Chromatic Coordinate) graphs extracted from the images, and satellite data. Lessons addressing map scale and Urban Heat Island effect will also be available for teachers.

Wednesday, August 1, 2012

RESEARCH PROFILE: Part Three of Biotic Change in Hemlock Forests - Ants and Spiders

By Yvan Delgado de la Flor

Eastern hemlock is a foundation species in eastern North America and plays a critical role in the local biota. This tree deeply shades the soil, creating a unique microclimate for some species. Currently, hemlocks are dying rapidly due to the invasive woolly adelgid, a nonnative phloem-feeding insect, causing alterations to the understory microclimates. Hemlocks are being replaced slowly by hardwood forests. All of these changes affect the entire ecosystem and result in the local extinction of some arthropods; for example, some ants and spiders are very sensitive to changes in temperatures. In my study this summer I measured the impact of hemlock loss on ant and spider communities in hemlock stands and contrasted their assemblages in hemlock and hardwood stands. 

I hypothesized that the loss of eastern hemlock and associated increase in forest-floor temperature would result in the extirpation of some ant and spider genera. The effect of the adelgid was mimicked with four canopy-manipulation treatments: hemlock (control), girdled hemlock, logged hemlock, and hardwood (control). Pitfall traps were sampled throughout the summer in all treatments; ants and spiders collected were identified to genus. Initial results suggest little differences among the treatments, but sample size remains small because most of the pitfall traps will not be collected until late July. Eastern hemlocks occupy large area of late successional forests in eastern North America and the effect and impact will be better observed in 20+ years, when hemlocks will be locally extinct, potentially leading to the extirpation of local species, and the alteration of food webs and ecosystems. 

This summer I work with Aaron Ellison, Relena Ribbons and Clarisse Hart. My job is to go to the forest approximately every two weeks to set up and collect pitfalls. Pitfalls are small plastic cups that are place at ground level and filled up with one-inch of soapy water. I pick up the pitfalls two days after I have set them up and I bring them to the microscope room in the Torrey lab to proceed with the identification. Then, I sort them out in three small tubes: ants, spiders and beetles. Ants have to be identified to the species level, while spiders are identified by genus, and beetles are archived for future investigations.

RESEARCH PROFILE: Part Two of Biotic Change in Hemlock Forests - Rodents

By Elizabeth Kennett

3:40am my alarm goes off. I adorn my headlamp, throw on some field clothes, tuck my pants into my socks, and climb into my mentor Ally Degrassi's truck. We're going trapping.  

The afternoon before this we had been out to the Ridge block, one of our two. Each block consists of four hemlock forest treatments. The first two treatments are one plot that was logged out five years ago and is now full of young vegetation and the second is a plot in which the hemlocks within it have been girdled; killing the trees but leaving them standing, this was done to mimic the affect of the Wooly Adelgid, a parasite that is killing Hemlocks. The third plot is known as the Hemlock control plot and contains at least 70% Hemlocks in terms of the hardwood trees present, and the final plot consists of various hardwoods including hemlocks. The plots were set up this way so that we may observe the changes in the forest due to the hemlocks. Rodents and insectivores are very important in these changes because they have a have a large affect on seed dispersal. 

So back to 4am, Ally and I start in the logged plot, move on to the girdled and so forth. We use live Sherman traps to capture our specimens. Once a specimen is captured we usher it out of the trap and into a plastic bag where we weigh and mark it with nontoxic sharpy, and release them. The next morning we will do the same thing and see if we have any recaptures which will allow us to gain an estimate as to the overall species population in the plots. 

So far we have caught a variety of mammals which include, the white-footed mouse, Gapper's Red-backed Vole, Northern Short-tailed Shrew, Eastern Chipmunk, the Smoky Shrew, Southern Flying Squirrel, and the Woodland Jumping Mouse. 

We are determined to avoid biasing our data and looking for trends until our trapping session has been completed but so far we have caught and released over 400 specimens.

RESEARCH PROFILE: Part One of Biotic Change in Hemlock Forests - Moose, Deer, and Porcupines

By Andrew Moe

This summer, along with my mentor Ed Faison, a research associate at Harvard Forest and ecologist at Highstead Arboretum in Connecticut, I have been working on a project investigating the impacts of herbivory by moose, deer, and porcupine on regenerating forests. 

More specifically, we are interested in looking at regeneration within stands of eastern hemlock (Tsuga canadensis). Here in New England, hemlock forests are under attack. The hemlock woolly adelgid (Adelges tsugae), an exotic insect already responsible for widespread mortality of hemlock throughout the Eastern U.S., has arrived in Massachusetts and portends the demise of the tree here as well. In order to better understand the implications of this loss, Harvard Forest researchers have designed an experiment which aims to better understand the implications of this disturbance on forest ecosystem dynamics. Two hemlock removal treatments, including a logged treatment designed to simulate salvage logging, and a girdled treatment, simulating mortality caused by the adelgid, are coupled with two control treatments—a hemlock control (now infested with the adelgid) and a hardwood control, which represents future conditions of most hemlock stands. With two replicates of each of the four treatments, there are eight total research plots involved in the experiment. 

In order to assess the impacts of browsing by moose and deer, 15x30m exclosures were erected at each of the eight plots in the fall of 2011. At each plot, a grid of 4m2 subplots has been established. In each I identify and measure all of the stems >30cm, assess the % of the browse on each stem, and record any pellet groups belonging to moose, deer, or porcupine. 

Glamorous work it is not. There are thousands of stems to record, and it would rate high on a monotony scale. Companionship in the field consists of dozens of ticks and many more deer flies. Luckily, it suits me just fine. As the data begins to accumulate, patterns begin to surface. Sometimes expected results are met, other times not. More importantly, it is science in action, and for me, it has been a hugely rewarding experience to participate in. Working alongside all of the other REUs and getting to hear about all of the nifty, fascinating work that they are working on only compounds the enjoyment of the experience. Highly recommended!