Elsevier

Ecological Modelling

Volume 222, Issue 8, 24 April 2011, Pages 1471-1478
Ecological Modelling

Application of a Random Forest algorithm to predict spatial distribution of the potential yield of Ruditapes philippinarum in the Venice lagoon, Italy

https://doi.org/10.1016/j.ecolmodel.2011.02.007Get rights and content

Abstract

We present a modelling framework that combines machine learning techniques and Geographic Information Systems to support the management of an important aquaculture species, Manila clam (Ruditapes philippinarum). We use the Venice lagoon (Italy), the first site in Europe for the production of R. philippinarum, to illustrate the potential of this modelling approach. To investigate the relationship between the yield of R. philippinarum and a set of environmental factors, we used a Random Forest (RF) algorithm. The RF model was tuned with a large data set (n = 1698) and validated by an independent data set (n = 841). Overall, the model provided good predictions of site-specific yields and the analysis of marginal effect of predictors showed substantial agreement among the modelled responses and available ecological knowledge for R. philippinarum. The most influent environmental factors for yield estimation were percentage of sand in the sediment, salinity, and water depth. Our results agree with findings from other North Adriatic lagoons. The application of the fitted RF model to continuous maps of all the environmental variables allowed estimates of the potential yield for the whole basin. Such a spatial representation enabled site-specific estimates of yield in different farming areas within the lagoon. We present a possible management application of our model by estimating the potential yield under the current farming distribution and comparing it to a proposed re-organization of the farming areas. Our analysis suggests a reduction of total yield is likely to result from the proposed re-organization.

Research highlights

► We model the yield of R. philippinarum with a Random Forest (RF) algorithm. ► The RF model provides good predictions of site-specific yields. ► The most influent environmental factors are sediment type, salinity, water depth. ► RF model may be integrated with mechanistic models to help organize farming.

Introduction

The Manila clam Ruditapes philippinarum (Adam and Reeve, 1850), which is of Indo-Pacific origin, was introduced in the Venice lagoon (Fig. 1) in the 80s as a culture species (Cesari and Pellizzato, 1985) and radically changed the exploitation of living resources in the lagoon. Within a few years, R. philippinarum became the most important exploited species in the lagoon, with a production reaching a peak of over 40,000 t y−1 at the end of the 90s, estimated from various sources and using expert knowledge by Pellizzato and Da Ros (2005). No official fishery landings data for the whole lagoon is available and yield potential is largely unknown, despite the relevant social, economic and environmental consequence of the exploitation activities.

Since the introduction, the exploitation of R. philippinarum has been carried out in a regime of free access. In 1999, the Province of Venice began a gradual shift to a concession regime, i.e., to a system where harvesting areas are divided by the regulatory agency among a number of concessions, each managed by local clam fishermen under a strict set of rules on access limitation and exploitation effort. Technically, concessions are divided in farming (i.e., where clams are seeded) and fishing (i.e., where clams are naturally recruited) areas. In the following, we will use the word concession without further differentiating between farming and fishing areas. In 2007, about 42 km2 of the Venice lagoon were given in concession to fishermen for harvesting of R. philippinarum (Fig. 2a and Table 1).

However, the transition from uncontrolled fishing to a “culture-based fishery” based on correct and sustainable rearing procedures, while being successful in reducing production of R. philippinarum, revealed to be more complex than expected and cannot be considered successfully completed (Pellizzato and Da Ros, 2005). The Province of Venice is willing to reduce the number of fishermen operating in the lagoon (from about 900 to 600) and to remodel and reduce the areas given in concession to clam fishermen (G.R.A.L., 2006, G.R.A.L., 2009, Province of Venice, 2009) to reduce health risks linked to industrial pollutants or urban waste, minimize the environmental impacts of bottom dredging, such as the loss of sediments (e.g., Molinaroli et al., 2007), increase of water turbidity and movements of nutrients and pollutants (Pranovi et al., 2004, Sfriso et al., 2005), protect habitats of conservation concern, such as seagrass meadows (Fig. 2c), and maximize production in order to minimize fishing effort, both in space and time.

In this context, the identification of suitable harvestable grounds and a reliable estimation of site-specific commercial yield potentials are necessary to guarantee a sustainable fishery, to improve economic efficiency of clam farming, ensure an equitable share of exploitable areas to competing subjects interested in the exploitation of R. philippinarum and to foster transparency in the decision making process aimed at planning the future exploitation activities.

Habitat suitability (HS) models or models predicting species distribution (the two definitions will be used interchangeably) constitute good tools supporting decision-making within the framework of applied biology. HS models have been often used to improve our understanding of species–habitat relationship in space and time and to predict the likelihood of occurrence and abundance of a species using habitat attributes affecting its survival, growth and reproduction (e.g., Guisan and Thuiller, 2005, Hirzel et al., 2006, Santos et al., 2006). Habitat suitability approaches have also been used for identifying appropriate sites for mollusk farming in North-America and Mexico (e.g., Kapetsky et al., 1988, Aguilar-Manjarrez and Ross, 1995). Vincenzi et al., 2006a, Vincenzi et al., 2006b, Vincenzi et al., 2007 developed simple HS models for the estimation of yield potential of R. philippinarum in the Sacca di Goro lagoon (North Adriatic, Italy) by using semi-empirical and zero-inflated regression models. In recent years, machine learning methods, such as classification and regression trees (Džeroski and Drumm, 2003, Seoane et al., 2005) artificial neural networks (ANN, Pearson et al., 2002, Dedecker et al., 2004) and Random Forests (Benito Garzón et al., 2007, Benito Garzón et al., 2008) have been proposed for the development of spatial distribution models. Machine learning methods are capable of detecting complex relationships among model variables without making a priori assumptions about the type of relationship, such as a linear dependence on predictors, and are able to process complex and noisy data (Recknagel, 2001).

In this work, we used a Random Forest algorithm (Breiman, 2001) to explore the relationship between the yield potential of R. philippinarum and several environmental factors deemed important for the occurrence and abundance of the species. Several studies have shown that Random Forest (RF) models, based on an automatic combination of tree predictors, often reach top predictive performances compared to other methodologies (e.g., Prasad et al., 2006, Cutler et al., 2007). Our paper is organized as follows: after a brief description of the study area, of the environmental factors linked to the occurrence and abundance of R. philippinarum and of available data, we briefly illustrate the main features of the Random Forest model and proceed with the calibration and validation of the model by using two independent data sets relative to year 2007. Then, we apply the Random Forest model to the Venice lagoon to obtain estimates of potential yield inside and outside the areas given in concession in 2007. In addition, we predict the yield potential of R. philipparum in areas that, according to the remodelling plan proposed by local authorities, will be given in concessions in 2013. Finally, we discuss the relevant features, limitations and further development of the Random Forest approach.

Section snippets

Materials and methods

The resolution (i.e., operational scale) chosen for the study was 100 × 100 m cells (site), for a total of 45,443 cells.

Results

A strong correlation was found between chlorophyll “a” and turbidity (Fig. 3).

The out-of-bag estimates of the error rate (ERROOB) were used to select the optimum Random Forest parameters (mtry = 3, ntree = 700, nodesize = 5). For the calibration dataset (CD), the Random Forest was able to explain a large proportion of the variance of yield of R. philippinarum (r2CD = 0.99). The out-of-bag validation results were examined, for which r2OOB = 0.93.

Fig. 4 shows the ranking of predictors by their importance.

Discussion

We showed that the application of a Random Forest model provides an effective methodology for identifying suitable sites and quantifying site-specific yields for the exploitation of an aquaculture species. Random Forests, both classifier and regression, have been already used in several applicative context and were recently applied to predict plant and animal habitat suitability (e.g., Iverson et al., 2005, Lawler et al., 2006, Lawler et al., 2009, Benito Garzón et al., 2007, Benito Garzón et

Acknowledgment

The authors thank the Venice Water Authority (Magistrato alle Acque di Venezia) for providing water quality and sediment data. Simone Vincenzi concluded this work while visiting the Center for Stock Assessment Research (CSTAR), a partnership between the Fisheries Ecology Division, Southwest Fisheries Science Center, NOAA Fisheries and the University of California Santa Cruz, supported by a research grant provided by “Fondazione Luigi e Francesca Brusarosco”.

References (50)

  • F. Recknagel

    Applications of machine learning to ecological modelling

    Ecol. Mod.

    (2001)
  • X. Santos et al.

    Inferring habitat-suitability areas with ecological modelling techniques and GIS: a contribution to assess the conservation status of Vipera latastei

    Biol. Conserv.

    (2006)
  • J. Seoane et al.

    Species-specific traits associated to prediction errors in bird habitat suitability modelling

    Ecol. Mod.

    (2005)
  • C. Solidoro et al.

    Ecological and economic considerations on fishing and rearing of Tapes phillipinarum in the lagoon of Venice

    Ecol. Mod.

    (2003)
  • C. Solidoro et al.

    A partition of the Venice Lagoon based on physical properties and analysis of general circulation

    J. Mar. Syst.

    (2004)
  • C.M. Spillman et al.

    A spatially resolved model of seasonal variations in phytoplankton and clam (Tapes philippinarum) biomass in Barbamarco Lagoon, Italy

    Estuar. Coast Shelf S

    (2008)
  • G. Umgiesser et al.

    A finite element model for the Venice Lagoon development, set up, calibration and validation

    J. Mar. Syst.

    (2004)
  • S. Vincenzi et al.

    Estimating clam yield potential in the Sacca di Goro lagoon (Italy) by using a two-part conditional model

    Aquaculture

    (2006)
  • S. Vincenzi et al.

    A GIS-based habitat suitability model for commercial yield estimation of Tapes philippinarum in a Mediterranean coastal lagoon (Sacca di Goro, Italy)

    Ecol. Mod.

    (2006)
  • S. Vincenzi et al.

    A comparative analysis of three habitat suitability models for commercial yield estimation of Tapes philippinarum in a North Adriatic coastal lagoon (Sacca di Goro Italy)

    Mar. Pollut. Bull.

    (2007)
  • J. Aguilar-Manjarrez et al.

    Geographical information system (GIS) environmental models for aquaculture development in Sinaloa state, Mexico

    Aquacult. Int.

    (1995)
  • Barillari, A., Boldrin, A., Pellizzato, M., Turchetto, M., 1990. Condizioni Ambientali Nell’Allevamento Di Tapes...
  • M. Benito Garzón et al.

    Predictive modelling of tree species distributions on the Iberian Peninsula during the Last Glacial Maximum and Mid-Holocene

    Ecography

    (2007)
  • M. Benito Garzón et al.

    Effects of climate change on the distribution of Iberian tree species

    Appl. Veget. Sci.

    (2008)
  • L. Breiman

    Random forests

    Mach. Learn.

    (2001)
  • Cited by (185)

    View all citing articles on Scopus
    View full text