Evolutionary Bayesian Network design for high dimensional experiments

https://doi.org/10.1016/j.chemolab.2014.04.013Get rights and content

Highlights

  • We present EBN-design, a novel approach to design high dimensional experiments.

  • EBN-design combines evolutionary computation and statistical models.

  • EBN-design discovers optimum values testing a very limited number of exp points.

  • We apply EBN-design to a biochemical study concerning the emergence of vesicles.

  • The comparison with other approaches has always favoured EBN-design.

Abstract

Laboratory experimentation is increasingly concerned with systems whose dynamical behaviour can be affected by a very large number of variables. Objectives of experimentation on such systems are generally both the optimisation of some experimental responses and efficiency of experimentation in terms of low investment of resources and low impact on the environment. Design and modelling for high dimensional systems with these objectives present hard and challenging problems, to which much current research is devoted. In this paper, we introduce a novel approach based on the evolutionary principle and Bayesian network models. This approach can discover optimum values while testing just a very limited number of experimental points. The very good performance of the approach is shown both in a simulation analysis and biochemical study concerning the emergence of new functional bio-entities.

Introduction

Designing experiments and modelling data in current scientific research increasingly contend with the problem of high-dimensionality. Natural systems in particular are described by a very large number of variables whose dynamical interaction leads to the emergence of a particular behaviour of the system. Understanding the organisation of such systems and identifying the relationships among variables generally involve complex laboratory experimentation. Strategies to plan experiments have to address the difficulty of deriving efficient designs and optimisation procedures testing a small set of experimental points, as each of which may involve great investment of resources and sometimes negative environmental effects. Modelling high-dimensional systems confronts also the difficulties of both building nonlinear dependence relations and estimating a number of parameters that increases rapidly with the dimension of the system. These difficulties suggest the importance of alternatives to standard experimental designs and linear regression models.

As a result of these considerations, many authors have developed strategies to achieve efficient designs by reducing the effective number of parameters through the assumption of sparsity (only a few variables are relevant in determining the dynamics of the system). Georgiou et al. [1], [2], Marley et al. [3], and Sun et al. [4], among others, contributed to the construction and evaluation of the large class of supersaturated designs (SSDs), which have been derived as a class of fractional factorial designs for systems with a large number of variables and few experimental points. Building on the pioneering work of Booth and Cox [5], there have been significant developments initially for problems with two-level variables and later for multi-level and mixed-level variables [2]. Based on the classical assumption of the linear main effects model with Gaussian experimental errors, the construction of SSDs is realised through different algorithms and computer search strategies to find optimal designs with respect to a measure built on the information matrix of the model [3].

Another class of designs addressing high dimensionality is Exchange Algorithms and their developments [6], [7], [8], [9]. These designs are based on the exchange between the selected design points and points in a candidate list. The exchange is performed with the aim of identifying the design that satisfies a measure of optimality; in these approaches the exchange is realised at any iteration of the algorithm generally for increasing the determinant of the information matrix.

Optimal design procedures based on the information matrix, as SSDs and Exchange Algorithms, are adequate for several problems, but they require a prior knowledge of the form of the response function (linear model) which frequently is not available. Model-robust experimental designs based on Exchange Algorithms have been recently proposed [9] to respond to this issue. These designs do not assume a single model form but allow for a set of user-specified models, and the design is derived by maximising the product of the determinant of the information matrices associated with each of the suggested models.

A different way to address the problem of designing experimentation for high dimensional systems is to use computer experiments [10], [11]. In computer experiments, rather than conducting laboratory experimentation or making field observations, simulators based on mathematical models are constructed to study how the model behaves under relevant variables and conditions. Physical processes can be simulated and the simulation code can serve as an efficient mode for exploring the properties of the process [12], [13], [14], [15]. This is becoming a popular approach and it is applied in different research fields; however high dimensionality makes this approach hard to use, since it requires complex models that may be prohibitively expensive to simulate. In designing computer simulations, Latin hypercube sampling has been proposed [16], [17]. These designs generally provide uniform samples for the marginal distribution of the variables and have achieved successful results in a set of research studies. Recent contributions, based on these developments, have proposed to design high dimensional experiments by adopting search procedures built on evolutionary approaches, such as genetic algorithms and particle swarm optimisation techniques [18], [19], [20], [21], [22].

In this paper, we propose a design strategy that is developed according to the evolutionary approach. Our strategy does not involve an a priori choice of the experimental design but it evolves the design through a number of experimental generations, moving in different areas of the search space. Each generation of planned experiments is achieved by combining evolutionary principles and inference from statistical models. In this procedure we combine the ability of the evolutionary approach to intelligently navigate the search space with the capacity of statistical models to uncover hidden information. The common practice of a priori choice of the experimental points for high dimensional problems may in fact be inappropriate for a premature and possible misleading selection of the experimental points. More specifically, our approach is based on the evolution of probabilistic graphical models, PGMs [23], [24]. We focus on a particular class of PGMs, that is the class of Bayesian network models, where nodes in the graph correspond to random variables and arcs between nodes describe the dependence structure that may characterise the set of variables on which we then develop statistical inference.

We study this Evolutionary Bayesian Network design (EBN-design) both in a simulation analysis and laboratory biochemical experimentation concerning the self-organisation of amphiphilic molecules. In both our studies, EBN-design exhibits excellent performance in optimising the response of the systems, in comparison to other common experimental approaches. The paper is organised as follows. Section 2 introduces the design for optimisation and presents the Evolutionary Bayesian Network approach to design high dimensional experiments. Section 3 presents a simulation study to test the efficiency of EBN-design compared with common alternative design of experiment approaches. Section 4 describes a biochemical study concerning the self-organising process of amphiphilic molecules, and Section 5 presents some concluding remarks.

Section snippets

Design for optimisation

An optimisation problem can be described in its general structure as follows.

Let S be a subset of the Euclidean space Rd and f be a real-valued function on S. Let x = (x1,…,xd) be the set of variables defined in S, and y be the response variable, y = f(x1,…,xd). The optimisation problem consists in finding x in S such that f(x)  f(x) for all x  S (in maximisation problems). The inferential problem concerns the form of f. Experimentation provides the data to construct a function f^x1xd that can

Simulation study

We evaluate the performance of EBN-design by developing a simulation study based on the Rosenbrock function [50], frequently used to compare optimisation procedures [51], [52]. This function has the following form:fx=i=1d1100xi2xi+12+xi12.

Algorithm 1

Evolutionary Bayesian Network design

Require: Observational data set

Ensure: The optimal experimental points via EBN

  • 1:

    initialise nq, Ngen, threshold

  • 2:

    D1  First random population of n1 experimental points

  • 3:

    evaluate y1D1

  • 4:

    EBN-design  {[D1, y1]}

  • 5:

    for q in 2: Ngen do

  • 6:

    rank

Emergence of vesicles by EBN-design

Self-organising processes of amphiphilic molecules under particular conditions may lead to the emergence of new biological entities with specific functionalities that can be extremely relevant for medical and pharmaceutical research [59], [60], [61]. Experimentation in this field generally involves a huge number of parameters, generating an extremely large search space where the target optimum value is then very hard to find. Moreover this experimentation is generally expensive, time consuming

Concluding remarks

In this research we addressed the problem of designing experiments and modelling data from high dimensional systems with the objective of maximising a particular emergent property. Towards this aim, we introduced an evolutionary approach to design the experimental space where the evolution is guided by a Bayesian network model. The model was able to identify the most informative dependence relations among the system variables and their interactions, and achieve the target optimality region.

Conflict of interest

There is no conflict of interest.

Acknowledgements

This work was supported by the EU integrated project Programmable Artificial Cell Evolution (PACE) in FP6-IST-FET-002035, Complex Systems Initiative and by the Fondazione di Venezia — DICE project. The authors would like to acknowledge the European Centre for Living Technology (www.ecltech.org) for providing opportunities of presentation and fruitful discussion of the research. Thanks to the colleagues of the University of Venice and of ProtoLife which gave valuable suggestions to this work.

References (67)

  • Y. Yongjian et al.

    A new discrete filled function algorithm for discrete global optimization

    J. Comput. Appl. Math.

    (2007)
  • S.F. Woon et al.

    A critical review of discrete filled function methods in solving nonlinear discrete optimization problems

    Appl. Math. Comput.

    (2010)
  • M. Zambrano-Bigiarini et al.

    A model-independent particle swarm optimisation software for model calibration

    Environ. Model Softw.

    (2013)
  • F. Caschera et al.

    Machine learning optimization of evolvable artificial cells

  • M. Scutari et al.

    Identifying significant edges in graphical models of molecular networks

    Artif. Intell. Med.

    (2013)
  • S.D. Georgiou

    Supersaturated designs: a review of their construction and analysis

    J. Stat. Plan. Infer.

    (2012)
  • F. Sun et al.

    On construction of optimal mixed-level supersaturated designs

    Ann. Stat.

    (2011)
  • K.H.V. Booth et al.

    Some systematic supersaturated designs

    Technometrics

    (1962)
  • V.V. Fedorov

    Theory of Optimal Experiments

    (1972)
  • B. Smucker et al.

    Exchange algorithms for constructing model-robust experimental designs

    J. Qual. Technol.

    (2011)
  • S. Levy et al.

    Computer experiments: a review

    Adv. Stat. Anal.

    (2010)
  • S. Kuhnt et al.

    Design and analysis of computer experiments

    Adv. Stat. Anal.

    (2010)
  • V. Zakharov et al.

    Statistics of rogue waves in computer experiments

    JETP Lett.

    (2012)
  • T. Mhlenstdt et al.

    How to choose the simulation model for computer experiments: a local approach

    Appl. Stoch. Model. Bus. Ind.

    (2012)
  • F. Sun et al.

    Construction of orthogonal Latin hypercube designs

    Biometrika

    (2009)
  • L. Gu et al.

    Construction of nearly orthogonal Latin hypercube designs

    Metrika

    (2013)
  • J. Cawse

    Experimental Design for Combinatorial and High Throughput Material Developments

    (2003)
  • M. Pelikan et al.

    A survey of optimization by building and using probabilistic models

    Comput. Optim. Appl.

    (2002)
  • R. Baragona et al.

    Evolutionary Statistical Procedures

    (2011)
  • R. Poli et al.

    Particle swarm optimization

    Swarm Intell.

    (2007)
  • M. Forlin et al.

    Combining probabilistic dependency models and particle swarm optimization for parameter inference in stochastic biological systems

  • R. Cowell et al.

    Probabilistic Networks and Expert Systems

    (1999)
  • C. Borgelt et al.

    Graphical Models: Methods for Data Analysis and Mining

    (2002)
  • Cited by (8)

    • Optimization of sustainable, NaCl-resistant and water-repellent renders through evolutionary experimental design

      2017, Construction and Building Materials
      Citation Excerpt :

      While the number of variables and their different nature, as well as the limited number of points that can be tested due to technical and economical constraints, can make it very difficult to use classical DoE methodologies, planning efficient and effective experiments is essential to achieve good results in such areas of research where experimentation is complex, extremely expensive and time consuming. In this study, an innovative approach based on evolutionary experimental designs is proposed: the Design of Evolutionary Experiments based on Models approach (DEEM approach) [25–28]. DEEM approach has been recently developed to derive small sets of informative experimental points (i.e. to experimentally test a limited number of compositions), when classical hypotheses of statistical designs and multidimensional modeling are missing.

    • A Pareto-based multi-objective optimization algorithm to design energy-efficient shading devices

      2016, Applied Energy
      Citation Excerpt :

      To address this multi-objective optimization problem we propose a stochastic approach consisting of a multi-objective combined methodology based on Harmony Search Algorithms [20,21] and the Pareto front [22,23] to identify a set of different optimal solutions for decision makers selection. This approach is named multi-objective Evolutionary Design for Optimization (m-EDO) and it was developed within the Design of Evolutionary Experiments based on models approach (DEEMs), a class of smart evolutionary procedure where evolution and information achieved by statistical models are combined to generate informative sequential populations of solutions [24–27]. The set of solutions that we achieved leads to a large increase of indoor comfort in terms of overheating with a low level of energy consumption also in comparison with the solutions with full or without shading device and other suggested optimal solutions provided by architects.

    • Querying Bayesian networks to design experiments with application to 1AGY serine esterase protein engineering

      2015, Chemometrics and Intelligent Laboratory Systems
      Citation Excerpt :

      In this paper we propose a novel statistical approach to design experiments in complex and high dimensional systems. Following the paradigm introduced in Slanzi and Poli [12] we build a model-based evolutionary experimental design where the evolution is driven by querying a Bayesian network model. Specifically, we develop QueBN-design where sets of experimental points are sequentially selected by means of conditional probability distributions on relevant response values with respect to the target of the experimentation.

    • Current Overhang Research Methodology

      2022, SpringerBriefs in Architectural Design and Technology
    • Detection of Faults and Drifts in the Energy Performance of a Building Using Bayesian Networks

      2019, Journal of Dynamic Systems, Measurement and Control, Transactions of the ASME
    View all citing articles on Scopus
    View full text