Evolutionary Bayesian Network design for high dimensional experiments
Introduction
Designing experiments and modelling data in current scientific research increasingly contend with the problem of high-dimensionality. Natural systems in particular are described by a very large number of variables whose dynamical interaction leads to the emergence of a particular behaviour of the system. Understanding the organisation of such systems and identifying the relationships among variables generally involve complex laboratory experimentation. Strategies to plan experiments have to address the difficulty of deriving efficient designs and optimisation procedures testing a small set of experimental points, as each of which may involve great investment of resources and sometimes negative environmental effects. Modelling high-dimensional systems confronts also the difficulties of both building nonlinear dependence relations and estimating a number of parameters that increases rapidly with the dimension of the system. These difficulties suggest the importance of alternatives to standard experimental designs and linear regression models.
As a result of these considerations, many authors have developed strategies to achieve efficient designs by reducing the effective number of parameters through the assumption of sparsity (only a few variables are relevant in determining the dynamics of the system). Georgiou et al. [1], [2], Marley et al. [3], and Sun et al. [4], among others, contributed to the construction and evaluation of the large class of supersaturated designs (SSDs), which have been derived as a class of fractional factorial designs for systems with a large number of variables and few experimental points. Building on the pioneering work of Booth and Cox [5], there have been significant developments initially for problems with two-level variables and later for multi-level and mixed-level variables [2]. Based on the classical assumption of the linear main effects model with Gaussian experimental errors, the construction of SSDs is realised through different algorithms and computer search strategies to find optimal designs with respect to a measure built on the information matrix of the model [3].
Another class of designs addressing high dimensionality is Exchange Algorithms and their developments [6], [7], [8], [9]. These designs are based on the exchange between the selected design points and points in a candidate list. The exchange is performed with the aim of identifying the design that satisfies a measure of optimality; in these approaches the exchange is realised at any iteration of the algorithm generally for increasing the determinant of the information matrix.
Optimal design procedures based on the information matrix, as SSDs and Exchange Algorithms, are adequate for several problems, but they require a prior knowledge of the form of the response function (linear model) which frequently is not available. Model-robust experimental designs based on Exchange Algorithms have been recently proposed [9] to respond to this issue. These designs do not assume a single model form but allow for a set of user-specified models, and the design is derived by maximising the product of the determinant of the information matrices associated with each of the suggested models.
A different way to address the problem of designing experimentation for high dimensional systems is to use computer experiments [10], [11]. In computer experiments, rather than conducting laboratory experimentation or making field observations, simulators based on mathematical models are constructed to study how the model behaves under relevant variables and conditions. Physical processes can be simulated and the simulation code can serve as an efficient mode for exploring the properties of the process [12], [13], [14], [15]. This is becoming a popular approach and it is applied in different research fields; however high dimensionality makes this approach hard to use, since it requires complex models that may be prohibitively expensive to simulate. In designing computer simulations, Latin hypercube sampling has been proposed [16], [17]. These designs generally provide uniform samples for the marginal distribution of the variables and have achieved successful results in a set of research studies. Recent contributions, based on these developments, have proposed to design high dimensional experiments by adopting search procedures built on evolutionary approaches, such as genetic algorithms and particle swarm optimisation techniques [18], [19], [20], [21], [22].
In this paper, we propose a design strategy that is developed according to the evolutionary approach. Our strategy does not involve an a priori choice of the experimental design but it evolves the design through a number of experimental generations, moving in different areas of the search space. Each generation of planned experiments is achieved by combining evolutionary principles and inference from statistical models. In this procedure we combine the ability of the evolutionary approach to intelligently navigate the search space with the capacity of statistical models to uncover hidden information. The common practice of a priori choice of the experimental points for high dimensional problems may in fact be inappropriate for a premature and possible misleading selection of the experimental points. More specifically, our approach is based on the evolution of probabilistic graphical models, PGMs [23], [24]. We focus on a particular class of PGMs, that is the class of Bayesian network models, where nodes in the graph correspond to random variables and arcs between nodes describe the dependence structure that may characterise the set of variables on which we then develop statistical inference.
We study this Evolutionary Bayesian Network design (EBN-design) both in a simulation analysis and laboratory biochemical experimentation concerning the self-organisation of amphiphilic molecules. In both our studies, EBN-design exhibits excellent performance in optimising the response of the systems, in comparison to other common experimental approaches. The paper is organised as follows. Section 2 introduces the design for optimisation and presents the Evolutionary Bayesian Network approach to design high dimensional experiments. Section 3 presents a simulation study to test the efficiency of EBN-design compared with common alternative design of experiment approaches. Section 4 describes a biochemical study concerning the self-organising process of amphiphilic molecules, and Section 5 presents some concluding remarks.
Section snippets
Design for optimisation
An optimisation problem can be described in its general structure as follows.
Let S be a subset of the Euclidean space and f be a real-valued function on S. Let x = (x1,…,xd) be the set of variables defined in S, and y be the response variable, y = f(x1,…,xd). The optimisation problem consists in finding x∗ in S such that f(x∗) ≥ f(x) for all x ∈ S (in maximisation problems). The inferential problem concerns the form of f. Experimentation provides the data to construct a function that can
Simulation study
We evaluate the performance of EBN-design by developing a simulation study based on the Rosenbrock function [50], frequently used to compare optimisation procedures [51], [52]. This function has the following form: Algorithm 1 Evolutionary Bayesian Network design
Require: Observational data set
Ensure: The optimal experimental points via EBN
- 1:
initialise nq, Ngen, threshold
- 2:
D1 ← First random population of n1 experimental points
- 3:
evaluate y1 ∀ D1
- 4:
EBN-design ← {[D1, y1]}
- 5:
for q in 2: Ngen do
- 6:
rank
Emergence of vesicles by EBN-design
Self-organising processes of amphiphilic molecules under particular conditions may lead to the emergence of new biological entities with specific functionalities that can be extremely relevant for medical and pharmaceutical research [59], [60], [61]. Experimentation in this field generally involves a huge number of parameters, generating an extremely large search space where the target optimum value is then very hard to find. Moreover this experimentation is generally expensive, time consuming
Concluding remarks
In this research we addressed the problem of designing experiments and modelling data from high dimensional systems with the objective of maximising a particular emergent property. Towards this aim, we introduced an evolutionary approach to design the experimental space where the evolution is guided by a Bayesian network model. The model was able to identify the most informative dependence relations among the system variables and their interactions, and achieve the target optimality region.
Conflict of interest
There is no conflict of interest.
Acknowledgements
This work was supported by the EU integrated project Programmable Artificial Cell Evolution (PACE) in FP6-IST-FET-002035, Complex Systems Initiative and by the Fondazione di Venezia — DICE project. The authors would like to acknowledge the European Centre for Living Technology (www.ecltech.org) for providing opportunities of presentation and fruitful discussion of the research. Thanks to the colleagues of the University of Venice and of ProtoLife which gave valuable suggestions to this work.
References (67)
- et al.
On multi-level supersaturated designs
J. Stat. Plan. Infer.
(2006) - et al.
A comparison of design and model selection methods for supersaturated experiments
Comput. Stat. Data Anal.
(2010) - et al.
A review of some exchange algorithms for constructing discrete D-optimal designs
Comput. Stat. Data Anal.
(1992) - et al.
The use of a modified fedorov exchange algorithm to optimise sampling times for population pharmacokinetic experiments
Comput. Methods Prog. Biomed.
(2005) - et al.
Screening and metamodeling of computer experiments with functional outputs. application to thermalhydraulic computations
Reliab. Eng. Syst. Saf.
(2012) - et al.
A study of combined roughness and plasticity induced fatigue crack closure for long cracks using a modified strip-yield model
Int. J. Fatigue
(2012) - et al.
Evolutionary experiments for self-assembling amphiphilic systems
Chemom. Intell. Lab. Syst.
(2008) - et al.
Efficient discovery and optimization of complex high-throughput experiments
Catal. Today
(2011) - et al.
Optimised design of energy efficient building facades via evolutionary neural networks
Energy Build.
(2011) Propagating uncertainty in Bayesian networks by probabilistic logic sampling
A new discrete filled function algorithm for discrete global optimization
J. Comput. Appl. Math.
A critical review of discrete filled function methods in solving nonlinear discrete optimization problems
Appl. Math. Comput.
A model-independent particle swarm optimisation software for model calibration
Environ. Model Softw.
Machine learning optimization of evolvable artificial cells
Identifying significant edges in graphical models of molecular networks
Artif. Intell. Med.
Supersaturated designs: a review of their construction and analysis
J. Stat. Plan. Infer.
On construction of optimal mixed-level supersaturated designs
Ann. Stat.
Some systematic supersaturated designs
Technometrics
Theory of Optimal Experiments
Exchange algorithms for constructing model-robust experimental designs
J. Qual. Technol.
Computer experiments: a review
Adv. Stat. Anal.
Design and analysis of computer experiments
Adv. Stat. Anal.
Statistics of rogue waves in computer experiments
JETP Lett.
How to choose the simulation model for computer experiments: a local approach
Appl. Stoch. Model. Bus. Ind.
Construction of orthogonal Latin hypercube designs
Biometrika
Construction of nearly orthogonal Latin hypercube designs
Metrika
Experimental Design for Combinatorial and High Throughput Material Developments
A survey of optimization by building and using probabilistic models
Comput. Optim. Appl.
Evolutionary Statistical Procedures
Particle swarm optimization
Swarm Intell.
Combining probabilistic dependency models and particle swarm optimization for parameter inference in stochastic biological systems
Probabilistic Networks and Expert Systems
Graphical Models: Methods for Data Analysis and Mining
Cited by (8)
A knowledge-based system for numerical design of experiments processes in mechanical engineering
2019, Expert Systems with ApplicationsOptimization of sustainable, NaCl-resistant and water-repellent renders through evolutionary experimental design
2017, Construction and Building MaterialsCitation Excerpt :While the number of variables and their different nature, as well as the limited number of points that can be tested due to technical and economical constraints, can make it very difficult to use classical DoE methodologies, planning efficient and effective experiments is essential to achieve good results in such areas of research where experimentation is complex, extremely expensive and time consuming. In this study, an innovative approach based on evolutionary experimental designs is proposed: the Design of Evolutionary Experiments based on Models approach (DEEM approach) [25–28]. DEEM approach has been recently developed to derive small sets of informative experimental points (i.e. to experimentally test a limited number of compositions), when classical hypotheses of statistical designs and multidimensional modeling are missing.
A Pareto-based multi-objective optimization algorithm to design energy-efficient shading devices
2016, Applied EnergyCitation Excerpt :To address this multi-objective optimization problem we propose a stochastic approach consisting of a multi-objective combined methodology based on Harmony Search Algorithms [20,21] and the Pareto front [22,23] to identify a set of different optimal solutions for decision makers selection. This approach is named multi-objective Evolutionary Design for Optimization (m-EDO) and it was developed within the Design of Evolutionary Experiments based on models approach (DEEMs), a class of smart evolutionary procedure where evolution and information achieved by statistical models are combined to generate informative sequential populations of solutions [24–27]. The set of solutions that we achieved leads to a large increase of indoor comfort in terms of overheating with a low level of energy consumption also in comparison with the solutions with full or without shading device and other suggested optimal solutions provided by architects.
Querying Bayesian networks to design experiments with application to 1AGY serine esterase protein engineering
2015, Chemometrics and Intelligent Laboratory SystemsCitation Excerpt :In this paper we propose a novel statistical approach to design experiments in complex and high dimensional systems. Following the paradigm introduced in Slanzi and Poli [12] we build a model-based evolutionary experimental design where the evolution is driven by querying a Bayesian network model. Specifically, we develop QueBN-design where sets of experimental points are sequentially selected by means of conditional probability distributions on relevant response values with respect to the target of the experimentation.
Current Overhang Research Methodology
2022, SpringerBriefs in Architectural Design and TechnologyDetection of Faults and Drifts in the Energy Performance of a Building Using Bayesian Networks
2019, Journal of Dynamic Systems, Measurement and Control, Transactions of the ASME