Elsevier

Neurocomputing

Volume 461, 21 October 2021, Pages 281-291
Neurocomputing

Investigating the performance of multi-objective optimization when learning Bayesian Networks

https://doi.org/10.1016/j.neucom.2021.07.049Get rights and content

Abstract

Bayesian Networks have been widely used in the last decades in many fields, to describe statistical dependencies among random variables. In general, learning the structure of such models is a problem with considerable theoretical interest that poses many challenges. On the one hand, it is a well-known NP-complete problem, practically hardened by the huge search space of possible solutions. On the other hand, the phenomenon of I-equivalence, i.e., different graphical structures underpinning the same set of statistical dependencies, may lead to multimodal fitness landscapes further hindering maximum likelihood approaches to solve the task. Despite all these difficulties, greedy search methods based on a likelihood score coupled with a regularizator score to account for model complexity, have been shown to be surprisingly effective in practice. In this paper, we consider the formulation of the task of learning the structure of Bayesian Networks as an optimization problem based on a likelihood score, without complexity terms to regularize it. In particular, we exploit the NSGA-II multi-objective optimization procedure in order to explicitly account for both the likelihood of a solution and the number of selected arcs, by setting these as the two objective functions of the method. The aim of this work is to investigate the behavior of NSGA-II and analyse the quality of its solutions. We thus thoroughly examined the optimization results obtained on a wide set of simulated data, by considering both the goodness of the inferred solutions in terms of the objective functions values achieved, and by comparing the retrieved structures with the ground truth, i.e., the networks used to generate the target data. Our results show that NSGA-II can converge to solutions characterized by better likelihood and less arcs than classic approaches, although paradoxically characterized in many cases by a lower similarity with the target network.

Introduction

Bayesian Networks (BNs) [24] are widely used to provide a succinct description of the statistical dependencies among random variables. Their applications are manifold, ranging from diagnostics, discovery of gene regulatory networks, genetic programming and many other tasks [21], [8], [5], [28], [18], [7], [26], [39], [17], [1]. In any case, learning the structure of a BN is a non trivial task and still poses many challenges, both from a theoretical and practical standpoint. The state-of-the-art approaches to tackle this problem are mainly of two kinds.1 The constraint-based techniques, where the BN is formalized as a set of relations of conditional dependency among random variables to be, in turn, learned. These methods typically provide a causal interpretation of the underlying structure [30]. The score-based techniques do not attempt to give any causal interpretation to the network, but reformulate the task as an optimization problem with a fitness function usually based on likelihood adjusted with a complexity term [24], [16]. Regardless of the approach employed to learn a BN, this is a well-known NP-complete problem, due to the huge search space of possible solutions, which effectively prevents any exhaustive search for BNs with more than a few nodes [34], [10].

In addition, this problem is hardened by the phenomenon of I-equivalence, i.e., the occurrence of different graphical structures that can underpin the same set of statistical dependencies. I-equivalence may lead to multimodal fitness landscapes, further complicating maximum likelihood approaches. Because of this, any method for structure learning of BNs may converge to a set of equivalent optimal networks that, albeit structurally different, subsume the same induced distribution over the variables [24].

Despite all the difficulties mentioned above, greedy search methods based on a likelihood score coupled with a regularizator term to account for model complexity, have been shown to be surprisingly effective in practice [16], [4], [32], [38]. In this work, we consider a specific methodology that can be placed within the score-based approaches, in which the task of learning the structure of a BN is formalized as an optimization problem. In particular, we do not make use of a score composed by both a likelihood term and a regularizator to penalize complexity; instead, we directly account for the complexity of the solutions by means of a multi-objective optimization technique. Such formulation allows us to explicitly take into account the trade-off between likelihood and model complexity. Specifically, we adopt a multi-objective optimization algorithm, called Non-dominated Sorting Genetic Algorithm (NSGA-II) [14], in which we define two competing objective functions: the likelihood of a solution (to be maximized), and the number of selected arcs (to be minimized). The aim of this work consists in investigating the behaviour of NSGA-II and in particular analyzing the quality of the solutions inferred by the method when solving the problem of learning the structure of a BN, formulated as a multi-objective optimization problem. To this regard, we thoroughly examined the results obtained with NSGA-II on a wide set of simulated data, by considering both the goodness of the inferred solutions in terms of the objective functions values achieved, and by comparing the retrieved structures with the ground truth, namely the models that were used to simulate the target data. We observed that NSGA-II can converge to solutions characterized by better likelihood and less arcs than classic Hill Climbing, although characterized in many cases by a lower similarity with the target network.

Section snippets

Materials and methods

In this Section, we first report a formal definition of BNs and a description of the problem of learning their structure; then we describe the score-based and the population-based approaches, both in the case of single- and multi-objective optimization.

Results

In this work, we represent the candidate BNs as adjacency matrices encoded by means of binary strings with fixed length, as described in [33]. Thus, the search space explored by the evolutionary algorithms consists in all possible configurations of bits representing valid DAGs. The NSGA-II settings used for the tests reported in this section are listed in Table 1. In all tests that follow we used the following python libraries: DEAP [15], NumPy [20], NetworkX [19]. The likelihood calculations

Conclusion

In this paper we extensively investigated and analysed the performance of NSGA-II, a multi-objective optimization algorithm, when solving the problem of learning the structure of Bayesian Networks. The ostensible advantage of using NSGA-II to simultaneously optimize the likelihood and the number of arcs, is that it does not require any heuristic to regularize the likelihood score. To the best of our knowledge, such problem formulation of identifying a set of optimal solutions for the BN

CRediT authorship contribution statement

Marco S. Nobile: Conceptualization, Methodology, Software. Paolo Cazzaniga: Conceptualization, Methodology, Software. Daniele Ramazzotti: Conceptualization, Methodology, Software.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Marco S. Nobile received his PhD in Computer Science from the University of Milano-Bicocca, Italy. He is currently Assistant Professor in Computational Intelligence at the Eindhoven University of Technology, The Netherlands. He is a member of the B4 - Bicocca Bioinformatics, Biostatistics and Bioimaging research center. Dr. Nobile is a member of the IEEE Bioinformatics and Bioengineering Technical Committee. IEEE CIS Task Force on advanced representation in biological and medical search and

References (41)

  • G. Caravagna et al.

    Algorithmic methods to infer the evolutionary trajectories in cancer progression

    Proceedings of the National Academy of Sciences

    (2016)
  • G. Caravagna et al.

    Learning the structure of bayesian networks via the bootstrap

    Neurocomputing

    (2021)
  • D.M. Chickering

    Learning Bayesian networks is NP-complete

    Learning from Data: Artificial Intelligence and Statistics V

    (1996)
  • C.C. Coello, M.S. Lechuga, MOPSO: A proposal for multiple objective particle swarm optimization, in: Proceedings of the...
  • C. De Stefano et al.

    A novel mutation operator for the evolutionary learning of Bayesian networks

  • K. Deb et al.

    An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part i: solving problems with box constraints

    IEEE Transactions on Evolutionary Computation

    (2013)
  • K. Deb et al.

    A fast and elitist multiobjective genetic algorithm: NSGA-II

    IEEE Transactions on Evolutionary Computation

    (2002)
  • F.-A. Fortin et al.

    DEAP: Evolutionary algorithms made easy

    Journal of Machine Learning Research

    (Jul 2012)
  • J.A. Gámez et al.

    Learning Bayesian networks by hill climbing: efficient methods based on progressive restriction of the neighborhood

    Data Mining and Knowledge Discovery

    (2011)
  • R. Gendelman et al.

    Bayesian network inference modeling identifies TRIB1 as a novel regulator of cell-cycle progression and survival in cancer cells

    Cancer Research

    (2017)
  • Cited by (0)

    Marco S. Nobile received his PhD in Computer Science from the University of Milano-Bicocca, Italy. He is currently Assistant Professor in Computational Intelligence at the Eindhoven University of Technology, The Netherlands. He is a member of the B4 - Bicocca Bioinformatics, Biostatistics and Bioimaging research center. Dr. Nobile is a member of the IEEE Bioinformatics and Bioengineering Technical Committee. IEEE CIS Task Force on advanced representation in biological and medical search and optimization. His research interests include Evolutionary Computation, Swarm Intelligence, Interpretable AI, and high performance computing.

    Paolo Cazzaniga is Assistant Professor at the University of Bergamo, Italy. He is also a member of the B4 - Bicocca Bioinformatics, Biostatistics and Bioimaging research center. His research interests mainly focus on Computational Systems Biology, Machine Learning and GPU computing.

    Daniele Ramazzotti received his Ph.D. in Computer Science at the University of Milano-Bicocca, Milano (Italy) in February 2016. Since then he has been Research Associate at several important Research Centers and Universities such as Stanford University where he spent the last 4 years. He is currently a Postdoctoral Research Fellow at the School of Medicine and Surgery at the University of Milano-Bicocca. His current research interests involve Bioinformatics with specific focus on Cancer Evolution, Statistics, Bayesian Learning, Machine Learning, Algorithmics and Theories of Causality.

    View full text