Investigating the performance of multi-objective optimization when learning Bayesian Networks
Introduction
Bayesian Networks (BNs) [24] are widely used to provide a succinct description of the statistical dependencies among random variables. Their applications are manifold, ranging from diagnostics, discovery of gene regulatory networks, genetic programming and many other tasks [21], [8], [5], [28], [18], [7], [26], [39], [17], [1]. In any case, learning the structure of a BN is a non trivial task and still poses many challenges, both from a theoretical and practical standpoint. The state-of-the-art approaches to tackle this problem are mainly of two kinds.1 The constraint-based techniques, where the BN is formalized as a set of relations of conditional dependency among random variables to be, in turn, learned. These methods typically provide a causal interpretation of the underlying structure [30]. The score-based techniques do not attempt to give any causal interpretation to the network, but reformulate the task as an optimization problem with a fitness function usually based on likelihood adjusted with a complexity term [24], [16]. Regardless of the approach employed to learn a BN, this is a well-known NP-complete problem, due to the huge search space of possible solutions, which effectively prevents any exhaustive search for BNs with more than a few nodes [34], [10].
In addition, this problem is hardened by the phenomenon of I-equivalence, i.e., the occurrence of different graphical structures that can underpin the same set of statistical dependencies. I-equivalence may lead to multimodal fitness landscapes, further complicating maximum likelihood approaches. Because of this, any method for structure learning of BNs may converge to a set of equivalent optimal networks that, albeit structurally different, subsume the same induced distribution over the variables [24].
Despite all the difficulties mentioned above, greedy search methods based on a likelihood score coupled with a regularizator term to account for model complexity, have been shown to be surprisingly effective in practice [16], [4], [32], [38]. In this work, we consider a specific methodology that can be placed within the score-based approaches, in which the task of learning the structure of a BN is formalized as an optimization problem. In particular, we do not make use of a score composed by both a likelihood term and a regularizator to penalize complexity; instead, we directly account for the complexity of the solutions by means of a multi-objective optimization technique. Such formulation allows us to explicitly take into account the trade-off between likelihood and model complexity. Specifically, we adopt a multi-objective optimization algorithm, called Non-dominated Sorting Genetic Algorithm (NSGA-II) [14], in which we define two competing objective functions: the likelihood of a solution (to be maximized), and the number of selected arcs (to be minimized). The aim of this work consists in investigating the behaviour of NSGA-II and in particular analyzing the quality of the solutions inferred by the method when solving the problem of learning the structure of a BN, formulated as a multi-objective optimization problem. To this regard, we thoroughly examined the results obtained with NSGA-II on a wide set of simulated data, by considering both the goodness of the inferred solutions in terms of the objective functions values achieved, and by comparing the retrieved structures with the ground truth, namely the models that were used to simulate the target data. We observed that NSGA-II can converge to solutions characterized by better likelihood and less arcs than classic Hill Climbing, although characterized in many cases by a lower similarity with the target network.
Section snippets
Materials and methods
In this Section, we first report a formal definition of BNs and a description of the problem of learning their structure; then we describe the score-based and the population-based approaches, both in the case of single- and multi-objective optimization.
Results
In this work, we represent the candidate BNs as adjacency matrices encoded by means of binary strings with fixed length, as described in [33]. Thus, the search space explored by the evolutionary algorithms consists in all possible configurations of bits representing valid DAGs. The NSGA-II settings used for the tests reported in this section are listed in Table 1. In all tests that follow we used the following python libraries: DEAP [15], NumPy [20], NetworkX [19]. The likelihood calculations
Conclusion
In this paper we extensively investigated and analysed the performance of NSGA-II, a multi-objective optimization algorithm, when solving the problem of learning the structure of Bayesian Networks. The ostensible advantage of using NSGA-II to simultaneously optimize the likelihood and the number of arcs, is that it does not require any heuristic to regularize the likelihood score. To the best of our knowledge, such problem formulation of identifying a set of optimal solutions for the BN
CRediT authorship contribution statement
Marco S. Nobile: Conceptualization, Methodology, Software. Paolo Cazzaniga: Conceptualization, Methodology, Software. Daniele Ramazzotti: Conceptualization, Methodology, Software.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Marco S. Nobile received his PhD in Computer Science from the University of Milano-Bicocca, Italy. He is currently Assistant Professor in Computational Intelligence at the Eindhoven University of Technology, The Netherlands. He is a member of the B4 - Bicocca Bioinformatics, Biostatistics and Bioimaging research center. Dr. Nobile is a member of the IEEE Bioinformatics and Bioengineering Technical Committee. IEEE CIS Task Force on advanced representation in biological and medical search and
References (41)
- et al.
Causal data science for financial stress testing
Journal of Computational Science
(2018) - et al.
Efficient computational strategies to learn the structure of probabilistic graphical models of cumulative phenomena
Journal of Computational Science
(2019) - et al.
Who learns better bayesian network structures: Accuracy and speed of structure learning algorithms
International Journal of Approximate Reasoning
(2019) - et al.
Applications of Bayesian network models in predicting types of hematological malignancies
Scientific Reports
(2018) A new look at the statistical model identification
IEEE Transactions on Automatic Control
(1974)- T. Bäck, Selective pressure in evolutionary algorithms: A characterization of selection mechanisms, in: Evolutionary...
- S. Beretta, M. Castelli, I. Gonçalves, R. Henriques, D. Ramazzotti, Learning the structure of bayesian networks: A...
- et al.
Exposing the probabilistic causal structure of discrimination
International Journal of Data Science and Analytics
(2017) - et al.
I-MOPSO: A suitable PSO algorithm for many-objective optimization
- et al.
Analysis of prognostic factors for survival after surgery for gallbladder cancer based on a Bayesian network
Scientific Reports
(2017)
Algorithmic methods to infer the evolutionary trajectories in cancer progression
Proceedings of the National Academy of Sciences
Learning the structure of bayesian networks via the bootstrap
Neurocomputing
Learning Bayesian networks is NP-complete
Learning from Data: Artificial Intelligence and Statistics V
A novel mutation operator for the evolutionary learning of Bayesian networks
An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part i: solving problems with box constraints
IEEE Transactions on Evolutionary Computation
A fast and elitist multiobjective genetic algorithm: NSGA-II
IEEE Transactions on Evolutionary Computation
DEAP: Evolutionary algorithms made easy
Journal of Machine Learning Research
Learning Bayesian networks by hill climbing: efficient methods based on progressive restriction of the neighborhood
Data Mining and Knowledge Discovery
Bayesian network inference modeling identifies TRIB1 as a novel regulator of cell-cycle progression and survival in cancer cells
Cancer Research
Cited by (0)
Marco S. Nobile received his PhD in Computer Science from the University of Milano-Bicocca, Italy. He is currently Assistant Professor in Computational Intelligence at the Eindhoven University of Technology, The Netherlands. He is a member of the B4 - Bicocca Bioinformatics, Biostatistics and Bioimaging research center. Dr. Nobile is a member of the IEEE Bioinformatics and Bioengineering Technical Committee. IEEE CIS Task Force on advanced representation in biological and medical search and optimization. His research interests include Evolutionary Computation, Swarm Intelligence, Interpretable AI, and high performance computing.
Paolo Cazzaniga is Assistant Professor at the University of Bergamo, Italy. He is also a member of the B4 - Bicocca Bioinformatics, Biostatistics and Bioimaging research center. His research interests mainly focus on Computational Systems Biology, Machine Learning and GPU computing.
Daniele Ramazzotti received his Ph.D. in Computer Science at the University of Milano-Bicocca, Milano (Italy) in February 2016. Since then he has been Research Associate at several important Research Centers and Universities such as Stanford University where he spent the last 4 years. He is currently a Postdoctoral Research Fellow at the School of Medicine and Surgery at the University of Milano-Bicocca. His current research interests involve Bioinformatics with specific focus on Cancer Evolution, Statistics, Bayesian Learning, Machine Learning, Algorithmics and Theories of Causality.