On the Balanced Minimum Evolution polytope
Introduction
Consider a set of vertices, hereafter referred to as taxa. A phylogeny of is an Unrooted Binary Tree (UBT) having as leaf-set. By definition, a phylogeny of has internal vertices having degree 3 and an overall number of edges, of which are external, i.e., incident to a taxon, and are internal, i.e., incident to two internal vertices. The labeling of the internal vertices of a phylogeny is usually less important than the labeling of the leaves, hence it is generally omitted. However, whenever the context requires us to specify a particular vertex under study, we will use the convention to label it as an integer in . Fig. 1(a) shows a possible phylogeny of a set of seven taxa, shaped as a caterpillar. Fig. 1(b) shows a phylogeny with a more complex topology together with a terminology of its components that we will formally define in the next section. In Fig. 1(a) the internal vertices are labeled as indicated above. On the other hand, in Fig. 1(b), the labeling of the vertices of this phylogeny is omitted to better highlight the terminology.
Let be the set of the possible distinct phylogenies of . The cardinality of is known to be a function of the cardinality of and can be easily shown to be equal to [1]. Given a phylogeny of and a taxon , let be the set and let be the topological distance between taxa , , i.e., the number of edges belonging to the (unique) path from taxon to taxon in . For example, by referring to the phylogeny shown in Fig. 1(a), , , and .
Let be a given symmetric distance matrix, whose generic entry represents a measure of dissimilarity between the corresponding pair of taxa . Then, the Balanced Minimum Evolution Problem (BMEP) is to find a phylogeny of that minimizes the following length function [2], [3]: The BMEP is a version of the network design problem [4] defined over UBTs. It was introduced in the literature on phylogenetics in 2000 by Pauplin [5] and systematically investigated from a biological perspective in Desper and Gascuel [6] and Gascuel and Steel [7]. The problem is -hard and inapproximable within , for some constant , unless [8]. This fact has justified the development of a number of implicit enumeration algorithms to exactly solve the problem [3], [9], [10] as well as a number of heuristics to approximate its optimal solution [10], [11], [12].
The current state-of-the-art exact solution algorithm for the BMEP, described in Catanzaro et al. [3], is based on an Integer Linear Programming (ILP) model that exploits some combinatorial connections between the BMEP and the Huffman Coding Problem [13], [14], [15]. Such connections proved fundamental to identify a number of properties that characterize phylogenies, such as the Kraft equalities or the Third Equality (see Section 2 and [3]), and had a central role in the study of tight lower bounds on the optimal solution to the problem. The algorithm is however unable to solve instances of the BMEP containing more than two dozen taxa within one hour computing time. Thus, in the attempt to develop algorithms able to exactly solve increasingly large instances of the problem, particular attention has been recently devoted to the study of its polyhedral combinatorics. The earliest attempts were the works Eickmeyer et al. [16] and Haws et al. [17]. Eickmeyer et al. [16] introduced the convex hull of the BMEP (hereafter also referred to as the BMEP polytope) and characterized its vertices. Haws et al. [17] characterized instead some faces of the polytope in the space of the topological distances. Neither studies led to results directly applicable in implicit enumeration approaches. Moreover, due to the nonlinearity of some specific equations that characterize phylogenies (see Section 2), it is difficult to carry out the analysis of the polytope in the chosen space. Hence, Forcey et al. [18] investigated alternative spaces to perform such analysis and proposed the use of Polymake [19] to assist in this task. Their empirical approach proved successful in unveiling connections between the BMEP polytope and the permutoassociahedron [20], [21], [22], [23] as well as in characterizing the BMEP polytope in low and fixed dimensions. In particular, they succeeded in characterizing all of its facets for five taxa and some of the facets for six taxa. Unfortunately, the very large number of facets to analyze in the latter case (over 90 000, see Section 6.2) prevented the complete description of the polytope.
In this article, we extend the results of Eickmeyer et al. [16] and Forcey et al. [21], [18], by presenting both alternative and novel proofs concerning the polyhedral combinatorics of the BMEP as well as algorithms that may inspire the development of new exact solution approaches to the problem based on implicit enumeration. Some alternative proofs (e.g., the characterization of the dimension of the BMEP polytope) are interesting per se because they enable the analysis of aspects not trivially deducible from previous studies [16], [18], [21]. We also characterize new facets of the BMEP polytope for six or more taxa as well as valid inequalities. We address the problem of consistently generating the vertices of the BMEP polytope as well as the problem of recognizing its vertices. We present a set of necessary and sufficient conditions to address the vertex generation problem and we show how to translate such conditions into a set of nonlinear constraints in order to develop new exact solution approaches to the problem similar to those already proposed in [3] and [10]. We will show that these conditions also suggest a possible polynomial-time oracle to solve the recognition problem as well. The article is organized as follows. In Section 2 we introduce some notation, definitions, and fundamental properties of the topological distances that will prove useful throughout the article. In Section 3, we review Forcey et al. [18] space. In Section 4, we show a possible basis for the space of Forcey et al. [18]. In Sections 5 On the convex hull of the set, 6 Valid inequalities and facets of we discuss some fundamental characteristics of the BMEP polytope and we extend the analysis of Eickmeyer et al. [16] and Forcey et al. [18]. Finally, in Section 7, we address the problem of recognizing and consistently generating the vertices of the BMEP polytope.
Section snippets
Nomenclature of phylogenetics and properties of the topological distances
In this section, we introduce some specific notation and nomenclature of phylogenetics that will prove useful throughout the article. We will give particular emphasis to the concept and the properties of a “path-length sequence” of a phylogeny because it has a central role in the combinatorics of the BMEP.
By referring to the phylogeny shown in Fig. 1(b), we call
- •
a leaf adjacent vertex any internal vertex of a phylogeny that is adjacent to a leaf;
- •
a separator vertex any internal vertex of a
The space
Given a set of taxa, a phylogeny of , and a topological distance between two not necessarily distinct taxa in , we define Note that by definition is a positive integer such that if , and 0 otherwise. Hence, by definition, is a positive integer such that if , and otherwise. Given the path-length matrix associated to , we denote as the symmetric matrix whose generic entry is and as the set of the matrices
A basis for
Given a set of taxa, we define the fundamental caterpillar phylogeny of as the caterpillar phylogeny such that
- •
the first taxon is assigned to the first of the four leaves of the caterpillar that have a sibling;
- •
the th taxon is assigned to the only leaf of the caterpillar at topological distance , for ;
- •
the last two taxa are assigned to the only two leaves at topological distance from the first taxon. Fig. 4 shows an example of .
We denote as the caterpillar
On the convex hull of the set
A question that naturally arises is whether it is possible to characterize some properties of the vertices of Conv(). To address this question, we first observe that the bijection (12) implies that for any phylogeny and any pair . Then, we can rewrite (9) as As (15) implies for all we have that In the light of this result, the following proposition
Known facets of
Forcey et al. [18] provided a complete description of the BMEP polytope for (i.e., when assuming an input set including up to five taxa) and characterized a number of facet-defining inequalities of for (i.e., when assuming an input set including six taxa). For the sake of completeness, in this subsection we briefly report and comment upon their results. In the next subsection, we will present a new set of valid and facet-defining inequalities that will contribute to the
Characterizing and recognizing elements of
In this section, we investigate two fundamental issues concerning the set , namely the problem of consistently generating its elements and the problem of deciding whether a given matrix is an element of (hence, a vertex of ). We present a set of necessary and sufficient conditions to accomplish the first task and we show how to translate such conditions into a set of nonlinear constraints. Moreover, we present a polynomial-time oracle to decide whether a given symmetric matrix
Acknowledgments
The authors are very grateful to the reviewers for their careful and meticulous reading of the article as well as their valuable comments. The authors are also in debt with Dr. Roberto Ronco for helpful insights about the structure of the matrix (11). The first author acknowledges support from the Belgian National Fund for Scientific Research (FRS-FNRS) via the research grant “Crédit de Recherche” ref. S/25-MCF/OL J.0026.17; the Université Catholique de Louvain, Belgium via the “Fonds Spéciaux
References (29)
- et al.
Approximating the balanced minimum evolution problem
Oper. Res. Lett.
(2012) - et al.
Optimal solutions for the balanced minimum evolution problem
Comput. Oper. Res.
(2011) - et al.
Geometry of the space of phylogenetic trees
Adv. Appl. Math.
(2001) The permutoassociahedron, Mac Lane’s coherence theorem and asymptotic zones for the KZ equation
J. Pure Appl. Algebra
(1993)- et al.
Additive evolutionary trees
J. Theoret. Biol.
(1977) Inferring Phylogenies
(2004)Estimating phylogenies from molecular data
- et al.
The balanced minimum evolution problem
INFORMS J. Comput.
(2012) - et al.
The complexity of the network design problem
Networks
(1978) Direct calculation of a tree length using a distance matrix
J. Mol. Evol.
(2000)
Theoretical foundations of the balanced minimum evolution method of phylogenetic inference and its relationship to the weighted least-squares tree fitting
Mol. Biol. Evol.
Neighbor-joining revealed
Mol. Biol. Evol.
Algorithms on Phylogenetic Trees
The minimum evolution problem: Overview and classification
Networks
Cited by (5)
A massively parallel branch-&-bound algorithm for the balanced minimum evolution problem
2023, Computers and Operations ResearchA tutorial on the balanced minimum evolution problem
2022, European Journal of Operational ResearchCitation Excerpt :The next sections will review and classify the outcomes of these efforts, by starting from the combinatorics of the BMEP. The compact statement of the BMEP hides deep optimization aspects that relate its combinatorics to those of well-known problems, such as the Traveling Salesman Problem (TSP) (Lawler, Lenstra, Kan, & Shmoys, 1985) and the Huffman Coding Problem (HCP) (Catanzaro et al., 2015; Catanzaro et al., 2013a; Catanzaro et al., 2012; Catanzaro et al., 2020b; Parker & Ram, 1996). In this section we review these aspects, by starting from Semple & Steel’s interpretation of the BMEP length function (1) and by subsequently discussing the complexity aspects of the problem as well as the recent advances on its polyhedral combinatorics.
An information theory perspective on the balanced minimum evolution problem
2020, Operations Research LettersCitation Excerpt :The BMEP was introduced in the literature on molecular phylogenetics by Desper and Gascuel [7], based on an estimation model proposed by Pauplin [21] 20 years ago. Subsequently, the problem was the subject of extensive research efforts focused on the characterization of its statistical consistency [8,13,14], the study of its combinatorics [4–6,11,12,15,23], and the development of implicit enumeration algorithms [1,4,19] and heuristics [10,13,19] to tackle and solve its instances. In the next example we show how the recursion (24) works.
Composite analysis of web pages in adaptive environment through Modified Salp Swarm algorithm to rank the web pages
2022, Journal of Ambient Intelligence and Humanized ComputingOn the approximability of the fixed-tree balanced minimum evolution problem
2021, Optimization Letters