On the Balanced Minimum Evolution polytope

https://doi.org/10.1016/j.disopt.2020.100570Get rights and content

Abstract

Recent advances on the polyhedral combinatorics of the Balanced Minimum Evolution Problem (BMEP) enabled the characterization of a number of facets of its convex hull (also referred to as the BMEP polytope) as well as the discovery of connections between this polytope and the permutoassociahedron. In this article, we extend these studies, by presenting new results concerning some fundamental characteristics of the BMEP polytope, new facet-defining inequalities in the case of six or more taxa, a number of valid inequalities, and a polynomial time oracle to recognize its vertices. Our aim is to broaden understanding of the polyhedral combinatorics of the BMEP with a view to developing new and more effective exact solution algorithms.

Introduction

Consider a set Γ={1,2,,n} of n4 vertices, hereafter referred to as taxa. A phylogeny T of Γ is an Unrooted Binary Tree (UBT) having Γ as leaf-set. By definition, a phylogeny of Γ has (n2) internal vertices having degree 3 and an overall number of (2n3) edges, n of which are external, i.e., incident to a taxon, and (n3) are internal, i.e., incident to two internal vertices. The labeling of the internal vertices of a phylogeny is usually less important than the labeling of the leaves, hence it is generally omitted. However, whenever the context requires us to specify a particular vertex under study, we will use the convention to label it as an integer in {n+1,n+2,,2n2}. Fig. 1(a) shows a possible phylogeny of a set Γ of seven taxa, shaped as a caterpillar. Fig. 1(b) shows a phylogeny with a more complex topology together with a terminology of its components that we will formally define in the next section. In Fig. 1(a) the internal vertices are labeled as indicated above. On the other hand, in Fig. 1(b), the labeling of the vertices of this phylogeny is omitted to better highlight the terminology.

Let T be the set of the possible distinct phylogenies of Γ. The cardinality of T is known to be a function of the cardinality of Γ and can be easily shown to be equal to (2n5)!!=1×3×5×(2n5) [1]. Given a phylogeny T of Γ and a taxon iΓ, let Γi be the set Γ{i} and let τij be the topological distance between taxa i,jΓ, ij, i.e., the number of edges belonging to the (unique) path from taxon i to taxon j in T. For example, by referring to the phylogeny shown in Fig. 1(a), τ13=3, τ15=5, and τ35=4.

Let D be a given n×n symmetric distance matrix, whose generic entry dijR0+ represents a measure of dissimilarity between the corresponding pair of taxa i,jΓ. Then, the Balanced Minimum Evolution Problem (BMEP) is to find a phylogeny T of Γ that minimizes the following length function [2], [3]: L(T)=iΓjΓidij2τij.The BMEP is a version of the network design problem [4] defined over UBTs. It was introduced in the literature on phylogenetics in 2000 by Pauplin [5] and systematically investigated from a biological perspective in Desper and Gascuel [6] and Gascuel and Steel [7]. The problem is NP-hard and inapproximable within cn, for some constant c>1, unless P=NP [8]. This fact has justified the development of a number of implicit enumeration algorithms to exactly solve the problem [3], [9], [10] as well as a number of heuristics to approximate its optimal solution [10], [11], [12].

The current state-of-the-art exact solution algorithm for the BMEP, described in Catanzaro et al. [3], is based on an Integer Linear Programming (ILP) model that exploits some combinatorial connections between the BMEP and the Huffman Coding Problem [13], [14], [15]. Such connections proved fundamental to identify a number of properties that characterize phylogenies, such as the Kraft equalities or the Third Equality (see Section 2 and [3]), and had a central role in the study of tight lower bounds on the optimal solution to the problem. The algorithm is however unable to solve instances of the BMEP containing more than two dozen taxa within one hour computing time. Thus, in the attempt to develop algorithms able to exactly solve increasingly large instances of the problem, particular attention has been recently devoted to the study of its polyhedral combinatorics. The earliest attempts were the works Eickmeyer et al. [16] and Haws et al. [17]. Eickmeyer et al. [16] introduced the convex hull of the BMEP (hereafter also referred to as the BMEP polytope) and characterized its vertices. Haws et al. [17] characterized instead some faces of the polytope in the space of the topological distances. Neither studies led to results directly applicable in implicit enumeration approaches. Moreover, due to the nonlinearity of some specific equations that characterize phylogenies (see Section 2), it is difficult to carry out the analysis of the polytope in the chosen space. Hence, Forcey et al. [18] investigated alternative spaces to perform such analysis and proposed the use of Polymake [19] to assist in this task. Their empirical approach proved successful in unveiling connections between the BMEP polytope and the permutoassociahedron [20], [21], [22], [23] as well as in characterizing the BMEP polytope in low and fixed dimensions. In particular, they succeeded in characterizing all of its facets for five taxa and some of the facets for six taxa. Unfortunately, the very large number of facets to analyze in the latter case (over 90 000, see Section 6.2) prevented the complete description of the polytope.

In this article, we extend the results of Eickmeyer et al. [16] and Forcey et al. [21], [18], by presenting both alternative and novel proofs concerning the polyhedral combinatorics of the BMEP as well as algorithms that may inspire the development of new exact solution approaches to the problem based on implicit enumeration. Some alternative proofs (e.g., the characterization of the dimension of the BMEP polytope) are interesting per se because they enable the analysis of aspects not trivially deducible from previous studies [16], [18], [21]. We also characterize new facets of the BMEP polytope for six or more taxa as well as valid inequalities. We address the problem of consistently generating the vertices of the BMEP polytope as well as the problem of recognizing its vertices. We present a set of necessary and sufficient conditions to address the vertex generation problem and we show how to translate such conditions into a set of nonlinear constraints in order to develop new exact solution approaches to the problem similar to those already proposed in [3] and [10]. We will show that these conditions also suggest a possible polynomial-time oracle to solve the recognition problem as well. The article is organized as follows. In Section 2 we introduce some notation, definitions, and fundamental properties of the topological distances that will prove useful throughout the article. In Section 3, we review Forcey et al. [18] space. In Section 4, we show a possible basis for the space of Forcey et al. [18]. In Sections 5 On the convex hull of the set, 6 Valid inequalities and facets of we discuss some fundamental characteristics of the BMEP polytope and we extend the analysis of Eickmeyer et al. [16] and Forcey et al. [18]. Finally, in Section 7, we address the problem of recognizing and consistently generating the vertices of the BMEP polytope.

Section snippets

Nomenclature of phylogenetics and properties of the topological distances

In this section, we introduce some specific notation and nomenclature of phylogenetics that will prove useful throughout the article. We will give particular emphasis to the concept and the properties of a “path-length sequence” of a phylogeny because it has a central role in the combinatorics of the BMEP.

By referring to the phylogeny shown in Fig. 1(b), we call

  • a leaf adjacent vertex any internal vertex of a phylogeny that is adjacent to a leaf;

  • a separator vertex any internal vertex of a

The X space

Given a set Γ of n3 taxa, a phylogeny T of Γ, and a topological distance τij between two not necessarily distinct taxa i,jΓ in T, we define xij=2n1τij.Note that by definition τij is a positive integer such that 2τijn1 if ij, and 0 otherwise. Hence, by definition, xij is a positive integer such that 1xij2n3 if ij, and 2n1 otherwise. Given the path-length matrix τ associated to T, we denote X as the n×n symmetric matrix whose generic entry is xij and X as the set of the X matrices

A basis for Space(X)

Given a set Γ of n4 taxa, we define the fundamental caterpillar phylogeny of Γ as the caterpillar phylogeny Tˆ such that

  • the first taxon is assigned to the first of the four leaves of the caterpillar that have a sibling;

  • the ith taxon is assigned to the only leaf of the caterpillar Tˆ at topological distance τ1,i=i, for i[2,n2];

  • the last two taxa are assigned to the only two leaves at topological distance n1 from the first taxon. Fig. 4 shows an example of Tˆ.

We denote Tr,s as the caterpillar

On the convex hull of the set X

A question that naturally arises is whether it is possible to characterize some properties of the vertices of Conv(X). To address this question, we first observe that the bijection (12) implies that τij=(n1log2xij) for any phylogeny TT and any pair i,jΓ. Then, we can rewrite (9) as i,jΓijxij(n1log2xij)=(2n3)2n1.As (15) implies i,jΓijxij=n2n2,for all XX we have that i,jΓijxijlog2xij=i,jΓijxij(n1τij)=(n25n+6)2n2.In the light of this result, the following proposition

Known facets of Conv(X)

Forcey et al. [18] provided a complete description of the BMEP polytope for n5 (i.e., when assuming an input set Γ including up to five taxa) and characterized a number of facet-defining inequalities of Conv(X) for n6 (i.e., when assuming an input set Γ including six taxa). For the sake of completeness, in this subsection we briefly report and comment upon their results. In the next subsection, we will present a new set of valid and facet-defining inequalities that will contribute to the

Characterizing and recognizing elements of X

In this section, we investigate two fundamental issues concerning the set X, namely the problem of consistently generating its elements and the problem of deciding whether a given matrix is an element of X (hence, a vertex of Conv(X)). We present a set of necessary and sufficient conditions to accomplish the first task and we show how to translate such conditions into a set of nonlinear constraints. Moreover, we present a polynomial-time oracle to decide whether a given n×n symmetric matrix

Acknowledgments

The authors are very grateful to the reviewers for their careful and meticulous reading of the article as well as their valuable comments. The authors are also in debt with Dr. Roberto Ronco for helpful insights about the structure of the matrix (11). The first author acknowledges support from the Belgian National Fund for Scientific Research (FRS-FNRS) via the research grant “Crédit de Recherche” ref. S/25-MCF/OL J.0026.17; the Université Catholique de Louvain, Belgium via the “Fonds Spéciaux

References (29)

  • DesperR. et al.

    Theoretical foundations of the balanced minimum evolution method of phylogenetic inference and its relationship to the weighted least-squares tree fitting

    Mol. Biol. Evol.

    (2004)
  • GascuelO. et al.

    Neighbor-joining revealed

    Mol. Biol. Evol.

    (2006)
  • PardiF.

    Algorithms on Phylogenetic Trees

    (2009)
  • CatanzaroD.

    The minimum evolution problem: Overview and classification

    Networks

    (2009)
  • Cited by (5)

    • A tutorial on the balanced minimum evolution problem

      2022, European Journal of Operational Research
      Citation Excerpt :

      The next sections will review and classify the outcomes of these efforts, by starting from the combinatorics of the BMEP. The compact statement of the BMEP hides deep optimization aspects that relate its combinatorics to those of well-known problems, such as the Traveling Salesman Problem (TSP) (Lawler, Lenstra, Kan, & Shmoys, 1985) and the Huffman Coding Problem (HCP) (Catanzaro et al., 2015; Catanzaro et al., 2013a; Catanzaro et al., 2012; Catanzaro et al., 2020b; Parker & Ram, 1996). In this section we review these aspects, by starting from Semple & Steel’s interpretation of the BMEP length function (1) and by subsequently discussing the complexity aspects of the problem as well as the recent advances on its polyhedral combinatorics.

    • An information theory perspective on the balanced minimum evolution problem

      2020, Operations Research Letters
      Citation Excerpt :

      The BMEP was introduced in the literature on molecular phylogenetics by Desper and Gascuel [7], based on an estimation model proposed by Pauplin [21] 20 years ago. Subsequently, the problem was the subject of extensive research efforts focused on the characterization of its statistical consistency [8,13,14], the study of its combinatorics [4–6,11,12,15,23], and the development of implicit enumeration algorithms [1,4,19] and heuristics [10,13,19] to tackle and solve its instances. In the next example we show how the recursion (24) works.

    View full text