Elsevier

Pattern Recognition

Volume 48, Issue 2, February 2015, Pages 344-355
Pattern Recognition

A quantum Jensen–Shannon graph kernel for unattributed graphs

https://doi.org/10.1016/j.patcog.2014.03.028Get rights and content

Highlights

  • We compute a density matrix for a graph using the continuous-time quantum walk.

  • We compute the quantum Jensen–Shannon divergence between graph density matrixes.

  • We define a quantum Jensen–Shannon graph kernel using the quantum divergence.

  • We evaluate the performance of our quantum kernel on standard graph datasets.

  • We demonstrate the effectiveness of the proposed quantum kernel.

Abstract

In this paper, we use the quantum Jensen–Shannon divergence as a means of measuring the information theoretic dissimilarity of graphs and thus develop a novel graph kernel. In quantum mechanics, the quantum Jensen–Shannon divergence can be used to measure the dissimilarity of quantum systems specified in terms of their density matrices. We commence by computing the density matrix associated with a continuous-time quantum walk over each graph being compared. In particular, we adopt the closed form solution of the density matrix introduced in Rossi et al. (2013) [27], [28] to reduce the computational complexity and to avoid the cumbersome task of simulating the quantum walk evolution explicitly. Next, we compare the mixed states represented by the density matrices using the quantum Jensen–Shannon divergence. With the quantum states for a pair of graphs described by their density matrices to hand, the quantum graph kernel between the pair of graphs is defined using the quantum Jensen–Shannon divergence between the graph density matrices. We evaluate the performance of our kernel on several standard graph datasets from both bioinformatics and computer vision. The experimental results demonstrate the effectiveness of the proposed quantum graph kernel.

Introduction

Structural representations have been used for over 30 years in pattern recognition due to their representational power. However, the increased descriptiveness comes at the cost of a greater difficulty in applying standard techniques to them, as these usually require data to reside in a vector space. The famous kernel trick [1] allows the focus to be shifted from the vectorial representation of data, which now becomes implicit, to a similarity representation. This allows standard learning techniques to be applied to data for which no obvious vectorial representation exists. For this reason, in recent years pattern recognition has witnessed an increasing interest in structural learning using graph kernels.

One of the most influential works on structural kernels was the generic R-convolution kernel proposed by Haussler [2]. Here graph kernels are computed by comparing the similarity of each of the decompositions of the two graphs. Depending on how the graphs are decomposed, we obtain different kernels. Generally speaking, most R-convolution kernels count the number of isomorphic substructures in the two graphs. Kashima et al. [3] compute the kernel by decomposing the graph into random walks, while Borgwardt et al. [4] have proposed a kernel based on shortest paths. Here, the similarity is determined by counting the numbers of pairs of shortest paths of the same length in a pair of graphs. Shervashidze et al. [5] have developed a subtree kernel on subtrees of limited size. They compute the number of subtrees shared between two graphs using the Weisfeiler–Lehman graph invariant. Aziz et al. [6] have defined a backtrackless kernel on cycles of limited length. They compute the kernel value by counting the numbers of pairs of cycles of the same length in a pair of graphs. Costa and Grave [7] have defined a so-called neighborhood subgraph pairwise distance kernel by counting the number of pairs of isomorphic neighborhood subgraphs. Recently, Kriege et al. [8] counted the number of isomorphisms between pairs of subgraphs, while Neumann et al. [9] have introduced the concept of propagation kernels to handle partially labeled graphs through the use of continuous-valued vertex attributes.

One drawback of these kernels is that they neglect the locational information for the substructures in a graph. In other words, the similarity does not depend on the relationships between substructures. As a consequence, these kernels cannot establish reliable structural correspondences between the substructures. This limits the precision of the resulting similarity measure. To overcome this problem, Fröhlich et al. [10] introduced alternative optimal assignment kernels. Here each pair of structures is aligned before comparison. However, the introduction of the alignment step results in a kernel that is not positive definite in general [11]. The problem results from the fact that alignments are not in general transitive. In other words, if σ is the vertex-alignment between graph A and graph B, and π is the alignment between graph B and graph C, in general, we cannot guarantee that the alignment between graph A and graph C is πσ. On the other hand, when the alignments are transitive, there is a common simultaneous alignment of all the graphs. Under this alignment, the optimal assignment kernel is simply the sum over all the vertex/edge kernels, and this is positive definite since it is the sum of separate positive definite kernels. While lacking positive definiteness the optimal assignment kernels cannot be guaranteed to represent an implicit embedding into a Hilbert space, they have nonetheless been proven to be very effective in classifying structures. Another example of alignment-based kernels is the edit-distance-based kernels introduced by Neuhaus and Bunke [12]. Here the alignments obtained from graph-edit distance are used to guide random walks on the structures being compared.

An attractive alternative way to measure the similarity of a pair of graphs is to use the mutual information and compute the classical Jensen–Shannon divergence [13]. In information theory, the Jensen–Shannon divergence is a dissimilarity measure between probability distributions. It is symmetric, always well defined and bounded [13]. Bai and Hancock [14] have used the divergence to define a Jensen–Shannon kernel for graphs. Here, the kernel between a pair of graphs is computed using a nonextensive entropy measure in terms of the classical Jensen–Shannon divergence between probability distributions over graphs. For a graph, the elements of the probability distribution are computed from the degree of corresponding vertices. The entropy associated with a probability distribution of an individual graph can thus be directly computed without the need of decomposing the graph into substructures. Hence, unlike the aforementioned existing graph kernels, the Jensen–Shannon kernel between a pair of graphs avoids the computationally burdensome task of determining the similarities between all pairs of substructures. Unfortunately, the required composite entropy for the Jensen–Shannon kernel is computed from a product graph formed by a pair of graphs, and reflects no correspondence information between pairs of vertices. As a result, the Jensen–Shannon graph kernel lacks correspondence information between the probability distributions over the graphs, and thus cannot precisely reflect the similarity between graphs.

There has recently been an increasing interest in quantum computing because of the potential speed-ups over classical algorithms. Examples include Grover׳s polynomially faster search algorithm [15] and Shor׳s exponentially faster factorization algorithm [16]. Furthermore, quantum algorithms also offer us a richer structure than their classical counterparts since they use qubits rather than bits as the basic representational unit [32].

Quantum systems differ from their classical counterparts, since they add the possibility of state entanglement to the classical statistical mixture of classical systems, which results in an exponential increase of the dimensionality of the state-space which is at the basis of the quantum speedup. Pure quantum states are represented as entries in a complex Hilbert space, while potentially mixed quantum states are represented through the density matrix. Mixed states can then be compared by examining their density matrices. One convenient way to do this is to use the quantum Jensen–Shannon divergence, first introduced by Majtey et al. [13], [18]. Unlike the classical Jensen–Shannon divergence which is defined as a similarity measure between probability distributions, the quantum Jensen–Shannon divergence is defined as the distance measure between mixed quantum states described by density matrices. Moreover, it can be used to measure both the degree of mixing and the entanglement [13], [18].

In this paper, we are interested in computing a kernel between pairs of graphs using the quantum Jensen–Shannon divergence between two mixed states representing the evolution of continuous-time quantum walks on the graphs. The continuous-time quantum walk has been introduced as the natural quantum analogue of the classical random walk by Farhi and Gutmann [19], and has been widely used in the design of novel quantum algorithms. Ambainis et al. [20], [21] were among the first to generalize random walks on finite graphs to the quantum world. Furthermore, they have explored the application of quantum walks on graphs to a number of problems [22]. Childs et al. [23], [24] have explored the difference between quantum and classical random walks, and then exploited the increased power of quantum walk as a general model of computation.

Similar to the classical random walk on a graph, the state space of the continuous-time quantum walk is the set of vertices of the graph. However, unlike the classical random walk where the state vector is real-valued and the evolution is governed by a doubly stochastic matrix, the state vector of the continuous-time quantum walk is complex-valued and its evolution is governed by a time-varying unitary matrix. The continuous-time quantum walk possesses a number of interesting properties which are not exhibited by the classical random walk. For instance, the continuous-time quantum walk is reversible and non-ergodic, and does not have a limiting distribution. As a result, the continuous-time quantum walk offers us an elegant way to design quantum algorithms on graphs. For further details on quantum computation and quantum algorithms, we refer the reader to the textbook in [32].

There have recently been several attempts to define quantum kernels using the continuous-time quantum walk. For instance, Bai et al. [26] have introduced a novel graph kernel where the similarity between two graphs is defined in terms of the similarity between two quantum walks evolving on the two graphs. The basic idea here is to associate with each graph a mixed quantum state representing the time evolution of a quantum walk. The kernel between the walk is defined as the divergence between the corresponding density operators. However, this quantum divergence measure requires the computation of an additional mixed state where the system has equal probability of being in each of the two original quantum states. Unless this quantum kernel takes into account the correspondences between the vertices of the two graphs, it can be shown that it is not permutation invariant. Rossi et al. [27], [28] have attempted to overcome this problem by allowing the two quantum walks to evolve on the union of the two graphs. This exploits the relation between the interferences of the continuous-time quantum walks and the symmetries in the structures being compared. Since both the mixed states are defined on the same structure, this quantum kernel addresses the shortcoming of permutation variance. However, the resulting approach is less intuitive, thus making it particularly hard to prove the positive semidefiniteness of the kernel. Moreover, the union graph is established by roughly connecting all vertex pairs of the two graphs, and thus also lacks vertex correspondence information. As a result, this kernel cannot reflect precise similarity between the two graphs.

The aim of this paper is to overcome the shortcomings of the existing graph kernels by defining a novel quantum kernel. To this end, we develop the work in [26] further. We intend to solve the problems of permutation variance and missing vertex correspondence by introducing an additional alignment step, and thus compute an aligned mixed density matrix. The goal of the alignment is to provide vertex correspondences and a lower bound on the divergence over permutations of the vertices/quantum states. However we approximate the alignment that minimizes the divergence with that which minimizes the Frobenious norm between the density operators. This latter problem is solved using Umeyama׳s graph matching method [29]. Since the aligned density matrix is computed by taking into account the vertex correspondences between the two graphs, the density matrix reflects the locational correspondences between the quantum walks over the vertices of the two graphs. As a result, our new quantum kernel can not only overcome the shortcomings of missing vertex correspondence and permutation variance that arise with the existing kernels from the classical/quantum Jensen–Shannon divergence, but also overcome the shortcoming of neglecting locational correspondence between substructures that arises in the existing R-convolution kernels. Furthermore, in contrast to what is done in previous assignment-based kernels, we do not align the structures directly. Rather we align the density matrices, which are a special case of a Laplacian family signature [30] well known to provide a more robust representation for correspondence estimation. Finally, we adopt the closed form solution of the density matrix introduced in [27], [28] to reduce the computational complexity and to avoid the cumbersome task of explicitly simulating the time evolution of the quantum walk. We evaluate the performance of our kernel on several standard graph datasets from bioinformatics and computer vision. The experimental results demonstrate the effectiveness of the proposed graph kernel, providing both a higher classification accuracy than the unaligned kernel and also achieving a performance comparable or superior to other state-of-the-art graph kernels.

In Section 2, we introduce the quantum mechanical formalism that will be used to develop the ideas proposed in the paper. We describe how to compute a density matrix for the mixed quantum state of the continuous-time quantum walk on a graph. With the mixed state density matrix to hand, we compute the von Neumann entropy of the continuous-time quantum walk on the graph. In Section 3, we compute a graph kernel using the quantum Jensen–Shannon divergence between the density matrices for a pair of graphs. In Section 4, we provide some experimental evaluations, which demonstrate experimentally the effectiveness of our quantum kernel. In Section 5, we conclude the paper and suggest future research directions.

Section snippets

Quantum mechanical background

In this section, we introduce the quantum mechanical formalism to be used in this work. We commence by reviewing the concept of a continuous-time quantum walk on a graph. We then describe how to establish a density matrix for the mixed quantum state of the graph through a quantum walk. With the density matrix of the mixed quantum state to hand, we show how to compute a von Neumann entropy for a graph.

A quantum Jensen–Shannon graph kernel

In this section, we propose a novel graph kernel based on the quantum Jensen–Shannon divergence between continuous-time quantum walks on different graphs. Suppose that the graphs under consideration are represented by a set {G,,Ga,,Gb,,GN}, we consider the continuous-time quantum walks on a pair of graphs Ga(Va,Ea) and Gb(Vb,Eb), whose mixed state density matrices ρa and ρb can be computed using Eq. (18) or Eq. (19). With the density matrices ρa and ρb to hand we compute the quantum

Experimental evaluation

In this section, we test our graph kernels on several standard graph based datasets. There are three stages to our experimental evaluation. First, we evaluate the stability of the von Neumann entropies obtained from the density matrices with time t. Second, we empirically compare our new kernel with several alternative state-of-the-art graph kernels. Finally, we evaluate the computational efficiency of our new kernel.

Conclusion

In this paper, we have developed a novel graph kernel by using the quantum Jensen–Shannon divergence and the continuous-time quantum walk on a graph. Given a graph, we evolved a continuous-time quantum walk on its structure and we showed how to associate a mixed quantum state with the vertices of the graph. From the density matrix for the mixed state we computed the von Neumann entropy. With the von Neumann entropies to hand, the kernel between a pair of graphs was defined as a function of the

Conflict of interest statement

None declared.

Acknowledgements

Edwin R. Hancock is supported by a Royal Society Wolfson Research Merit Award.

We thank Prof. Karsten Borgwardt and Dr. Nino Shervashidze for providing the Matlab implementation for the various graph kernel methods, and Dr. Geng Li for providing the graph datasets. We also thank Prof. Horst Bunke and Dr. Peng Ren for the constructive discussion and suggestions.

Lu Bai received both the B.Sc. and M.Sc. degrees from Faculty of Information Technology, Macau University of Science and Technology, Macau SAR, P.R. China. He is currently pursuing the Ph.D. degree in the University of York, York, UK. He is a member of British Machine Vision Association and Society for Pattern Recognition (BMVA). His current research interests include structural pattern recognition, machine learning, soft computing and approximation reasoning, especially in kernel methods and

References (43)

  • Michel Neuhaus et al.

    Edit distance-based kernel functions for structural pattern classification

    Pattern Recognit.

    (2006)
  • B. Schölkopf et al.

    Learning with Kernels

    (2002)
  • D. Haussler, Convolution Kernels on Discrete Structures, Technical Report UCS-CRL-99-10, Santa Cruz, CA, USA,...
  • H. Kashima, K. Tsuda, A. Inokuchi, Marginalized kernels between labeled graphs, in: Proceedings of International...
  • K.M. Borgwardt, H.P. Kriegel, Shortest-path kernels on graphs, in: Proceedings of the IEEE International Conference on...
  • N. Shervashidze et al.

    Weisfeiler–Lehman graph kernels

    J. Mach. Learn. Res.

    (2010)
  • F. Aziz et al.

    Backtrackless walks on a graph

    IEEE Trans. Neural Netw. Learn. Syst.

    (2013)
  • F. Costa, K.D. Grave, Fast neighborhood subgraph pairwise distance kernel, in: Proceedings of International Conference...
  • Kriege Nils, Mutzel Petra, Subgraph matching Kernels for attributed graphs, in: Proceedings of International Conference...
  • Neumann Marion et al.

    Efficient graph kernels by randomization, Machine Learning and Knowledge Discovery in Databases

    (2012)
  • Holger Fröhlich, Jörg K. Wegner, Florian Sieker, Andreas Zell, Optimal assignment kernels for attributed molecular...
  • Jean-Philippe Vert, The Optimal Assignment Kernel is Not Positive Definite, arXiv:0801.4061,...
  • P. Lamberti et al.

    Metric character of the quantum Jensen–Shannon divergence

    Phys. Rev. A

    (2008)
  • L. Bai et al.

    Graph kernels from the Jensen–Shannon divergence

    J. Math. Imaging Vis.

    (2013)
  • L.K. Grover, A fast quantum mechanical algorithm for database search, in: Proceedings of the ACM Symposium on the...
  • P.W. Shor

    Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer

    SIAM J. Comput.

    (1997)
  • A. Majtey et al.

    Jensen–Shannon divergence as a measure of distinguishability between mixed quantum states

    Phys. Rev. A

    (2005)
  • E. Farhi et al.

    Quantum computation and decision trees

    Phys. Rev. A

    (1998)
  • D. Aharonov, A. Ambainis, J. Kempe, U. Vazirani, Quantum walks on graphs, in: Proceedings of ACM Theory of Computing...
  • A. Ambainis, E. Bach, A. Nayak, A. Vishwanath, J. Watrous, One-dimensional quantum walks, in: Proceedings of ACM Theory...
  • A. Ambainis

    Quantum walks and their algorithmic applications

    Int. J. Quantum Inf.

    (2003)
  • Cited by (95)

    View all citing articles on Scopus

    Lu Bai received both the B.Sc. and M.Sc. degrees from Faculty of Information Technology, Macau University of Science and Technology, Macau SAR, P.R. China. He is currently pursuing the Ph.D. degree in the University of York, York, UK. He is a member of British Machine Vision Association and Society for Pattern Recognition (BMVA). His current research interests include structural pattern recognition, machine learning, soft computing and approximation reasoning, especially in kernel methods and complexity analysis on (hyper)graphs and networks.

    Luca Rossi received his B.Sc., M.Sc., and Ph.D. in computer science from Ca' Foscari University of Venice, Italy. He is now a Postdoctoral Research Fellow at the School of Computer Science of the University of Birmingham. His current research interests are in the areas of graph-based pattern recognition, machine learning, network science and computer vision.

    Andrea Torsello received his Ph.D. in computer science at the University of York, UK, and is currently an assistant professor at Ca׳ Foscari University of Venice, Italy. His research interests are in the areas of computer vision and pattern recognition, in particular, the interplay between stochastic and structural approaches and game-theoretic models. Recently, he co-edited a special issue of Pattern Recognition on “Similarity-based pattern recognition.” He has published around 40 technical papers in refereed journals and conference proceedings and has been in the program committees of various international conferences and workshops. He was a co-chair of the edition of GbR, a well-established IAPR workshop on Graph-based methods in Pattern Recognition, held in Venice in 2009. Since November 2007 he holds a visiting research position at the Information Technology Faculty of Monash University, Australia.

    Edwin R. Hancock received the B.Sc., Ph.D., and D.Sc. degrees from the University of Durham, Durham, UK. He is now a professor of computer vision in the Department of Computer Science, University of York, York, UK. He has published nearly 150 journal articles and 550 conference papers. He was the recipient of a Royal Society Wolfson Research Merit Award in 2009. He has been a member of the editorial board of the IEEE Transactions on Pattern Analysis and Machine Intelligence, Pattern Recognition, Computer Vision and Image Understanding, and Image and Vision Computing. His awards include the Pattern Recognition Society Medal in 1991, outstanding paper awards from the Pattern Recognition Journal in 1997, and the best conference best paper awards from the Computer Analysis of Images and Patterns Conference in 2001, the Asian Conference on Computer Vision in 2002, the International Conference on Pattern Recognition (ICPR) in 2006, British Machine Vision Conference (BMVC) in 2007, and the International Conference on Image Analysis and Processing in 2009. He is a Fellow of the International Association for Pattern Recognition, the Institute of Physics, the Institute of Engineering and Technology, and the British Computer Society. He was appointed as the founding Editor-in-Chief of the Institute of Engineering & Technology Computer Vision Journal in 2006. He was a General Chair for BMVC in 1994 and the Statistical, Syntactical and Structural Pattern Recognition in 2010, Track Chair for ICPR in 2004, and Area Chair at the European Conference on Computer Vision in 2006 and the Computer Vision and Pattern Recognition in 2008. He established the energy minimization methods in the Computer Vision and Pattern Recognition Workshop Series in 1997.

    1

    He was supported by a Royal Society Wolfson Research Merit Award.

    View full text