QuickScorer: Efficient Traversal of Large Ensembles of Decision Trees

Lucchese, Claudio; Nardini, Franco Maria; Orlando, Salvatore; Perego, Raffaele; Tonellotto, Nicola; Venturini, Rossano

doi:10.1007/978-3-319-71273-4_36

Claudio Lucchese²²,
Franco Maria Nardini²²,
Salvatore Orlando^22,23,
Raffaele Perego²²,
Nicola Tonellotto²² &
…
Rossano Venturini^22,24

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10536))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

3126 Accesses
4 Citations

Abstract

Machine-learnt models based on additive ensembles of binary regression trees are currently deemed the best solution to address complex classification, regression, and ranking tasks. Evaluating these models is a computationally demanding task as it needs to traverse thousands of trees with hundreds of nodes each. The cost of traversing such large forests of trees significantly impacts their application to big and stream input data, when the time budget available for each prediction is limited to guarantee a given processing throughput. Document ranking in Web search is a typical example of this challenging scenario, where the exploitation of tree-based models to score query-document pairs, and finally rank lists of documents for each incoming query, is the state-of-art method for ranking (a.k.a. Learning-to-Rank). This paper presents QuickScorer, a novel algorithm for the traversal of huge decision trees ensembles that, thanks to a cache- and CPU-aware design, provides a \({\sim } 9 \! \times \) speedup over best competitors.

You have full access to this open access chapter, Download conference paper PDF

Training Efficient Tree-Based Models for Document Ranking

iSOUP-SymRF: Symbolic Feature Ranking with Random Forests in Online Multi-target Regression

Hierarchical Meta-Rules for Scalable Meta-Learning

Keywords

1 Introduction

In this paper we discuss QuickScorer (QS), an algorithm developed to speedup the application of machine-learnt forests of binary regression trees to score and finally rank lists of candidate documents for each query submitted to a Web search engine. QuickScorer was thus developed in the field of Learning-to-Rank (LtR) within the IR community. Nowadays, LtR is commonly exploited by Web search engines within their query processing pipeline, by exploiting massive training datasets consisting of collections of query-document pairs, in turn modeled as vectors of hundreds features, annotated with a relevance label.

The interest in exploiting forests of binary regression trees to rank lists of candidate documents is due to the success of gradient boosting tree algorithms [4]. This kind of algorithms is considered the state-of-the-art LtR solution for addressing complex ranking problems [5]. In search engines, these forests are exploited within a two-stage architecture. While the first stage retrieves a set of possibly relevant documents matching the user query, such expensive LtR-based scorers, optimized for high precision, are exploited in the second stage to re-rank the set of candidate documents coming from the first stage. The time budget available to re-rank the candidate documents is limited, due to the incoming rate of queries and the users’ expectations in terms of response time. Therefore, devising techniques and strategies to speed up document ranking without losing in quality is definitely an urgent research topic in Web search [9].

Strongly motivated by these considerations, the IR community has started to investigate computational optimizations to reduce the scoring time of the most effective LtR rankers based on ensembles of regression trees, by exploiting advanced features of modern CPUs and carefully exploiting memory hierarchies. Among those, the best competitor of QuickScorer is vPRED [1].

We argue that QuickScorer can also be exploited in different time-sensitive scenarios and each time it is needed to use a large forest of binary decision trees, e.g., random forest, for classification/regression purposes and apply it to big and stream data with strict processing throughput requirements.

2 QuickScorer

Given a query-document pair \((q, d_i)\), represented by a feature vector \(\mathbf {x}\), a LtR model based on an additive ensemble of regression trees predicts a relevance score \(s(\mathbf{x})\) used for ranking a set of documents. Typically, a tree ensemble encompasses several binary decision trees, denoted by \(\mathcal {T} = \{T_0, T_1, \ldots \}\). Each internal (or branching) node in \(T_h\) is associated with a Boolean test over a specific feature \(f_{\phi } \in \mathcal {F}\), and a constant threshold \(\gamma \in \mathbb {R}\). Tests are of the form \(\mathbf {x}[\phi ] \le \gamma \), and, during the visit, the left branch is taken iff the test succeeds. Each leaf node stores the tree prediction, representing the potential contribution of the tree to the final document score. The scoring of \(\mathbf {x}\) requires the traversal of all the ensemble’s trees and it is computed as a weighted sum of all the tree predictions.

Algorithm 1 illustrates QS [3, 7]. One important result is that QS computes \(s(\mathbf {x})\) by only identifying the branching nodes whose test evaluates to false, called false nodes. For each false node detected in \(T_h \in \mathcal {T}\), QS updates a bitvector associated with \(T_h\), which stores information that is eventually exploited to identify the exit leaf of \(T_h\) that contributes to the final score \(s(\mathbf {x})\). To this end, QS maintains for each tree \(T_h \in \mathcal {T}\) a bitvector leafidx[ h ], made of \(\varLambda \) bits, one per leaf. Initially, every bit in leafidx[ h ] is set to 1. Moreover, each branching node is associated with a bitvector mask, still of \(\varLambda \) bits, identifying the set of unreachable leaves of \(T_h\) in case the corresponding test evaluates to false. Whenever a false node is visited, the set of unreachable leaves leafidx[ h ] is updated through a logical AND (\(\wedge \)) with mask. Eventually, the leftmost bit set in leafidx[ h ] identifies the leaf corresponding to the score contribution of \(T_h\), stored in the lookup table leafvalues.

To efficiently identify all the false nodes in the ensemble, QS processes the branching nodes of all the trees feature by feature. Specifically, for each feature \(f_\phi \), QS builds a list \(\mathcal{N}_\phi \) of tuples \((\gamma , \texttt {mask}, h)\), where \(\gamma \) is the predicate threshold of a branching node of tree \(T_h\) performing a test over the feature \(f_\phi \), denoted by \(\mathbf {x}[\phi ]\), and mask is the pre-computed mask that identifies the leaves of \(T_h\) that are un-reachable when the associated test evaluates to false. \(\mathcal{N}_\phi \) is statically sorted in ascending order of \(\gamma \). Hence, when processing \(\mathcal{N}_\phi \) sequentially, as soon as a test evaluates to true, i.e., \(\mathbf {x}[\phi ] \le \gamma \), the remaining occurrences surely evaluate to true as well, and their evaluation is thus safely skipped.

We call mask computation the first step of the algorithm during which all the bitvectors leafidx[ h ] are updated, and score computation the second step where such bitvectors are used to retrieve tree predictions.

Compared to the classic tree traversal, QuickScorer introduces a main novelty. The cost of the traversal does not depend on the average length of the root-to-leaf paths, but rather on the average number of false nodes in the trees of the forest. Experiments on large public datasets with large forests, with 64 leaves per tree and up to 20,000 trees, show that a classic traversal evaluates between 50% and 80% of the branching nodes. This is due to the imbalance of the trees built by state-of-the-art LtR algorithms. On the other hand, on the same datasets, QuickScorer always visits less than 30% of the nodes. This results in a largely reduced number of operations and number of memory accesses.

Moreover, QuickScorer exploits a cache- and CPU-aware design. For instance, the values of \((\gamma , \texttt {mask}, h)\) are accessed through a linear scan of the QuickScorer data structures, which favours cache prefetching and limits data dependencies. For each feature, QuickScorer visits only one true node, thus easing the CPU branch predictor and limiting control dependencies. This makes QuickScorer to perform better than competitors also with a special kind of perfectly balanced trees named oblivious [6].

The design of QuickScorer makes it possible to introduce two further improvements. Firstly, for large LtR models, the forest can be split into multiple blocks of trees, sufficiently small to allow the data structure of a single block to entirely fit into the third-level CPU cache. We name BlockWise-QS (BWQS) the resulting variant. This cache-aware algorithm reduces the cache miss ratio from more than 10% to less than 1%. Secondly, the scoring can be vectorized so as to score multiple documents simultaneously. In V-QuickScorer (vQS) [8] vectorization is achieved through AVX 2.0 instructions and 256-bits wide registers. In such setting, up to 8 documents can be processed simultaneously.

Figure 1 compares QS, BWQS, and vQS against the best competitor vPRED. The test was performed on a large dataset, with a model with 64 leaves per tree and varying the number of trees of the forest.

3 Discussion

In this work, we focused on tree ensembles to tackle the LtR problem. Decision tree ensembles are a popular and effective machine learning tool beyond LtR. Their success is witnessed by the Kaggle 2015 competitions, where most of the winning solutions exploited MART models, and by the KDD Cup 2015, where MART-based algorithms were used by all the top 10 teams [2].

In the LtR scenario, the time budget available for applying a model is limited and must be satisfied. Therefore large models, despite being more accurate, cannot be used because of their high evaluation cost. QS, a novel algorithm for the traversal of decision trees ensembles, is an answer to this problem as it provides \({\sim } 9\! \times \) speedup over state-of-the-art competitors. Moreover, the need of efficient traversal strategies goes beyond the LtR scenario, for instance when such models are used to classify big data collections. For all these reasons, we believe that QS can help scientists from the data mining community to speed-up the process of evaluating highly effective tree-based models over big and stream datasets.

References

Asadi, N., Lin, J., de Vries, A.P.: Runtime optimizations for tree-based machine learning models. IEEE TKDE 26(9), 2281–2292 (2014)
Google Scholar
Chen, T., Guestrin, C.: XGBoost: A scalable tree boosting system. In: Proceedings of SIGKDD, pp. 785–794. ACM (2016)
Google Scholar
Dato, D., Lucchese, C., Nardini, F.M., Orlando, S., Perego, R., Tonellotto, N., Venturini, R.: Fast ranking with additive ensembles of oblivious and non-oblivious regression trees. ACM TOIS 35(2), 1–31 (2016)
Article Google Scholar
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2000)
Article MathSciNet MATH Google Scholar
Gulin, A., Kuralenok, I., Pavlov, D.: Winning The transfer learning track of yahoo!’s learning to rank challenge with YetiRank. In: Yahoo! Learning to Rank Challenge, pp. 63–76 (2011)
Google Scholar
Langley, P., Sage, S.: Oblivious decision trees and abstract cases. In: Working Notes of the AAAI-94 Workshop on Case-Based Reasoning, pp. 113–117. AAAI Press (1994)
Google Scholar
Lucchese, C., Nardini, F.M., Orlando, S., Perego, R., Tonellotto, N., Venturini, R.: QuickScorer: a fast algorithm to rank documents with additive ensembles of regression trees. In: Proceedings of SIGIR, pp. 73–82. ACM (2015)
Google Scholar
Lucchese, C., Nardini, F.M., Orlando, S., Perego, R., Tonellotto, N., Venturini, R.: Exploiting CPU SIMD extensions to speed-up document scoring with tree ensembles. In: Proceedings of SIGIR, pp. 833–836. ACM (2016)
Google Scholar
Segalovich, I.: Machine learning in search quality at Yandex. Presentation at the Industry Track of SIGIR (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

ISTI–CNR, Pisa, Italy
Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Nicola Tonellotto & Rossano Venturini
Ca’ Foscari University of Venice, Venice, Italy
Salvatore Orlando
University of Pisa, Pisa, Italy
Rossano Venturini

Authors

Claudio Lucchese
View author publications
You can also search for this author in PubMed Google Scholar
Franco Maria Nardini
View author publications
You can also search for this author in PubMed Google Scholar
Salvatore Orlando
View author publications
You can also search for this author in PubMed Google Scholar
Raffaele Perego
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Tonellotto
View author publications
You can also search for this author in PubMed Google Scholar
Rossano Venturini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Claudio Lucchese .

Editor information

Editors and Affiliations

Google Research, Google Inc., Zurich, Switzerland
Yasemin Altun
NASA Ames Research Center, Mountain View, USA
Kamalika Das
Oath, Sunnyvale, USA
Taneli Mielikäinen
Department of Computer Science, University of Bari Aldo Moro, Bari, Italy
Donato Malerba
Institute of Computing Science, Poznan University of Technology, Poznan, Poland
Jerzy Stefanowski
Laboratoire d’ Informatique (LIX), École Polytechnique, Palaiseau, France
Jesse Read
Department of Computer Science, Stanford University, Stanford, USA
Marinka Žitnik
Università degli Studi di Bari Aldo Moro, Bari, Italy
Michelangelo Ceci
Jožef Stefan Institute, Ljubljana, Slovenia
Sašo Džeroski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lucchese, C., Nardini, F.M., Orlando, S., Perego, R., Tonellotto, N., Venturini, R. (2017). QuickScorer: Efficient Traversal of Large Ensembles of Decision Trees. In: Altun, Y., et al. Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2017. Lecture Notes in Computer Science(), vol 10536. Springer, Cham. https://doi.org/10.1007/978-3-319-71273-4_36

Download citation

DOI: https://doi.org/10.1007/978-3-319-71273-4_36
Published: 30 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71272-7
Online ISBN: 978-3-319-71273-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

QuickScorer: Efficient Traversal of Large Ensembles of Decision Trees

Abstract

Similar content being viewed by others

Training Efficient Tree-Based Models for Document Ranking

iSOUP-SymRF: Symbolic Feature Ranking with Random Forests in Online Multi-target Regression

Hierarchical Meta-Rules for Scalable Meta-Learning

Keywords

1 Introduction

2 QuickScorer

3 Discussion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

QuickScorer: Efficient Traversal of Large Ensembles of Decision Trees

Abstract

Similar content being viewed by others

Training Efficient Tree-Based Models for Document Ranking

iSOUP-SymRF: Symbolic Feature Ranking with Random Forests in Online Multi-target Regression

Hierarchical Meta-Rules for Scalable Meta-Learning

Keywords

1 Introduction

2 QuickScorer

3 Discussion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation