ABSTRACT
Modern search engine ranking pipelines are commonly based on large machine-learned ensembles of regression trees. We propose LEAR, a novel - learned - technique aimed to reduce the average number of trees traversed by documents to accumulate the scores, thus reducing the overall query response time. LEAR exploits a classifier that predicts whether a document can early exit the ensemble because it is unlikely to be ranked among the final top-k results. The early exit decision occurs at a sentinel point, i.e., after having evaluated a limited number of trees, and the partial scores are exploited to filter out non-promising documents. We evaluate LEAR by deploying it in a production-like setting, adopting a state-of-the-art algorithm for ensembles traversal. We provide a comprehensive experimental evaluation on two public datasets. The experiments show that LEAR has a significant impact on the efficiency of the query processing without hindering its ranking quality. In detail, on a first dataset, LEAR is able to achieve a speedup of 3x without any loss in NDCG@10, while on a second dataset the speedup is larger than 5x with a negligible NDCG@10 loss (< 0.05%).
- Nima Asadi and Jimmy Lin. 2013. Training efficient tree-based models for document ranking. In Advances in Information Retrieval. Springer, 146--157.Google Scholar
- Christopher JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning, Vol. 11, 23--581 (2010), 81.Google Scholar
- Berkant Barla Cambazoglu, Hugo Zaragoza, Olivier Chapelle, Jiang Chen, Ciya Liao, Zhaohui Zheng, and Jon Degenhardt. 2010. Early exit optimizations for additive machine learned ranking systems. In Proc. WSDM. ACM, 411--420.Google ScholarDigital Library
- G. Capannini, C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, and N. Tonellotto. 2016. Quality versus efficiency in document scoring with learning-to-rank models. Information Processing & Management, Vol. 52, 6 (2016), 1161 -- 1177.Google ScholarDigital Library
- Ruey-Cheng Chen, Luke Gallagher, Roi Blanco, and J. Shane Culpepper. 2017. Efficient Cost-Aware Cascade Ranking in Multi-Stage Retrieval. In Proc. SIGIR. ACM, 445--454.Google Scholar
- D. Dato, C. Lucchese, F.M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. 2016. Fast ranking with additive ensembles of oblivious and non-oblivious regression trees. ACM TOIS, Vol. 35, 2 (2016).Google Scholar
- J. H. Friedman. 2000. Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics, Vol. 29 (2000), 1189--1232.Google ScholarCross Ref
- Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Fabrizio Silvestri, and Trani Salvatore. 2018. X-CLEaVER: Learning Ranking Ensembles by Growing and Pruning Trees. ACM TIST (2018).Google Scholar
- Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Fabrizio Silvestri, and Salvatore Trani. 2016. Post-Learning Optimization of Tree Ensembles for Efficient Ranking. In Proc. SIGIR. ACM, 949--952.Google ScholarDigital Library
- C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. 2015. QuickScorer: A Fast Algorithm to Rank Documents with Additive Ensembles of Regression Trees. In Proc. SIGIR. ACM, 73--82.Google Scholar
- Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, and Salvatore Trani. 2017. X-DART: Blending Dropout and Pruning for Efficient Learning to Rank. In Proc. SIGIR. ACM, 1077--1080.Google ScholarDigital Library
- Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, and Salvatore Trani. 2020. Query-level Early Exit for Additive Learning-to-Rank Ensembles. In Proc. SIGIR. ACM, 2033--2036.Google ScholarDigital Library
- Lidan Wang, Jimmy Lin, and Donald Metzler. 2010. Learning to Efficiently Rank. In Proc. SIGIR. ACM, New York, NY, USA, 138--145.Google ScholarDigital Library
- Q. Wu, C.J.C. Burges, K.M. Svore, and J. Gao. 2010. Adapting boosting for information retrieval measures. Information Retrieval (2010).Google Scholar
- Ji Xin, Raphael Tang, Yaoliang Yu, and Jimmy Lin. 2021. BERxiT: Early Exiting for BERT with Better Fine-Tuning and Extension to Regression. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics, Online, 91--104. https://www.aclweb.org/anthology/2021.eacl-main.8Google ScholarCross Ref
- Ting Ye, Hucheng Zhou, Will Y. Zou, Bin Gao, and Ruofei Zhang. 2018. RapidScorer: Fast Tree Ensemble Evaluation by Maximizing Compactness in Data Level Parallelization. In Proc. SIGKDD. ACM, 941--950.Google ScholarDigital Library
Index Terms
- Learning Early Exit Strategies for Additive Ranking Ensembles
Recommendations
Query-level Early Exit for Additive Learning-to-Rank Ensembles
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information RetrievalSearch engine ranking pipelines are commonly based on large ensembles of machine-learned decision trees. The tight constraints on query response time recently motivated researchers to investigate algorithms to make faster the traversal of the additive ...
Learning to Rank with Ensemble Ranking SVM
In this paper, we propose a novel learning to rank method using Ensemble Ranking SVM. Ensemble Ranking SVM is based on Ranking SVM which has been commonly used for learning to rank. The basic idea of Ranking SVM is to formulate the problem of learning ...
Quality-biased ranking for queries with commercial intent
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide WebModern search engines are good enough to answer popular commercial queries with mainly highly relevant documents. However, our experiments show that users behavior on such relevant commercial sites may differ from one to another web-site with the same ...
Comments