ABSTRACT
An emerging research area named Learning-to-Rank (LtR) has shown that effective solutions to the ranking problem can leverage machine learning techniques applied to a large set of features capturing the relevance of a candidate document for the user query. Large-scale search systems must however answer user queries very fast, and the computation of the features for candidate documents must comply with strict back-end latency constraints. The number of features cannot thus grow beyond a given limit, and Feature Selection (FS) techniques have to be exploited to find a subset of features that both meets latency requirements and leads to high effectiveness of the trained models. In this paper, we propose three new algorithms for FS specifically designed for the LtR context where hundreds of continuous or categorical features can be involved. We present a comprehensive experimental analysis conducted on publicly available LtR datasets and we show that the proposed strategies outperform a well-known state-of-the-art competitor.
- A. Agresti. Analysis of Ordinal Categorical Data (Second ed.). 2010.Google Scholar
- S. Baccianella, A. Esuli, and F. Sebastiani. Feature selection for ordinal text classification. Neural computation, 26(3):557--591, 2014. Google ScholarDigital Library
- G. Capannini, C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, and N. Tonellotto. Quality versus efficiency in document scoring with learning-to-rank models. Information Processing & Management, 2016.Google ScholarDigital Library
- V. Dang and B. Croft. Feature selection for document ranking using best first search and coordinate ascent. In ACM SIGIR workshop on feature generation and selection for information retrieval, 2010.Google Scholar
- X. Geng, T.-Y. Liu, T. Qin, and H. Li. Feature selection for ranking. In Proc. SIGIR'07. ACM, 2007. Google ScholarDigital Library
- J. C. Gower and G. Ross. Minimum spanning trees and single linkage cluster analysis. Applied statistics, pages 54--64, 1969.Google Scholar
- I. Guyon and A. Elisseeff. An introduction to variable and feature selection. The Journal of Machine Learning Research, 3:1157--1182, 2003. Google ScholarDigital Library
- G. Hua, M. Zhang, Y. Liu, S. Ma, and L. Ru. Hierarchical feature selection for ranking. 2010.Google Scholar
- K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of ir techniques. ACM TOIS, 20(4):422--446, 2002. Google ScholarDigital Library
- H. Lai, Y. Pan, Y. Tang, and R. Yu. Fsmrank: Feature selection algorithm for learning to rank. Transactions on Neural Networks and Learning Systems, 24(6), 2013.Google Scholar
- L. Laporte, R. Flamary, S. Canu, S. Déjean, and J. Mothe. Non-convex regularizations for feature selection in ranking with sparse svm. Transactions on Neural Networks and Learning Systems, 10(10), 2012.Google Scholar
- T.-Y. Liu. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 3(3):225--331, 2009. Google ScholarDigital Library
- K. D. Naini and I. S. Altingovde. Exploiting result diversification methods for feature selection in learning to rank. In Proc. ECIR, pages 455--461. Springer, 2014.Google Scholar
- F. Pan, T. Converse, D. Ahn, F. Salvetti, and G. Donato. Feature selection for ranking using boosted trees. In Proc. CIKM'09. ACM, 2009. Google ScholarDigital Library
- M. D. Smucker, J. Allan, and B. Carterette. A comparison of statistical significance tests for information retrieval evaluation. In Proc. CIKM '07. ACM, 2007. Google ScholarDigital Library
- J. A. T. Thomas M. Cover. Elements of Information Theory. 2006.Google Scholar
- J. H. Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236--244, 1963.Google Scholar
- Q. Wu, C. J. Burges, K. M. Svore, and J. Gao. Ranking, boosting, and model adaptation. Technical report, Microsoft Research, 2008.Google Scholar
Index Terms
Fast Feature Selection for Learning to Rank
Recommendations
Hierarchical feature selection for ranking
WWW '10: Proceedings of the 19th international conference on World wide webRanking is an essential part of information retrieval(IR) tasks such as Web search. Nowadays there are hundreds of features for ranking. So learning to rank(LTR), an interdisciplinary field of IR and machine learning(ML), has attracted increasing ...
Feature selection for ranking
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrievalRanking is a very important topic in information retrieval. While algorithms for learning ranking models have been intensively studied, this is not the case for feature selection, despite of its importance. The reality is that many feature selection ...
Feature selection under learning to rank model for multimedia retrieve
ICIMCS '10: Proceedings of the Second International Conference on Internet Multimedia Computing and ServiceMost multimedia retrieval problem can be described by ranking model, i.e. the images in the database could be ranked according to the similarity compared with the query image. Existing ranking models generally use the features that are pre-defined by ...
Comments