On the Effect of Low-Ranked Documents: A New Sampling Function for Selective Gradient Boosting

Authors:
Claudio Lucchese

Università Ca' Foscari Venezia, Venice, Italy

Università Ca' Foscari Venezia, Venice, Italy

https://orcid.org/0000-0002-2545-0425
View Profile

,
Federico Marcuzzi

Università Ca' Foscari Venezia, Venice, Italy

Università Ca' Foscari Venezia, Venice, Italy

https://orcid.org/0000-0002-8141-8294
View Profile

,
Salvatore Orlando

Università Ca' Foscari Venezia, Venice, Italy

Università Ca' Foscari Venezia, Venice, Italy

https://orcid.org/0000-0002-4155-9797
View Profile

SAC '23: Proceedings of the 38th ACM/SIGAPP Symposium on Applied ComputingMarch 2023Pages 646–652https://doi.org/10.1145/3555776.3577597

Published:07 June 2023Publication History

SAC '23: Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing

Pages 646–652

ABSTRACT

Learning to Rank is the task of learning a ranking function from a set of query-documents pairs. Generally, documents within a query are thousands but not all documents are informative for the learning phase. Different strategies were designed to select the most informative documents from the training set. However, most of them focused on reducing the size of the training set to speed up the learning phase, sacrificing effectiveness. A first attempt in this direction was achieved by Selective Gradient Boosting a learning algorithm that makes use of customisable sampling strategy to train effective ranking models. In this work, we propose a new sampling strategy called High_Low_Sampl for selecting negative examples applicable to Selective Gradient Boosting, without compromising model effectiveness. The proposed sampling strategy allows Selective Gradient Boosting to compose a new training set by selecting from the original one three document classes: the positive examples, high-ranked negative examples and low-ranked negative examples. The resulting dataset aims at minimizing the mis-ranking risk, i.e., enhancing the discriminative power of the learned model and maintaining generalisation to unseen instances. We demonstrated through an extensive experimental analysis on publicly available datasets, that the proposed selection algorithm is able to make the most of the negative examples within the training set and leads to models capable of obtaining statistically significant improvements in terms of NDCG, compared to the state of the art.

References

Javed A. Aslam, Evangelos Kanoulas, Virgiliu Pavlu, Stefan Savev, and Emine Yilmaz. 2009. Document selection methodologies for efficient and effective learning-to-rank. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009, Boston, MA, USA, July 19-23, 2009, James Allan, Javed A. Aslam, Mark Sanderson, ChengXiang Zhai, and Justin Zobel (Eds.). ACM, 468--475. Google ScholarDigital Library
Sebastian Bruch. 2021. An Alternative Cross Entropy Loss for Learning-to-Rank. In WWW '21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19-23, 2021, Jure Leskovec, Marko Grobelnik, Marc Najork, Jie Tang, and Leila Zia (Eds.). ACM / IW3C2, 118--126. Google ScholarDigital Library
Olivier Chapelle and Yi Chang. 2011. Yahoo! Learning to Rank Challenge Overview. In Proceedings of the Yahoo! Learning to Rank Challenge, held at ICML 2010, Haifa, Israel, June 25, 2010 (JMLR Proceedings), Olivier Chapelle, Yi Chang, and Tie-Yan Liu (Eds.), Vol. 14. JMLR.org, 1--24. http://proceedings.mlr.press/v14/chapelle11a.htmlGoogle Scholar
John Chen, Vatsal Shah, and Anastasios Kyrillidis. 2020. Negative sampling in semi-supervised learning. In International Conference on Machine Learning. PMLR, 1704--1714.Google Scholar
Ronald A. Fisher. 1935. The design of experiments. 1935. Oliver and Boyd, Edinburgh.Google Scholar
Jerome H Friedman. 2002. Stochastic gradient boosting. Computational statistics & data analysis 38, 4 (2002), 367--378.Google Scholar
Muhammad Ibrahim and Mark Carman. 2014. Undersampling techniques to re-balance training data for large scale learning-to-rank. In Asia Information Retrieval Symposium. Springer, 444--457.Google ScholarCross Ref
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 3146--3154. https://proceedings.neurips.cc/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.htmlGoogle Scholar
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems 30 (2017).Google ScholarDigital Library
Yang Liu and Hongyi Guo. 2020. Peer loss functions: Learning from noisy labels without knowing noise rates. In International Conference on Machine Learning. PMLR, 6226--6236.Google Scholar
Claudio Lucchese, Cristina Ioana Muntean, Franco Maria Nardini, Raffaele Perego, and Salvatore Trani. 2017. RankEval: An Evaluation and Analysis Framework for Learning-to-Rank Solutions. In SIGIR 2017: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval.Google Scholar
Claudio Lucchese, Franco Maria Nardini, Raffaele Perego, Salvatore Orlando, and Salvatore Trani. 2018. Selective Gradient Boosting for Effective Learning to Rank. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, July 08-12, 2018, Kevyn Collins-Thompson, Qiaozhu Mei, Brian D. Davison, Yiqun Liu, and Emine Yilmaz (Eds.). ACM, 155--164. Google ScholarDigital Library
Claudio Lucchese, Franco Maria Nardini, Raffaele Perego, Salvatore Orlando, and Salvatore Trani. 2018. Selective Gradient Boosting for Effective Learning to Rank. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, July 08-12, 2018, Kevyn Collins-Thompson, Qiaozhu Mei, Brian D. Davison, Yiqun Liu, and Emine Yilmaz (Eds.). ACM, 155--164. Google ScholarDigital Library
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to information retrieval. Cambridge University Press. Google ScholarDigital Library
Federico Marcuzzi, Claudio Lucchese, and Salvatore Orlando. 2022. Filtering out Outliers in Learning to Rank. In Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR '22). Association for Computing Machinery, New York, NY, USA, 214--222. Google ScholarDigital Library
Tao Qin and Tie-Yan Liu. 2013. Introducing LETOR 4.0 Datasets. CoRR abs/1306.2597 (2013). http://arxiv.org/abs/1306.2597Google Scholar
Peifeng Wang, Shuangyin Li, and Rong Pan. 2018. Incorporating gan for negative sampling in knowledge representation learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarCross Ref
Qiang Wu, Christopher J. C. Burges, Krysta M. Svore, and Jianfeng Gao. 2010. Adapting boosting for information retrieval measures. Inf. Retr. 13, 3 (2010), 254--270. Google ScholarDigital Library

Index Terms

On the Effect of Low-Ranked Documents: A New Sampling Function for Selective Gradient Boosting
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Retrieval effectiveness
    2. Retrieval models and ranking
      1. Learning to rank

Recommendations

Compression-Based Selective Sampling for Learning to Rank
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

Learning to rank (L2R) algorithms use a labeled training set to generate a ranking model that can be later used to rank new query results. These training sets are very costly and laborious to produce, requiring human annotators to assess the relevance or ...
Read More
Selective Gradient Boosting for Effective Learning to Rank
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

Learning an effective ranking function from a large number of query-document examples is a challenging task. Indeed, training sets where queries are associated with a few relevant documents and a large number of irrelevant ones are required to model ...
Read More
Ranking function adaptation with boosting trees

Machine-learned ranking functions have shown successes in Web search engines. With the increasing demands on developing effective ranking functions for different search domains, we have seen a big bottleneck, that is, the problem of insufficient labeled ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '23: Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing
March 2023
1932 pages
ISBN:9781450395175
DOI:10.1145/3555776
Conference Chairs:
Jiman Hong
Soongsil University, South Korea
,
Maart Lanperne
Tallinn University, Estonia
,
Program Chairs:
Juw Won Park
University of Louisville, USA
,
Tomas Cerny
Baylor University, USA
,
Publication Chair:
Hossain Shahriar
Kennesaw State University, USA
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 June 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
learning to rank
sampling methods
gradient boosting
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,650of6,669submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 57
  Total Downloads
- Downloads (Last 12 months)57
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

On the Effect of Low-Ranked Documents: A New Sampling Function for Selective Gradient Boosting

SAC '23: Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Compression-Based Selective Sampling for Learning to Rank

Selective Gradient Boosting for Effective Learning to Rank

Ranking function adaptation with boosting trees