Information Retrieval

TSPRank: Bridging Pairwise and Listwise Methods with a Bilinear Travelling Salesman Model

Citation @inproceedings{10.1145/3690624.3709234, author = {Li, Weixian Waylon and Ziser, Yftah and Xie, Yifei and Cohen, Shay B. and Ma, Tiejun}, title = {TSPRank: Bridging Pairwise and Listwise Methods with a Bilinear Travelling Salesman Model}, year = {2025}, isbn = {9798400712456}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3690624.3709234}, doi = {10.1145/3690624.3709234}, abstract = {Traditional Learning-To-Rank (LETOR) approaches, including pairwise methods like RankNet and LambdaMART, often fall short by solely focusing on pairwise comparisons, leading to sub-optimal global rankings. Conversely, deep learning based listwise methods, while aiming to optimise entire lists, require complex tuning and yield only marginal improvements over robust pairwise models. To overcome these limitations, we introduce Travelling Salesman Problem Rank (TSPRank), a hybrid pairwise-listwise ranking method. TSPRank reframes the ranking problem as a Travelling Salesman Problem (TSP), a well-known combinatorial optimisation challenge that has been extensively studied for its numerous solution algorithms and applications. This approach enables the modelling of pairwise relationships and leverages combinatorial optimisation to determine the listwise ranking. TSPRank can be directly integrated as an additional component into embeddings generated by existing backbone models to enhance ranking performance. Our extensive experiments across three backbone models on diverse tasks, including stock ranking, information retrieval, and historical events ordering, demonstrate that TSPRank significantly outperforms both pure pairwise and listwise methods. Our qualitative analysis reveals that TSPRank's main advantage over existing methods is its ability to harness global information better while ranking. TSPRank's robustness and superior performance across different domains highlight its potential as a versatile and effective LETOR solution.}, booktitle = {Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1}, pages = {707–718}, numpages = {12}, keywords = {learning-to-rank, pairwise-listwise ranking, travelling salesman problem}, location = {Toronto ON, Canada}, series = {KDD '25} }

Feb 18, 2025

SynthRank: Synthetic Data Generation of Individual’s Financial Transactions Through Learning to Ranking

Feb 1, 2024

BERT Is Not The Count: Learning to Match Mathematical Statements with Proofs

Citation @inproceedings{li-etal-2023-bert, title = "{BERT} Is Not The Count: Learning to Match Mathematical Statements with Proofs", author = "Li, Weixian Waylon and Ziser, Yftah and Coavoux, Maximin and Cohen, Shay B.", editor = "Vlachos, Andreas and Augenstein, Isabelle", booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics", month = may, year = "2023", address = "Dubrovnik, Croatia", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.eacl-main.260", doi = "10.18653/v1/2023.eacl-main.260", pages = "3581--3593", abstract = "We introduce a task consisting in matching a proof to a given mathematical statement. The task fits well within current research on Mathematical Information Retrieval and, more generally, mathematical article analysis (Mathematical Sciences, 2014). We present a dataset for the task (the MATcH dataset) consisting of over 180k statement-proof pairs extracted from modern mathematical research articles. We find this dataset highly representative of our task, as it consists of relatively new findings useful to mathematicians. We propose a bilinear similarity model and two decoding methods to match statements to proofs effectively. While the first decoding method matches a proof to a statement without being aware of other statements or proofs, the second method treats the task as a global matching problem. Through a symbol replacement procedure, we analyze the {``}insights{''} that pre-trained language models have in such mathematical article analysis and show that while these models perform well on this task with the best performing mean reciprocal rank of 73.7, they follow a relatively shallow symbolic analysis and matching to achieve that performance.", }

May 1, 2023