BERT Is Not The Count: Learning to Match Mathematical Statements with Proofs

5月 1, 2023·

Weixian Waylon Li

Yftah Ziser

Maximin Coavoux

Shay B. Cohen

· 2 分钟阅读时长

PDF 引用代码数据集海报演示文稿视频 DOI

摘要

We introduce a task consisting in matching a proof to a given mathematical statement. The task fits well within current research on Mathematical Information Retrieval and, more generally, mathematical article analysis (Mathematical Sciences, 2014). We present a dataset for the task (the MATcH dataset) consisting of over 180k statement-proof pairs extracted from modern mathematical research articles. We find this dataset highly representative of our task, as it consists of relatively new findings useful to mathematicians. We propose a bilinear similarity model and two decoding methods to match statements to proofs effectively. While the first decoding method matches a proof to a statement without being aware of other statements or proofs, the second method treats the task as a global matching problem. Through a symbol replacement procedure, we analyze the “insights” that pre-trained language models have in such mathematical article analysis and show that while these models perform well on this task with the best performing mean reciprocal rank of 73.7, they follow a relatively shallow symbolic analysis and matching to achieve that performance.

类型

会议文章

出版物

In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Citation

@inproceedings{li-etal-2023-bert,
    title = "{BERT} Is Not The Count: Learning to Match Mathematical Statements with Proofs",
    author = "Li, Weixian Waylon  and
      Ziser, Yftah  and
      Coavoux, Maximin  and
      Cohen, Shay B.",
    editor = "Vlachos, Andreas  and
      Augenstein, Isabelle",
    booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics",
    month = may,
    year = "2023",
    address = "Dubrovnik, Croatia",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.eacl-main.260",
    doi = "10.18653/v1/2023.eacl-main.260",
    pages = "3581--3593",
    abstract = "We introduce a task consisting in matching a proof to a given mathematical statement. The task fits well within current research on Mathematical Information Retrieval and, more generally, mathematical article analysis (Mathematical Sciences, 2014). We present a dataset for the task (the MATcH dataset) consisting of over 180k statement-proof pairs extracted from modern mathematical research articles. We find this dataset highly representative of our task, as it consists of relatively new findings useful to mathematicians. We propose a bilinear similarity model and two decoding methods to match statements to proofs effectively. While the first decoding method matches a proof to a statement without being aware of other statements or proofs, the second method treats the task as a global matching problem. Through a symbol replacement procedure, we analyze the {``}insights{''} that pre-trained language models have in such mathematical article analysis and show that while these models perform well on this task with the best performing mean reciprocal rank of 73.7, they follow a relatively shallow symbolic analysis and matching to achieve that performance.",
}

最近更新于 10月 17, 2025