NLP

Recently, our school celebrated the 60th anniversary of Computer Science & AI. To mark the occasion, the organizers invited Fernando Pereira to deliver a lecture on the connection between form and meaning in language. This subject has captivated the minds of linguists, computer scientists, and cognitive researchers for many years.

Oct 13, 2023

BERT Is Not The Count: Learning to Match Mathematical Statements with Proofs

Citation @inproceedings{li-etal-2023-bert, title = "{BERT} Is Not The Count: Learning to Match Mathematical Statements with Proofs", author = "Li, Weixian Waylon and Ziser, Yftah and Coavoux, Maximin and Cohen, Shay B.", editor = "Vlachos, Andreas and Augenstein, Isabelle", booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics", month = may, year = "2023", address = "Dubrovnik, Croatia", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.eacl-main.260", doi = "10.18653/v1/2023.eacl-main.260", pages = "3581--3593", abstract = "We introduce a task consisting in matching a proof to a given mathematical statement. The task fits well within current research on Mathematical Information Retrieval and, more generally, mathematical article analysis (Mathematical Sciences, 2014). We present a dataset for the task (the MATcH dataset) consisting of over 180k statement-proof pairs extracted from modern mathematical research articles. We find this dataset highly representative of our task, as it consists of relatively new findings useful to mathematicians. We propose a bilinear similarity model and two decoding methods to match statements to proofs effectively. While the first decoding method matches a proof to a statement without being aware of other statements or proofs, the second method treats the task as a global matching problem. Through a symbol replacement procedure, we analyze the {``}insights{''} that pre-trained language models have in such mathematical article analysis and show that while these models perform well on this task with the best performing mean reciprocal rank of 73.7, they follow a relatively shallow symbolic analysis and matching to achieve that performance.", }

May 1, 2023

Contrastive Learning Note

Related reading: The Beginner’s Guide to Contrastive Learning SimCSE: Simple Contrastive Learning of Sentence Embeddings A Simple Framework for Contrastive Learning of Visual Representations Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks Background Contrastive learning aims to learn effective representation by pulling semantically close neighbors together and pushing apart non-neighbors. Initially, contrastive learning was applied to computer vision tasks. As what it is shown in the figure below, we expect the model to learn the communities between two images that share the same label and the difference between a pair of images with different labels.

Aug 8, 2022

Real-Time Speech Recognition Using Python

A step-by-step guidance to create a simple real-time speech recogniser

Mar 5, 2020