SynthRank: Synthetic Data Generation of Individual’s Financial Transactions Through Learning to Ranking

2月 1, 2024·

Weixian Waylon Li

Mengyu Wang

Carsten Maple

Tiejun Ma

· 0 分钟阅读时长

PDF 海报演示文稿

摘要

Synthetic data generation is being seen as a solution to the problem of acquiring data to build artificial intelligence models. In particular, obtaining financial transaction data is notably challenging, largely attributed to strict privacy regulations and the sensitivity of personal financial details. In contrast to existing approaches that utilise generative models dedicated to replicating authentic data distributions, this paper proposes SynthRank, a novel approach founded on the learning-to-rank (LETOR) algorithm, for financial transaction synthetic data creation. Focusing specifically on trading transactions, we address the risky trader detection and prediction challenge by leveraging LETOR techniques to generate ranking scores for each attribute of the transaction set. These scores are aggregated into a new vector, constituting the synthetic data. By segmenting data into distinct ranking groups, we produce synthetic data without quantity limitations. Our approach offers privacy protection of individual data, since sensitive information becomes challenging for attackers to infer. Our comprehensive analysis demonstrates that SynthRank not only enhances utility but also preserves the essential distribution characteristics of the original dataset while providing privacy protection.

类型

会议文章

出版物

AI in Finance for Social Impact @ AAAI 2024