-
Words That Unite The World: A Unified Framework for Deciphering Central Bank Communications Globally
Authors:
Agam Shah,
Siddhant Sukhani,
Huzaifa Pardawala,
Saketh Budideti,
Riya Bhadani,
Rudra Gopal,
Siddhartha Somani,
Michael Galarnyk,
Soungmin Lee,
Arnav Hiray,
Akshar Ravichandran,
Eric Kim,
Pranav Aluru,
Joshua Zhang,
Sebastian Jaskowski,
Veer Guda,
Meghaj Tarte,
Liqin Ye,
Spencer Gosden,
Rutwik Routu,
Rachel Yuh,
Sloka Chava,
Sahasra Chava,
Dylan Patrick Kelly,
Aiden Chiang
, et al. (2 additional authors not shown)
Abstract:
Central banks around the world play a crucial role in maintaining economic stability. Deciphering policy implications in their communications is essential, especially as misinterpretations can disproportionately impact vulnerable populations. To address this, we introduce the World Central Banks (WCB) dataset, the most comprehensive monetary policy corpus to date, comprising over 380k sentences fr…
▽ More
Central banks around the world play a crucial role in maintaining economic stability. Deciphering policy implications in their communications is essential, especially as misinterpretations can disproportionately impact vulnerable populations. To address this, we introduce the World Central Banks (WCB) dataset, the most comprehensive monetary policy corpus to date, comprising over 380k sentences from 25 central banks across diverse geographic regions, spanning 28 years of historical data. After uniformly sampling 1k sentences per bank (25k total) across all available years, we annotate and review each sentence using dual annotators, disagreement resolutions, and secondary expert reviews. We define three tasks: Stance Detection, Temporal Classification, and Uncertainty Estimation, with each sentence annotated for all three. We benchmark seven Pretrained Language Models (PLMs) and nine Large Language Models (LLMs) (Zero-Shot, Few-Shot, and with annotation guide) on these tasks, running 15,075 benchmarking experiments. We find that a model trained on aggregated data across banks significantly surpasses a model trained on an individual bank's data, confirming the principle "the whole is greater than the sum of its parts." Additionally, rigorous human evaluations, error analyses, and predictive tasks validate our framework's economic utility. Our artifacts are accessible through the HuggingFace and GitHub under the CC-BY-NC-SA 4.0 license.
△ Less
Submitted 15 May, 2025;
originally announced May 2025.
-
Language Modeling for the Future of Finance: A Quantitative Survey into Metrics, Tasks, and Data Opportunities
Authors:
Nikita Tatarinov,
Siddhant Sukhani,
Agam Shah,
Sudheer Chava
Abstract:
Recent advances in language modeling have led to growing interest in applying Natural Language Processing (NLP) techniques to financial problems, enabling new approaches to analysis and decision-making. To systematically examine this trend, we review 374 NLP research papers published between 2017 and 2024 across 38 conferences and workshops, with a focused analysis of 221 papers that directly addr…
▽ More
Recent advances in language modeling have led to growing interest in applying Natural Language Processing (NLP) techniques to financial problems, enabling new approaches to analysis and decision-making. To systematically examine this trend, we review 374 NLP research papers published between 2017 and 2024 across 38 conferences and workshops, with a focused analysis of 221 papers that directly address finance-related tasks. We evaluate these papers across 11 qualitative and quantitative dimensions, identifying key trends such as the increasing use of general-purpose language models, steady progress in sentiment analysis and information extraction, and emerging efforts around explainability and privacy-preserving methods. We also discuss the use of evaluation metrics, highlighting the importance of domain-specific ones to complement standard machine learning metrics. Our findings emphasize the need for more accessible, adaptive datasets and highlight the significance of incorporating financial crisis periods to strengthen model robustness under real-world conditions. This survey provides a structured overview of NLP research applied to finance and offers practical insights for researchers and practitioners working at this intersection.
△ Less
Submitted 9 April, 2025;
originally announced April 2025.
-
SubjECTive-QA: Measuring Subjectivity in Earnings Call Transcripts' QA Through Six-Dimensional Feature Analysis
Authors:
Huzaifa Pardawala,
Siddhant Sukhani,
Agam Shah,
Veer Kejriwal,
Abhishek Pillai,
Rohan Bhasin,
Andrew DiBiasio,
Tarun Mandapati,
Dhruv Adha,
Sudheer Chava
Abstract:
Fact-checking is extensively studied in the context of misinformation and disinformation, addressing objective inaccuracies. However, a softer form of misinformation involves responses that are factually correct but lack certain features such as clarity and relevance. This challenge is prevalent in formal Question-Answer (QA) settings such as press conferences in finance, politics, sports, and oth…
▽ More
Fact-checking is extensively studied in the context of misinformation and disinformation, addressing objective inaccuracies. However, a softer form of misinformation involves responses that are factually correct but lack certain features such as clarity and relevance. This challenge is prevalent in formal Question-Answer (QA) settings such as press conferences in finance, politics, sports, and other domains, where subjective answers can obscure transparency. Despite this, there is a lack of manually annotated datasets for subjective features across multiple dimensions. To address this gap, we introduce SubjECTive-QA, a human annotated dataset on Earnings Call Transcripts' (ECTs) QA sessions as the answers given by company representatives are often open to subjective interpretations and scrutiny. The dataset includes 49,446 annotations for long-form QA pairs across six features: Assertive, Cautious, Optimistic, Specific, Clear, and Relevant. These features are carefully selected to encompass the key attributes that reflect the tone of the answers provided during QA sessions across different domain. Our findings are that the best-performing Pre-trained Language Model (PLM), RoBERTa-base, has similar weighted F1 scores to Llama-3-70b-Chat on features with lower subjectivity, such as Relevant and Clear, with a mean difference of 2.17% in their weighted F1 scores. The models perform significantly better on features with higher subjectivity, such as Specific and Assertive, with a mean difference of 10.01% in their weighted F1 scores. Furthermore, testing SubjECTive-QA's generalizability using QAs from White House Press Briefings and Gaggles yields an average weighted F1 score of 65.97% using our best models for each feature, demonstrating broader applicability beyond the financial domain. SubjECTive-QA is publicly available under the CC BY 4.0 license
△ Less
Submitted 23 January, 2025; v1 submitted 27 October, 2024;
originally announced October 2024.